glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	glusterd: Brick process fails to come up with brickmux on	Vishal Pandey	2020-02-20	1	-1/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue: 1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1, vol2, vol3 with 3 bricks (one from each node) 2- Set cluster.brick-multiplex on 3- Start all 3 volumes 4- Check if all bricks on a node are running on same port 5- Kill N1 6- Set performance.readdir-ahead for volumes vol1, vol2, vol3 7- Bring N1 up and check volume status 8- All bricks processes not running on N1. Root Cause - Since, There is a diff in volfile versions in N1 as compared to N2 and N3 therefore glusterd_import_friend_volume() is called. glusterd_import_friend_volume() copies the new_volinfo and deletes old_volinfo and then calls glusterd_start_bricks(). glusterd_start_bricks() looks for the volfiles and sends an rpc request to glusterfs_handle_attach(). Now, since the volinfo has been deleted by glusterd_delete_stale_volume() from priv->volumes list before glusterd_start_bricks() and glusterd_create_volfiles_and_notify_services() and glusterd_list_add_order is called after glusterd_start_bricks(), therefore the attach RPC req gets an empty volfile path and that causes the brick to crash. Fix- Call glusterd_list_add_order() and glusterd_create_volfiles_and_notify_services before glusterd_start_bricks() cal is made in glusterd_import_friend_volume Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa Fixes: bz#1773856 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	afr: prevent spurious entry heals leading to gfid split-brain	Ravishankar N	2020-02-18	2	-14/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a hyperconverged setup with granular-entry-heal enabled, if a file is recreated while one of the bricks is down, and an index heal is triggered (with the brick still down), entry-self heal was doing a spurious heal with just the 2 good bricks. It was doing a post-op leading to removal of the filename from .glusterfs/indices/entry-changes as well as erroneous setting of afr xattrs on the parent. When the brick came up, the xattrs were cleared, resulting in the renamed file not getting healed and leading to gfid split-brain and EIO on the mount. Fix: Proceed with entry heal only when shd can connect to all bricks of the replica, just like in data and metadata heal. fixes: bz#1801624 Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	feature/changelog: Avoid thread creation if xlator is not enabled	Mohit Agrawal	2020-02-09	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Changelog creates threads even if the changelog is not enabled Background: Changelog xlator broadly does two things 1. Journalling - Cosumers are geo-rep and glusterfind 2. Event Notification for registered events like (open, release etc) - Consumers are bitrot, geo-rep The existing option "changelog.changelog" controls journalling and there is no option to control event notification and is enabled by default. So when bitrot/geo-rep is not enabled on the volume, threads and resources(rpc and rbuf) related to event notifications consumes resources and cpu cycle which is unnecessary. Solution: The solution is to have two different options as below. 1. changelog-notification : Event notifications 2. changelog : Journalling This patch introduces the option "changelog-notification" which is not exposed to user. When either bitrot or changelog (journalling) is enabled, it internally enbales 'changelog-notification'. But once the 'changelog-notification' is enabled, it will not be disabled for the life time of the brick process even after bitrot and changelog is disabled. As of now, rpc resource cleanup has lot of races and is difficult to cleanup cleanly. If allowed, it leads to memory leaks and crashes on enable/disable of bitrot or changelog (journal) in a loop. Hence to be safer, the event notification is not disabled within lifetime of process once enabled. Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a Updates: #475 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	dht: Fix stale-layout and create issue	Susant Palai	2020-02-09	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: With lookup-optimize set to on by default, a client with stale-layout can create a new file on a wrong subvol. This will lead to possible duplicate files if two different clients attempt to create the same file with two different layouts. Solution: Send in-memory layout to be cross checked at posix before commiting a "create". In case of a mismatch, sync the client layout with that of the server and attempt the create fop one more time. test: Manual, testcase(attached) fixes: bz#1786679 Change-Id: Ife0941f105113f1c572f4363cbcee65e0dd9bd6a Signed-off-by: Susant Palai <spalai@redhat.com>
*	tests: fix test failures for nfsnobody user and group	Sunil Kumar Acharya	2020-01-24	2	-4/+20
\| \| \| \| \| \| \| \| \| \| \|	'nfsnobody' user and group is merged with 'nobody' user and group in RHEL8. Tests are modified to use appropriate user and group. BUG: 1756900 Change-Id: I59863da2262283b00b1cb417d3652ebe29a36407 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
*	afr: restore timestamp of files during metadata heal	Sheetal Pamecha	2020-01-24	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \|	For files: During metadata heal, we restore timestamps only for non-regular (char, block etc.) files. Extenting it for regular files as timestamp is updated via touch command also fixes: bz#1787274 Change-Id: I26fe4fb6dff679422ba4698a7f828bf62ca7ca18 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	server: Mount fails after reboot 1/3 gluster nodes	Mohit Agrawal	2020-01-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: At the time of coming up one server node(1x3) after reboot client is unmounted.The client is unmounted because a client is getting AUTH_FAILED event and client call fini for the graph.The client is getting AUTH_FAILED because brick is not attached with a graph at that moment Solution: To avoid the unmounting the client graph throw ENOENT error from server in case if brick is not attached with server at the time of authenticate clients. Credits: Xavi Hernandez <xhernandez@redhat.com> Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e Fixes: bz#1793852 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	glusterd: deafult options after volume reset	Sanju Rakonde	2020-01-01	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: default option itransport.address-family is disappered in volume info output after a volume reset. Cause: with 3.8.0 onwards volume option transport.address-family has default value, any volume which is created will have this option set. So, volume info will show this in its output. But, with reset volume, this option is not handled. Solution: In glusterd_enable_default_options(), we should add this option along with other default options. This function is called by glusterd_options_reset() with volume reset command. fixes: bz#1786478 Change-Id: I58f7aa24cf01f308c4efe6cae748cc3bc8b99b1d Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	posix: Avoid diskpace error in case of overwriting the data	Mohit Agrawal	2020-01-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Sometime fops like posix_writev, posix_fallocate, posix_zerofile failed and throw error ENOSPC if storage.reserve threshold limit has reached even fops is overwriting the data Solution: Retry the fops in case of overwrite if diskspace check is failed Credits: kinsu <vpolakis@gmail.com> Change-Id: I987d73bcf47ed1bb27878df40c39751296e95fe8 Updates: #745 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	tests/fuse/bug-965974.t: turn off md-cache, to fix failure of the test	Raghavendra G	2019-12-19	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Imagine the following set of operations: 1. touch $M0/file 2. ln $M0/file $M0/file.lnk 3. rm $M1/file 4. wait for md-cache-timeout 5. stat $M0/file.lnk $M0/file stat on $M0/file in step 5 succeeds when md-cache-timeout is non-zero even though it was removed from $M1. The reason is 1. fuse-bridge on $M0 would resolve both file and file.lnk to same inode. This inode i1 is cached in md-cache. 2. After md-cache-timeout lookup on $M0/file.lnk would be sent to backend. This lookup will be successful as file.lnk is present on backend and would refresh the md-cache. 3. The lookup on $M0/file sent within md-cache-timeout after lookup on $M0/file.lnk would be sent with i1. Since i1 was refreshed by lookup on $M0/file.lnk, the cache is deemed valid and md-cache responds lookup on $M0/file as success To fix this failure we can either: 1. remove lookup on $M0/file.lnk, so that lookup on $M0/file is actually sent to backend. 2. turn off md-cache This patch chooses option 2. credits: Csaba Henk <csaba@redhat.com> Change-Id: I352c2acd377fe10c4bdf3b6e53c1de86a4e544c7 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1756900
*	[Cli] Removing old log rotate command.	kshithijiyer	2019-12-17	2	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old command for log rotate is still present removing it completely. Also adding testcase to test the log rotate command with both the old as well as the new command and fixing testcase which use the old syntax to use the new one. Code to be removed: 1. In cli-cmd-volume.c from struct cli_cmd volume_cmds[]: {"volume log rotate <VOLNAME> [BRICK]", cli_cmd_log_rotate_cbk, "rotate the log file for corresponding volume/brick" " NOTE: This is an old syntax, will be deprecated from next release."}, 2. In cli-cmd-volume.c from cli_cmd_log_rotate_cbk(): \|\|(strcmp("rotate", words[2]) == 0))) 3. In cli-cmd-parser.c from cli_cmd_log_rotate_parse() if (strcmp("rotate", words[2]) == 0) volname = (char *)words[3]; else fixes: bz#1750387 Change-Id: I56e4d295044e8d5fd1fc0d848bc87e135e9e32b4 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
*	glusterd: Client Handling of Elastic Clusters	Mohit Agrawal	2019-11-12	1	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Configure the list of gluster servers in the key GLUSTERD_BRICK_SERVERS at the time of GETSPEC RPC CALL and access the value in client side to update volfile serve list so that client would be able to connect next volfile server in case of current volfile server is down Updates #741 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I23f36ddb92982bb02ffd83937a8bd8a2c97e8104
*	cli: display detailed rebalance info	Sanju Rakonde	2019-11-12	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When one of the node is down in cluster, rebalance status is not displaying detailed information. Cause: In glusterd_volume_rebalance_use_rsp_dict() we are aggregating rsp from all the nodes into a dictionary and sending it to cli for printing. While assigning a index to keys we are considering all the peers instead of considering only the peers which are up. Because of which, index is not reaching till 1. while parsing the rsp cli unable to find status-1 key in dictionary and going out without printing any information. Solution: The simplest fix for this without much code change is to continue to look for other keys when status-1 key is not found. fixes: bz#1764119 Change-Id: I0062839933c9706119eb85416256eade97e976dc Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	ssl/test: Change the rsa key length to 2048	Mohammed Rafi KC	2019-10-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	On a rhel-8 machine, we need to have a key length greater than or eaual to 2048. So changing the values to 2048 to pass the test. Credits: Mohit Agrawal <moagrawal@redhat.com> Change-Id: I0f21db4d737203d0b2e44e7e61f50ae1279795ad Updates: bz#1756900 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests: Fix spurious failure	Pranith Kumar K	2019-10-16	1	-2/+2
\| \| \| \| \| \|	fixes: bz#1759002 Change-Id: I4d49e1c2ca9b3c1d74b9dd5a30f1c66983a76529 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: Specify bs for dd	Pranith Kumar K	2019-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \|	On some distros default bs is very slow and the test takes close to 2 minutes instead of 20 seconds. fixes: bz#1761769 Change-Id: If10d595a7ca05f053237f3c5ffbb09c5151eab35 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: mark tests/bugs/glusterd/quorum-value-check.t as NFS test	Sanju Rakonde	2019-10-15	1	-0/+2
\| \| \| \| \| \| \|	Fixes: bz#1665358 Change-Id: Iea000dd839d4e4dbef45941f97ab3725a2aa1726 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd: rebalance start should fail when quorum is not met	Sanju Rakonde	2019-10-10	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	rebalance start should not succeed if quorum is not met. this patch adds a condition to check whether quorum is met in pre-validation stage. fixes: bz#1760467 Change-Id: Ic7d0d08f69e4bc6d5e7abae713ec1881531c8ad4 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	Fix spurious failure in bug-1744548-heal-timeout.t	Pranith Kumar K	2019-10-09	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Script was assuming that the heal would have triggered by the time test was executed, which may not be the case. It can lead to following failures when the race happens: ... 18:29:45 not ok 14 [ 85/ 1] < 26> '[ 331 == 333 ]' -> '' ... 18:29:45 not ok 16 [ 10097/ 1] < 33> '[ 668 == 666 ]' -> '' Heal on 3rd brick didn't start completely first time the command was executed. So the extra count got added to the next profile info. Fixed it by depending on cumulative stats and waiting until the count is satisfied using EXPECT_WITHIN fixes: bz#1759002 Change-Id: I3b410671c902d6b1458a757fa245613cb29d967d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Heal entries when there is a source & no healed_sinks	karthik-us	2019-10-09	1	-0/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame anything for entry heal, heal will not complete even though we have clear source and sinks. This will happen because while doing afr_selfheal_find_direction() only the bricks which are blamed by non-accused bricks are considered as sinks. Later in __afr_selfheal_entry_finalize_source() when it tries to mark all the non-sources as sinks it fails to do so because there won't be any healed_sinks marked, no witness present and there will be a source. Fix: If there is a source and no healed_sinks, then reset all the locked sources to 0 and healed sinks to 1 to do conservative merge. Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548 Fixes: bz#1749322 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	tests: Fix spurious failure in bug-1134691-afr-lookup-metadata-heal.t	Ravishankar N	2019-10-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The .t was examining the sink brick's iatt value before the launched client-side metadata heal got a chance to complete. Fix: Wait for heal completion. Fixes: bz#1759081 Change-Id: I4dd4e3a1cccf35fd18e8cdfea6aa76a726a4763b Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr: support split-brain CLI for replica 3	Ravishankar N	2019-10-09	1	-0/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ever since we added quorum checks for lookups in afr via commit bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution commands would not work for replica 3 because there would be no readables for the lookup fop. The argument was that split-brains do not occur in replica 3 but we do see (data/metadata) split-brain cases once in a while which indicate that there are a few bugs/corner cases yet to be discovered and fixed. Fortunately, commit 8016d51a3bbd410b0b927ed66be50a09574b7982 added GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD, split-brain resolution commands will work for replica 3 volumes too. Likewise, the check is added in shard_lookup as well to permit resolving split-brains by specifying "/.shard/shard-file.xx" as the file name (which previously used to fail with EPERM). Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9 Fixes: bz#1756938 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	tests: add a pending test case	Amar Tumballi	2019-10-03	1	-1/+17
\| \| \| \| \| \| \| \| \|	While merging the protocol handshake fixes (bz#1620580), there was a case which was left out. Adding it separately now. Change-Id: I52133d5fe160b4567400a65e60aac8f7bc20697f Updates: bz#1193929 Signed-off-by: Amar Tumballi <amarts@gmail.com>
*	ssl: fix RHEL8 regression failure	Sanju Rakonde	2019-10-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This tests is failing with "SSL routines:SSL_CTX_use_certificate:ee key too small" in RHEL8. This change is made according to https://access.redhat.com/solutions/4157431 updates: bz#1756900 Change-Id: Ib436372c3bd94bcf7324976337add7da4088b3d5 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	protocol/handshake: pass volume-id for extra check	Amar Tumballi	2019-09-30	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With added check of volume-id during handshake, we can be sure to not connect with a brick if this gets re-used in another volume. This prevents any accidental issues which can happen with a stale client process lurking along. Also added test case for testing same volume name which would fetch a different volfile (ie, different bricks, different type), and a different volume name, but same brick. For reference: Currently a client<->server handshake happens in glusterfs through protocol/client translator (setvolume) to protocol/server using a dictionary which containes many keys. Rejection happens in server side if some of the required keys are missing in handshake dictionary. Till now, there was no single unique identifier to validate for a client to tell server if it is actually talking to a corresponding server. All we look in protocol/client is a key called 'remote-subvolume', which should match with a subvolume name in server volume file, and for any volume with same brick name (can be present in same cluster due to recreate), it would be same. This could cause major issue, when a client was connected to a given brick, in one volume would be connected to another volume's brick if its re-created/re-used. To prevent this behavior, we are now passing along 'volume-id' in handshake, which would be preserved for the life of client process, which can prevent this accidental connections. NOTE: This behavior wouldn't be applicable for user-snapshot enabled volumes, as snapshotted volume's would have different volume-id. Fixes: bz#1620580 Change-Id: Ie98286e94ce95ae09c2135fd6ec7d7c2ca1e8095 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests/shard: Remove dependence on distributed cache	Pranith Kumar K	2019-09-27	1	-3/+3
\| \| \| \| \| \|	fixes: bz#1756211 Change-Id: Iee5b37af89ab624c16a45df364806003238280e5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	ctime/rebalance: Heal ctime xattr on directory during rebalance	Kotresh HR	2019-09-16	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After add-brick and rebalance, the ctime xattr is not present on rebalanced directories on new brick. This patch fixes the same. Note that ctime still doesn't support consistent time across distribute sub-volume. This patch also fixes the in-memory inconsistency of time attributes when metadata is self healed. Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df fixes: bz#1734026 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	protocol/client: don't reopen fds on which POSIX locks are held after a ↵	Raghavendra G	2019-09-12	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reconnect Bricks cleanup any granted locks after a client disconnects and currently these locks are not healed after a reconnect. This means post reconnect a competing process could be granted a lock even though the first process which was granted locks has not unlocked. By not re-opening fds, subsequent operations on such fds will fail forcing the application to close the current fd and reopen a new one. This way we prevent any silent corruption. A new option "client.strict-locks" is introduced to control this behaviour. This option is set to "off" by default. Change-Id: Ieed545efea466cb5e8f5a36199aa26380c301b9e Signed-off-by: Raghavendra G <rgowdapp@redhat.com> updates: bz#1694920
*	api: Cleanup of executable not done	Sheetal	2019-09-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In test tests/bugs/gfapi/bug-1447266/bug-1447266.t actual file created is - tests/bugs/gfapi/bug-1447266/bug-1447266 which is not cleaned up later fixes: bz#1750618 Change-Id: I93120418e54b95018a7213d106a1f1c990766281 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	tests: Fix spurious failure	Pranith Kumar K	2019-09-11	1	-2/+20
\| \| \| \| \| \| \| \| \| \|	If heal from next brick starts after the first brick completes heal, then opendir on the brick can change atime leading to failure of the test. When ctime is disabled it is better to just check mtime to be same after heal. fixes: bz#1751134 Change-Id: Ia03e30fd547e6bbe85c1e299845ffa122f3a2692 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	afr/lookup: Pass xattr_req in while doing a selfheal in lookup	Mohammed Rafi KC	2019-09-05	1	-0/+52
\| \| \| \| \| \| \| \| \| \|	We were not passing xattr_req when doing a name self heal as well as a meta data heal. Because of this, some xdata was missing which causes i/o errors Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f Fixes: bz#1728770 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests: fix spurious failure of bug-1402841.t-mt-dir-scan-race.t	Ravishankar N	2019-09-04	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Since commit 600ba94183333c4af9b4a09616690994fd528478, shd starts healing as soon as it is toggled from disabled to enabled. This was causing the following line in the .t to fail on a 'fast' machine (always on my laptop and sometimes on the jenkins slaves). EXPECT_NOT "^0$" get_pending_heal_count $V0 because by the time shd was disabled, the heal was already completed. Fix: Increase the no. of files to be healed and make it a variable called FILE_COUNT, should we need to bump it up further because the machines become even faster. Also created pending metadata heals to increase the time taken to heal a file. fixes: bz#1748744 Change-Id: I5a26b08e45b8c19bce3c01ce67bdcc28ed48198d Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr: wake up index healer threads	Ravishankar N	2019-08-30	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \|	...whenever shd is re-enabled after disabling or there is a change in `cluster.heal-timeout`, without needing to restart shd or waiting for the current `cluster.heal-timeout` seconds to expire. See BZ 1743988 for more details. Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe fixes: bz#1744548 Reported-by: Glen Kiessling <glenk1973@hotmail.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	performance/md-cache: Do not skip caching of null character xattr values	Anoop C S	2019-08-20	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Null character string is a valid xattr value in file system. But for those xattrs processed by md-cache, it does not update its entries if value is null('\0'). This results in ENODATA when those xattrs are queried afterwards via getxattr() causing failures in basic operations like create, copy etc in a specially configured Samba setup for Mac OS clients. On the other side snapview-server is internally setting empty string("") as value for xattrs received as part of listxattr() and are not intended to be cached. Therefore we try to maintain that behaviour using an additional dictionary key to prevent updation of entries in getxattr() and fgetxattr() callbacks in md-cache. Credits: Poornima G <pgurusid@redhat.com> Change-Id: I7859cbad0a06ca6d788420c2a495e658699c6ff7 Fixes: bz#1726205 Signed-off-by: Anoop C S <anoopcs@redhat.com>
*	glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing	Mohit Agrawal	2019-08-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: sometime ./tests/bugs/glusterd/bug-1595320.t is failing is failing at the time of checking brick_process after sending a kill signal to brick process Solution: Wait sometime after just sending a kill signal to brick process to make sure brick process is stopped Change-Id: Iee9e91284618abfc62a550d47e4f9117785def58 Fixes: bz#1743200 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests: mark ↵	Atin Mukherjee	2019-08-19	1	-0/+1
\| \| \| \| \| \| \| \|	bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t as BRICK_MUX_BAD_TEST Updates: bz#1743069 Change-Id: I1eea0186ca0c1b1226f4b3d0d7c0e41fc7821cbd Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	afr: restore timestamp of parent dir during entry-heal	Ravishankar N	2019-08-14	1	-0/+78
\| \| \| \| \| \|	Fixes: bz#1734370 Change-Id: I29e338bac62104233a6f80212df8d0fb016affda Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	features/shard: Send correct size when reads are sent beyond file size	Krutika Dhananjay	2019-08-12	1	-0/+29
\| \| \| \| \| \|	Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b fixes bz#1738419 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	tests: fix bug-880898.t crash	Ravishankar N	2019-08-12	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: https://build.gluster.org/job/centos7-regression/7337/consoleFull indicates the shd crashing for this .t. On looking at the core, I see the crash is at the time of shd init and glusterfs context is null: (gdb) bt (gdb) p ctx $2 = (glusterfs_ctx_t *) 0xf00000000 The .t is killing all gluster processes immediately after volume start, so it looks like a race between shd coming up and it being killed. Fix: Kill gluster processes only after they are up and running. Fixes: bz#1740017 Change-Id: I7cf589201669bd9f535e968d147015dc99e9a4b6 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	tests: Fix bug-1717819-metadata-split-brain-detection.t failure	karthik-us	2019-07-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: tests/bugs/replicate/bug-1717819-metadata-split-brain-detection.t fails intermittently in test cases #49 & #50, which compare the values of the user set xattr values after enabling the heal. We are not waiting for the heal to complete before comparing those values, which might lead those tests to fail. Fix: Wait till the HEAL-TIMEOUT before comparing the xattr values. Also cheking for the shd to come up and the bricks to connect to the shd process in another case. Change-Id: I0e245b328da9df23ce70c5300278fad1c1d9f7ff Fixes: bz#1729847 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	cluster/afr: Fix incorrect reporting of gfid & type mismatch	karthik-us	2019-07-12	1	-0/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problems: 1. When checking for type and gfid mismatch, if the type or gfid is unknown because of missing gfid handle and the gfid xattr it will be reported as type or gfid mismatch and the heal will not complete. 2. If the source selected during entry heal has null gfid the same will be sent to afr_lookup_and_heal_gfid(). In this function when we try to assign the gfid on the bricks where it does not exist, we are considering the same gfid and try to assign that on those bricks. This will fail in posix_gfid_set() since the gfid sent is null. Fix: If the gfid sent to afr_lookup_and_heal_gfid() is null choose a valid gfid before proceeding to assign the gfid on the bricks where it is missing. In afr_selfheal_detect_gfid_and_type_mismatch(), do not report type/gfid mismatch if the type/gfid is unknown or not set. Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da fixes: bz#1722507 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	posix: modify storage.reserve option to take size and percent	Sheetal Pamecha	2019-06-26	1	-17/+12
\| \| \| \| \| \| \| \| \| \| \|	* reverting changes made in https://review.gluster.org/#/c/glusterfs/+/21686/ * Now storage.reserve can take value in percent or bytes fixes: bz#1651445 Change-Id: Id4826210ec27991c55b17d1fecd90356bff3e036 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	glusterd: Can't run rebalance due to long unix socket	Mohit Agrawal	2019-06-25	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: glusterd populate unix socket file name based on volname and if volname is lengthy socket system call's are failed due to breach maximum length is defined in the kernel. Solution:Convert unix socket name to hash to resolve the issue Change-Id: I5072e8184013095587537dbfa4767286307fff65 fixes: bz#1720566 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	test: Fix spurious failures in bug-1040275-brick-uid-reset-on-volume-restart.t	Mohit Agrawal	2019-06-25	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: test case is failing after just starting the volume at the time of running stat command on mount point and client is getting error transport endpoint is not conencted Solution: To avoid the error make sure all brick instance should be up and mount point should be active Change-Id: I49553a04d5b13e155ee02f4a1888a07fe3ee2ff5 fixes: bz#1721590 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	encryption/crypt: remove from volume file	Amar Tumballi	2019-06-20	1	-43/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The feature is not supported and is moved out of the codebase from glusterfs-5.x release. Doesn't make sense to keep the code to support it. For those who want to upgrade from an version supporting it to higher version, please do a 'gluster volume reset $VOL encryption reset' and then continue with the upgrade process. updates: bz#1648169 Change-Id: I8cf822c0d7195940bd37f6af2432a3cac68d44d1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfile	Atin Mukherjee	2019-06-17	1	-1/+3
\| \| \| \| \| \| \| \| \|	... with out which volume creation fails with "volume create: <xyz>: failed: Failed to create volume files" Fixes: bz#1716812 Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	tests: Add missing NFS test tag to the testfile	Aravinda VK	2019-06-15	1	-0/+2
\| \| \| \| \| \| \| \|	$SRC/glusterfs/bugs/nfs/showmount-many-clients.t Change-Id: I48758cc66fcb55f48c4a8a0a738b06867f6814a1 Signed-off-by: Aravinda VK <avishwan@redhat.com> Updates: bz#1193929
*	Cluster/afr: Don't treat all bricks having metadata pending as split-brain	karthik-us	2019-06-10	2	-64/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in metadata split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others for metadata. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks. Fix: Pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. This patch also adds restriction of all the bricks to be up to perform metadata heal to avoid any metadata loss. Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t as it was doing metadata heal even when only 2 of 3 bricks were up. Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09 fixes: bz#1717819 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	features/shard: Fix extra unref when inode object is lru'd out and added back	Krutika Dhananjay	2019-06-09	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Long tale of double unref! But do read... In cases where a shard base inode is evicted from lru list while still being part of fsync list but added back soon before its unlink, there could be an extra inode_unref() leading to premature inode destruction leading to crash. One such specific case is the following - Consider features.shard-deletion-rate = features.shard-lru-limit = 2. This is an oversimplified example but explains the problem clearly. First, a file is FALLOCATE'd to a size so that number of shards under /.shard = 3 > lru-limit. Shards 1, 2 and 3 need to be resolved. 1 and 2 are resolved first. Resultant lru list: 1 -----> 2 refs on base inode - (1) + (1) = 2 3 needs to be resolved. So 1 is lru'd out. Resultant lru list - 2 -----> 3 refs on base inode - (1) + (1) = 2 Note that 1 is inode_unlink()d but not destroyed because there are non-zero refs on it since it is still participating in this ongoing FALLOCATE operation. FALLOCATE is sent on all participant shards. In the cbk, all of them are added to fync_list. Resulting fsync list - 1 -----> 2 -----> 3 (order doesn't matter) refs on base inode - (1) + (1) + (1) = 3 Total refs = 3 + 2 = 5 Now an attempt is made to unlink this file. Background deletion is triggered. The first $shard-deletion-rate shards need to be unlinked in the first batch. So shards 1 and 2 need to be resolved. inode_resolve fails on 1 but succeeds on 2 and so it's moved to tail of list. lru list now - 3 -----> 2 No change in refs. shard 1 is looked up. In lookup_cbk, it's linked and added back to lru list at the cost of evicting shard 3. lru list now - 2 -----> 1 refs on base inode: (1) + (1) = 2 fsync list now - 1 -----> 2 (again order doesn't matter) refs on base inode - (1) + (1) = 2 Total refs = 2 + 2 = 4 After eviction, it is found 3 needs fsync. So fsync is wound, yet to be ack'd. So it is still inode_link()d. Now deletion of shards 1 and 2 completes. lru list is empty. Base inode unref'd and destroyed. In the next batched deletion, 3 needs to be deleted. It is inode_resolve()able. It is added back to lru list but base inode passed to __shard_update_shards_inode_list() is NULL since the inode is destroyed. But its ctx->inode still contains base inode ptr from first addition to lru list for no additional ref on it. lru list now - 3 refs on base inode - (0) Total refs on base inode = 0 Unlink is sent on 3. It completes. Now since the ctx contains ptr to base_inode and the shard is part of lru list, base shard is unref'd leading to a crash. FIX: When shard is readded back to lru list, copy the base inode pointer as is into its inode ctx, even if it is NULL. This is needed to prevent double unrefs at the time of deleting it. Change-Id: I99a44039da2e10a1aad183e84f644d63ca552462 Updates: bz#1696136 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	tests/utils: Fix py2/py3 util python scripts	Kotresh HR	2019-06-07	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following files are fixed. tests/bugs/distribute/overlap.py tests/utils/changelogparser.py tests/utils/create-files.py tests/utils/gfid-access.py tests/utils/libcxattr.py Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 updates: bz#1193929 Signed-off-by: Kotresh HR <khiremat@redhat.com>