glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	dht - fixing xattr inconsistency	Barak Sason Rofman	2020-06-25	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \|	The scenario of setting an xattr to a dir, killing one of the bricks, removing the xattr, bringing back the brick results in xattr inconsistency - The downed brick will still have the xattr, but the rest won't. This patch add a mechanism that will remove the extra xattrs during lookup. fixes: #1324 Change-Id: Ibcc449bad6c7cb46bcae380e42e4496d733b453d Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	glusterd: add-brick command failure	Sanju Rakonde	2020-06-21	2	-3/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: add-brick operation is failing when replica or disperse count is not mentioned in the add-brick command. Reason: with commit a113d93 we are checking brick order while doing add-brick operation for replica and disperse volumes. If replica count or disperse count is not mentioned in the command, the dict get is failing and resulting add-brick operation failure. fixes: #1306 Change-Id: Ie957540e303bfb5f2d69015661a60d7e72557353 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	mount/fuse: use cookies to get fuse-interrupt-record instead of xdata	Pranith Kumar K	2020-06-18	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: On executing tests/features/flock_interrupt.t the following error log appears [2020-06-16 11:51:54.631072 +0000] E [fuse-bridge.c:4791:fuse_setlk_interrupt_handler_cbk] 0-glusterfs-fuse: interrupt record not found This happens because fuse-interrupt-record is never sent on the wire by getxattr fop and there is no guarantee that in the cbk it will be available in case of failures. Fix: wind getxattr fop with fuse-interrupt-record as cookie and recover it in the cbk Fixes: #1310 Change-Id: I4cfff154321a449114fc26e9440db0f08e5c7daa Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests/glusterd: spurious failure of ↵	Sanju Rakonde	2020-06-17	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t Test Summary Report ------------------- tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t (Wstat: 0 Tests: 23 Failed: 3) Failed tests: 21-23 After glusterd restart, volume start is failing. Looks like, it need some time to sync the data. Adding sleep for the same. Note: All other changes are made to avoid spurious failures in the future. fixes: #1272 Change-Id: Ib184757fb936e03b5b6208465e44a8e790b71c1c Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	afr: more quorum checks in lookup and new entry marking	Ravishankar N	2020-06-16	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: See github issue for details. Fix: -In lookup if the entry exists in 2 out of 3 bricks, don't fail the lookup with ENOENT just because there is an entrylk on the parent. Consider quorum before deciding. -If entry FOP does not succeed on quorum no. of bricks, do not perform new entry mark. Fixes: #1303 Change-Id: I56df8c89ad53b29fa450c7930a7b7ccec9f4a6c5 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	Indicate timezone offsets in timestamps	Csaba Henk	2020-06-15	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Logs and other output carrying timestamps will have now timezone offsets indicated, eg.: [2020-03-12 07:01:05.584482 +0000] I [MSGID: 106143] [glusterd-pmap.c:388:pmap_registry_remove] 0-pmap: removing brick (null) on port 49153 To this end, - gf_time_fmt() now inserts timezone offset via %z strftime(3) template. - A new utility function has been added, gf_time_fmt_tv(), that takes a struct timeval pointer (tv) instead of a time_t value to specify the time. If tv->tv_usec is negative, gf_time_fmt_tv(... tv ...) is equivalent to gf_time_fmt(... tv->tv_sec ...) Otherwise it also inserts tv->tv_usec to the formatted string. - Building timestamps of usec precision has been converted to gf_time_fmt_tv, which is necessary because the method of appending a period and the usec value to the end of the timestamp does not work if the timestamp has zone offset, but it's also beneficial in terms of eliminating repetition. - The buffer passed to gf_time_fmt/gf_time_fmt_tv has been unified to be of GF_TIMESTR_SIZE size (256). We need slightly larger buffer space to accommodate the zone offset and it's preferable to use a buffer which is undisputedly large enough. This change does not* do the following: - Retaining a method of timestamp creation without timezone offset. As to my understanding we don't need such backward compatibility as the code just emits timestamps to logs and other diagnostic texts, and doesn't do any later processing on them that would rely on their format. An exception to this, ie. a case where timestamp is built for internal use, is graph.c:fill_uuid(). As far as I can see, what matters in that case is the uniqueness of the produced string, not the format. - Implementing a single-token (space free) timestamp format. While some timestamp formats used to be single-token, now all of them will include a space preceding the offset indicator. Again, I did not see a use case where this could be significant in terms of representation. - Moving the codebase to a single unified timestamp format and dropping the fmt argument of gf_time_fmt/gf_time_fmt_tv. While the gf_timefmt_FT format is almost ubiquitous, there are a few cases where different formats are used. I'm not convinced there is any reason to not use gf_timefmt_FT in those cases too, but I did not want to make a decision in this regard. Change-Id: I0af73ab5d490cca7ed8d07a2ce7ac22a6df2920a Updates: #837 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	features/shard: Use fd lookup post file open	Vinayakswami Hariharmath	2020-06-11	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue: When a process has the open fd and the same file is unlinked in middle of the operations, then file based lookup fails with ENOENT or stale file Solution: When the file already open and fd is available, use fstat to get the file attributes Change-Id: I0e83aee9f11b616dcfe13769ebfcda6742e4e0f4 Fixes: #1281 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
*	test: Test case brick-mux-validation-in-cluster.t is failing on RHEL-8	Mohit Agrawal	2020-06-09	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Brick process are not properly attached on any cluster node while some volume options are changed on peer node and glusterd is down on that specific node. Solution: At the time of restart glusterd it got a friend update request from a peer node if peer node having some changes on volume.If the brick process is started before received a friend update request in that case brick_mux behavior is not workingproperly. All bricks are attached to the same process even volumes options are not the same. To avoid the issue introduce an atomic flag volpeerupdate and update the value while glusterd has received a friend update request from peer for a specific volume.If volpeerupdate flag is 1 volume is started by glusterd_import_friend_volume synctask Change-Id: I4c026f1e7807ded249153670e6967a2be8d22cb7 Credit: Sanju Rakaonde <srakonde@redhat.com> fixes: #1290 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	cluster/afr: Delay post-op for fsync	Pranith Kumar K	2020-06-08	3	-0/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: AFR doesn't delay post-op for fsync fop. For fsync heavy workloads this leads to un-necessary fxattrop/finodelk for every fsync leading to bad performance. Fix: Have delayed post-op for fsync. Add special flag in xdata to indicate that afr shouldn't delay post-op in cases where either the process will terminate or graph-switch would happen. Otherwise it leads to un-necessary heals when the graph-switch/process-termination happens before delayed-post-op completes. Fixes: #1253 Change-Id: I531940d13269a111c49e0510d49514dc169f4577 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Prioritize ENOSPC over other errors	karthik-us	2020-06-05	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a replicate/arbiter volume if file creations or writes fails on quorum number of bricks and on one brick it is due to ENOSPC and on other brick it fails for a different reason, it may fail with errors other than ENOSPC in some cases. Fix: Prioritize ENOSPC over other lesser priority errors and do not set op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any error_no which can be misinterpreted by __afr_dir_write_finalize(). Also removing the function afr_has_arbiter_fop_cbk_quorum() which might consider a successful reply form a single brick as quorum success in some cases, whereas we always need fop to be successful on quorum number of bricks in arbiter configuration. Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08 Fixes: #1254 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	open-behind: rewrite of internal logic	Xavi Hernandez	2020-06-04	5	-0/+872
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a critical flaw in the previous implementation of open-behind. When an open is done in the background, it's necessary to take a reference on the fd_t object because once we "fake" the open answer, the fd could be destroyed. However as long as there's a reference, the release function won't be called. So, if the application closes the file descriptor without having actually opened it, there will always remain at least 1 reference, causing a leak. To avoid this problem, the previous implementation didn't take a reference on the fd_t, so there were races where the fd could be destroyed while it was still in use. To fix this, I've implemented a new xlator cbk that gets called from fuse when the application closes a file descriptor. The whole logic of handling background opens have been simplified and it's more efficient now. Only if the fop needs to be delayed until an open completes, a stub is created. Otherwise no memory allocations are needed. Correctly handling the close request while the open is still pending has added a bit of complexity, but overall normal operation is simpler. Change-Id: I6376a5491368e0e1c283cc452849032636261592 Fixes: #1225 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	io-cache,quick-read: deprecate volume options with flawed semantics or naming	Csaba Henk	2020-06-02	4	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- performance.cache-size has a flawed semantics, as it's dispatched on two independent translators, io-cache and quick-read. - performance.qr-cache-timeout has a confusing name, as other options affecting quick-read have an unabbreviated "quick-read-..." prefix in their names. We keep these options with unchanged operation, but in the help output we indicate their deprecation. The following better alternatives are introduced: - performance.io-cache-size to tune cache-size option of io-cache - performance.quick-read-cache-size to tune cache-size option of quick-read - performance.quick-read-cache-timeout as a preferred synonym for performance.qr-cache-timeout Fixes: #952 Change-Id: Ibd04fb638de8cac450ba992ad8a415154f9f4281 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	dht - sparse files rebalance enhancements	Barak Sason Rofman	2020-06-01	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently data migration in rebalance reads sparse file sequentially, disregarding which segments are holes and which are data. This can lead to extremely long migration time for large sparse file. Data migration mechanism needs to be enhanced so only data segments are read and migrated. This can be achieved using lseek to seek for holes and data in the file. This enhancement is a consequence of https://bugzilla.redhat.com/show_bug.cgi?id=1823703 fixes: #1222 Change-Id: If5f448a0c532926464e1f34f504c5c94749b08c3 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	features/shard: Aggregate file size, block-count before unwinding removexattr	Krutika Dhananjay	2020-05-26	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops. These iatts are further cached at layers like md-cache. Shard translator, in its current state, simply returns these values without updating the aggregated file size and block-count. This patch fixes this problem. Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872 Fixes: #1243 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	features/shard: Aggregate size, block-count in iatt before unwinding setxattr	Krutika Dhananjay	2020-05-21	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops. These iatts are further cached at layers like md-cache. Shard translator, in its current state, simply returns these values without updating the aggregated file size and block-count. This patch fixes this problem. Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea Fixes: #1243 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	mgmt/glusterd: Stop old shd before increasing replica count	Pranith Kumar K	2020-05-16	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In add-brick that increases replica count SHD was restarted after pending xattrs are set on the new bricks and adding bricks. But before restarting SHD there is a possibility that old SHD would do a scan on root-directory see no heal is needed and delete index for root-dir leading to no heals until lookup is executed on the mount Fix: Stop shd, perform pending-xattr setting/adding new bricks and then restart shd Fixes: #1240 Change-Id: I94fd7c6c909211b597185dfe097a559db6c0d00f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: Disable client-heals	Pranith Kumar K	2020-05-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: ok 32 [ 11/ 9] < 46> 'gf_rm_file_and_gfid_link /d/backends/patchy0 del-file' not ok 33 [ 13/ 131] < 48> '! dd if=/dev/zero of=/mnt/glusterfs/0/del-file bs=1M count=1 oflag=direct' -> '' The assumption in the test above is that the file wouldn't exist when dd happens. But heal can lead to creation of the file in some cases leading to spurious failures. Fix: Disable client side heal. Fixes: #1245 Change-Id: I96b2b45528f9dfb3199d503a467cafafba9b387f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	Fix ./tests/basic/fencing/afr-lock-heal-basic.t failure	Pranith Kumar K	2020-05-13	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In brick-mux tests, all bricks of the volume have same pid. "generate_brick_statedump" cleans up the older statedumps with same brick pid. So successive calls to this function will delete previous brick's statedump as all bricks share same pid. So grep calls to the statedump were failing leading to failure of the .t To fix this, stored the result we need from statedump before calling next brick's statedump Fixes: #1234 Change-Id: I824ed4dff79e7242b3e980364836b9af0e87a6ee Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: skip tests on absence of reflink in xfs	Pranith Kumar K	2020-05-06	1	-7/+9
\| \| \| \| \| \|	Fixes: #1223 Change-Id: I36cb72d920ffd77405051546615c5262c392daef Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	test: improve geo-rep non root test	Sunny Kumar	2020-05-02	2	-9/+23
\| \| \| \| \| \| \| \| \|	Make sure bricks are up before mounting volume. Also made sure that mount is available before creating test data. Change-Id: I4915b837df46e43be5678dac8ae5602021c52685 Updates: #1197 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	tests: fix georep-upgrade.t failure	Sunny Kumar	2020-04-29	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes georep-upgrade.t test failure. Problem: `TEST upgrade_script=$(find / -type f -name glusterfs-georep-upgrade.py)` Multiple files with the same name can exist for temp fix picking only the 1st result. The proper fix should be finding a proper place for this upgrade script and use that. Change-Id: I8b388e30a30bc4a9a2f392bed42ceee7e8bc250a Updates: #1209 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	tests: Fix bug-1101647.t test case failure	karthik-us	2020-04-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: tests/bugs/replicate/bug-1101647.t test case fails sporadically in the volume heal since connection to the bricks with shd was not being checked before running the index heal. Build link: https://build.gluster.org/job/regression-test-burn-in/5007/ Fix: Check for the connection status of the bricks with shd before performing the index heal. Change-Id: Ie7060f379b63bef39fd4f9804f6e22e0a25680c1 Updates: #1154 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	extras: upgrade script for geo-rep	Shwetha K Acharya	2020-04-27	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch https://review.gluster.org/#/c/glusterfs/+/23733/( which optimizes the changelog) introduces change in dirctory structure which is above changelog files. Thus, before upgrade, old version should get updated, with respect to the corresponding changes made by the above qouted patch. This upgrade script, 1) creates a temp htime file, with updated paths from the htime file. Updates temp htime file as htime file. 2) places the changelog files under the required directory structure. Updates: #154 Change-Id: I4b5a6cb9a9266a65972b419b329bc958de8fdf8a Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	afr: event gen changes	Ravishankar N	2020-04-24	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The general idea of the changes is to prevent resetting event generation to zero in the inode ctx, since event gen is something that should follow 'causal order'. Change #1: For a read txn, in inode refresh cbk, if event_generation is found zero, we are failing the read fop. This is not needed because change in event gen is only a marker for the next inode refresh to happen and should not be taken into account by the current read txn. Change #2: The event gen being zero above can happen if there is a racing lookup, which resets even get (in afr_lookup_done) if there are non zero afr xattrs. The resetting is done only to trigger an inode refresh and a possible client side heal on the next lookup. That can be acheived by setting the need_refresh flag in the inode ctx. So replaced all occurences of resetting even gen to zero with a call to afr_inode_need_refresh_set(). Change #3: In both lookup and discover path, we are doing an inode refresh which is not required since all 3 essentially do the same thing- update the inode ctx with the good/bad copies from the brick replies. Inode refresh also triggers background heals, but I think it is okay to do it when we call refresh during the read and write txns and not in the lookup path. The .ts which relied on inode refresh in lookup path to trigger heals are now changed to do read txn so that inode refresh and the heal happens. Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec Fixes: #1179 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Erik Jacobson <erik.jacobson at hpe.com>
*	tests: Fix spurious failure of tests/basic/quick-read-with-upcall.t	Pranith Kumar K	2020-04-22	1	-10/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The test is failing at 14:56:41 ok 13, LINENUM:38 14:56:41 not ok 14 Got "test-message0" instead of "test-message1", LINENUM:41 14:56:41 FAILED COMMAND: test-message1 cat /mnt/glusterfs/1/test.txt This happens because fuse sometimes doesn't send 'read' fop to glusterfs and is served from cache. Fix: Mount with direct-io-mode=yes so that read is always received by gluster Fixes: #1190 Change-Id: I369e2024a85dc492dc24c7579b161fb965f55d19 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: Fix for spurious failure for some test cases	Mohit Agrawal	2020-04-16	5	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Sometimes test case is failing at the time of creating files on mount point after mounting the volume Solution: After started the volume need to wait to make sure all bricks instances are completely started so put a online_brick_count check after just started the volume Change-Id: I5020e7e417539377277ca00189f9c51d2cf877a6 Fixes: #1162 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests: do not truncate file offsets and sizes to 32-bit	Dmitry Antipov	2020-04-15	4	-9/+13
\| \| \| \| \| \| \| \| \| \|	Do not truncate file offsets and sizes to 32-bit to prevent tests from spurious failures on >2Gb files. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Change-Id: I2a77ea5f9f415249b23035eecf07129f19194ac2 Fixes: #1161
*	tests: Fix spurious failure of worm.t	Rinku Kothiya	2020-04-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When the output of date command is a single digit number it is preceded by zero which is getting considered as an octal number. Removing the leading zero from the number solved the problem. Fixes: #1156 Change-Id: Iac4fa20607c0bb90d94dd8ff157ef6b60932c560 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
*	test: Fix test "bug-1064148" to pass in mux	Barak Sason Rofman	2020-04-13	1	-10/+11
\| \| \| \| \| \| \| \|	Parts of the test weren't designed to run in mux mode, this is now fixed Change-Id: I428c2fcce6d047e324ca5dcaef677ee1794e3dfe updates: #1154 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	tests: Fix spurious failure of ↵	Mohit Agrawal	2020-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	tests/bugs/glusterd/serialize-shd-manager-glusterd-restart.t Problem: Sometime volume status is failed after restart glusterd in one cluster node Solution: Wait to finish glusterd handshake on down cluster node Change-Id: Ib23ca41c943caf2903c61ebf42dc437c1b9d6054 Fixes: #1158 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	Adding basic glusterfind test case	Shwetha K Acharya	2020-04-09	2	-1/+85
\| \| \| \| \| \| \| \|	This test case includes all the basic glusterfind scenarios. fixes: #1044 Change-Id: I6021443729e35769fe855c5cc41bb3fbc6365ef0 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	posix: Avoid dict_del logs in posix_is_layout_stale while key is NULL	Mohit Agrawal	2020-04-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Problem: The key "GF_PREOP_PARENT_KEY" has been populated by dht and for non-distribute volume like 1x3 key is not populated so posix_is_layout stale throw a message while a file is created Solution: To avoid a log put a condition before delete a key Change-Id: I813ee7960633e7f9f5e9ad2f42f288053d9eb71f Fixes: #1150 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	mount/fuse: Wait for 'mount' child to exit before dying	Pranith Kumar K	2020-04-09	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: tests/bugs/protocol/bug-1433815-auth-allow.t fails sometimes because of stale mount. This stale mount comes into picture when parent process dies without waiting for the child process which mounts fuse fs to die Fix: Wait for mounting child process to die before dying. Fixes: #1152 Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	dht - fixing a permission update issue	Barak Sason Rofman	2020-04-08	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When bringing back a downed brick and performing lookup from the client side, the permission on said brick aren't updated on the first lookup, but only on the second. This patch modifies permission update logic so the first lookup will trigger a permission update on the downed brick. LIMITATIONS OF THE PATCH: As the choice of source depends on whether the directory has layout or not. Even the directories on the newly added brick will have layout xattr[zeroed], but the same is not true for a root directory. Hence, in case in the entire cluster only the newly added bricks are up [and others are down], then any change in permission during this time will be overwritten by the older permissions when the cluster is restarted. fixes: #999 Change-Id: Ieb70246d41e59f9cae9f70bc203627a433dfbd33 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	tests: Fix spurious failure of tests/bugs/snapshot/bug-1111041.t	Pranith Kumar K	2020-04-07	1	-4/+6
\| \| \| \| \| \| \| \| \|	Test should wait for process down notification to be received by glusterd. Fixes: #1153 Change-Id: I9162b58a92c1a909ca98097f14c0714f9086bdd1 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	Posix: Optimize posix code to improve file creation	Mohit Agrawal	2020-04-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Before executing a fop in POSIX xlator it builds an internal path based on GFID.To validate the path it call's (l)stat system call and while .glusterfs is heavily loaded kernel takes time to lookup inode and due to that performance drops Solution: In this patch we followed two ways to improve the performance. 1) Keep open fd specific to first level directory(gfid[0]) in .glusterfs, it would force to kernel keep the inodes from all those files in cache. In case of memory pressure kernel won't uncache first level inodes. We need to open 256 fd's per brick to access the entry faster. 2) Use at based call's to access relative path to reduce path based lookup time. Note: To verify the patch we have executed kernel untar 100 times on 6 different clients after enabling metadata group-cache and some other option.We were getting more than 20 percent improvement in kenel untar after applying the patch. Credits: Xavi Hernandez <xhernandez@redhat.com> Change-Id: I1643e6b01ed669b2bb148d02f4e6a8e08da45343 updates: #891 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	cluster/ec: Add test for reset-brick command	Ashish Pandey	2020-04-06	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \|	Following tests are done - 1 - After finishing reset-brick all the bricks should be up. 2 - Heal should be completed. 3 - Check number of entries present on brick which was reset. Change-Id: I9314bed180293a99d400d94bb8cc7ece999da29e Updates: #1144
*	fuse: Add error-logs to debug bug-1433815-auth-allow.t failures	Pranith Kumar K	2020-04-06	1	-0/+1
\| \| \| \| \| \|	Fixes: #1149 Change-Id: I38483fc7d76d7fe0ac9fb649669a46bdf9c82234 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	write-behind: fix data corruption	Xavi Hernandez	2020-04-03	2	-0/+307
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a bug in write-behind that allowed a previous completed write to overwrite the overlapping region of data from a future write. Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are sequential, and W3 writes at the same offset of W2: W2.offset = W3.offset = W1.offset + W1.size Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes. So W3 should always overwrite the overlapping part of W2. Suppose write-behind processes the requests from 2 concurrent threads: Thread 1 Thread 2 <received W1> <received W2> wb_enqueue_tempted(W1) /* W1 is assigned gen X / wb_enqueue_tempted(W2) / W2 is assigned gen X / wb_process_queue() __wb_preprocess_winds() / W1 and W2 are sequential and all * other requisites are met to merge * both requests. / __wb_collapse_small_writes(W1, W2) __wb_fulfill_request(W2) __wb_pick_unwinds() -> W2 / In this case, since the request is * already fulfilled, wb_inode->gen * is not updated. / wb_do_unwinds() STACK_UNWIND(W2) / The application has received the * result of W2, so it can send W3. / <received W3> wb_enqueue_tempted(W3) / W3 is assigned gen X / wb_process_queue() / Here we have W1 (which contains * the conflicting W2) and W3 with * same gen, so they are interpreted * as concurrent writes that do not * conflict. / __wb_pick_winds() -> W3 wb_do_winds() STACK_WIND(W3) wb_process_queue() / Eventually W1 will be * ready to be sent / __wb_pick_winds() -> W1 __wb_pick_unwinds() -> W1 / Here wb_inode->gen is * incremented. */ wb_do_unwinds() STACK_UNWIND(W1) wb_do_winds() STACK_WIND(W1) So, as we can see, W3 is sent before W1, which shouldn't happen. The problem is that wb_inode->gen is only incremented for requests that have not been fulfilled but, after a merge, the request is marked as fulfilled even though it has not been sent to the brick. This allows that future requests are assigned to the same generation, which could be internally reordered. Solution: Increment wb_inode->gen before any unwind, even if it's for a fulfilled request. Special thanks to Stefan Ring for writing a reproducer that has been crucial to identify the issue. Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: #884
*	features/utime: Don't access frame after stack-wind	Pranith Kumar K	2020-04-03	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: frame is accessed after stack-wind. This can lead to crash if the cbk frees the frame. Fix: Use new frame for the wind instead. Updates: #832 Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	snap_scheduler: python3 compatibility and new test case	Sunny Kumar	2020-04-03	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: "snap_scheduler.py init" command failing with the below traceback: [root@dhcp43-104 ~]# snap_scheduler.py init Traceback (most recent call last): File "/usr/sbin/snap_scheduler.py", line 941, in <module> sys.exit(main(sys.argv[1:])) File "/usr/sbin/snap_scheduler.py", line 851, in main initLogger() File "/usr/sbin/snap_scheduler.py", line 153, in initLogger logfile = os.path.join(process.stdout.read()[:-1], SCRIPT_NAME + ".log") File "/usr/lib64/python3.6/posixpath.py", line 94, in join genericpath._check_arg_types('join', a, *p) File "/usr/lib64/python3.6/genericpath.py", line 151, in _check_arg_types raise TypeError("Can't mix strings and bytes in path components") from None TypeError: Can't mix strings and bytes in path components Solution: Added the 'universal_newlines' flag to Popen to support backward compatibility. Added a basic test for snapshot scheduler. Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e Fixes: #1134 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	afr: mark pending xattrs as a part of metadata heal	Ravishankar N	2020-04-02	1	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...if pending xattrs are zero for all children. Problem: If there are no pending xattrs and a metadata heal needs to be performed, it can be possible that we end up with xattrs inadvertendly deleted from all bricks, as explained in the BZ. Fix: After picking one among the sources as the good copy, mark pending xattrs on all sources to blame the sinks. Now even if this metadata heal fails midway, a subsequent heal will still choose one of the valid sources that it picked previously. Fixes: #1067 Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	Posix: Use simple approach to close fd	Mohit Agrawal	2020-03-20	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: posix_release(dir) functions add the fd's into a ctx->janitor_fds and janitor thread closes the fd's.In brick_mux environment it is difficult to handle race condition in janitor threads because brick spawns a single janitor thread for all bricks. Solution: Use synctask to execute posix_release(dir) functions instead of using background a thread to close fds. Credits: Pranith Karampuri <pkarampu@redhat.com> Change-Id: Iffb031f0695a7da83d5a2f6bac8863dad225317e Fixes: bz#1811631 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	WORM-Xlator: Initial write of a file succeeds if auto-commit-period 0	David Spisla	2020-03-17	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If worm-file-level enabled and auto-commit-period 0 an initial write of a file (e.g. $ echo test >> file1.txt) would lead to an zero byte file because the WORM xlator immediately WORMed the file when it was created. To avoid this we move the setting of trusted.worm_file from worm_create_cbk to worm_release . This means that this xattr will set when the filehandle is closed and all initial WRITE FOPs succeed. Finally we also perform gf_worm_state_transition in worm_release to ensure that the file will be immediately WORMed after the file handle was closed. Change-Id: I5d02e18975b646ca1a27ed41d836e9d0dc333204 Fixes: bz#1808421 Signed-off-by: David Spisla <david.spisla@iternity.com>
*	tests: fix afr-lock-heal-* failure	Ravishankar N	2020-03-16	2	-12/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When brick-mux is enabled: i)brick statedumps seem to be listing the same lock information multiple times. While that is getting fixed, make changes to the .ts to check for unique values. ii)detecting a brick as online via brick_up_status() seems to be taking longer time when delaygen is enabled. Hence bump up PROCESS_UP_TIMEOUT to 90 for afr-lock-heal-advanced.t Updates: #1042 Change-Id: Ife76008f7a99dd1f1fe5791a32577366baaab4b3 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	cluster/afr: Fixes for halo	Pranith Kumar K	2020-03-13	3	-1/+72
\| \| \| \| \| \| \| \| \| \| \|	Current implementation assumes that ping-event will come after connect event but that may not be the case in the cases where after socket connection fds need to be re-opened which would consume more time. So handle any order of the ping/child-up events. fixes: bz#1800583 Change-Id: I6bcdc0caa503bdc039ef2b4739fbf4afae121f05 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	mgmt/glusterd: Adding validation for statedump path	yatipadia	2020-03-09	1	-5/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Description of problem: server.statedump-path is the path where statedumps are stored, by default it is /var/run/gluster. And can be set to any valid directory path. It was observed that server.statedump-path was also accepting file, non-existent file and non-existent paths as well. And statedump command was successful even when statedumps with all the invalid paths. a. A file b. A non-existent path Solution: Added a validation function in gluster-volume-set.c which will allow volume set to success if it's a valid directory and in all other cases, volume set should fail. Fixes: bz#1787122 Change-Id: Ia66e2b3d35f23efc5444c829928779a79d827b42 Signed-off-by: yatipadia <ypadia@redhat.com> Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd: Brick process fails to come up with brickmux on	Vishal Pandey	2020-02-20	1	-1/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue: 1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1, vol2, vol3 with 3 bricks (one from each node) 2- Set cluster.brick-multiplex on 3- Start all 3 volumes 4- Check if all bricks on a node are running on same port 5- Kill N1 6- Set performance.readdir-ahead for volumes vol1, vol2, vol3 7- Bring N1 up and check volume status 8- All bricks processes not running on N1. Root Cause - Since, There is a diff in volfile versions in N1 as compared to N2 and N3 therefore glusterd_import_friend_volume() is called. glusterd_import_friend_volume() copies the new_volinfo and deletes old_volinfo and then calls glusterd_start_bricks(). glusterd_start_bricks() looks for the volfiles and sends an rpc request to glusterfs_handle_attach(). Now, since the volinfo has been deleted by glusterd_delete_stale_volume() from priv->volumes list before glusterd_start_bricks() and glusterd_create_volfiles_and_notify_services() and glusterd_list_add_order is called after glusterd_start_bricks(), therefore the attach RPC req gets an empty volfile path and that causes the brick to crash. Fix- Call glusterd_list_add_order() and glusterd_create_volfiles_and_notify_services before glusterd_start_bricks() cal is made in glusterd_import_friend_volume Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa Fixes: bz#1773856 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	Updating gluster manual.	Rishubh Jain	2020-02-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Adding disperse-data to gluster manual under volume create command Change-Id: Ic9eb47c9e71a1d7a11af9394c615c8e90f8d1d69 Fixes: bz#1668239 Signed-off-by: Rishubh Jain <risjain@redhat.com> Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	afr: prevent spurious entry heals leading to gfid split-brain	Ravishankar N	2020-02-18	2	-14/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a hyperconverged setup with granular-entry-heal enabled, if a file is recreated while one of the bricks is down, and an index heal is triggered (with the brick still down), entry-self heal was doing a spurious heal with just the 2 good bricks. It was doing a post-op leading to removal of the filename from .glusterfs/indices/entry-changes as well as erroneous setting of afr xattrs on the parent. When the brick came up, the xattrs were cleared, resulting in the renamed file not getting healed and leading to gfid split-brain and EIO on the mount. Fix: Proceed with entry heal only when shd can connect to all bricks of the replica, just like in data and metadata heal. fixes: bz#1801624 Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e Signed-off-by: Ravishankar N <ravishankar@redhat.com>