summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* storage/posix: fix initialization warning reported with clang-10Dmitry Antipov2020-04-301-8/+4
| | | | | | | | | | | | | | | | | xlators/storage/posix/src/posix-common.c:1440:18: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] .validate = GF_OPT_VALIDATE_MAX, ^~~~~~~~~~~~~~~~~~~ xlators/storage/posix/src/posix-common.c:1439:18: note: previous initialization is here .validate = GF_OPT_VALIDATE_MIN, ^~~~~~~~~~~~~~~~~~~ [4 times] Use GF_OPT_VALIDATE_BOTH for min/max-bounded values. Fixes: #1208 Change-Id: I073a27d23176f3b4a126f2eb50c079374a11418d Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* core: Updated the GD_OP_VERSIONRinku Kothiya2020-04-291-1/+3
| | | | | | | fixes: #1204 Change-Id: Ied5d4d553771ff315ed3f1a7229f96733fe7ed00 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* tests: fix georep-upgrade.t failureSunny Kumar2020-04-291-1/+4
| | | | | | | | | | | | | | This patch fixes georep-upgrade.t test failure. Problem: `TEST upgrade_script=$(find / -type f -name glusterfs-georep-upgrade.py)` Multiple files with the same name can exist for temp fix picking only the 1st result. The proper fix should be finding a proper place for this upgrade script and use that. Change-Id: I8b388e30a30bc4a9a2f392bed42ceee7e8bc250a Updates: #1209 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* github/stale-bot: close issues automaticallyAmar Tumballi2020-04-292-0/+27
| | | | | | Updates: #824 Change-Id: I18e0a93c39e82bba74cc163457344fa6ce4d7fbb Signed-off-by: Amar Tumballi <amar@kadalu.io>
* tests: Fix bug-1101647.t test case failurekarthik-us2020-04-291-0/+2
| | | | | | | | | | | | | | Problem: tests/bugs/replicate/bug-1101647.t test case fails sporadically in the volume heal since connection to the bricks with shd was not being checked before running the index heal. Build link: https://build.gluster.org/job/regression-test-burn-in/5007/ Fix: Check for the connection status of the bricks with shd before performing the index heal. Change-Id: Ie7060f379b63bef39fd4f9804f6e22e0a25680c1 Updates: #1154 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* rfc.sh: remove reference to bugzillaAmar Tumballi2020-04-291-36/+18
| | | | | | Updates: #824 Change-Id: I2eb1b43f01ca0e705110bd5d28f204065e611d3e Signed-off-by: Amar Tumballi <amar@kadalu.io>
* Github template: release tracker template addedHari Gowtham2020-04-292-4/+18
| | | | | | | | Have made minor changes to issue template to fix the nits Change-Id: Iadaee1289a98427c15ea03dd0a7c77a8e3ac3f7f Updates: #824 Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
* dht xlator: integer handling issuenik-redhat2020-04-294-11/+19
| | | | | | | | | | | | | | Issue: The ret value is passed to the function instead of the proper errno value Fix: Passing the errno generated to the log function CID: 1415824 : Improper use of negative value CID: 1420205 : Improper use of negative value Change-Id: Iaa7407ebd03eda46a2c027695e6bf0f598b371b2 Updates: #1060 Signed-off-by: nik-redhat <nladha@redhat.com>
* dht: Handle setxattr and rm race for directory in rebalanceSusant Palai2020-04-283-0/+33
| | | | | | | | | | | | | | Problem: Selfheal as part of directory does not return an error if the layout setxattr fails. This is because the actual lookup fop must have been successful to proceed for layout heal. Hence, we could not tell if fix-layout failed in rebalance. Solution: We can check this information in the layout structure that whether all the xlators have returned error. fixes: #1200 Change-Id: I3e5f2a36c0d934c21476a73a9a5473d8e490cde7 Signed-off-by: Susant Palai <spalai@redhat.com>
* snapshot: fix python3 issue in gcronSunny Kumar2020-04-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | `$gcron.py test_vol Job` Traceback: File "/usr/sbin/gcron.py", line 189, in <module> main() File "/usr/sbin/gcron.py", line 121, in main initLogger(script_name) File "/usr/sbin/gcron.py", line 44, in initLogger logfile = os.path.join(out.strip(), script_name[:-3]+".log") File "/usr/lib64/python3.6/posixpath.py", line 94, in join genericpath._check_arg_types('join', a, *p) File "/usr/lib64/python3.6/genericpath.py", line 151, in _check_arg_types raise TypeError("Can't mix strings and bytes in path components") from None TypeError: Can't mix strings and bytes in path components Solution: Added the 'universal_newlines' flag to Popen. Change-Id: I4c7a0e5bce605e4c134f6786c9dd8162b89fc77f Fixes: #1193 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* socket: Resolve ssl_ctx leak for a brick while only mgmt SSL is enabledMohit Agrawal2020-04-281-2/+2
| | | | | | | | | | | | | Problem: While only mgmt SSL is enabled for a brick process use_ssl flag is false for a brick process and socket api's cleanup ssl_ctx only while use_ssl and ssl_ctx both are valid Solution: To avoid a leak check only ssl_ctx, if it is valid cleanup ssl_ctx Fixes: #1196 Change-Id: I2f4295478f4149dcb7d608ea78ee5104f28812c3 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* build: geo-rep requires relevant selinux permission for rsyncSunny Kumar2020-04-281-0/+15
| | | | | | | | | | | If selinux is set in enforcing mode geo-rep goes into faulty state. To avoid this from happening some relevant selinux booleans need to be set in 'on' state to allow rsync operation. Change-Id: Ia8ce530d6548c2a545f4c99c600f5aac2bbb3363 Fixes: #1182 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* extras: upgrade script for geo-repShwetha K Acharya2020-04-272-0/+153
| | | | | | | | | | | | | | | | | The patch https://review.gluster.org/#/c/glusterfs/+/23733/( which optimizes the changelog) introduces change in dirctory structure which is above changelog files. Thus, before upgrade, old version should get updated, with respect to the corresponding changes made by the above qouted patch. This upgrade script, 1) creates a temp htime file, with updated paths from the htime file. Updates temp htime file as htime file. 2) places the changelog files under the required directory structure. Updates: #154 Change-Id: I4b5a6cb9a9266a65972b419b329bc958de8fdf8a Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* afr: event gen changesRavishankar N2020-04-244-82/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general idea of the changes is to prevent resetting event generation to zero in the inode ctx, since event gen is something that should follow 'causal order'. Change #1: For a read txn, in inode refresh cbk, if event_generation is found zero, we are failing the read fop. This is not needed because change in event gen is only a marker for the next inode refresh to happen and should not be taken into account by the current read txn. Change #2: The event gen being zero above can happen if there is a racing lookup, which resets even get (in afr_lookup_done) if there are non zero afr xattrs. The resetting is done only to trigger an inode refresh and a possible client side heal on the next lookup. That can be acheived by setting the need_refresh flag in the inode ctx. So replaced all occurences of resetting even gen to zero with a call to afr_inode_need_refresh_set(). Change #3: In both lookup and discover path, we are doing an inode refresh which is not required since all 3 essentially do the same thing- update the inode ctx with the good/bad copies from the brick replies. Inode refresh also triggers background heals, but I think it is okay to do it when we call refresh during the read and write txns and not in the lookup path. The .ts which relied on inode refresh in lookup path to trigger heals are now changed to do read txn so that inode refresh and the heal happens. Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec Fixes: #1179 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Erik Jacobson <erik.jacobson at hpe.com>
* dht - fixing rebalance failures for files with holesBarak Sason Rofman2020-04-241-11/+10
| | | | | | | | | | | Rebalance process handling of files which contains holes casued rebalance to fail with "No space left on device" errors. This patch modifies the code-flow in such a way that files with holes will be rebalanced correctly. fixes: #1187 Change-Id: I89bc3d4ea7f074db7213d759c49307f379543932 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* md-cache: fix several NULL dereferencesXavi Hernandez2020-04-231-66/+129
| | | | | | | | | | | | | | This patch includes the following CID from Coverity Scan: * 1425196 * 1425197 * 1425198 * 1425199 * 1525200 Change-Id: Iddcfea449d3dd56d4dfcc39f4c3c608518e611e4 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Updates: #1060
* tests: Fix spurious failure of tests/basic/quick-read-with-upcall.tPranith Kumar K2020-04-221-10/+3
| | | | | | | | | | | | | | | | | | | Problem: The test is failing at 14:56:41 ok 13, LINENUM:38 14:56:41 not ok 14 Got "test-message0" instead of "test-message1", LINENUM:41 14:56:41 FAILED COMMAND: test-message1 cat /mnt/glusterfs/1/test.txt This happens because fuse sometimes doesn't send 'read' fop to glusterfs and is served from cache. Fix: Mount with direct-io-mode=yes so that read is always received by gluster Fixes: #1190 Change-Id: I369e2024a85dc492dc24c7579b161fb965f55d19 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* dht/rebalance - fixing recursive failure issueBarak Sason Rofman2020-04-211-1/+2
| | | | | | | | | | | If rebalance process is failing, recursive failures appear in the log file, which is distracting from the root cause. In order to avoid recursive failure, error handling mechanism has been modified. fixes: #1072 Change-Id: Iae19430323630acd97c2c8d35685626d8da747a7 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* dht - Remove "tier" code (part 1)v9devBarak Sason Rofman2020-04-173-476/+19
| | | | | | | | | | | | | This patch is removing some of the "tier" code in dht xlator, as it is no longer being used. Not all of the not-needed code is removed at once, so reviewing is easier. Follow up patches removing additional unused code will follow. This is based in the work done in https://review.gluster.org/#/c/glusterfs/+/23935/ Change-Id: I3cb6a0c5d8f14afcd87cf021ef8f74b91c0f908a updates: #1097 Signed-off-by: Barak Sason Rofman <bsaonro@redhat.com>
* tests: Fix for spurious failure for some test casesMohit Agrawal2020-04-165-1/+6
| | | | | | | | | | | | | Problem: Sometimes test case is failing at the time of creating files on mount point after mounting the volume Solution: After started the volume need to wait to make sure all bricks instances are completely started so put a online_brick_count check after just started the volume Change-Id: I5020e7e417539377277ca00189f9c51d2cf877a6 Fixes: #1162 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* md-cache: avoid clearing cache when not necessaryXavi Hernandez2020-04-161-72/+93
| | | | | | | | | | | | mdc_inode_xatt_set() blindly cleared current cache when dict was not NULL, even if there was no xattr requested. This patch fixes this by only calling mdc_inode_xatt_set() when we have explicitly requested something to cache. Change-Id: Idc91a4693f1ff39f7059acde26682ccc361b947d Fixes: #1140 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* posix: fix GF_VALIDATE_OR_GOTO(this->name, this, out)Sanju Rakonde2020-04-161-3/+1
| | | | | | | | | | | | | | | Remove GF_VALIDATE_OR_GOTO(this->name, this, out) when this is passed as an argument and is checked for NULL in the caller itself. GF_VALIDATE_OR_GOTO(this->name, this, out) is modified to use xlator name instead of this->name as we are still verifying whether this is NULL. updates: #1000 Change-Id: Ide3180da29d0d4a35b2c5b9a7604fdf2ff4a9ffb Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* test: tests/bugs/rpc/bug-847624.t is crashedMohit Agrawal2020-04-152-7/+8
| | | | | | | | | | | | | Problem: glusterfs(GNFS) is crashing at the time of handling Pollerr event in rpcsvc_drc_client_unref.GNFS is crashed because ref was 0 at the time of unref and ref was taken while Pollin event successfully handled. Solution: Convert drc_client ref to atomic ref to avoid the crash Change-Id: Ia4c054f2f388032a5cd99597d0cfa18b003ca690 Fixes: #1038 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* tests: do not truncate file offsets and sizes to 32-bitDmitry Antipov2020-04-154-9/+13
| | | | | | | | | | Do not truncate file offsets and sizes to 32-bit to prevent tests from spurious failures on >2Gb files. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Change-Id: I2a77ea5f9f415249b23035eecf07129f19194ac2 Fixes: #1161
* common-ha: cluster status shows "FAILOVER" when actually HEALTHYKaleb S. KEITHLEY2020-04-141-1/+1
| | | | | | | | | | | | | | | | | | | pacemaker devs change the format of the ouput of `pcs status` Expected to find a line in the format: Online: .... but now it's * Online: ... And the `grep -E "^Online:" no longer finds the list of nodes that are online. Change-Id: If2aa1e7b53c766c625d7b4cc222a83ea2c0bd72d Fixes: #1169 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* tests: Fix spurious failure of worm.tRinku Kothiya2020-04-131-2/+2
| | | | | | | | | | | | When the output of date command is a single digit number it is preceded by zero which is getting considered as an octal number. Removing the leading zero from the number solved the problem. Fixes: #1156 Change-Id: Iac4fa20607c0bb90d94dd8ff157ef6b60932c560 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* mgmt/glusterd: use stat() syscall wrapperDmitry Antipov2020-04-131-2/+3
| | | | | | | | | | | | | | | Found with 0-symbol-check.t: ./tests/basic/0symbol-check.t .. 1..2 ./xlators/mgmt/glusterd/src/.libs/glusterd_la-glusterd-volume-set.o should call sys_stat, not stat ok 1 [ 40/ 41011] < 40> 'find . -name *.o -exec ./tests/basic/symbol-check.sh {} \;' not ok 2 [ 11/ 1] < 42> '[ ! -e ./.symbol-check-errors ]' -> '' Failed 1/2 subtests Change-Id: I8962f487cd88738a1f7a962049d513712687088c Fixes: #1160 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* test: Fix test "bug-1064148" to pass in muxBarak Sason Rofman2020-04-131-10/+11
| | | | | | | | Parts of the test weren't designed to run in mux mode, this is now fixed Change-Id: I428c2fcce6d047e324ca5dcaef677ee1794e3dfe updates: #1154 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* tests: Fix spurious failure of ↵Mohit Agrawal2020-04-131-1/+1
| | | | | | | | | | | | | tests/bugs/glusterd/serialize-shd-manager-glusterd-restart.t Problem: Sometime volume status is failed after restart glusterd in one cluster node Solution: Wait to finish glusterd handshake on down cluster node Change-Id: Ib23ca41c943caf2903c61ebf42dc437c1b9d6054 Fixes: #1158 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* protocol/server: Fix coverity issue RESOURCE_LEAKSheetal Pamecha2020-04-101-0/+1
| | | | | | | | | Handle case of arg not freed CID: 1422174 Updates: #1060 Change-Id: Ibd03908a3ea8369035c2b7f6e024b3e5be48f436 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* Adding basic glusterfind test caseShwetha K Acharya2020-04-092-1/+85
| | | | | | | | This test case includes all the basic glusterfind scenarios. fixes: #1044 Change-Id: I6021443729e35769fe855c5cc41bb3fbc6365ef0 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* posix: Avoid dict_del logs in posix_is_layout_stale while key is NULLMohit Agrawal2020-04-092-4/+5
| | | | | | | | | | | | Problem: The key "GF_PREOP_PARENT_KEY" has been populated by dht and for non-distribute volume like 1x3 key is not populated so posix_is_layout stale throw a message while a file is created Solution: To avoid a log put a condition before delete a key Change-Id: I813ee7960633e7f9f5e9ad2f42f288053d9eb71f Fixes: #1150 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* mount/fuse: Wait for 'mount' child to exit before dyingPranith Kumar K2020-04-092-1/+27
| | | | | | | | | | | | | | | | Problem: tests/bugs/protocol/bug-1433815-auth-allow.t fails sometimes because of stale mount. This stale mount comes into picture when parent process dies without waiting for the child process which mounts fuse fs to die Fix: Wait for mounting child process to die before dying. Fixes: #1152 Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* [geo-rep] Merging multiple import into single importkshithijiyer2020-04-089-38/+39
| | | | | | | | | | | | | | | | | | Geo-replication has a large number of repeated imports as shown below: ``` from syncdutils import set_term_handler, finalize, lf from syncdutils import log_raise_exception, FreeObject, escape ``` There imports can be clubbed together as shown below: `` from syncdutils import (set_term_handler, finalize, lf, log_raise_exception, FreeObject, escape) ``` Fixes: #1105 Change-Id: I59a48dd57a70fc851d93150b85e736ce41e8b793 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* dht - fixing a permission update issueBarak Sason Rofman2020-04-084-18/+110
| | | | | | | | | | | | | | | | | | When bringing back a downed brick and performing lookup from the client side, the permission on said brick aren't updated on the first lookup, but only on the second. This patch modifies permission update logic so the first lookup will trigger a permission update on the downed brick. LIMITATIONS OF THE PATCH: As the choice of source depends on whether the directory has layout or not. Even the directories on the newly added brick will have layout xattr[zeroed], but the same is not true for a root directory. Hence, in case in the entire cluster only the newly added bricks are up [and others are down], then any change in permission during this time will be overwritten by the older permissions when the cluster is restarted. fixes: #999 Change-Id: Ieb70246d41e59f9cae9f70bc203627a433dfbd33 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* tests: Fix spurious failure of tests/bugs/snapshot/bug-1111041.tPranith Kumar K2020-04-071-4/+6
| | | | | | | | | Test should wait for process down notification to be received by glusterd. Fixes: #1153 Change-Id: I9162b58a92c1a909ca98097f14c0714f9086bdd1 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* gfapi: Suspend synctasks instead of blocking themSoumya Koduri2020-04-073-2/+50
| | | | | | | | | | | | | | | | | | | | | | | | There are certain conditions which blocks the current execution thread (like waiting on mutex lock or condition variable or I/O response). In such cases, if it is a synctask thread, we should suspend the task instead of blocking it (like done in SYNCOP using synctask_yield) This is to avoid deadlock like the one mentioned below - 1) synctaskA sets fs->migration_in_progress to 1 and does I/O (LOOKUP) 2) Other synctask threads wait for fs->migration_in_progress to be reset to 0 by synctaskA and hence blocked 3) but synctaskA cannot resume as all synctask threads are blocked on (2). Note: this same approach is already used by few other components like syncbarrier etc. Change-Id: If90f870d663bb242c702a5b86ac52eeda67c6f0d Fixes: #1146 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* cluster/afr: Removing unsupported options from code base to improve coveragekarthik-us2020-04-073-23/+4
| | | | | | | | | | | | | | | | Support for gluster volume heal <volname> info healed/heal-failed was removed by commit bb02cfb56ae08f56df4452c2b948fa962ae1212b in release-3.6. cli parser will display the usage message in all the supported versions whenever these clis are run, leading to some dead code in the latest branches. Since support for these clis were removed long back, this should not give any backward compatibility issues as well. Hence removing the dead code from the code base which will lead to better code coverage by the regression runs as well. Updates: #1052 Change-Id: I0c2b061469caf233c06d9699b0d159ce48e240b9 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* Posix: Optimize posix code to improve file creationMohit Agrawal2020-04-069-75/+255
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Before executing a fop in POSIX xlator it builds an internal path based on GFID.To validate the path it call's (l)stat system call and while .glusterfs is heavily loaded kernel takes time to lookup inode and due to that performance drops Solution: In this patch we followed two ways to improve the performance. 1) Keep open fd specific to first level directory(gfid[0]) in .glusterfs, it would force to kernel keep the inodes from all those files in cache. In case of memory pressure kernel won't uncache first level inodes. We need to open 256 fd's per brick to access the entry faster. 2) Use at based call's to access relative path to reduce path based lookup time. Note: To verify the patch we have executed kernel untar 100 times on 6 different clients after enabling metadata group-cache and some other option.We were getting more than 20 percent improvement in kenel untar after applying the patch. Credits: Xavi Hernandez <xhernandez@redhat.com> Change-Id: I1643e6b01ed669b2bb148d02f4e6a8e08da45343 updates: #891 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* cluster/ec: Add test for reset-brick commandAshish Pandey2020-04-061-0/+50
| | | | | | | | | | | Following tests are done - 1 - After finishing reset-brick all the bricks should be up. 2 - Heal should be completed. 3 - Check number of entries present on brick which was reset. Change-Id: I9314bed180293a99d400d94bb8cc7ece999da29e Updates: #1144
* mgmt/glusterd: Reduce log level of repetitive logVijay Bellur2020-04-061-1/+1
| | | | | | | | | | | | | | | Noticed that the following message repeats quite a bit in log files when an external monitoring tool queries gluster for list of volumes periodically: "Received get vol req" As there's not much value in having this log message at log level INFO, changing the log level to DEBUG to make glusterd.log a bit quieter. Change-Id: I4e791fc65b9a4f813d295e7b2b6a05f3c0782e69 Updates: #1000 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* fuse: Add error-logs to debug bug-1433815-auth-allow.t failuresPranith Kumar K2020-04-062-0/+7
| | | | | | Fixes: #1149 Change-Id: I38483fc7d76d7fe0ac9fb649669a46bdf9c82234 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* storage/posix: log the ENOENT errors in posix_pstatRaghavendra Bhat2020-04-041-0/+5
| | | | | | Change-Id: I93f11dae6e4939ab79b0481ead2a4f7bb3085b70 Fixes: #1142 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* write-behind: fix data corruptionXavi Hernandez2020-04-033-2/+309
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a bug in write-behind that allowed a previous completed write to overwrite the overlapping region of data from a future write. Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are sequential, and W3 writes at the same offset of W2: W2.offset = W3.offset = W1.offset + W1.size Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes. So W3 should *always* overwrite the overlapping part of W2. Suppose write-behind processes the requests from 2 concurrent threads: Thread 1 Thread 2 <received W1> <received W2> wb_enqueue_tempted(W1) /* W1 is assigned gen X */ wb_enqueue_tempted(W2) /* W2 is assigned gen X */ wb_process_queue() __wb_preprocess_winds() /* W1 and W2 are sequential and all * other requisites are met to merge * both requests. */ __wb_collapse_small_writes(W1, W2) __wb_fulfill_request(W2) __wb_pick_unwinds() -> W2 /* In this case, since the request is * already fulfilled, wb_inode->gen * is not updated. */ wb_do_unwinds() STACK_UNWIND(W2) /* The application has received the * result of W2, so it can send W3. */ <received W3> wb_enqueue_tempted(W3) /* W3 is assigned gen X */ wb_process_queue() /* Here we have W1 (which contains * the conflicting W2) and W3 with * same gen, so they are interpreted * as concurrent writes that do not * conflict. */ __wb_pick_winds() -> W3 wb_do_winds() STACK_WIND(W3) wb_process_queue() /* Eventually W1 will be * ready to be sent */ __wb_pick_winds() -> W1 __wb_pick_unwinds() -> W1 /* Here wb_inode->gen is * incremented. */ wb_do_unwinds() STACK_UNWIND(W1) wb_do_winds() STACK_WIND(W1) So, as we can see, W3 is sent before W1, which shouldn't happen. The problem is that wb_inode->gen is only incremented for requests that have not been fulfilled but, after a merge, the request is marked as fulfilled even though it has not been sent to the brick. This allows that future requests are assigned to the same generation, which could be internally reordered. Solution: Increment wb_inode->gen before any unwind, even if it's for a fulfilled request. Special thanks to Stefan Ring for writing a reproducer that has been crucial to identify the issue. Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: #884
* features/utime: Don't access frame after stack-windPranith Kumar K2020-04-032-15/+52
| | | | | | | | | | | | | Problem: frame is accessed after stack-wind. This can lead to crash if the cbk frees the frame. Fix: Use new frame for the wind instead. Updates: #832 Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* snap_scheduler: python3 compatibility and new test caseSunny Kumar2020-04-032-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: "snap_scheduler.py init" command failing with the below traceback: [root@dhcp43-104 ~]# snap_scheduler.py init Traceback (most recent call last): File "/usr/sbin/snap_scheduler.py", line 941, in <module> sys.exit(main(sys.argv[1:])) File "/usr/sbin/snap_scheduler.py", line 851, in main initLogger() File "/usr/sbin/snap_scheduler.py", line 153, in initLogger logfile = os.path.join(process.stdout.read()[:-1], SCRIPT_NAME + ".log") File "/usr/lib64/python3.6/posixpath.py", line 94, in join genericpath._check_arg_types('join', a, *p) File "/usr/lib64/python3.6/genericpath.py", line 151, in _check_arg_types raise TypeError("Can't mix strings and bytes in path components") from None TypeError: Can't mix strings and bytes in path components Solution: Added the 'universal_newlines' flag to Popen to support backward compatibility. Added a basic test for snapshot scheduler. Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e Fixes: #1134 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* rpc: Make ssl log more usefulMohit Agrawal2020-04-021-17/+22
| | | | | | | | | | | | | | Currently, ssl_setup_connection_params throws 4 messages for every rpc connection that irritates a user while reading the logs. The same info we can print in a single log with peerinfo to make it more useful.ssl_setup_connection_params try to load dh_param even user has not configured it and if a dh_param file is not available it throws a failure message.To avoid the message load dh_param only while the user has configured it. Change-Id: I9ddb57f86a3fa3e519180cb5d88828e59fe0e487 Fixes: #1141 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* afr: mark pending xattrs as a part of metadata healRavishankar N2020-04-022-1/+120
| | | | | | | | | | | | | | | | | | | ...if pending xattrs are zero for all children. Problem: If there are no pending xattrs and a metadata heal needs to be performed, it can be possible that we end up with xattrs inadvertendly deleted from all bricks, as explained in the BZ. Fix: After picking one among the sources as the good copy, mark pending xattrs on all sources to blame the sinks. Now even if this metadata heal fails midway, a subsequent heal will still choose one of the valid sources that it picked previously. Fixes: #1067 Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* add clean local after grant lockl17zhou2020-04-011-7/+8
| | | | | | | | | found by flock test, without correct ref number of fd, lock will not be correctly released. Fixes: bz#1779089 Change-Id: I3e466b17c852eb219c8778e43af8ad670a8449cc Signed-off-by: l17zhou <cynthia.zhou@nokia-sbell.com>
* Marker: Logically deadcode found by coverityHari Gowtham2020-03-311-15/+11
| | | | | | | | Have removed the deadcode found by the coverity id:1356503 Change-Id: Ieaa41e864538fb82dc967b4a214d4db09e267098 Updates: #1060 Signed-off-by: Hari Gowtham <hgowtham@redhat.com>