summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tests: Increase timeout to 30 minutes to handle lcov slownessPranith Kumar K2018-08-161-1/+1
| | | | | | | | | | This script on a normal setup takes 15 minutes. With lcov it needs to be increased. Considering we did 1.5X of the default $run_timeout in run-tests.sh, I am doing the same for this script. fixes bz#1614718 Change-Id: Ia571b33ff13deb8cbd8e48561769e876aa0b1aff Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* contrib: Remove gf_mkostemp copied from GLibCShyamsundarR2018-08-164-114/+13
| | | | | | | | | | | | | | | | | | gf_mkostemp is borrowed from GLibC a long time back, we now have mkostemp or mkstemp alternatives with all distributions and versions that we care for. As a result removing this from the contrib directory and modifying the one instance that is still using the same. This is part of code cleanup as we cleaned up coverity SECUR_TEMP errors. Updates: bz#1193929 Change-Id: I1ad7863043cdb0845c53748f5a0522e767079130 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* tests: Fix spurious failures in stats-dump.t testShyamsundarR2018-08-161-0/+8
| | | | | | | | | | | | | | | | | The test fails to grep and find queue_size, in a brick stats dump, having succesfully found aggr.* values in the same. The troubleshot is that, the writer thread in io-stats, that dumps this in a particular interval, truncates the file just before the grep attempts to read the contents, and hence the failure. The fix is to stop the dumper thread, and then wait for a couple of seconds and then check the output, so that the thread writer does not interfere with the test. Fixes: bz#1615582 Change-Id: I29f95488a2ad693abe1dd525b1d87a9d1eee29a2 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* performance/md-cache: Use bitwise AND instead of logical ANDVijay Bellur2018-08-161-1/+1
| | | | | | | | Addresses CID: 1394640 Change-Id: I1139222301569d17760df74624acd301594063b9 updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* contrib/xxhash: update to latest xxHash (0.6.5)Yaniv Kaul2018-08-168-499/+730
| | | | | | | | | | | | | | | | | | | | | | Update to latest xxHash, which is supposed to faster with small keys. Specifically, updated to 3064d42e7d74b0921bdd1818395d9cb37bb8976a, which is a bit higher than 0.6.5. Compiled hopefully with namespace (XXH_NAMESPACE=GF_), which allows to use XXH() funcs with no fear they'll 'leak' from our library. Only compile tested! xxhsum is modified to display messages which was conflicting with regression tests (TAP harness). So modified the gfid2path_fuse.t and gfid2path_nfs.t to adhere to that. updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I35cea5cc93f338c1023ac2c9bc6d7d13225a967b
* mountbroker : fix coverity issue in glusterd-mountbroker.cSunny Kumar2018-08-151-1/+4
| | | | | | | | | Fixes CID : 1124789 updates: bz#789278 Change-Id: I61c70f05e6377d7ddc8961556274714dd356a117 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* features/changelog: Fix a resource leakVijay Bellur2018-08-151-0/+1
| | | | | | | | Fixes CID 1382359 Change-Id: Iaafbdb9a45496091327e3dc9092e09148fa9a5c5 updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* Bash integration script should namespace variablesMark Mielke2018-08-151-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the original submitted script, it looks like there was effort put into namespacing all global variables. However a few mistakes remained. GLUSTER_TOP_SUBOPTIONSx were defined, but TOP_SUBOPTIONSx were referenced. This was likely an unrecognized defect in the original code submission? These are now corrected to refer to GLUSTER_TOP_SUBOPTIONSx. FINAL_LIST, LIST, and TOP were leaked into all Bash shells and used by the command completion functions. The most problematic of these was TOP, which was declared with "-i" making it an integer. This cause other code which used TOP to define a path to fail like this: $ bash $ TOP=/abc bash: /abc: syntax error: operand expected (error token is "/abc") These are now qualified as GLUSTER_FINAL_LIST, GLUSTER_LIST, and GLUSTER_TOP to reduce impact on scripts that might choose to use these extremely common variable names. Change-Id: Ic96eda8efd1f3238bbade6c6ddb69118e8d82158 Fixes: bz#1425325 Signed-off-by: Mark Mielke <mark.mielke@gmail.com>
* glusterd: fix gcc7 warningsAmar Tumballi2018-08-143-22/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | [sh]$ gcc --version gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) Warnings were of the type below: xlators/mgmt/glusterd/src/glusterd-store.c:3285:33: warning: ‘/options’ directive output may be truncated writing 8 bytes into a region of size between 1 and 4096 [-Wformat-truncation=] snprintf (path, len, "%s/options", conf->workdir); ^~~~~~~~ xlators/mgmt/glusterd/src/glusterd-store.c:1280:39: warning: ‘/snaps/’ directive output may be truncated writing 7 bytes into a region of size between 1 and 4096 [-Wformat-truncation=] snprintf (snap_fpath, len, "%s/snaps/%s/%s", priv->workdir, ^~~~~~~ * Also changed some places where there was issues with key size * Made sure all the 'char buf[SOMESIZE] = {0,};' are changed to 'char buf[SOMESIZE] = "";` - In the files I changed * Also edited coding standard to reflect that. updates: bz#1193929 Change-Id: I04c652624ac63199cea2077e46b3a5def37c3689 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* mgmt/glusterd: Fix possible use after free in glusterd_op_ac_commit_op()Vijay Bellur2018-08-141-1/+3
| | | | | | | | Fixes CID 1391418 Change-Id: I60ce6cd3b2528369f4dc1be81c0c15a1a806982a updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: Fix buffer length to prevent a memory overrunVijay Bellur2018-08-141-2/+2
| | | | | | | | Fixes CID 1394647, 1394658 Change-Id: I30cf6e793919a08e0a3fe10622351b8316d7767c updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: remove the unused databuf in rebalance structureAmar Tumballi2018-08-141-1/+0
| | | | | | | | | While it is a one line fix, it allows a significant unwanted memory being allocated for defrag structure. Updates: bz#1193929 Change-Id: Idda70d1d3dc0e7be56c35e872aa6edfaf752290d Signed-off-by: Amar Tumballi <amarts@redhat.com>
* features/shard: Fix crash and test case in RENAME fopKrutika Dhananjay2018-08-142-42/+61
| | | | | | | | | | | | | | Setting the refresh flag in inode ctx in shard_rename_src_cbk() is applicable only when the dst file exists and is sharded and has a hard link > 1 at the time of rename. But this piece of code is exercised even when dst doesn't exist. In this case, the mount crashes because local->int_inodelk.loc.inode is NULL. Change-Id: Iaf85a5ee3dff8b01a76e11972f10f2bb9dcbd407 Updates: bz#1611692 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/dht: Fixed rebalanced filesN Balachandran2018-08-141-1/+1
| | | | | | | | | An error caused skipped files to be counted as rebalanced files. Change-Id: I02333f099fb8b73ba953f41a2922021a1e4da7be fixes: bz#1615474 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: fix inode ref management in dht_heal_pathSusant Palai2018-08-141-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In dht_heal_path, the inodes are created & looked up from top to down. If the path is "a/b/c", then lookup will be done on a, then b and so on. Here is a rough snippet of the function "dht_heal_path". <snippet> if (bname) { ref_count - loc.inode = create/grep inode 1 - syncop_lookup (loc.inode) - linked_inode = inode_link (loc.inode) 2 /*clean up current loc*/ - loc_wipe(&loc) 1 /*set up parent and bname for next child */ - loc.parent = inode - bname = next_child_name } out: - inode_ref (linked_inode) 2 - loc_wipe (&loc) 1 </snippet> The problem with the above code is if _bname_ is empty ie the chain lookup is done, then for the next iteration we populate loc.parent anyway. Now that bname is empty, the loc_wipe is done in the _out_ section as well. Since, the loc.parent was set to the previous inode, we lose a ref unwantedly. Now a dht_local_wipe as part of the DHT_STACK_UNWIND takes away the last ref leading to inode_destroy. This problenm is observed currently with nfs-ganesha with the nameless lookup. Post the inode_purge, gfapi does not get the new inode to link and hence, it links the inode it sent in the lookup fop, which does not have any dht related context (layout) leading to "invalid argument error" in lookup path done parallely with tar operation. test done in the following way: - create two nfs client connected with two different nfs servers. - run untar on one client and run lookup continuously on the other. - Prior to this patch, invalid arguement was seen which is fixed with the current patch. Change-Id: Ifb90c178a2f3c16604068c7da8fa562b877f5c61 fixes: bz#1610256 Signed-off-by: Susant Palai <spalai@redhat.com>
* mgmt/glusterd: Fix a memory leak in volgenVijay Bellur2018-08-141-0/+1
| | | | | | | | Fixes CID 1325557 Change-Id: I5e33ae19ddf4c44a49a2b3b3dea0c739bc96d3a7 updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* All: remove memset() before sprintf()Yaniv Kaul2018-08-1431-761/+96
| | | | | | | | | | | | It's not needed. There's a good chance the compiler is smart enough to remove it anyway, but it can't hurt - I hope. Compile-tested only! Change-Id: Id7c054e146ba630227affa591007803f3046416b updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* error-gen, locks: Fix a typo in commentsVijay Bellur2018-08-142-3/+3
| | | | | | | | s/coverty/coverity/ Change-Id: Iac7c13176162eace4247dd3236373aa76d906380 updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* tests: Fix for gfid-mismatch-resolution-with-fav-child-policy.t failurekarthik-us2018-08-141-0/+1
| | | | | | | | | | | | | | | | | | | This test was retried once on build https://build.gluster.org/job/regression-on-demand-multiplex/174/ (logs for the first try is not available with this build) Test case was failing in line #47 where it was was checking for the heal count to be 0. Line #51 had passed that means file got the gfid split brain resolved, and both the bricks had same gfids. At line #54 it again failed which checks for the md5sum on both the bricks. At this point the md5sum of the brick where the file got impunged had the md5sum same as the newly created empty file. This means the data heal has not happened for the file. At line #64 enabling granular-entry-heal faild, but without the logs it is not possible to debug this issue. Change-Id: I56d854dbb9e188cafedfd24a9d463603ae79bd06 fixes: bz#1615331 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* libglusterfs: Fix a resource leak in graph.cVijay Bellur2018-08-131-0/+1
| | | | | | | | Fixes CID 1382367 Change-Id: I02678fc71716ab0046ea2ef437c6594a8a34a4fc updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* cloudsync: fix -Werror=format-truncation error on gcc8Susant Palai2018-08-131-13/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is the gcc8 warning: libcloudsyncs3.c: In function ‘aws_download_s3’: libcloudsyncs3.c:480:48: error: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size 1015 [-Werror=format-truncation=] snprintf(buf, sizeof(buf), "https://%s/%s", priv->hostname, resource); libcloudsyncs3.c:480:9: note: ‘snprintf’ output 10 or more bytes (assuming 4105) into a destination of size 1024 snprintf(buf, sizeof(buf), "https://%s/%s", priv->hostname, resource); Memleak: It fixes a memleak as well where sign_req in fn: aws_form_request was not freed. Adjusted the calloc size for sign_req as well to match with the demand. Test: Have tested the local cloudsync regression test to validate the changes. Smoke validation will be sufficient for the gcc8 warning fixes. Fixes: bz#1609126 Change-Id: I1c537b30168f2e0b54862344a951843e86b0b488 Signed-off-by: Susant Palai <spalai@redhat.com>
* tests: fix brick check ordersAtin Mukherjee2018-08-139-43/+66
| | | | | | | | | | | | fix brick checks for validating-server-quorum.t & quorum-validation.t ...and make brick_up_status_1 function more generic. Also fix a timing issue in bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t Change-Id: I797ef4cec5b160aafa979bae7151b1e99fcb48ac Updates: bz#1603063 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* posix: Mark 'shared-brick-count' as settablePrashanth Pai2018-08-131-0/+1
| | | | | | updates: #302 Change-Id: I9c1b9c9751c21866b074ac5d3ef15a58ae7aa707 Signed-off-by: Prashanth Pai <ppai@redhat.com>
* Fix a grammar error in the logsNigel Babu2018-08-131-1/+1
| | | | | | Change-Id: Ie4fe18d5094c051fa20de71f7fc841085cc6aaee Fixes: bz#1614142 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* coverity: last of the secure temp fixesShyamsundarR2018-08-131-3/+1
| | | | | | | | | | | | | | | | | | Coverity ignore directive is not working if the comment is split across lines (or has an empty line at the end. This can be seen in this report: https://download.gluster.org/pub/gluster/glusterfs/static-analysis /master/glusterfs-coverity/2018-08-06-b982e09f/html/1 /384glusterfsd-mgmt.c.html#error In other places the same pattern has avoided coverity from flagging off the same call, except here. Updates: bz#789278 Change-Id: Ic35ff0fc91d0a42904630728ef7c18215aa277f3 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* tests/quick-read/bug-846240.t: fix a wrong testRaghavendra G2018-08-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Earlier this test did following things on M0 and M1 mounted on same volume: 1 create file M0/testfile 2 open an fd on M0/testfile 3 remove the file from M1, M1/testfile 4 echo "data" >> M0/testfile The test expects appending data to M0/testfile to fail. However, redirector ">>" creates a file if it doesn't exist. So, the only reason test succeeded was due to lookup succeeding due to stale stat in md-cache. This hypothesis is verified by two experiments: * Add a sleep of 10 seconds before append operation. md-cache cache expires and lookup fails followed by creation of file and hence append succeeds to new file. * set md-cache timeout to 600 seconds and test never fails even with sleep 10 before append operation. Reason is stale stat in md-cache survives sleep 10. So, the spurious nature of failure was dependent on whether lookup is done when stat is present in md-cache or not. The actual test should've been to write to the fd opened in step 2 above. I've changed the test accordingly. Note that this patch also remounts M0 after initial file creation as open-behind disables opening-behind on witnessing a setattr on the inode and touch involves a setattr. On remount, create operation is not done and hence file is opened-behind. Change-Id: I739f255e0a62ff0024f0824dad3539974955df99 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1615096
* cluster/afr: Fix bug-1433571-undo-pending-only-on-up-bricks.tkarthik-us2018-08-131-2/+2
| | | | | | | | | | | | | | Problem: The test case was checking for the entry pending marker reset on the root after performing client side lookup at line #60-63. But sometimes the entry heal was not getting completed immediately. Fix: Wait for the entry heal to complete before checking the changelog. Change-Id: I42fde21b04a126ab044ce58373a996d72f125d96 fixes: bz#1614730 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* tests: potential fixes to bugs/replicate/bug-1408712.tRavishankar N2018-08-131-2/+15
| | | | | | | | See BZ for details. Change-Id: I2cc2064f14d80271ebcc21747103ce4cee848cbf fixes: bz#1615078 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* tests: fix replace-brick-self-heal.t failureRavishankar N2018-08-131-1/+1
| | | | | | | | Please see BZ for details. Change-Id: Id9273432874bc6a452ac96b2b8c7a61ea6c5b98d Fixes: bz#1615239 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* gfapi : Handle the path == "" glfs_resolve_atJiffin Tony Thottan2018-08-131-7/+10
| | | | | | | | | | | Currently there is no check for path = "" in glfs_resolve_at. So if application sends an empty path, then the function resolves into the parent inode which is incorrect. Plus modified possible of "path" with "origpath" in the same function. Change-Id: Ie5ff9ce4b771607b7dbb3fe00704fe670421792a fixes: bz#1610236 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* performance/quick-read: handle rollover of generation counterRaghavendra G2018-08-132-36/+108
| | | | | | Change-Id: I37a6e0efda430b70d03dd431c35bef23b3d16361 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* tests: fix tests/bugs/shard/configure-lru-limit.tAtin Mukherjee2018-08-131-0/+4
| | | | | | | | Check for the bricks to be up before attempting to mount. Change-Id: I1224908137016df3007f4467aa9760967ce0694d Fixes: bz#1615092 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests: potential fixes for tests/basic/afr/add-brick-self-heal.tRavishankar N2018-08-131-0/+7
| | | | | | | | Please see bug description for details. Change-Id: Ieb6bce6d1d5c4c31f1878dd1a1c3d007d8ff81d5 fixes: bz#1614654 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* geo-rep: Fix deadlock during worker startKotresh HR2018-08-132-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Analysis: Monitor process spawns monitor threads (one per brick). Each monitor thread, forks worker and agent processes. Each monitor thread, while intializing, updates the monitor status file. It is synchronized using flock. The race is that, some thread can fork worker while other thread opened the status file resulting in holding the reference of fd in worker process. Cause: flock gets unlocked either by specifically unlocking it or by closing all duplicate fds referring to the file. The code was relying on fd close, hence a reference in worker/agent process by fork could cause the deadlock. Fix: 1. flock is unlocked specifically. 2. Also made sure to update status file in approriate places so that the reference is not leaked to worker/agent process. With this fix, both the deadlock and possible fd leaks is solved. fixes: bz#1614799 Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d Signed-off-by: Kotresh HR <khiremat@redhat.com>
* glusterd: compare friend data within mutexAtin Mukherjee2018-08-133-41/+48
| | | | | | | | | | | | | | | | | | | | | | | During friend handshake if the glusterd receives more than one friend updates, it might very well become possible that two threads would end up working on two different volinfo references and glusterd might end up updating the store with a old volinfo reference. While debugging glusterd crash from validating-server-quorum.t test file from the line-coverage regression the same was observed. Solution is to run glusterd_compare_friend_data under a mutex. Test: As the crash was more visible in the line-coverage run (given lcov does some instrumentation and exposes the races), 6 manual lcov runs were triggered starting from https://build.gluster.org/job/line-coverage/443 to https://build.gluster.org/job/line-coverage/449/ and no crash was observed from validating-server-quorum.t Change-Id: I86fce473a76fd24742d51bf17a685d28b90a8941 Fixes: bz#1603063 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests: Fix cleanup routine for some mux testsShyamsundarR2018-08-134-10/+7
| | | | | | | | | | | | | | | | | Some of the mux tests, set a trap to catch test exit and call cleanup. This will cause cleanup to be invoked twice in case the test times out, or even otherwise, as include.rc also sets a trap to cleanup on exit (TERM and others). This leads to the tarballs generated on failures for these tests to be empty and does not aid debugging. This patch corrects this pattern across the tests to the more standard cleanup at the end. Fixes: bz#1615037 Change-Id: Ib83aeb09fac2aa591b390b9fb9e1f605bfef9a8b Signed-off-by: ShyamsundarR <srangana@redhat.com>
* Make sure EXPECT_WITHIN executes the statement multiple timesPranith Kumar K2018-08-122-6/+14
| | | | | | | | | | | | | When we pass a command to be executed in EXPECT_WITHIN and we use `` the value is passed by value, so if the first execution gives a result that is different from the expected value, EXPECT_WITHIN test will fail because the command will not be re-evaluated. Changed the expression with `` to a function. Added sleep(3) in afr.c for reconfigure to both RC and re-test after the change. fixes bz#1614662 Change-Id: I3bc8a75b996729261aa48067f6ed8da9c6273b13 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* glusterd: Compare volume_id before start/attach a brickMohit Agrawal2018-08-102-24/+32
| | | | | | | | | | | | | | Problem: After reboot a node brick is not coming up because fsid comparison is failed before start a brick Solution: Instead of comparing fsid compare volume_id to resolve the same because fsid is changed after reboot a node but volume_id persist as a xattr on brick_root path at the time of creating a volume. Change-Id: Ic289aab1b4ebfd83bbcae8438fee26ae61a0fff4 fixes: bz#1612418 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* core: Update condition in get_xlator_by_name_or_typeMohit Agrawal2018-08-101-1/+1
| | | | | | | | | | | | | | | | Problem: Sometimes client connection is failed after throwing error "cleanup flag is set for xlator Try again later". The situation comes only after get a detach request but the brick stack is not completely detached and at the same time the client initiates a connection with brick Solution: To resolve the same check cleanup_starting flag in get xlator_by_name_or_type, this function call by server_setvolume to attach a client with brick. Change-Id: I3720e42642fe495dd05211e2ed2cb976db9e231b fixes: bz#1614124 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* tests: kill_brick should wait for brick status to become offlineAtin Mukherjee2018-08-101-10/+10
| | | | | | Change-Id: I52e8eec7f334af37de433c444f4ddfc876fa56cc Fixes: bz#1614088 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests: Add ability to preserve older tarball for retried testsShyamsundarR2018-08-091-0/+39
| | | | | | | | | | | | | | | | | | | When a test is retried, the cleanup directives overwrite the older tarball with the latest one, thus losing the logs from the failed run. This patch changes run-tests.sh to rename the older tarball when retrying a test, thus preserving the same. The tarball is renamed using a time stamp and optionally a trailing sequence number, in case the test fails within the very second. Although the sequence # is not strictly required as we retry only once, it provides a defence for any future enhancements to the same. Fixes: bz#1614062 Change-Id: I9afe486b0b6f6a26f2ad0642e38bc0ba15b3ecc9 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* tests: Set heal-timeout to 5 secondsPranith Kumar K2018-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Shd keeps doing heals in a loop until it heals at least one entry in the previous run. A heal is termed successful only if it heals both metadata and entry/data heal i.e. the entry needs to be completely healed by just that healer. In tests/basic/afr/granular-esh/replace-brick.t test, brick-0 is old and brick-1 is new. After replace-brick only root-gfid will be present in brick-0's index 1) shd-thread corresponding to brick-0 does metadata heal, this creates root-gfid in brick-0's 'dirty' index. 2) Both healer threads corresponding to brick-0 and brick-1 now try to heal root-gfid and brick-1 gets the heal-domain lock. brick-0's shd-thread will experience a failure and it goes back to waiting for 10 minutes (cluster.heal-timeout). 3) When brick-1's healer-thread completes healing root-gfid it creates 5 files which create indices in brick-0, so until brick-0 doesn't trigger one more heal, heal won't happen. $HEAL_TIMEOUT is set at 120 seconds, which is lesser than cluster.heal-timeout, so decreasing this to 5 seconds so that the next heal is triggered which will do the heals. fixes bz#1613807 Change-Id: I881133fc28880d8615fbc4558a0dfa0dc63d7798 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* tests: Increase timeout for mpx restart crash testShyamsundarR2018-08-091-3/+6
| | | | | | | | | | | | | | | | | | | | In lcov based regression testing environments, all tests take more time than what occurs in centos7 regressions. Possibly due to code instrumentation for lcov purposes. Due to this the test, bug-1432542-mpx-restart-crash.t constantly times out. This patch increases the timeout for the same to enable lcov tests to pass on a more regular basis. It was also noted by Nithya that the test at times generated an OOM kill on the regression machines. In order to reduce runtime memory foot print of the tests, FUSE mounts are unmounted as soon as the required test is complete. Fixes: bz#1608568 Change-Id: I37f8d4b45807a69c52c7c7df4923c0fc33fab4e4 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* glusterd: more stricter checks of if brick is running in multiplex modeAtin Mukherjee2018-08-091-32/+39
| | | | | | | | | | | | | | | | | | | | | | | While gf_attach () utility can help in detaching a brick instance from the brick process which the kill_brick () function in tests/volume.rc uses it has a caveat which is as follows: 1. It doesn't ensure the respective brick is marked as stopped which glusterd does from glusterd_brick_stop 2. Sometimes if kill_brick () is executed just after a brick stack is up, the mgmt_rpc_notify () can take some time before marking priv->connected to 1 and before it if kill_brick () is executed, brick will fail to initiate the pmap_signout which would inturn cleans up the pidfile. To avoid such possibilities, a more stricter check on if a brick is running or not in brick multiplexing has been brought in now where it not only checks for its pid's existance but checks if the respective process has the brick instance associated with it before checking for brick's status. Change-Id: I98b92df949076663b9686add7aab4ec2f24ad5ab Fixes: bz#1595320 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests/bitrot: Fix tests/bitrot/bug-1373520.tKotresh HR2018-08-091-4/+13
| | | | | | | | | | | The test was failing with brick-mux enabled intermittently. As the test depends on lookup to recover file via heal, it's advisable to disable all perf xlators. Hence doing the same. fixes: bz#1611566 Change-Id: Ib7705e7951d53c435b8e390298164d73c6d71745 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* MAINTAINERS: Add Xavier Hernandez as peer for shard xlatorKrutika Dhananjay2018-08-071-0/+1
| | | | | | | | | | | | Shard module never had a peer, although Pranith reviewed most of the patches. Over the past few months, Xavier has reviewed shard patches - both big and small - and also found some great bugs in his reviews of some complex patches. Proposing that we add him as peer for shard translator. Change-Id: I29487052673f3738340764aa63bdd7586fb28def fixes: bz#1612017 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* tests: Add timeout option to run-tests.shShyamsundarR2018-08-061-1/+2
| | | | | | | | | | | Added a '-t' timeout option to run-tests.sh, to be able to set this to higher than the default 200 in case of lcov based tests, as those take more time due to instrumentations added by lcov. Change-Id: Ibaf70e881bfa94f35e822124bcf9849b309e7cc1 Updates: bz#1608564 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* performance/quick-read: don't update with stale data after invalidationRaghavendra G2018-08-042-44/+233
| | | | | | | | | | | | Once invalidated, make sure that only ops incident after invalidation update the cache. This makes sure that ops before invalidation don't repopulate cache with stale data. This patch also uses an internal counter instead of frame->root->unique for keeping track of generations. Change-Id: I6b38b141985283bd54b287775f3ec67b88bf6cb8 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* tests: fix online_brick_count functionAtin Mukherjee2018-08-031-1/+4
| | | | | | | | online_brick_count should discard Bitrot and Scrubber daemon. Change-Id: I301373ccdbeec1d1a5e6c6b137f48ed997f22556 Fixes: bz#1611103 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* posix: prevent crash when SEEK_DATA/HOLE is not supportedXavi Hernandez2018-08-032-4/+4
| | | | | | | | | Instead of not defining the 'seek' fop when it's not supported on the compilation platform, we simply return EINVAL when it's used. Fixes: bz#1611834 Change-Id: I253666d8910c5e2fffa3a3ba37085e5c1c058a8e Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>