summaryrefslogtreecommitdiffstats
path: root/tests/bugs/replicate
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: Undo pending xattrs only on the up brickskarthik-us2017-03-271-0/+89
| | | | | | | | | | | | | | | | | | | | | Problem: While doing conservative merge, even if a brick is down, it will reset the pending xattr on that. When that brick comes up, as part of the heal, it will consider this brick as the source and removes the entries on the other bricks, which leads to data loss. Fix: Undo pending only for the bricks which are up. Change-Id: I18436fa0bb1faa5f60531b357dea3f6b20446303 BUG: 1433571 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/16913 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* afr: all children of AFR must be up to resolve s-brainRavishankar N2017-02-091-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | Problem: The various split-brain resolution policies (favorite-child-policy based, CLI based and mount (get/setfattr) based) attempt to resolve split-brain even when not all bricks of replica are up. This can be a problem when say in a replica 3, the only good copy is down and the other 2 bricks are up and blame each other (i.e. split-brain). We end up healing the file in such a case and allow I/O on it. Fix: A decision on whether the file is in split-brain or not must be taken only if we are able to examine the afr xattrs of *all* bricks of a given replica. Change-Id: Icddb1268b380005799990f5379ef957d84639ef9 BUG: 1417522 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16476 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tier : Tier as a servicehari gowtham2017-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tierd is implemented by separating from rebalance process. The commands affected: 1) Attach tier will trigger this process instead of old one 2) tier start and tier start force will also trigger this process. 3) volume status [tier] will show tier daemon as a process instead of task and normal tier status and tier detach status works. 4) tier stop implemented. 5) detach tier implemented separately along with new detach tier status 6) volume tier volname status will work using the changes. 7) volume set works This patch has separated the tier translator from the legacy DHT rebalance code. It now sends the RPCs from the CLI to glusterd separate to the DHT rebalance code. The daemon is now a service, similar to the snapshot daemon, and can be viewed using the volume status command. The code for the validation and commit phase are the same as the earlier tier validation code in DHT rebalance. The “brickop” phase has been changed so that the status command can use this framework. The service management framework is now used. DHT rebalance does not use this framework. This service framework takes care of : *) spawning the daemon, killing it and other such processes. *) volume set options , which are written on the volfile. *) restart and reconfigure functions. Restart is to restart the daemon at two points 1)after gluster goes down and comes up. 2) to stop detach tier. *) reconfigure is used to make immediate volfile changes. By doing this, we don’t restart the daemon. it has the code to rewrite the volfile for topological changes too (which comes into place during add and remove brick). With this patch the log, pid, and volfile are separated and put into respective directories. Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399 BUG: 1313838 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13365 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/afr: Fix missing name indices due to EEXIST errorKrutika Dhananjay2016-12-271-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: Consider a volume with granular-entry-heal and sharding enabled. When a replica is down and a shard is created as part of a write, the name index is correctly created under indices/entry-changes/<dot-shard-gfid>. Now when a read on the same region triggers another MKNOD, the fop fails on the online bricks with EEXIST. By virtue of this being a symmetric error, the failed_subvols[] array is reset to all zeroes. Because of this, before post-op, the GF_XATTROP_ENTRY_OUT_KEY will be set, causing the name index, which was created in the previous MKNOD operation, to be wrongly deleted in THIS MKNOD operation. FIX: The ideal fix would have been for a transaction to delete the name index ONLY if it knows it is the one that created the index in the first place. This would involve gathering information as to whether THIS xattrop created the index from individual bricks, aggregating their responses and based on the various posisble combinations of responses, decide whether to delete the index or not. This is rather complex. Simpler fix would be for post-op to examine local->op_ret in the event of no failed_subvols to figure out whether to delete the name index or not. This can occasionally lead to creation of stale name indices but they won't be affecting the IO path or mess with pending changelogs in any way and self-heal in its crawl of "entry-changes" directory would take care to delete such indices. Change-Id: Ic1b5257f4dc9c20cb740a866b9598cf785a1affa BUG: 1408712 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16286 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* tests: Fix spurious failure in tests/bugs/replicate/bug-1402730.tKrutika Dhananjay2016-12-211-2/+2
| | | | | | | | | | | | | | | Replace the EXPECT '00000001' with EXPECT_NOT '00000000'. This is because occasionally a name-heal is performing new-entry marking on 'c' causing the pending entry changelog on it to become '00000002'. Change-Id: I30916e6266534d18899cfa5771c892db8c51ad9a BUG: 1405902 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16193 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix per-txn optimistic changelog initialisationKrutika Dhananjay2016-12-121-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Incorrect initialisation of local->optimistic_change_log was leading to skipped pre-op and post-op even when a brick didn't participate in the txn because it was down. The result - missing granular name index resulting in some entries never getting healed. FIX: Initialise local->optimistic_change_log just before pre-op. Also fixed granular entry heal to create the granular name index in pre-op as opposed to post-op. This is to prevent loss of granular information when during an entry txn, the good (src) brick goes offline before the post-op is done. This would cause self-heal to do conservative merge (since dirty xattr is the only information available), which when granular-entry-heal is enabled, expects granular indices, the lack of which can lead to loss of data in the worst case. Change-Id: Ia3ad716d6fb1821555f02180e86e8711a79f958d BUG: 1402730 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16075 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* afr: allow I/O when favorite-child-policy is enabledRavishankar N2016-11-271-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently, I/O on a split-brained file fails even when the favorite-child-policy is set until the self-heal is complete. Fix: If a valid 'source' is found using the set favorite-child-policy, inspect and reset the afr pending xattrs on the 'sinks' (inside appropriate locks), refresh the inode and then proceed with the read or write transaction. The resetting itself happens in the self-heal code and hence can also happen in the client side background-heal or by the shd's index-heal in addition to the txn code path explained above. When it happens in via heal, we also add checks in undo-pending to not reset the sink xattrs again. Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626 BUG: 1386188 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin@ubisoft.com> Reviewed-on: http://review.gluster.org/15673 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: "gluster v heal test statistics heal-count replica" output is not ↵Mohit Agrawal2016-09-261-0/+25
| | | | | | | | | | | | | | | | | | | | correct Problem : "gluster v heal test statistcs heal-count replica" does not show correct output. Solution: After update condition (match brick name) in _select_hxlator_with_matching_brick, it shows correct output. BUG: 1325792 Change-Id: I60cc7c68ea70bce267a747570f91dcddbc1d9016 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15494 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd : Introduce reset brickAnuradha Talur2016-08-291-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command basically allows replace brick with src and dst bricks as same. Usage: gluster v reset-brick <volname> <hostname:brick-path> start This command kills the brick to be reset. Once this command is run, admin can do other manual operations that they need to do, like configuring some options for the brick. Once this is done, resetting the brick can be continued with the following options. gluster v reset-brick <vname> <hostname:brick> <hostname:brick> commit {force} Does the job of resetting the brick. 'force' option should be used when the brick already contains volinfo id. Problem: On doing a disk-replacement of a brick in a replicate volume the following 2 scenarios may occur : a) there is a chance that reads are served from this replaced-disk brick, which leads to empty reads. b) potential data loss if next writes succeed only on replaced brick, and heal is done to other bricks from this one. Solution: After disk-replacement, make sure that reset-brick command is run for that brick so that pending markers are set for the brick and it is not chosen as source for reads and heal. But, as of now replace-brick for the same brick-path is not allowed. In order to fix the above mentioned problem, same brick-path replace-brick is needed. With this patch reset-brick commit {force} will be allowed even when source and destination <hostname:brickpath> are identical as long as 1) destination brick is not alive 2) source and destination brick have the same brick uuid and path. Also, the destination brick after replace-brick will use the same port as the source brick. Change-Id: I440b9e892ffb781ea4b8563688c3f85c7a7c89de BUG: 1266876 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12250 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tests: change EXPECT_WITHIN timeoutsAnuradha Talur2016-08-281-1/+1
| | | | | | | | | | | | | | | | | Use defined HEAL and PROCESS_UP timeouts rather than hard code them in self-heald.t. Change-Id: I21586811904c8417b7208bb643f14dff20dc4832 BUG: 1370074 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/15316 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Prevent split-brain when bricks are brought off and on in ↵Krutika Dhananjay2016-08-221-0/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cyclic order When the bricks are brought offline and then online in cyclic order while writes are in progress on a file, thanks to inode refresh in write txns, AFR will mostly fail the write attempt when the only good copy is offline. However, there is still a remote possibility that the file will run into split-brain if the brick that has the lone good copy goes offline *after* the inode refresh but *before* the write txn completes (I call it in-flight split-brain in the patch for ease of reference), requiring intervention from admin to resolve the split-brain before the IO can resume normally on the file. To get around this, the patch does the following things: i) retains the dirty xattrs on the file ii) avoids marking the last of the good copies as bad (or accused) in case it is the one to go down during the course of a write. iii) fails that particular write with the appropriate errno. This way, we still have one good copy left despite the split-brain situation which when it is back online, will be chosen as source to do the heal. Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a BUG: 1363721 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15080 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: Convert volume to replica after adding brick self heal is not ↵Mohit Agrawal2016-08-111-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | triggered Problem: After add brick to a distribute volume to convert to replica is not triggering self heal. Solution: Modify the condition in brick_graph_add_index to set trusted.afr.dirty attribute in xlator. Test : To verify the patch followd below steps 1) Create a single node volume gluster volume create <DIS> <IP:/dist1/brick1> 2) Start volume and create mount point mount -t glusterfs <IP>:/DIS /mnt 3) Touch some file and write some data on file 4) Add another brick along with replica 2 gluster volume add-brick DIS replica 2 <IP>:/dist2/brick2 5) Before apply the patch file size is 0 bytes in mount point. BUG: 1365455 Change-Id: Ief0ccbf98ea21b53d0e27edef177db6cabb3397f Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15118 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* io-threads: remove least-rate-limit option and codeJeff Darcy2016-07-281-53/+0
| | | | | | | | | | | | | | This will be unnecessary, and mostly in the way, as real fairness guarantees are implemented. Change-Id: Ic61ec1c9e9add58385f1a4eafcfe2cc554ceefc8 BUG: 1360402 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/14989 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: Gluster Build System <jenkins@build.gluster.org>
* afr: some coverity fixesRavishankar N2016-07-261-1/+1
| | | | | | | | | | | | | | | Thanks to Krutika for a cleaner way to track inode refs in afr_set_split_brain_choice(). Change-Id: I2d968d05b815ad764b7e3f8aa9ad95a792b3c1df BUG: 1355604 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/14895 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* afr:Don't wind reads for files in metadata split-brainRavishankar N2016-06-242-1/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: For a read on a file in metadata split-brain: 1.lookup_done resets event_generation to zero. 2. readv is issued, goes to inode refresh due to mismatching event_gen. 3. After refresh is successful, we update event_generation, data and metdata readable. 3. We then call afr_read_txn_refresh_done() which in turn calls afr_inode_get_readable() but doesn't check for EIO. So afr_readv_wind is called with local->readable (which is populated with data_readable), thus winding the read to a brick. 4. Also, further parallel reads that come directly go to the wind path because there is no inode_refresh needed. Fix: 1.For any afr_read_txn(), readable must be an intersection of data and metadata readable. 2.Check for EIO in afr_read_txn_refresh_done(). Change-Id: I22dd221fdfaf96d7aced2f474e28ed1337d69f0e BUG: 1305031 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13389 Reviewed-by: Ashish Pandey <aspandey@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Consider ENOSPC and EDQUOT as symmetric errorsRavishankar N2016-06-121-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Since commit 8eaa3506ead4f11b81b146a9e56575c79f3aad7b, in replica 3, if a brick is down and a create fails on the other 2 brick with EDQUOT, we consider it an unsymmetric error and hence do not do post-op. So the dirty xattr remains set on the parent dir, leading to conservative merges during heal when all bricks are up. i.e. a file deleted on the source might re-appear after heal. Fix: Consider ENOSPC and EDQUOT as symmetric errors since there is no possibility of partial inode or entry modification operations possible when quota is enabled. IOW, if quota reports EDQUOT, the no. of bytes written (or not written) will be the same on all bricks of the replica. Likewise, the entry operation (create, mkdir...) will either succeed or not succeed on all bricks. Change-Id: Iacb1108e9ef4a918e36242fb4a957455133744e9 BUG: 1341650 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/14604 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/afr adding test case for http://review.gluster.org/#/c/14553/Jiffin Tony Thottan2016-05-311-0/+43
| | | | | | | | | | | | | | Change-Id: I23865343021ae65a36f6abc74d6bd594efd9dc7e BUG: 1340623 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/14561 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Do not inode_link in afrPranith Kumar K2016-05-201-1/+2
| | | | | | | | | | | | | | | | | | Race is explained at https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0 This patch also handles performing of self-heal with shd-pid. Also performs the healing with this->itable's inode rather than main itable. BUG: 1337405 Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14422 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr : Do post-op in case of symmetric errorsAnuradha Talur2016-05-131-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In afr_changelog_post_op_now(), if there was any error, meaning op_ret < 0, post-op was not being done even when the errors were symmetric and there were no "failed subvols". Fix: When the errors are symmetric, perform post-op. How was the bug found : In a 1 X 3 volume with shard and write behind on when writes were done into a file with one brick down, the trusted.afr.dirty xattr's value for .shard directory would keep increasing as post op was not done but pre-op was. This incorrectly showed .shard to be in split-brain. RCA: When WB is on, due to multiple writes being sent on offset lying in the same shard, chances are that same shard file will be created more than once with the second one failing with op_ret < 0 and op_errno = EEXIST. As op_ret was negative, afr wouldn't do post-op, leading to no decrement of trusted.afr.dirty xattr. Thus showing .shard directory to be in split-brain. Change-Id: I711bdeaa1397244e6a7790e96f0c84501798fc59 BUG: 1335652 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/14310 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* glusterd: default value of nfs.disable, change from false to trueKaleb S KEITHLEY2016-04-272-0/+2
| | | | | | | | | | | | | | | | | | Next step in eventual deprecation of glusterfs nfs server in favor of ganesha.nfsd. Also replace several open-coded strings with constant. Change-Id: If52f5e880191a14fd38e69b70a32b0300dd93a50 BUG: 1092414 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13738 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/ec: Rebalance hangs during renameAshish Pandey2016-03-301-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During the rename of a particular file (ec is holding blocking inodelk on the parent directory), if the rename of another file under the same directory comes. EC does not release the lock and goes ahead and renames the "new" file with the "already held lock". That causes rebalance process to be blocked on a lock which has been acquired by rename. Solution: While rename fop comes, ec takes blocking inodelk on old and new parent of the file. Before releasing, every lock held by ec, it waits for some "time" to see if that lock can be reused by the next fop. If within this "time" some other request comes, it releases this lock based on condition "lock count > 1" To get this "lock count" for rename fop, we have implemented "pl_rename" in feature/lock. Also, on ec side, changed the condition to release the lock based on the type of fop and old and new parent directories. Change-Id: I979dbab1185df962e8f305a6074ae1186ffe7db0 Bug: 1304988 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13460 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/ec: Provide an option to enable/disable eager lockAshish Pandey2016-03-156-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If a fop takes lock, and completes its operation, it waits for 1 second before releasing the lock. However, If ec find any lock contention within this time period, it release the lock immediately before time expires. As we take lock on first brick, for few operations, like read, it might happen that discovery of lock contention might take long time and can degrades the performance. Solution: Provide an option to enable/disable eager lock. If eager lock is disabled, lock will be released as soon as fop completes. gluster v set <VOLUME NAME> disperse.eager-lock on gluster v set <VOLUME NAME> disperse.eager-lock off Change-Id: I000985a787eba3c190fdcd5981dfbf04e64af166 BUG: 1314649 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13605 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* afr: Add throttled background client-side healsRavishankar N2016-03-012-32/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a heal is needed after inode refresh (lookup, read_txn), launch it in the background instead of blocking the fop (that triggered refresh) until the heal happens. afr_replies_interpret() is modified such that the heal is launched only if atleast one sink brick is up. Max. no of heals that can happen in parallel is configurable via the 'background-self-heal-count' volume option. Any number greater than that is put in a wait queue whose length is configurable via 'heal-wait-queue-leng' volume option. If the wait queue is also full, further heals will be ignored. Default values: background-self-heal-count=8, heal-wait-queue-leng=128 Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70 BUG: 1297172 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13207 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* heal: Remove sleep()Pranith Kumar K2016-02-111-4/+4
| | | | | | | | | | | | | | | | | | I wrote this program from a sample gfapi program which had sleep. I am not sure why this sleep was needed. So removing it now. Changed tests/bugs/replicate/bug-1190069-afr-stale-index-entries.t to execute count_sh_entries every second, instead of comparing same value over and over. Change-Id: I7b89d6cab3e50bb7bf4d40a6064f2d8734155bea BUG: 1306199 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13421 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr: Fix heal-info slow response while IO is in progressKrutika Dhananjay2016-02-031-0/+43
| | | | | | | | | | | | | | | | | | Now heal-info does an open() on the file being examined so that the client at some point sees open-fd count being > 1 and releases the eager-lock so that heal-info doesn't remain blocked forever until IO completes. Change-Id: Icc478098e2bc7234408728b54d8185102b3540dc BUG: 1297695 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13326 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* tests: Fix spurious failure in bug-1221481-allow-fops-on-dir-split-brain.t.Krutika Dhananjay2016-01-221-4/+1
| | | | | | | | | | | | | | | | | | | | Occasionally, when ls is executed, prior to READDIRP, a STAT is wound on the operand directory. And AFR fails STAT with EIO if it is in metadata split-brain which "dir" is in the test case in question. As a result, ls also fails with EIO, causing test 20 to return negative exit status. The fix is in the test script where the parts that cause the dir to go into metadata split-brain have been removed. Now "dir" will only have entry split-brain. Change-Id: I4e4e6ba0a2401c7168719cd44e5f4f4bcb8fdd89 BUG: 1295702 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13172 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* tests: handle bad objects during lookup/inode_refreshRavishankar N2015-12-281-0/+53
| | | | | | | | | | | Change-Id: I1848f0e9243c9376e0deba6738757350fe8b704a BUG: 1290965 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13044 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix data loss due to race between sh and ongoing writeKrutika Dhananjay2015-12-221-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When IO is happening on a file and a brick goes down comes back up during this time, protocol/client translator attempts reopening of the fd on the gfid handle of the file. But if another client renames this file while a brick was down && writes were in progress on it, once this brick is back up, there can be a race between reopening of the fd and entry self-heal replaying the effect of the rename() on the sink brick. If the reopening of the fd happens first, the application's writes continue to go into the data blocks associated with the gfid. Now entry-self-heal deletes 'src' and creates 'dst' file on the sink, marking dst as a 'newentry'. Data self-heal is also completed on 'dst' as a result and self-heal terminates. If at this point the application is still writing to this fd, all writes on the file after self-heal would go into the data blocks associated with this fd, which would be lost once the fd is closed. The result - the 'dst' file on the source and sink are not the same and there is no pending heal on the file, leading to silent corruption on the sink. Fix: Leverage http://review.gluster.org/#/c/12816/ to ensure the gfid handle path gets saved in .glusterfs/unlink until the fd is closed on the file. During this time, when self-heal sends mknod() with gfid of the file, do the following: link() the gfid handle under .glusterfs/unlink to the new path to be created in mknod() and rename() the gfid handle to go back under .glusterfs/ab/cd/. Change-Id: I86ef1f97a76ffe11f32653bb995f575f7648f798 BUG: 1292379 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13001 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr : Examine data/metadata readable for read-subvolAnuradha Talur2015-08-251-0/+48
| | | | | | | | | | | | | | | | | | During lookup and discover, currently read_subvol is based only on data_readable. read_subvol should be decided based on both data_readable and metadata_readable. Credits to Ravishankar N for the logic of afr_first_up_child from http://review.gluster.org/10905/ . Change-Id: I98580b23c278172ee2902be08eeaafb6722e830c BUG: 1240244 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11551 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: modify afr_txn_nothing_failed()Ravishankar N2015-08-252-0/+91
| | | | | | | | | | | | | | | | | | In an AFR transaction, we need to consider something as failed only if the failure (either in the pre-op or the FOP phase) occurs on the bricks on which a transaction lock was obtained. Without this, we would end up considering the transaction as failure even on the bricks on which the lock was not obtained, resulting in unnecessary fsyncs during the post-op phase of every write transaction for non-appending writes. Change-Id: Iee79e5d85dc7b4c41459d8bdd04a8454bdaf9a9d BUG: 1250170 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11827 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tests: Fix ./tests/bugs/replicate/bug-1238508-self-heal.tRavishankar N2015-07-071-1/+1
| | | | | | | | | | | | | | | | | | Test failed @ http://build.gluster.org/job/rackspace-regression-2GB-triggered/12010/consoleFull (Reported by Vijaykumar M) Fix: s/afr_get_pending_heal_count/get_pending_heal_count Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: I69c44919ae68e3ebb9a5bc58a8e45a0a96fad62e BUG: 1238508 Reviewed-on: http://review.gluster.org/11556 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Make background healing optional behaviorPranith Kumar K2015-07-063-3/+3
| | | | | | | | | | | Provide options to control number of active background heal count and qlen. Change-Id: Idc2419219d881f47e7d2e9bbc1dcdd999b372033 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11473 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr : expunge first, impunge next in entry selfhealAnuradha Talur2015-07-062-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When entry self-heals are performed, the files/directories that are to be expunged should be removed first and then impunge should be done. Consider the following scenario : A volume with 2 bricks : b0 and b1. 1) With following hierarchy on both bricks: olddir |__ oldfile 2) Bring down b1 and do 'mv olddir newdir'. 3) Bring up b1 and self-heal. 4) Without patch, during self-heal the events occur in following order, a) Creation of newdir on the sink brick. Notice that gfid of olddir and newdir are same. As a result of which gfid-link file in .glusterfs directory still points to olddir and not to newdir. b) Deletion of olddir on the sink brick. As a part of this deletion, the gfid link file is also deleted. Now, there is no link file pointing to newdir. 5) Files under newdir will not get listed as part of readdir. To tackle this kind of scenario, an expunge should be done first and impunge later; which is the purpose of this patch. Change-Id: Idc8546f652adf11a13784ff989077cf79986bbd5 BUG: 1238508 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11498 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: complete conservative merge even in case of gfid split-brain.Ravishankar N2015-06-221-0/+48
| | | | | | | | | | | | | | | | | | | | Problem: While performing conservative merge, we bail out of the merge if we encounter a file with mismatching gfid or type. What this means is all entries that come after the mismatching file (during the merge) never get healed, no matter how many index heals are done. Fix: Continue with the merging of rest of the entries even if a gfid/type mismatch is found, but ensure that post-op does not happen on the parent dir in such a case. Change-Id: I9bbfccc8906007daa53a0750ddd401dcf83943f8 BUG: 1180545 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9429 Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* Tests portability: umount(8)Emmanuel Dreyfus2015-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Avoid hangs on unmounting NFS on NetBSD NetBSD umount(8) on a NFS mount whose server is gone will wait forever because umount(8) calls realpath(3) and tries to access the mount before it calls unmount(2). The non-portable, NetBSD-specific umount -R flag prevent that behavior. We therefore introduce UMOUNT_F, defined as "umount -f" on Linux and "umount -f -R" on NetBSD to take care of forced unmounts, especially in the NFS case. 2) Enforce usage of force_umount wrapper with timeout Whenever umount is used it should be wrapped in force_umount with tiemout handling. That saves us timing issues, and it handles the NetBSD NFS case. 3) Cleanup kernel cache flush. We used (cd $M0 && umount $M0 ) as a portable kernel cache flush trick, but it does not flush everything we need on Linux. Introduce a drop_cache() shell function that reverts to previously used echo 3 > /proc/sys/vm/drop_caches on Linux, and keeps (cd $M0 && umount $M0 ) on other systems. BUG: 1129939 Change-Id: Iab1f5a023405f1f7270c42b595573702ca1eb6f3 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/11114 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: allow readdir to proceed for directories in split-brainRavishankar N2015-05-281-0/+37
| | | | | | | | | | | | | | | | | | | Problem: afr_read_txn() bails out if read_subvol==-1. This meant that for directories that were in entry split-brain, FOPS like readdir, access, stat etc were not allowed. Fix: Except for getxattr, all other FOPS are wound on the first up child of afr. Change-Id: Iacec8fbb1e75c4d2094baa304f62331c81a6f670 BUG: 1221481 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10776 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: NetBSD Build System
* tests: Disable flush-behindPranith Kumar K2015-05-081-0/+1
| | | | | | | | | Change-Id: Ibbb6b03d2878ef4a049f737662c31e70a68e5755 BUG: 1219816 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10666 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* afr: remove stale index entriesRavishankar N2015-03-175-46/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During pre-op phase, the index xlator 1. Creates the entry inside .glusterfs/indices/xattrop 2. Winds the xattrop fop to posix to mark dirty/pending changelogs. If the brick crashes after 1, the xattrop entry becomes stale and never gets removed by shd during subsequent crawls because there is nothing to heal (changelogs are zero). Though the stale entry does not get displayed in the output of 'heal info' command, it nevertheless stays there forever unless a new write transaction is performed on the file. Fix: During index self-heal if afr xattrs are found to be clean (indicated by ret value of 2 on a call to afr_shd_selfheal(), send a dummy post-op with all 0s for the xattr values, which makes the index xlator to unlink the stale entry. Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919 BUG: 1190069 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9714 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Enable auto-quorum for replicate with odd number of bricksPranith Kumar K2015-02-091-0/+1
| | | | | | | | | | | Change-Id: I908934f1f22cf7d2d0ceccc0dedf28a69861997f BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9517 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* syncop: Provide syncop_ftw and syncop_dir_scan utilsPranith Kumar K2015-02-061-0/+2
| | | | | | | | | | | | | | | | | ftw provides file tree walk. dir_scan does just a readdir not readdirp. Also changed Afr's self-heal-daemon's crawling functions to use this. These utils will be used by ec in future to do proactive/full healing. Change-Id: I05715ddb789592c1b79a71e98f1e8cc29aac5c26 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9485 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tests: move all test-cases into component subdirectoriesNiels de Vos2015-01-0635-0/+1913
There are around 300 regression tests, 250 being in tests/bugs. Running partial set of tests/bugs is not easy because this is a flat directory with almost all tests inside. It would be valuable to make partial test/bugs easier, and allow the use of mulitple build hosts for a single commit, each running a subset of the tests for a quicker result. Additional changes made: - correct the include path for *.rc shell libraries and *.py utils - make the testcases pass checkpatch - arequal-checksum in afr/self-heal.t was never executed, now it is - include.rc now complains loudly if it fails to find env.rc Change-Id: I26ffd067e9853d3be1fd63b2f37d8aa0fd1b4fea BUG: 1178685 Reported-by: Emmanuel Dreyfus <manu@netbsd.org> Reported-by: Atin Mukherjee <amukherj@redhat.com> URL: http://www.gluster.org/pipermail/gluster-devel/2014-December/043414.html Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9353 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>