summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr/src/afr-common.c
Commit message (Collapse)AuthorAgeFilesLines
* afr: Avoid resetting event_gen when brick is always downRavishankar N2017-01-131-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/16309/ Problem: __afr_set_in_flight_sb_status(), which resets event_gen to zero, is called if failed_subvols[i] is non-zero for any brick. But failed_subvols[i] is true even if the brick was down *before* the transaction started. Hence say if 1 brick is down in a replica-3, every writev that comes will trigger an inode refresh because of this resetting, as seen from the no. of FSTATs in the profile info in the BZ. Fix: Reset event gen only if the brick was previously a valid read child and the FOP failed on it the first time. Also `s/afr_inode_read_subvol_reset/afr_inode_event_gen_reset` because the function only resets event gen and not the data/metadata readable. Change-Id: I7840f7123d3b3e0404743988088ec349391ca980 BUG: 1412890 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16387 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* afr: use accused matrix instead of readable matrix for deciding healsRavishankar N2016-12-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: afr_replies_interpret() used the 'readable' matrix to trigger client side heals after inode refresh. But for arbiter, readable is always zero. So when `dd` is run with a data brick down, spurious data heals are are triggered. These heals open an fd, causing eager lock to be disabled (open fd count >1) in afr transactions, leading to extra FXATTROPS Fix: Use the accused matrix (derived from interpreting the afr pending xattrs) to decide whether we can start heal or not. > Reviewed-on: http://review.gluster.org/16277 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit 5a7c86e578f5bbd793126a035c30e6b052177a9f) Change-Id: Ibbd56c9aed6026de6ec42422e60293702aaf55f9 BUG: 1408820 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16299 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Fill loc.pargfid too for named lookups on individual shardsKrutika Dhananjay2016-11-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/15788/ On a sharded volume when a brick is replaced while IO is going on, named lookup on individual shards as part of read/write was failing with ENOENT on the replaced brick, and as a result AFR initiated name heal in lookup callback. But since pargfid was empty (which is what this patch attempts to fix), the resolution of the shards by protocol/server used to fail and the following pattern of logs was seen: Brick-logs: [2016-11-08 07:41:49.387127] W [MSGID: 115009] [server-resolve.c:566:server_resolve] 0-rep-server: no resolution type for (null) (LOOKUP) [2016-11-08 07:41:49.387157] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-rep-server: 91833: LOOKUP(null) (00000000-0000-0000-0000-000000000000/16d47463-ece5-4b33-9c93-470be918c0f6.82) ==> (Invalid argument) [Invalid argument] Client-logs: [2016-11-08 07:41:27.497687] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.497755] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.498500] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.499680] E [MSGID: 133010] Also, this patch makes AFR by itself choose a non-NULL pargfid even if its ancestors fail to initialize all pargfid placeholders. Change-Id: I34b9f90d0f09766b6d87b3994d5cd7a77b622dcb BUG: 1392853 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15797 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Prevent split-brain when bricks are brought off and on in ↵Krutika Dhananjay2016-08-221-6/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cyclic order Backport of: http://review.gluster.org/15080 When the bricks are brought offline and then online in cyclic order while writes are in progress on a file, thanks to inode refresh in write txns, AFR will mostly fail the write attempt when the only good copy is offline. However, there is still a remote possibility that the file will run into split-brain if the brick that has the lone good copy goes offline *after* the inode refresh but *before* the write txn completes (I call it in-flight split-brain in the patch for ease of reference), requiring intervention from admin to resolve the split-brain before the IO can resume normally on the file. To get around this, the patch does the following things: i) retains the dirty xattrs on the file ii) avoids marking the last of the good copies as bad (or accused) in case it is the one to go down during the course of a write. iii) fails that particular write with the appropriate errno. This way, we still have one good copy left despite the split-brain situation which when it is back online, will be chosen as source to do the heal. Change-Id: I7c13c6ddd5b8fe88b0f2684e8ce5f4a9c3a24a08 BUG: 1367270 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15222 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: some coverity fixesRavishankar N2016-07-281-92/+136
| | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/14895/ Thanks to Krutika for a cleaner way to track inode refs in afr_set_split_brain_choice(). Change-Id: I2d968d05b815ad764b7e3f8aa9ad95a792b3c1df BUG: 1360549 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/15017 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Unwind xdata_rsp even in case of failuresPranith Kumar K2016-06-031-8/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | DHT expects GF_PREOP_CHECK_FAILED to be present in xdata_rsp in case of mkdir failures because of stale layout. But AFR was unwinding null xdata_rsp in case of failures. This was leading to mkdir failures just after remove-brick. Unwind the xdata_rsp in case of failures to make sure the response from brick reaches dht. >BUG: 1340623 >Change-Id: Idd3f7b95730e8ea987b608e892011ff190e181d1 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/14553 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Anuradha Talur <atalur@redhat.com> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> BUG: 1340992 Change-Id: I2641d35a851be692aa223dfea5d082245ac6c2bc Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14633 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Refresh inode for inode-write fops in needPranith Kumar K2016-05-291-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If a named fresh-lookup is done on an loc and the fop fails on one of the bricks or not sent on one of the bricks, but by the time response comes to afr, if the brick is up, 'can_interpret' will be set to false in afr_lookup_done(), this will lead to inode-ctx for that inode to be not set, this can lead to EIO in case of a transaction as it depends on 'readable' array to be available by that point. Fix: Refresh inode for inode-write fops for the ctx to be set if it is not already done at the time of named fresh-lookup or if the file is in split-brain where we need to perform one more refresh before failing the fop to check if the file is still in split-brain or not. >BUG: 1336612 >Change-Id: I5c50b62c8de06129b8516039f7c252e5008c47a5 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/14368 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Backport of http://review.gluster.org/14545 BUG: 1337831 Change-Id: If4465ab8fc506e1f905b623b82a53bdab8f5cffd Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14453 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Do not inode_link in afrPranith Kumar K2016-05-261-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Race is explained at https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0 This patch also handles performing of self-heal with shd-pid. Also performs the healing with this->itable's inode rather than main itable. >BUG: 1337405 >Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/14422 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> BUG: 1337872 Change-Id: I6d8e79a44e4cc1c5489d81f05c82510e4e90546f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14456 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Handle non-zero source in heal-info decisionPranith Kumar K2016-05-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/14302 Problem: Spurious entries are reported in heal info when the mount is on second/third brick of the replica pair because local-child is given preference in selecting source. The code is supposed to suggest the file needs heal if the (source < 0) (failure code path), but instead it is written as if any non-zero value is considered failure. Fix: Treat +ve source as success case BUG: 1334566 Change-Id: Iac6d68cc429496756a9d8f6e21e71aa5f6b932ee Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14304 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* afr: propagate child up event after timeoutRavishankar N2016-04-271-96/+184
| | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/11113 Problem: During mount, afr waits for response from all its children before notifying the parent xlator. In a 1x2 replica volume , if one of the nodes is down, the mount will hang for more than a minute until child down is received from the client xlator for that node. Fix: When parent up is received by afr, start a 10 second timer. In the timer call back, if we receive a successful child up from atleast one brick, propagate the event to the parent xlator. Change-Id: I31e57c8802c1a03a4a5d581ee4ab82f3a9c8799d BUG: 1330855 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14088 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* afr: replica pair going offline does not require CHILD_MODIFIED eventSakshi Bansal2016-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As a part of CHILD_MODIFIED event DHT forgets the current layout and performs fresh lookup. However this is not required when a replica pair goes offline as the xattrs can be read from other replica pairs. Hence setting different event to handle replica pair going down. > Backport of http://review.gluster.org/#/c/12573/ > Change-Id: I5ede2a6398e63f34f89f9d3c9bc30598974402e3 > BUG: 1281230 > Signed-off-by: Sakshi Bansal <sabansal@redhat.com> > Reviewed-on: http://review.gluster.org/12573 > Reviewed-by: Ravishankar N <ravishankar@redhat.com> > Reviewed-by: Susant Palai <spalai@redhat.com> > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: Ida30240d1ad8b8730af7ab50b129dfb05264fdf9 BUG: 1283972 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12767 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix spurious entries in heal infoPranith Kumar K2016-04-201-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Locking schemes in afr-v1 were locking the directory/file completely during self-heal. Newer schemes of locking don't require Full directory, file locking. But afr-v2 still has compatibility code to work-well with older clients, where in entry-self-heal it takes a lock on a special 256 character name which can't be created on the fs. Similarly for data self-heal there used to be a lock on (LLONG_MAX-2, 1). Old locking scheme requires heal info to take sh-domain locks before examining heal-state. If it doesn't take sh-domain locks, then there is a possibility of heal-info hanging till self-heal completes because of compatibility locks. But the problem with heal-info taking sh-domain locks is that if two heal-info or shd, heal-info try to inspect heal state in parallel using trylocks on sh-domain, there is a possibility that both of them assuming a heal is in progress. This was leading to spurious entries being shown in heal-info. Fix: As long as there is afr-v1 way of locking, we can't fix this problem with simple solutions. If we know that the cluster is running newer versions of locking schemes, in those cases we can give accurate information in heal-info. So introduce a new option called 'locking-scheme' which if it is 'granular' will give correct information in heal-info. Not only that, Extra network hops for taking compatibility locks, sh-domain locks in heal info will not be necessary anymore. Thus it improves performance. >BUG: 1322850 >Change-Id: Ia563c5f096b5922009ff0ec1c42d969d55d827a3 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/13873 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Ashish Pandey <aspandey@redhat.com> >Reviewed-by: Anuradha Talur <atalur@redhat.com> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >(cherry picked from commit b6a0780d86e7c6afe7ae0d9a87e6fe5c62b4d792) Change-Id: If7eee18843b48bbeff4c1355c102aa572b2c155a BUG: 1294675 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14039 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Don't lookup/forget inodesPranith Kumar K2016-04-171-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: All inodes that are looked-up are always forgotten without fail in afr removing the benefits of them being in lru. This same code can cause crashes if between inode_lookup, inode_forget in afr if the top xlator does inode_forget(0). Fix: Don't use lookup/forget in afr. No benefits are there at the moment for keeping this code. It is impossible to prevent top xlators to do inode_forget(0). Found similar instances in ec and removed them even though those code paths are not going to be executed in any place other than heal-daemon. >BUG: 1321554 >Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/13834 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Anuradha Talur <atalur@redhat.com> BUG: 1327864 Change-Id: I3507ed88cd75e069ed302525bfa259cf407871fb Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14009 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Don't delete gfid-req from lookup requestPranith Kumar K2016-04-121-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Afr does dict_ref of the xattr_req that comes to it and deletes "gfid-req" key. Dht uses same dict to send lookup to other subvolumes. So in case of directories and more than 1 dht subvolumes, second subvolume till the last subvolume won't get a lookup request with "gfid-req". So gfid reset never happens on the directories in distributed replicate subvolume for 2nd till last subvolumes. Fix: Make a copy of lookup xattr request. Also fixed replies_wipe possibly resetting gfid to NULL gfid >BUG: 1312816 >Change-Id: Ic16260e5a4664837d069c1dc05b9e96ca05bda88 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/13545 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >(cherry picked from commit 9b022c3a3f2f774904b5b458ae065425b46cc15d) Change-Id: Ia68193b559ec1dfd841cc5a22ef1fa801b866200 BUG: 1313693 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13574 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Choose local child as source if possiblePranith Kumar K2016-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | It is better to choose local brick as source if possible to prevent over the wire read thus saving on bandwidth. Also changed code to not attempt data-heal if 'source' is selected as arbiter. >Change-Id: I9a328d0198422280b13a30ab99545370a301dfea >BUG: 1314150 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/13585 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >Tested-by: Krutika Dhananjay <kdhananj@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> >(cherry picked from commit 2807e3fc005630213ab7ad251fef13d61c07ac6b) Change-Id: I24ea66683f81e238a6c1850664a49fe554011a0a BUG: 1322521 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13860 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* afr : Enable auto heal when replica count increasesAnuradha Talur2016-03-231-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/12454 This patch is part two change to prevent data loss in a replicate volume on doing a add-brick operation. Problem: After doing add-brick, there is a chance that self heal might happen from the newly added brick rather than the source brick, leading to data loss. Solution: Mark pending changelogs on afr children for the new afr-child so that heal is performed in the correct direction. >Change-Id: I11871e55eef3593aec874f92214a2d97da229b17 >BUG: 1276203 >Signed-off-by: Anuradha Talur <atalur@redhat.com> >Reviewed-on: http://review.gluster.org/12454 >Smoke: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Change-Id: Iae6af44f97e612cb3ee8c642254ec3d15ac063f5 BUG: 1320020 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/13807 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr: Add throttled background client-side healsRavishankar N2016-03-221-48/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/13207 If a heal is needed after inode refresh (lookup, read_txn), launch it in the background instead of blocking the fop (that triggered refresh) until the heal happens. afr_replies_interpret() is modified such that the heal is launched only if atleast one sink brick is up. Max. no of heals that can happen in parallel is configurable via the 'background-self-heal-count' volume option. Any number greater than that is put in a wait queue whose length is configurable via 'heal-wait-queue-leng' volume option. If the wait queue is also full, further heals will be ignored. Default values: background-self-heal-count=8, heal-wait-queue-leng=128 Change-Id: I9a134b2c29d66b70b7b1278811bd504963aabacc BUG: 1313312 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13564 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: misc performance improvementsRavishankar N2016-03-081-28/+46
| | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/13595/ 1. In afr_getxattr_cbk, consider the errno value before blindly launching an inode refresh and a subsequent retry on other children. 2. We want to accuse small files only when we know for sure that there is no IO happening on that inode. Otherwise, the ia_sizes obtained in the post-inode-refresh replies may mismatch due to a race between inode-refresh and ongoing writes, causing spurious heal launches. Change-Id: I9858485d1061db67353ccf99c59530731649c847 BUG: 1309462 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13644 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* afr: do not set arbiter as a readable subvol in inode contextRavishankar N2016-03-071-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If afr_lookup_done() or afr_read_subvol_select_by_policy() chooses the arbiter brick to serve the stat() data, file size will be reported as zero from the mount, despite other data bricks being available. This can break programs like tar which use the stat info to decide how much to read. Fix: In the inode-context, mark arbiter as a non-readable subvol for both data and metadata. It it to be noted that by making this fix, we are *not* going to serve metadata FOPS anymore from the arbiter brick despite the brick storing the metadata. It makes sense to do this because the ever increasing over-loaded FOPs (getxattr returning stat data etc.) and compound FOPS in gluster will otherwise make it difficult to add checks in code to handle corner cases. >Change-Id: Ic60b25d77fd05e0897481b7fcb3716d4f2101001 >BUG: 1310171 >Signed-off-by: Ravishankar N <ravishankar@redhat.com> >Reported-by: Mat Clayton <mat@mixcloud.com> >Reviewed-on: http://review.gluster.org/13539 >Reviewed-by: Anuradha Talur <atalur@redhat.com> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> BUG: 1313921 Change-Id: I07fc08d633ca2af48f7354454bc2ab75cedb850a Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13609 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Move remaining gf_logs to gf_msgsPranith Kumar K2016-03-061-5/+8
| | | | | | | | | | | | | | | | | | | | | >Change-Id: I48d9e5313bd3ccf9fe26c90a7051a8a174d75c49 >BUG: 1296818 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/13195 >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Vijay Bellur <vbellur@redhat.com> BUG: 1315142 Change-Id: Iab55c3fd41777e9fe295f674c3ee410fd780b418 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13617 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* Revert "afr: do not set arbiter as a readable subvol in inode context"Pranith Kumar Karampuri2016-03-051-8/+0
| | | | | | | | | | | | This reverts commit ad0b1253b9d74797620c493184818685c024f17c. Change-Id: Id43ba8e75d58325f897e15e3f64f9389236adb40 Reviewed-on: http://review.gluster.org/13608 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* afr: do not set arbiter as a readable subvol in inode contextRavishankar N2016-03-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport-of: http://review.gluster.org/#/c/13539/ Problem: If afr_lookup_done() or afr_read_subvol_select_by_policy() chooses the arbiter brick to serve the stat() data, file size will be reported as zero from the mount, despite other data bricks being available. This can break programs like tar which use the stat info to decide how much to read. Fix: In the inode-context, mark arbiter as a non-readable subvol for both data and metadata. It it to be noted that by making this fix, we are *not* going to serve metadata FOPS anymore from the arbiter brick despite the brick storing the metadata. It makes sense to do this because the ever increasing over-loaded FOPs (getxattr returning stat data etc.) and compound FOPS in gluster will otherwise make it difficult to add checks in code to handle corner cases. Change-Id: Ic60b25d77fd05e0897481b7fcb3716d4f2101001 BUG: 1313921 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Mat Clayton <mat@mixcloud.com> Reviewed-on: http://review.gluster.org/13582 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Mark self-heal fops as internalPranith Kumar K2016-03-041-3/+3
| | | | | | | | | | | | | | | | | | >Change-Id: I8ae7af266d3e00460f0cfdc9389a926e5f2fee36 >BUG: 1282761 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/12598 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> BUG: 1283757 Change-Id: Ic20d4ee031265305db1a6ed2cf591ce94b7d0749 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12668 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Fix heal-info slow response while IO is in progressKrutika Dhananjay2016-02-041-1/+16
| | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/13326/ Now heal-info does an open() on the file being examined so that the client at some point sees open-fd count being > 1 and releases the eager-lock so that heal-info doesn't remain blocked forever until IO completes. Change-Id: I7d4a8aa4de459216408b666894ee7bb42e406547 BUG: 1303899 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13348 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* afr: handle bad objects during lookup/inode_refreshRavishankar N2016-01-211-1/+12
| | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/12955, http://review.gluster.org/#/c/13077/ and http://review.gluster.org/#/c/13185/ If an object (file) is marked bad by bitrot, do not consider the brick on which the object is present as a potential read subvolume for AFR irrespective of the pending xattr values. Also do not consider the brick containing the bad object while performing afr_accuse_smallfiles(). Otherwise if the bad object's size is bigger,we may end up considering that as the source. Change-Id: I4abc68e51e5c43c5adfa56e1c00b46db22c88cf7 BUG: 1293300 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13041 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* all: reduce "inline" usageKaleb S KEITHLEY2016-01-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. backport of Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155, http://review.gluster.org/11769, BUG: 1245331 Change-Id: Iba1efb0bc578ea4a5e9bf76b7bd93dc1be9eba44 BUG: 1283302 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12646 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* afr: refresh inode using fstatRavishankar N2015-12-281-26/+91
| | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/12894 For fd based operations (fgetxattr, readv etc.) if an inode refresh is required, do so using fstat instead of lookup. This is because the file might have been deleted by another client before refresh but posix mandates that FOPS using already open fds must still succeed. Change-Id: Id5f71c3af4892b648eb747f363dffe6208e7ac09 BUG: 1290363 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13040 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* heal : Changed heal info to process all indices directoriesAnuradha Talur2015-12-211-39/+61
| | | | | | | | | | | | Backport of http://review.gluster.org/#/c/12658/ Change-Id: Ida863844e14309b6526c1b8434273fbf05c410d2 BUG: 1287531 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12854 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr : Readdirp performance enhancementAnuradha Talur2015-12-211-2/+103
| | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/12507/ Things done : 1) during lookup and inode_refresh as part of read_txn, request is sent to detect if heal is required or not. 2) If heal is required, be conservative in setting the readdirp entry inodes to NULL, otherwise don't be. 3) Self-heal-daemon now crawls both indices/xattrop and indices/dirty directory while healing. Change-Id: Ic4a4da63fb7e0726eab5f341a200859b29cf7eb7 BUG: 1287531 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12507 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12853
* afr: fixes in transaction codeRavishankar N2015-10-261-7/+0
| | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/12368/ and http://review.gluster.org/#/c/12415/ 1. When winding the pre-op, transaction.pre_op[i] is set. If the pre-op fails, transaction.failed_subvols[i] is set. If if fails on all chidren, we can directly proceed to unlock (via afr_changelog_post_op_now) without trying to wind the write, fail and then going to unlock. 2. 'fop_subvols' seems to be an unused variable, hence removing it. 3. Call local->transaction.wind() only on subvols where pre-op succeeded. Change-Id: I9525628daf48082e979b0093fa0478934495e61f BUG: 1273334 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12399 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr : get split-brain-status in a synctaskAnuradha Talur2015-09-151-6/+26
| | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/12163/ On executing `getfattr -n replica.split-brain-status <file>` on mount, there is a possibility that the mount hangs. To avoid this hang, fetch the split-brain-status of a file in synctask. >Change-Id: I87b781419ffc63248f915325b845e3233143d385 >BUG: 1262345 >Signed-off-by: Anuradha Talur <atalur@redhat.com> Change-Id: I9f4f4b54e108d3a0017264353b8272e072170c16 BUG: 1262547 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12166 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Make [f]xattrop metadata transactionPranith Kumar K2015-08-311-164/+8
| | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.com/11809 Problem: When xlators above afr do [f]xattrop when one of the bricks is down, after the brick comes backup, the metadata is not healed because [f]xattrop is not considered a transaction. Fix: Treat [f]xattrop as transaction so that changes done by xlators above afr are marked for heal when some of the bricks were down at the time of [f]xattrop. BUG: 1248890 Change-Id: Ibe69aa0ca6be9b4b4134dc2879b306e2e9c4cde8 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11810 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr : Examine data/metadata readable for read-subvolAnuradha Talur2015-08-281-20/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | During lookup and discover, currently read_subvol is based only on data_readable. read_subvol should be decided based on both data_readable and metadata_readable. Credits to Ravishankar N for the logic of afr_first_up_child from http://review.gluster.org/10905/ . > Change-Id: I98580b23c278172ee2902be08eeaafb6722e830c > BUG: 1240244 > Signed-off-by: Anuradha Talur <atalur@redhat.com> > Reviewed-on: http://review.gluster.org/11551 > Reviewed-by: Ravishankar N <ravishankar@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > (cherry picked from commit 36349fa250ace6109002dfa41305d9dcd54ce0a9) Change-Id: Ia068ef9deb97f7bc48ea0c56d5ab6851f8860118 BUG: 1256909 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12011 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: launch index heal on local subvols up on a child-up eventRavishankar N2015-08-231-17/+11
| | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/11912/ Problem: When a replica's child goes down and comes up, the index heal is triggered only on the child that just came up. This does not serve the intended purpose as the list of files that need to be healed to this child is actually captured on the other child of the replica. Fix: Launch index-heal on all local children of the replica xlator which just received a child up. Note that afr_selfheal_childup() eventually calls afr_shd_index_healer() which will not run the heal on non-local children. Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: Ia23e47d197f983c695ec0bcd283e74931119ee55 BUG: 1255690 Reviewed-on: http://review.gluster.org/11982 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr: Do not wind statfs to arbiter brickRavishankar N2015-08-121-3/+6
| | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/11857/ Problem: AFR serves statfs from the brick having the least free space available. Since the size to be allocated to the arbiter brick in a 3 way replica is supposed to be considerably lesser than the other 2 bricks, statfs will be served from this brick which is incorrect. Fix: Don't serve statfs from the arbiter brick. Change-Id: Ia2d2402ba1e8f5d96831f71b3f8337f241e6753b BUG: 1251380 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11858 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: Porting messages to new logging frameworkarao2015-06-281-27/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | updated Backport of http://review.gluster.org/9897 Cherry picked from 58a736111fa1db4f10c6646e81066434260f674f >Change-Id: I94ac7b2cb0d43a82cf0eeee21407cff9b575c458 >BUG: 1194640 >Signed-off-by: arao <arao@redhat.com> >Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com> >Reviewed-on: http://review.gluster.org/9897 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Change-Id: I5fb464da38594579f31661b42a8a3e9d858a797e BUG: 1217722 Signed-off-by: arao <arao@redhat.com> Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com> Reviewed-on: http://review.gluster.org/11351 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Block fops when file is in split-brainRavishankar N2015-06-261-0/+58
| | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/11371/ For directories, block metadata FOPS. For non-directories, block data and metadata FOPS. Do not block entry FOPS. Change-Id: I23963648291ec1136a5a35052ed0b7bc1b3059e7 BUG: 1235934 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11420 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Pick gfid from poststat during fresh lookup for read child ↵Krutika Dhananjay2015-06-251-27/+41
| | | | | | | | | | | | | | calculation Backport of: http://review.gluster.org/11373 Change-Id: I3ddc70cb0e7dbd1ef8adb352393b5ec16464fc94 BUG: 1212842 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11391 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: honour selfheal enable/disable volume set optionsRavishankar N2015-06-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | afr-v1 had the following volume set options that are used to enable/ disable self-heals from happening in AFR xlator when loaded in the client graph: cluster.metadata-self-heal cluster.data-self-heal cluster.entry-self-heal In afr-v2, these 3 heals can happen from the client if there is an inode refresh. This patch allows such heals to proceed only if the corresponding volume set options are set to true. Change-Id: I8d97d6020611152e73a269f3fdb607652c66cc86 BUG: 1227674 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11012 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit da111ae21429d33179cd11409bc171fae9d55194) Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11062
* cluster/afr: Treat op_ret >= 0 as success in afr_final_errno()Krutika Dhananjay2015-05-281-1/+1
| | | | | | | | | | | | Backport of: http://review.gluster.org/10946 Change-Id: I77e7ec8627846d1d7e3198459ea7b8624b5f450b BUG: 1225743 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10960 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System
* cluster/afr : Do not copy dict when it is NULLAnuradha2015-05-151-1/+1
| | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/10755 In afr_lookup_xattr_req_prepare(), dict_copy was done even though source dict was NULL. Change-Id: I85a5d2823ba021e7f78c1ce13402a0f16b08cb51 BUG: 1221470 Reviewed-on: http://review.gluster.org/10755 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10777
* cluster/afr : Prevent inode-evict during split-brain resolutionAnuradha2015-05-091-24/+174
| | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/10134/ 1) Provided setfattr command to set timeout for split-brain choice. 2) If split-brain inspection/resolution is being done from the mount for a file, ref the inode when split-brain-choice is set. This inode will be unconditionally unref-ed after timeout seconds set by the user/default otherwise. 3) Updated the doc and testcase to reflect the changes. Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1 BUG: 1219388 Reviewed-on: http://review.gluster.org/10134 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10679
* cluster/ec: Fix dictionary compare functionPranith Kumar K2015-05-081-35/+6
| | | | | | | | | | | | | | | | | | | | | If both dicts are NULL then equal. If one of the dicts is NULL but the other has only ignorable keys then also they are equal. If both dicts are non-null then check if for each non-ignorable key, values are same or not. value_ignore function is used to skip comparing values for the keys which must be present in both the dictionaries but the value could be different. geo-rep's stime xattr doesn't need to be present in list xattr but when getxattr comes on stime xattr even if there aren't enough responses with the xattr we should still give out an answer which is maximum of the stimes available. Change-Id: I8de2ceaa2db785b797f302f585d88e73b154167d BUG: 1216303 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10078 Reviewed-on: http://review.gluster.org/10690 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* afr: add arbitration supportRavishankar N2015-05-051-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/10258/ Add logic in afr to work in conjunction with the arbiter xlator when a replica 3 arbiter volume is created. More specifically, this patch: * Enables full locks for afr data transaction for such volumes. * Removes the upfront marking of pending xattrs at the time of pre-op and defer it to post-op. (This is an arbiter independent change and is made for all afr transactions.) * After pre-op stage, check if we can proceed with the fop stage without ending up in split-brain by examining the changelog xattrs. * Unwinds the fop with failure if only one source was available at the time of pre-op and the fop happened to fail on particular source brick. * Skips data self-heal if arbiter brick is the only source available. * Adds the arbiter-count option to the shd graph. This patch is a part of the arbiter logic implementation for 3 way AFR details of which can be found at http://review.gluster.org/#/c/9656/ Change-Id: I9603db9d04de5626eb2f4d8d959ef5b46113561d BUG: 1217689 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10514 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr : null dereference coverity fix.Manikandan Selvaganesh2015-04-081-14/+25
| | | | | | | | | | | | CID : 1194648 Change-Id: Ib26e7cdbf412d563240885fb3113bcc1fe5c9c49 BUG: 789278 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9571 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-041-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Tests: fix spurious failure in read-subvol-entry.tEmmanuel Dreyfus2015-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | read-subvol-entry.t tests that if a brick has pending operations, it is not used for readdir operations. On NetBSD this test exhibits spurious failures, with the wrong brick being used to perform readdir. It happens because when afr_replies_interpret() looks at xattr for pending attributes, it uses alternative bahvior whether it is working on a directory or another object. The decision is based on inode->ia_type, which may be IA_INVAL at that time if we come there from: afr_replies_interpret.() afr_xattrs_are_equal() afr_lookup_metadata_heal_chec() afr_lookup_entry_heal() afr_lookup_cbk() Using replies[i].poststat.ia_type, which is correctly set, works around the problem. BUG: 1129939 Change-Id: Id9ccdd8604f79a69db5f1902697f8913acac50ad Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9831 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: Implementation of bit-rot xlatorVenky Shankar2015-03-241-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the "Signer" -- responsible for signing files with their checksums upon last file descriptor close (last release()). The event notification facility provided by the changelog xlator is made use of. Moreover, checksums are as of now SHA256 hash of the object data and is the only available hash at this point of time. Therefore, there is no special "what hash to use" type check, although it's does not take much to add various hashing algorithms to sign objects with. Signatures are stored in extended attributes of the objects along with the the type of hashing used to calculate the signature. This makes thing future proof when other hash types are added. The signature infrastructure is provided by bitrot stub: a little piece of code that sits over the POSIX xlator providing interfaces to "get or set" objects signature and it's staleness. Since objects are signed upon receiving release() notification, pre-existing data which are "never" modified would never be signed. To counter this, an initial crawler thread is spawned The crawler scans the entire brick for objects that are unsigned or "missed" signing due to the server going offline (node reboots, crashes, etc..) and triggers an explicit sign. This would also sign objects when bit-rot is enabled for a volume and/or after upgrade. Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b BUG: 1170075 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9711 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : enable inspection & resolution of files in split-brainAnuradha2015-03-191-16/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: remove stale index entriesRavishankar N2015-03-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During pre-op phase, the index xlator 1. Creates the entry inside .glusterfs/indices/xattrop 2. Winds the xattrop fop to posix to mark dirty/pending changelogs. If the brick crashes after 1, the xattrop entry becomes stale and never gets removed by shd during subsequent crawls because there is nothing to heal (changelogs are zero). Though the stale entry does not get displayed in the output of 'heal info' command, it nevertheless stays there forever unless a new write transaction is performed on the file. Fix: During index self-heal if afr xattrs are found to be clean (indicated by ret value of 2 on a call to afr_shd_selfheal(), send a dummy post-op with all 0s for the xattr values, which makes the index xlator to unlink the stale entry. Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919 BUG: 1190069 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9714 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>