glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/afr: Unwind xdata_rsp even in case of failures	Pranith Kumar K	2016-06-03	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DHT expects GF_PREOP_CHECK_FAILED to be present in xdata_rsp in case of mkdir failures because of stale layout. But AFR was unwinding null xdata_rsp in case of failures. This was leading to mkdir failures just after remove-brick. Unwind the xdata_rsp in case of failures to make sure the response from brick reaches dht. >BUG: 1340623 >Change-Id: Idd3f7b95730e8ea987b608e892011ff190e181d1 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/14553 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Anuradha Talur <atalur@redhat.com> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> BUG: 1340992 Change-Id: I2641d35a851be692aa223dfea5d082245ac6c2bc Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14633 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Refresh inode for inode-write fops in need	Pranith Kumar K	2016-05-29	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If a named fresh-lookup is done on an loc and the fop fails on one of the bricks or not sent on one of the bricks, but by the time response comes to afr, if the brick is up, 'can_interpret' will be set to false in afr_lookup_done(), this will lead to inode-ctx for that inode to be not set, this can lead to EIO in case of a transaction as it depends on 'readable' array to be available by that point. Fix: Refresh inode for inode-write fops for the ctx to be set if it is not already done at the time of named fresh-lookup or if the file is in split-brain where we need to perform one more refresh before failing the fop to check if the file is still in split-brain or not. >BUG: 1336612 >Change-Id: I5c50b62c8de06129b8516039f7c252e5008c47a5 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/14368 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Backport of http://review.gluster.org/14545 BUG: 1337831 Change-Id: If4465ab8fc506e1f905b623b82a53bdab8f5cffd Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14453 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
*	afr: propagate child up event after timeout	Ravishankar N	2016-04-27	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: http://review.gluster.org/11113 Problem: During mount, afr waits for response from all its children before notifying the parent xlator. In a 1x2 replica volume , if one of the nodes is down, the mount will hang for more than a minute until child down is received from the client xlator for that node. Fix: When parent up is received by afr, start a 10 second timer. In the timer call back, if we receive a successful child up from atleast one brick, propagate the event to the parent xlator. Change-Id: I31e57c8802c1a03a4a5d581ee4ab82f3a9c8799d BUG: 1330855 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14088 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Fix spurious entries in heal info	Pranith Kumar K	2016-04-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Locking schemes in afr-v1 were locking the directory/file completely during self-heal. Newer schemes of locking don't require Full directory, file locking. But afr-v2 still has compatibility code to work-well with older clients, where in entry-self-heal it takes a lock on a special 256 character name which can't be created on the fs. Similarly for data self-heal there used to be a lock on (LLONG_MAX-2, 1). Old locking scheme requires heal info to take sh-domain locks before examining heal-state. If it doesn't take sh-domain locks, then there is a possibility of heal-info hanging till self-heal completes because of compatibility locks. But the problem with heal-info taking sh-domain locks is that if two heal-info or shd, heal-info try to inspect heal state in parallel using trylocks on sh-domain, there is a possibility that both of them assuming a heal is in progress. This was leading to spurious entries being shown in heal-info. Fix: As long as there is afr-v1 way of locking, we can't fix this problem with simple solutions. If we know that the cluster is running newer versions of locking schemes, in those cases we can give accurate information in heal-info. So introduce a new option called 'locking-scheme' which if it is 'granular' will give correct information in heal-info. Not only that, Extra network hops for taking compatibility locks, sh-domain locks in heal info will not be necessary anymore. Thus it improves performance. >BUG: 1322850 >Change-Id: Ia563c5f096b5922009ff0ec1c42d969d55d827a3 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/13873 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Ashish Pandey <aspandey@redhat.com> >Reviewed-by: Anuradha Talur <atalur@redhat.com> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >(cherry picked from commit b6a0780d86e7c6afe7ae0d9a87e6fe5c62b4d792) Change-Id: If7eee18843b48bbeff4c1355c102aa572b2c155a BUG: 1294675 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14039 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Use parallel dir scan functionality	Pranith Kumar K	2016-04-17	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	>BUG: 1221737 >Change-Id: I0ed71a72f0e33bd733723e00a01cf28378c5534e >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/13755 >Reviewed-on: http://review.gluster.org/13992 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Smoke: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> BUG: 1325857 Change-Id: I7c6b2ea065edd7f5dafffeb42fd6c601b4ab8d14 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14010 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Choose local child as source if possible	Pranith Kumar K	2016-03-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is better to choose local brick as source if possible to prevent over the wire read thus saving on bandwidth. Also changed code to not attempt data-heal if 'source' is selected as arbiter. >Change-Id: I9a328d0198422280b13a30ab99545370a301dfea >BUG: 1314150 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/13585 >Smoke: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >Tested-by: Krutika Dhananjay <kdhananj@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> >(cherry picked from commit 2807e3fc005630213ab7ad251fef13d61c07ac6b) Change-Id: I24ea66683f81e238a6c1850664a49fe554011a0a BUG: 1322521 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13860 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
*	afr : Enable auto heal when replica count increases	Anuradha Talur	2016-03-23	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: http://review.gluster.org/12454 This patch is part two change to prevent data loss in a replicate volume on doing a add-brick operation. Problem: After doing add-brick, there is a chance that self heal might happen from the newly added brick rather than the source brick, leading to data loss. Solution: Mark pending changelogs on afr children for the new afr-child so that heal is performed in the correct direction. >Change-Id: I11871e55eef3593aec874f92214a2d97da229b17 >BUG: 1276203 >Signed-off-by: Anuradha Talur <atalur@redhat.com> >Reviewed-on: http://review.gluster.org/12454 >Smoke: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Change-Id: Iae6af44f97e612cb3ee8c642254ec3d15ac063f5 BUG: 1320020 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/13807 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	afr: Add throttled background client-side heals	Ravishankar N	2016-03-22	1	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: http://review.gluster.org/13207 If a heal is needed after inode refresh (lookup, read_txn), launch it in the background instead of blocking the fop (that triggered refresh) until the heal happens. afr_replies_interpret() is modified such that the heal is launched only if atleast one sink brick is up. Max. no of heals that can happen in parallel is configurable via the 'background-self-heal-count' volume option. Any number greater than that is put in a wait queue whose length is configurable via 'heal-wait-queue-leng' volume option. If the wait queue is also full, further heals will be ignored. Default values: background-self-heal-count=8, heal-wait-queue-leng=128 Change-Id: I9a134b2c29d66b70b7b1278811bd504963aabacc BUG: 1313312 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13564 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr: Fix heal-info slow response while IO is in progress	Krutika Dhananjay	2016-02-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: http://review.gluster.org/#/c/13326/ Now heal-info does an open() on the file being examined so that the client at some point sees open-fd count being > 1 and releases the eager-lock so that heal-info doesn't remain blocked forever until IO completes. Change-Id: I7d4a8aa4de459216408b666894ee7bb42e406547 BUG: 1303899 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13348 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr : Readdirp performance enhancement	Anuradha Talur	2015-12-21	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: http://review.gluster.org/#/c/12507/ Things done : 1) during lookup and inode_refresh as part of read_txn, request is sent to detect if heal is required or not. 2) If heal is required, be conservative in setting the readdirp entry inodes to NULL, otherwise don't be. 3) Self-heal-daemon now crawls both indices/xattrop and indices/dirty directory while healing. Change-Id: Ic4a4da63fb7e0726eab5f341a200859b29cf7eb7 BUG: 1287531 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12507 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12853
*	afr: write zeros to sink for non-sparse files	Ravishankar N	2015-10-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/#/c/12371/ Problem: If a file is created with zeroes ('dd', 'fallocate' etc.) when a brick is down, the self-heal does not write the zeroes to the sink after it comes up. Consequenty, there is a mismatch in disk-usage amongst the bricks of the replica. Fix: If we definitely know that the file is not sparse, then write the zeroes to the sink even if the checksums match. Change-Id: Ic739b3da5dbf47d99801c0e1743bb13aeb3af864 BUG: 1275921 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12436 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	afr: fixes in transaction code	Ravishankar N	2015-10-26	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/#/c/12368/ and http://review.gluster.org/#/c/12415/ 1. When winding the pre-op, transaction.pre_op[i] is set. If the pre-op fails, transaction.failed_subvols[i] is set. If if fails on all chidren, we can directly proceed to unlock (via afr_changelog_post_op_now) without trying to wind the write, fail and then going to unlock. 2. 'fop_subvols' seems to be an unused variable, hence removing it. 3. Call local->transaction.wind() only on subvols where pre-op succeeded. Change-Id: I9525628daf48082e979b0093fa0478934495e61f BUG: 1273334 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12399 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	afr: perform replace-brick in a synctask	Ravishankar N	2015-09-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/#/c/12169/ Problem: replace-brick setxattr is not performed inside a synctask. This can lead to hangs if the setxattr is executed by epoll thread, as the epoll thread will be waiting for replies to come where as epoll thread is the thread that needs to epoll_ctl for reading from socket and listen. Fix: Move replace-brick to synctask to prevent epoll thread hang. This patch is in line with the fix performed in http://review.gluster.org/#/c/12163/ Change-Id: I7284930ead9b0adaa0257f21ec2d893fa5a7146f BUG: 1262547 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12172 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	afr : get split-brain-status in a synctask	Anuradha Talur	2015-09-15	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: http://review.gluster.org/#/c/12163/ On executing `getfattr -n replica.split-brain-status <file>` on mount, there is a possibility that the mount hangs. To avoid this hang, fetch the split-brain-status of a file in synctask. >Change-Id: I87b781419ffc63248f915325b845e3233143d385 >BUG: 1262345 >Signed-off-by: Anuradha Talur <atalur@redhat.com> Change-Id: I9f4f4b54e108d3a0017264353b8272e072170c16 BUG: 1262547 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12166 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr: Make [f]xattrop metadata transaction	Pranith Kumar K	2015-08-31	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.com/11809 Problem: When xlators above afr do [f]xattrop when one of the bricks is down, after the brick comes backup, the metadata is not healed because [f]xattrop is not considered a transaction. Fix: Treat [f]xattrop as transaction so that changes done by xlators above afr are marked for heal when some of the bricks were down at the time of [f]xattrop. BUG: 1248890 Change-Id: Ibe69aa0ca6be9b4b4134dc2879b306e2e9c4cde8 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11810 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
*	afr: Do not wind statfs to arbiter brick	Ravishankar N	2015-08-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: http://review.gluster.org/#/c/11857/ Problem: AFR serves statfs from the brick having the least free space available. Since the size to be allocated to the arbiter brick in a 3 way replica is supposed to be considerably lesser than the other 2 bricks, statfs will be served from this brick which is incorrect. Fix: Don't serve statfs from the arbiter brick. Change-Id: Ia2d2402ba1e8f5d96831f71b3f8337f241e6753b BUG: 1251380 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11858 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	afr: Block fops when file is in split-brain	Ravishankar N	2015-06-26	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/#/c/11371/ For directories, block metadata FOPS. For non-directories, block data and metadata FOPS. Do not block entry FOPS. Change-Id: I23963648291ec1136a5a35052ed0b7bc1b3059e7 BUG: 1235934 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11420 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Pick gfid from poststat during fresh lookup for read child ↵	Krutika Dhananjay	2015-06-25	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	calculation Backport of: http://review.gluster.org/11373 Change-Id: I3ddc70cb0e7dbd1ef8adb352393b5ec16464fc94 BUG: 1212842 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11391 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr : Prevent inode-evict during split-brain resolution	Anuradha	2015-05-09	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: http://review.gluster.org/#/c/10134/ 1) Provided setfattr command to set timeout for split-brain choice. 2) If split-brain inspection/resolution is being done from the mount for a file, ref the inode when split-brain-choice is set. This inode will be unconditionally unref-ed after timeout seconds set by the user/default otherwise. 3) Updated the doc and testcase to reflect the changes. Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1 BUG: 1219388 Reviewed-on: http://review.gluster.org/10134 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10679
*	afr: add arbitration support	Ravishankar N	2015-05-05	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/#/c/10258/ Add logic in afr to work in conjunction with the arbiter xlator when a replica 3 arbiter volume is created. More specifically, this patch: * Enables full locks for afr data transaction for such volumes. * Removes the upfront marking of pending xattrs at the time of pre-op and defer it to post-op. (This is an arbiter independent change and is made for all afr transactions.) * After pre-op stage, check if we can proceed with the fop stage without ending up in split-brain by examining the changelog xattrs. * Unwinds the fop with failure if only one source was available at the time of pre-op and the fop happened to fail on particular source brick. * Skips data self-heal if arbiter brick is the only source available. * Adds the arbiter-count option to the shd graph. This patch is a part of the arbiter logic implementation for 3 way AFR details of which can be found at http://review.gluster.org/#/c/9656/ Change-Id: I9603db9d04de5626eb2f4d8d959ef5b46113561d BUG: 1217689 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10514 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	arbiter: load arbiter xlator on every 3rd brick of a replica 3 AFR subvol	Ravishankar N	2015-05-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/10257 Logic for adding the 'glusterd_brickinfo->group' member and using it to find the brick positon has been taken from http://review.gluster.org/#/c/9919. Thanks to Jeff Darcy for that. This patch is a part of the arbiter logic implementation for 3 way AFR details of which can be found at http://review.gluster.org/#/c/9656/ Change-Id: Idbfe4f29ee8e098e0102def8f38b32314316b188 BUG: 1217689 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10479 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
*	libxlator: Change marker xattr handling interface	Pranith Kumar K	2015-03-25	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Changed the implementation of marker xattr handling to take just a function which populates important data that is different from default 'gauge' values and subvolumes where the call needs to be wound. - Removed duplicate code I found while reading the code and moved it to cluster_marker_unwind. Removed unused structure members. - Changed dht/afr/stripe implementations to follow the new implementation - Implemented marker xattr handling for ec. Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4 BUG: 1200372 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9892 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr : enable inspection & resolution of files in split-brain	Anuradha	2015-03-19	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Make read child match check in afr optional	Krutika Dhananjay	2015-03-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Having this particular check which was introduced by commit c78998c39f0857ea7aacba360632c148afc54a55 causes a drop in performance in readdirp. So the behavior is made configurable with this patch. Change-Id: I2858fc18b3539df7aa6d3f489e0d5cfaeb8a9b3c BUG: 1202669 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9917 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr: Implementation of quorum-reads	Pranith Kumar K	2015-03-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Provide a way of disabling reads when quorum is not met. Change-Id: Ic4f57c2b87a0b8514600759de3a7a47e217fe3b5 BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9543 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	libglusterfs: Moved common functions as utils in syncop/common-utils	Pranith Kumar K	2015-02-27	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These will be used by both afr and ec. Moved syncop_dirfd, syncop_ftw, syncop_dir_scan functions also into syncop-utils.c Change-Id: I467253c74a346e1e292d36a8c1a035775c3aa670 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9740 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr : provide split-brain info by using getxattr	Anuradha	2015-02-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is one part to enable users analyze and resolve split-brain. Problem : To know if a file is in data/metadata split-brain Solution : Performing "getfattr -n afr.split-brain-status <path-to-file>" from the mount provides this information. Also provides the list of afr children to analyse to get more information. Change-Id: I4d9b429794759a906371416cb84c84a212e2c7b9 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9633 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr: split-brain resolution CLI	Ravishankar N	2015-01-15	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend the AFR heal command to include automated split-brain resolution. This patch [3/3] is the final patch for afr automated split-brain resolution implementation. "gluster volume heal <VOLNAME> [full \| statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] \|info [healed \| heal-failed \| split-brain]\| split-brain {bigger-file <FILE> \|source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]" The new additions being: 1.gluster volume heal <VOLNAME> split-brain bigger-file <FILE> Locates the replica containing the FILE, selects bigger-file as source and completes heal. 2.gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE> Selects <FILE> present in <HOSTNAME:BRICKNAME> as source and completes heal. 3.gluster volume heal <VOLNAME> split-brain <HOSTNAME:BRICKNAME> Selects all split-brained files in <HOSTNAME:BRICKNAME> as source and completes heal. Note: <FILE> can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file, which sometimes gets displayed in the heal info command's output. Entry/gfid split-brain resolution is not supported. Example can be found in the test case. Change-Id: I4649733922d406f14f28ee9033a5cb627b9538b3 BUG: 1136769 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9377 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: serialize inode locks	Pranith Kumar K	2015-01-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Afr winds inodelk calls without any order, so blocking inodelks from two different mounts can lead to dead lock when mount1 gets the lock on brick-1 and blocked on brick-2 where as mount2 gets lock on brick-2 and blocked on brick-1 Fix: Serialize the inodelks whether they are blocking inodelks or non-blocking inodelks. Non-blocking locks also need to be serialized. Otherwise there is a chance that both the mounts which issued same non-blocking inodelk may endup not acquiring the lock on any-brick. Ex: Mount1 and Mount2 request for full length lock on file f1. Mount1 afr may acquire the partial lock on brick-1 and may not acquire the lock on brick-2 because Mount2 already got the lock on brick-2, vice versa. Since both the mounts only got partial locks, afr treats them as failure in gaining the locks and unwinds with EAGAIN errno. Change-Id: Ie6cc3d564638ab3aad586f9a4064d81e42d52aef BUG: 1176008 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9372 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	afr: stop encoding subvolume id in readdir d_off	Anand Avati	2014-12-26	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of encoding d_off in AFR is to indicate the selected subvolume for the first readdir, and continue all further readdirs of the session on the same subvolume. This is required because, unlike files, dir d_offs are specific to the backend and cannot be re-used on another subvolume. The d_off transformation encodes the subvolume id and prevents such invalid use of d_offs on other servers. However, this approach could be quite wasteful of precious d_off bit-space. Unlike DHT, where server id can change from entry to entry and thus encoding the server id in the transformed d_off is necessary, we could take a slightly relaxed approach in AFR. The approach is to save the subvolume where the last readdir request was sent in the fd_ctx. This consumes constant space (i.e no per-entry cache), and serves the purpose of avoiding d_off "misuse" (i.e using d_off from one server on another). The compromise here is NFS resuming readdir from a non-0 cookie after an extended delay (either anonymous FD has been reclaimed, or server has restarted). In such cases a subvolume is picked freshly. To make this fresh picking more deterministic (i.e, to pick the same subvolume whenever possible, even after reboots), the function afr_hash_child (used by afr_read_subvol_select_by_policy) is modified to skip all dynamic inputs (i.e PID) for the case of directories. Change-Id: I46ad95feaeb21fb811b7e8d772866a646330c9d8 BUG: 1163161 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/9332 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr : Change in volume heal info command	Anuradha	2014-12-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster volume heal <volname> info command will now also display if the files listed (in the output of the command) are in split-brain or possibly being healed. This patch also fixes build warning that occurs. Change-Id: I1fc92e62137f23b2b9ddf6e05819cee6230741d1 BUG: 1163804 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9119 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	features/marker: Filter internal xattrs in lookup	Pranith Kumar K	2014-11-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Afr should ignore quota-size-key as part of self-heal but should heal quota-limit key. Change-Id: Ic0b06bd20a563a00d6bfdc2dc5a76c661e533ecb BUG: 1161106 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9061 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr: Fix spurious metadata self-heals	Pranith Kumar K	2014-09-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Added logging for metadata and data self-heals which helped in debugging this issue. - Added checks to skip self-heals when no sinks are available to heal Change-Id: I0d50dceb84cd9ad4fe00e0b749ddf7d4ff42348a BUG: 1128721 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8709 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Fixed mem leaks in self-heal code path.	Anuradha	2014-09-23	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AFR_STACK_RESET previously didn't cleanup afr_local_t, leading to memory leaks. With this patch, cleanup is done. All credit goes to Pranith Kumar Karampuri. Change-Id: I3c727ff4bb323dccb81da4b3168ac69bb340d17d BUG: 1145471 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8821 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr: Handle EAGAIN properly in inodelk	Pranith Kumar K	2014-09-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When one of the brick is taken down and brough back up in a replica pair, locks on that brick will be allowed. Afr returns inodelk success even when one of the bricks already has the lock taken. Fix: If any brick returns EAGAIN return failure to parent xlator. Change-Id: I5b842d0fc094359cc4231494053d2bfeb606bbbe BUG: 1141539 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8710 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr: perform list-xattr during lookup	Ravishankar N	2014-09-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Detect and heal mismatching user extended attributes during lookup. 'Forward' port of http://review.gluster.org/#/c/7444/ Change-Id: Id03c9746f083ffd3014711d0b3a2e5a71a45eed4 BUG: 1134691 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/8558 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr : Mark pending changelog xattrs for new creations	Anuradha	2014-09-03	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	Based on type of file, set appropriate pending changelogs for new entries. Change-Id: Ifd124bf9bc54b996ce83ab9f39d03b3ccca7eb3c BUG: 1130892 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8555 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr: Fix dict_t leaks	Krutika Dhananjay	2014-08-28	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	dict_t objects that are ref'd in alloca'd "replies" in afr_replies_copy() are not unref'd after "replies" go out of scope. Change-Id: Id5a6ca3c17a8de72b94b3e0f92165609da5a36ea BUG: 1134221 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8553 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Fix all locked_on bricks are sinks check in self-heals	Pranith Kumar K	2014-08-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Counts may give wrong results when the number of bricks is > 2. If the locks are acquired on one source and sink, but the source accuses even the down sink then there will be 2 sinks and lock is acquired on 2 bricks so even when there is a clear source and sink **_finalize_source functions think the file/directory is in split-brain. Fix: Check that all the bricks which are locked are sinks. Change-Id: Ia43790e8e1bfb5e72a3d0b56bcad94abd0dc58ab BUG: 1128721 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8456 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/afr: Perform gfid heal inside locks.	Pranith Kumar K	2014-08-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Allowing lookup with 'gfid-req' will lead to assigning gfid at posix layer. When two mounts perform lookup in parallel that can lead to both bricks getting different gfids leading to gfid-mismatch/EIO for the lookup. Fix: Perform gfid heal inside lock. BUG: 1129529 Change-Id: I20c6c5e25ee27eeb906bff2f4c8ad0da18d00090 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8512 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
*	Fix resolution issues across fuse/server/afr	Pranith Kumar K	2014-06-14	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problems with fuse/server: Fuse loc touch up sets loc->name even when pargfid is not known. Server lookup does (pargfid, name) based lookup when name is set ignoring the gfid. Because of this server resolver finds that the lookup came on (null-pargfid, name) and fails the lookup with EINVAL. Fix: Don't set loc->name in loc_touchup if the pargfid is not known. Did the same even for server-resolver Problem with afr: Lets say there is a directory hierarchy a/b/c/d on the mount and the user is cd'ed into the directory. Bring down one of the bricks of replica and remove all directories/files to simulate disk replacement on that brick. Now this brick is brought back up. Creates on the cd'ed directory fail with ESTALE. Basically before sending a create of 'f' inside 'd', fuse sends a lookup to make sure the file is not present. On one of the bricks 'd' is present and 'f' is not so it sends ENOENT as response. On the new brick 'd' itself is not present. So it sends ESTALE. In afr ESTALE is considered to be special errno on witnessing which lookup has to fail. And ESTALE is given more priority than ENOENT. Due to these reasons lookup fails with ESTALE rather than ENOENT. Since lookup didn't fail with ENOENT, 'create' can't be issued so the command is failed with ESTALE. Solution: Afr needs to consider ESTALE errno normally and ENOENT needs to be given more priority so that operations like create can proceed even when only one of the brick is up and running. Whenever client xlator identifies that gfid-changed, it sets that information in lookup xdata. Afr uses this information to fail the lookup with ESTALE so that top xlator can send fresh lookup. Change-Id: Ica6ce01baef08620154050a635e6f97d51029ef6 BUG: 1106408 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8015 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr: move messages to new logging framework	Ravishankar N	2014-05-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change important (from a diagnostics point of view) log messages to use the gf_msg() framework. Change-Id: I0a58184bbb78989db149e67f07c140a21c781bc2 BUG: 1075611 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7784 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/afr: Fix bugs in quorum implementation	Pranith Kumar K	2014-05-14	1	-16/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Have common place to perform quorum fop wind check - Check if fop succeeded in a way that matches quorum to avoid marking changelog in split-brain. BUG: 1066996 Change-Id: Ibc5b80e01dc206b2abbea2d29e26f3c60ff4f204 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7600 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
*	cluster/afr: Remove eager-lock stub on finodelk failure	Pranith Kumar K	2014-04-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: For write fops afr's transaction eager-lock init adds transactions that can share eager-lock to fdctx list. But if eager-lock finodelk fop fails the stub remains in the list. This could later lead to corruption of the list and lead to infinite loop on the list leading to a mount hang. Fix: Remove the stub when finodelk fails. Change-Id: I0ed4bc6b62f26c5e891c1181a6871ee6e4f4f5fd BUG: 1063190 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6944 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	cluster/afr: refactor	Anand Avati	2014-03-22	1	-532/+291
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Remove client side self-healing completely (opendir, openfd, lookup) - Re-work readdir-failover to work reliably in case of NFS - Remove unused/dead lock recovery code - Consistently use xdata in both calls and callbacks in all FOPs - Per-inode event generation, used to force inode ctx refresh - Implement dirty flag support (in place of pending counts) - Eliminate inode ctx structure, use read subvol bits + event_generation - Implement inode ctx refreshing based on event generation - Provide backward compatibility in transactions - remove unused variables and functions - make code more consistent in style and pattern - regularize and clean up inode-write transaction code - regularize and clean up dir-write transaction code - regularize and clean up common FOPs - reorganize transaction framework code - skip setting xattrs in pending dict if nothing is pending - re-write self-healing code using syncops - re-write simpler self-heal-daemon Change-Id: I1e4080c9796c8a2815c2dab4be3073f389d614a8 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6010 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr: Unable to self heal symbolic links	Venkatesh Somyajulu	2014-01-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Under the entry self heal, readlink is done at the source and sink. When readlink is done at the sink, because link is not present at the sink, afr expects ENOENT. AFR translator takes decisions for new link creation based on ENOENT but server translator is modified to return ESTALE because of which afr xlator is not able to heal. Fix: The check for inode absence at server includes ESTALE as well. Change-Id: I319e4cb4156a243afee79365b7b7a5a7823e9a24 BUG: 1046624 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6599 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	cluster/afr: Add foreground self-heal launch capability through lookup	Pranith Kumar K	2013-12-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Also renamed allow-sh-for-running-transaction -> attempt-self-heal Change-Id: I134cc79e663b532e625ffc342c59e49e71644ab3 BUG: 1039544 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6463 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: venkatesh somyajulu <vsomyaju@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	zerofill: Change the type of len argument of glfs_zerofill() to off_t	Bharata B Rao	2013-11-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterfs: zerofill support	M. Mohan Kumar	2013-11-10	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	afr: check for split-brain before proceeding with fops	Ravishankar N	2013-10-16	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \|	Bail out of fops if split brain has been detected during lookup Change-Id: Id387dbb1a25eec4a121dedceadc6069bdea24b5d BUG: 1010834 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/5988 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>