summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr/src/afr-read-txn.c
Commit message (Collapse)AuthorAgeFilesLines
* afr: add checks for allowing lookupsRavishankar N2017-11-181-2/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: In an arbiter volume, lookup was being served from one of the sink bricks (source brick was down). shard uses the iatt values from lookup cbk to calculate the size and block count, which in this case were incorrect values. shard_local_t->last_block was thus initialised to -1, resulting in an infinite while loop in shard_common_resolve_shards(). Fix: Use client quorum logic to allow or fail the lookups from afr if there are no readable subvolumes. So in replica-3 or arbiter vols, if there is no good copy or if quorum is not met, fail lookup with ENOTCONN. With this fix, we are also removing support for quorum-reads xlator option. So if quorum is not met, neither read nor write txns are allowed and we fail the fop with ENOTCONN. Change-Id: Ic65c00c24f77ece007328b421494eee62a505fa0 BUG: 1467250 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr: do not mention split-brain in log message in read_txnRavishankar N2017-03-201-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | I am seeing a lot of messages in qe/customer logs where read_txn complains that file is possibly in split-brain because of no readable subvol being found, does inode refresh and then there is no split-brain message post the inode refresh. This means that a lookup was not issued on the indoe to populate 'readable' or it can mean one brick is source for data and the other for metadata, making readable to be zero (because readable=intersection of (data,metadata readable) since commit 7a1c1e290470149696. Since we anyway log actual split-brains post inode-refresh, move this message to DEBUG log level. Change-Id: Idb88b8ea362515279dc9b246f06b6b646c6d8013 BUG: 1433838 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16879 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Do not log of split-brain when there isn't oneKrutika Dhananjay2017-01-111-18/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | * Even on errors like ENOENT, AFR logs split-brain after read-txn refresh, introduced by commit a07ddd8f. This can be a cause of much panic and confusion and needs to be fixed. * Also fixed this issue in write-txns. * Fixed afr read txns to log about split-brain only after knowing that there is no split-brain choice configured. * Removed code duplication * Fixed incorrect passing of error code in afr_write_txn_refresh_done() (the function was passing -0 as errno to gf_msg(). Change-Id: I354f454ce5bf0e5f00bc27916eb597367cb7d927 BUG: 1411625 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16362 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Ignore event_generation checks post inode refresh for write txnsRavishankar N2016-12-221-0/+1
| | | | | | | | | | | | | | | | | | | Before http://review.gluster.org/#/c/15673/, after inode refresh, we failed read txns in case of EIO or event_generation being zero. For write transactions, the check was only for EIO. 15673 re-factored the code to fail both read and write when event_generation=0. This seems to have caused a regression as explained in the BZ. This patch restores that behaviour in afr_txn_refresh_done(). Change-Id: Ib8e116506badce6f58b55827dbe403d95069d744 BUG: 1406224 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16205 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* afr: Fix the EIO that can occur in afr_inode_refresh as a resultPoornima G2016-11-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of cache invalidation(upcall). Issue: ------ When a cache invalidation is recieved as a result of changing pending xattr, the read_subvol is reset. Consider the below chain of execution: CHILD_DOWN ... afr_readv ... afr_inode_refresh ... afr_inode_read_subvol_reset <- as a result of pending xattr set by some other client GF_EVENT_UPCALL will be sent afr_refresh_done -> this results in an EIO, as the read subvol was reset by the end of the afr_inode_refresh Solution: --------- When GF_EVENT_UPCALL is recieved, instead of resetting read_subvol, set a variable need_refresh in inode_ctx, the next time some one starts a txn, along with event gen, need_rrefresh also needs to be checked. Change-Id: Ifda21a7a8039b8874215e1afa4bdf20f7d991b58 BUG: 1396952 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15892 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: allow I/O when favorite-child-policy is enabledRavishankar N2016-11-271-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently, I/O on a split-brained file fails even when the favorite-child-policy is set until the self-heal is complete. Fix: If a valid 'source' is found using the set favorite-child-policy, inspect and reset the afr pending xattrs on the 'sinks' (inside appropriate locks), refresh the inode and then proceed with the read or write transaction. The resetting itself happens in the self-heal code and hence can also happen in the client side background-heal or by the shd's index-heal in addition to the txn code path explained above. When it happens in via heal, we also add checks in undo-pending to not reset the sink xattrs again. Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626 BUG: 1386188 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin@ubisoft.com> Reviewed-on: http://review.gluster.org/15673 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Give option to do consistent-ioPranith Kumar K2016-08-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: When tiering/rebalance does migrations and afr with 2-way replica is in picture, migration can read stale data if the source brick goes down and writes to the destination. After this deletion of the file leads to permanent loss of data after migration. Fix: Rebalance/tiering should migrate only when the data is definitely not stale. So introduce an option in afr called consistent-io which will be enabled in migration daemons. BUG: 1306398 Change-Id: I750f65091cc70a3ed4bf3c12f83d0949af43920a Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13425 Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* afr:Don't wind reads for files in metadata split-brainRavishankar N2016-06-241-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: For a read on a file in metadata split-brain: 1.lookup_done resets event_generation to zero. 2. readv is issued, goes to inode refresh due to mismatching event_gen. 3. After refresh is successful, we update event_generation, data and metdata readable. 3. We then call afr_read_txn_refresh_done() which in turn calls afr_inode_get_readable() but doesn't check for EIO. So afr_readv_wind is called with local->readable (which is populated with data_readable), thus winding the read to a brick. 4. Also, further parallel reads that come directly go to the wind path because there is no inode_refresh needed. Fix: 1.For any afr_read_txn(), readable must be an intersection of data and metadata readable. 2.Check for EIO in afr_read_txn_refresh_done(). Change-Id: I22dd221fdfaf96d7aced2f474e28ed1337d69f0e BUG: 1305031 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13389 Reviewed-by: Ashish Pandey <aspandey@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Refresh inode for inode-write fops in needPranith Kumar K2016-05-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: If a named fresh-lookup is done on an loc and the fop fails on one of the bricks or not sent on one of the bricks, but by the time response comes to afr, if the brick is up, 'can_interpret' will be set to false in afr_lookup_done(), this will lead to inode-ctx for that inode to be not set, this can lead to EIO in case of a transaction as it depends on 'readable' array to be available by that point. Fix: Refresh inode for inode-write fops for the ctx to be set if it is not already done at the time of named fresh-lookup or if the file is in split-brain where we need to perform one more refresh before failing the fop to check if the file is still in split-brain or not. BUG: 1336612 Change-Id: I5c50b62c8de06129b8516039f7c252e5008c47a5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14368 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr : Examine data/metadata readable for read-subvolAnuradha Talur2015-08-251-3/+13
| | | | | | | | | | | | | | | | | | During lookup and discover, currently read_subvol is based only on data_readable. read_subvol should be decided based on both data_readable and metadata_readable. Credits to Ravishankar N for the logic of afr_first_up_child from http://review.gluster.org/10905/ . Change-Id: I98580b23c278172ee2902be08eeaafb6722e830c BUG: 1240244 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11551 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix incorrect logging in read transactionsKrutika Dhananjay2015-07-261-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | afr_read_txn_refresh_done() at its entry point can fail for reasons like ENOENT/ESTALE but seldom due to EIO, which is something _AFR_ would internally generate and not receive in response from a child translator. AFR is reporting "split-brain" for _any_ kind of failure in read txn, of the following kind: [2015-07-07 18:04:34.787612] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-3: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error] This patch fixes such misleading errors. To-Do: Avoid logging EIO if/when split-brain choice is set. Will do that as part of a separate commit. Change-Id: Ib513c75168f7026118ad5b3f0b35e9dd498cfe1e BUG: 1246052 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11756 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: Porting messages to new logging frameworkarao2015-06-271-5/+7
| | | | | | | | | | | | | updated Change-Id: I94ac7b2cb0d43a82cf0eeee21407cff9b575c458 BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com> Reviewed-on: http://review.gluster.org/9897 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Block fops when file is in split-brainRavishankar N2015-06-261-14/+8
| | | | | | | | | | | | | For directories, block metadata FOPS. For non-directories, block data and metadata FOPS. Do not block entry FOPS. Change-Id: Id7f656f4a513b9d33c457dd7f2d58028dbef8e61 BUG: 1235007 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11371 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Pick gfid from poststat during fresh lookup for read child ↵Krutika Dhananjay2015-06-241-2/+2
| | | | | | | | | | | calculation Change-Id: I12c1e4f67f4ec4affbe13d7daf871044a8a2a12e BUG: 1235216 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11373 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: allow readdir to proceed for directories in split-brainRavishankar N2015-05-281-18/+22
| | | | | | | | | | | | | | | | | | | Problem: afr_read_txn() bails out if read_subvol==-1. This meant that for directories that were in entry split-brain, FOPS like readdir, access, stat etc were not allowed. Fix: Except for getxattr, all other FOPS are wound on the first up child of afr. Change-Id: Iacec8fbb1e75c4d2094baa304f62331c81a6f670 BUG: 1221481 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10776 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: NetBSD Build System
* cluster/afr : enable inspection & resolution of files in split-brainAnuradha2015-03-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Implementation of quorum-readsPranith Kumar K2015-03-051-0/+10
| | | | | | | | | | | Provide a way of disabling reads when quorum is not met. Change-Id: Ic4f57c2b87a0b8514600759de3a7a47e217fe3b5 BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9543 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Fix spurious metadata self-healsPranith Kumar K2014-09-241-0/+3
| | | | | | | | | | | | | - Added logging for metadata and data self-heals which helped in debugging this issue. - Added checks to skip self-heals when no sinks are available to heal Change-Id: I0d50dceb84cd9ad4fe00e0b749ddf7d4ff42348a BUG: 1128721 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8709 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: move messages to new logging frameworkRavishankar N2014-05-171-3/+4
| | | | | | | | | | | | | | Change important (from a diagnostics point of view) log messages to use the gf_msg() framework. Change-Id: I0a58184bbb78989db149e67f07c140a21c781bc2 BUG: 1075611 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7784 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: refactorAnand Avati2014-03-221-0/+239
- Remove client side self-healing completely (opendir, openfd, lookup) - Re-work readdir-failover to work reliably in case of NFS - Remove unused/dead lock recovery code - Consistently use xdata in both calls and callbacks in all FOPs - Per-inode event generation, used to force inode ctx refresh - Implement dirty flag support (in place of pending counts) - Eliminate inode ctx structure, use read subvol bits + event_generation - Implement inode ctx refreshing based on event generation - Provide backward compatibility in transactions - remove unused variables and functions - make code more consistent in style and pattern - regularize and clean up inode-write transaction code - regularize and clean up dir-write transaction code - regularize and clean up common FOPs - reorganize transaction framework code - skip setting xattrs in pending dict if nothing is pending - re-write self-healing code using syncops - re-write simpler self-heal-daemon Change-Id: I1e4080c9796c8a2815c2dab4be3073f389d614a8 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6010 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>