summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr/src/afr-self-heal-common.c
Commit message (Collapse)AuthorAgeFilesLines
...
* cluster/afr: Fix witness counting code in src/sink detectionPranith Kumar K2016-04-111-7/+19
| | | | | | | | | | | | | | | | | | | | | | | Problem: In afr-v1 pre-op, xattrop increments self xattr first then it increments the value on rest. In post-op, xattr value is decreased first on rest and at last it gets decremented on self. So for a possible operation to be witnessed i.e. a fop is seen by the brick it is important to have at least 1 pending op because without completing pre-op fop won't come. The other possibility is when fop completes but at the time of post-op after decrementing pending counts on others just before decrementing its own pending count, the brick dies. Fix: Fix witness detection code in afr_self_heal_find_direction() BUG: 1322253 Change-Id: Ia7e76482c0a46e775e269bb96ec1b9490a3ac18f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13811 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Don't lookup/forget inodesPranith Kumar K2016-03-311-18/+2
| | | | | | | | | | | | | | | | | | | | | | | | Problem: All inodes that are looked-up are always forgotten without fail in afr removing the benefits of them being in lru. This same code can cause crashes if between inode_lookup, inode_forget in afr if the top xlator does inode_forget(0). Fix: Don't use lookup/forget in afr. No benefits are there at the moment for keeping this code. It is impossible to prevent top xlators to do inode_forget(0). Found similar instances in ec and removed them even though those code paths are not going to be executed in any place other than heal-daemon. BUG: 1321554 Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13834 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com>
* afr: add mtime based split-brain resolution to CLIRavishankar N2016-03-291-10/+71
| | | | | | | | | | | | | | | | | | | Extended the CLI to include support for split-brain resolution based on mtime. The command syntax is: $:gluster volume heal <VOLNAME> split-brain latest-mtime <FILE> where <FILE> can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file. Change-Id: I7a16f72ff1a4495aa69f43f22758a9404e958b4f BUG: 1321322 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13828 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Choose local child as source if possiblePranith Kumar K2016-03-111-0/+29
| | | | | | | | | | | | | | | | | It is better to choose local brick as source if possible to prevent over the wire read thus saving on bandwidth. Also changed code to not attempt data-heal if 'source' is selected as arbiter. Change-Id: I9a328d0198422280b13a30ab99545370a301dfea BUG: 1314150 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13585 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Krutika Dhananjay <kdhananj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* afr: Add throttled background client-side healsRavishankar N2016-03-011-0/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a heal is needed after inode refresh (lookup, read_txn), launch it in the background instead of blocking the fop (that triggered refresh) until the heal happens. afr_replies_interpret() is modified such that the heal is launched only if atleast one sink brick is up. Max. no of heals that can happen in parallel is configurable via the 'background-self-heal-count' volume option. Any number greater than that is put in a wait queue whose length is configurable via 'heal-wait-queue-leng' volume option. If the wait queue is also full, further heals will be ignored. Default values: background-self-heal-count=8, heal-wait-queue-leng=128 Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70 BUG: 1297172 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13207 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* heal : Changed heal info to process all indices directoriesAnuradha Talur2015-12-021-4/+15
| | | | | | | | | | Change-Id: Ida863844e14309b6526c1b8434273fbf05c410d2 BUG: 1250803 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12658 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : Readdirp performance enhancementAnuradha Talur2015-11-301-1/+8
| | | | | | | | | | | | | | | | | | | | Things done : 1) during lookup and inode_refresh as part of read_txn, request is sent to detect if heal is required or not. 2) If heal is required, be conservative in setting the readdirp entry inodes to NULL, otherwise don't be. 3) Self-heal-daemon now crawls both indices/xattrop and indices/dirty directory while healing. Change-Id: Ic4a4da63fb7e0726eab5f341a200859b29cf7eb7 BUG: 1250803 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12507 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Mark self-heal fops as internalPranith Kumar K2015-11-181-1/+1
| | | | | | | | | | Change-Id: I8ae7af266d3e00460f0cfdc9389a926e5f2fee36 BUG: 1282761 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12598 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/afr : Remove stale indicesAshish Pandey2015-11-161-1/+22
| | | | | | | | | Change-Id: Iba23338a452b49dc9fe6ae7b4ca108ebc377fe42 BUG: 1270668 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/12336 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* dict: dict_set_bin() should never free the pointer on errorNiels de Vos2015-07-241-2/+6
| | | | | | | | | | | | | | | | | | | | | | dict_set_bin() is handling the pointer that it passed inconsistently. Depending on the errors that can occur, the pointer passed to the dict can be free'd, but there is no guarantee. It is cleaner to have the caller free the pointer that allocated it and dict_set_bin() returned an error. When dict_set_bin() returned success, the given pointer will be free'd when dict_unref() calls data_destroy(). Many callers of dict_set_bin() already take care of free'ing the pointer on error. The ones that did not, are corrected with this change too. Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966 BUG: 1242280 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: Porting messages to new logging frameworkarao2015-06-271-28/+33
| | | | | | | | | | | | | updated Change-Id: I94ac7b2cb0d43a82cf0eeee21407cff9b575c458 BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com> Reviewed-on: http://review.gluster.org/9897 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr : set pending xattrs for replaced brickAnuradha2015-06-251-8/+8
| | | | | | | | | | | | | | | | | | | | | | This patch is part two change to prevent data loss in a replicate volume on doing a replace-brick commit force operation. Problem: After doing replace-brick commit force, there is a chance that self heal might happen from the replaced (sink) brick rather than the source brick leading to data loss. Solution: Mark pending changelogs on afr children for the replaced afr-child so that heal is performed in the correct direction. Change-Id: Icb9807e49b4c1c4f1dcab115318d9a58ccf95675 BUG: 1207829 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10448 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* afr: honour selfheal enable/disable volume set optionsRavishankar N2015-06-031-3/+8
| | | | | | | | | | | | | | | | | | | | afr-v1 had the following volume set options that are used to enable/ disable self-heals from happening in AFR xlator when loaded in the client graph: cluster.metadata-self-heal cluster.data-self-heal cluster.entry-self-heal In afr-v2, these 3 heals can happen from the client if there is an inode refresh. This patch allows such heals to proceed only if the corresponding volume set options are set to true. Change-Id: I8d97d6020611152e73a269f3fdb607652c66cc86 BUG: 1226507 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11012 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-291-5/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-041-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/afr : enable inspection & resolution of files in split-brainAnuradha2015-03-191-16/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: remove stale index entriesRavishankar N2015-03-171-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During pre-op phase, the index xlator 1. Creates the entry inside .glusterfs/indices/xattrop 2. Winds the xattrop fop to posix to mark dirty/pending changelogs. If the brick crashes after 1, the xattrop entry becomes stale and never gets removed by shd during subsequent crawls because there is nothing to heal (changelogs are zero). Though the stale entry does not get displayed in the output of 'heal info' command, it nevertheless stays there forever unless a new write transaction is performed on the file. Fix: During index self-heal if afr xattrs are found to be clean (indicated by ret value of 2 on a call to afr_shd_selfheal(), send a dummy post-op with all 0s for the xattr values, which makes the index xlator to unlink the stale entry. Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919 BUG: 1190069 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9714 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Add self-heal-daemon command handlersPranith Kumar K2015-03-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the changes required in ec xlator to handle index/full heal. Index healer threads: Ec xlator start an index healer thread per local brick. This thread keeps waking up every minute to check if there are any files to be healed based on the indices kept in index directory. Whenever child_up event comes, then also this index healer thread wakes up and crawls the indices and triggers heal. When self-heal-daemon is disabled on this particular volume then the healer thread keeps waiting until it is enabled again to perform heals. Full healer threads: Ec xlator starts a full healer thread for the local subvolume provided by glusterd to perform full crawl on the directory hierarchy to perform heals. Once the crawl completes the thread exits if no more full heals are issued. Changed xl-op prefix GF_AFR_OP to GF_SHD_OP to make it more generic. Change-Id: Idf9b2735d779a6253717be064173dfde6f8f824b BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9787 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Do not increment healed_count if no healing was performedKrutika Dhananjay2015-03-041-18/+41
| | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: When file modifications are happening while index heal is launched, index healer could pick up entries which appeared in indices/xattrop transiently during the course of the operations on the mount point, and do not really need any heal. This will cause index healer to keep doing index-heal in a loop as long as it finds this entry, by believing that it did successfully heal some gfids even when it didn't. FIX: afr_selfheal() now returns a 1 to indicate that it did not (need to) heal a given gfid. afr_shd_selfheal() will not increment healed_count whenever afr_selfheal() returns a 1. Change-Id: I0d97e11392a032a852e8c6508f691300ef0e5b98 BUG: 1194305 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9713 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: split-brain resolution CLIRavishankar N2015-01-151-1/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the AFR heal command to include automated split-brain resolution. This patch [3/3] is the final patch for afr automated split-brain resolution implementation. "gluster volume heal <VOLNAME> [full | statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed | heal-failed | split-brain]| split-brain {bigger-file <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]" The new additions being: 1.gluster volume heal <VOLNAME> split-brain bigger-file <FILE> Locates the replica containing the FILE, selects bigger-file as source and completes heal. 2.gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE> Selects <FILE> present in <HOSTNAME:BRICKNAME> as source and completes heal. 3.gluster volume heal <VOLNAME> split-brain <HOSTNAME:BRICKNAME> Selects all split-brained files in <HOSTNAME:BRICKNAME> as source and completes heal. Note: <FILE> can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file, which sometimes gets displayed in the heal info command's output. Entry/gfid split-brain resolution is not supported. Example can be found in the test case. Change-Id: I4649733922d406f14f28ee9033a5cb627b9538b3 BUG: 1136769 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9377 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* heal: glfs-heal implementationPranith Kumar K2014-10-151-8/+13
| | | | | | | | | | | Thanks a lot to Niels for helping me to get build stuff right. Change-Id: I634f24d90cd856ceab3cc0c6e9a91003f443403e BUG: 1147462 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6529 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Add afr-v1 xattr compatibilityPranith Kumar K2014-10-011-41/+98
| | | | | | | | | | | | | All the special cases v1 handles and also self-accusing pending changelog from v1 pre-op also is handled in this patch. Change-Id: Ie10f71633fb20276f01ecafbd728f20483e7029c BUG: 1128721 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8536 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr : Fix incorrect looping of index healerAnuradha2014-09-291-4/+9
| | | | | | | | | | | | | | | | Sending appropriate return value from afr_selfheal() fixes the issue. Credits : Krutika Dhananjay and Pranith Kumar. Change-Id: I01dbd49476f6bfbd02028fdde1f60cc0324a1e39 BUG: 1146812 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8868 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix inode leakKrutika Dhananjay2014-09-281-0/+2
| | | | | | | | | | Change-Id: I723b65d6fcae5bd39c0daece3b2f48baf0a72166 BUG: 1145471 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8875 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix spurious metadata self-healsPranith Kumar K2014-09-241-0/+32
| | | | | | | | | | | | | - Added logging for metadata and data self-heals which helped in debugging this issue. - Added checks to skip self-heals when no sinks are available to heal Change-Id: I0d50dceb84cd9ad4fe00e0b749ddf7d4ff42348a BUG: 1128721 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8709 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Don't start heal when lookup succeeds on < 2 childrenPranith Kumar K2014-09-231-0/+17
| | | | | | | | | | | | | | | | | | | Problem: When self-heal code doesn't see at least 2 successes on looking up children, then self-heal can't be done. What is happening now is if all the lookups fail then the pending changelog is all zeros in xattrs so all the children are becoming sources and leading to crashes when the code paths further assume that some data structures are populated properly Fix: Don't proceed with self-heals when < 2 children succeed lookups. BUG: 1128721 Change-Id: Iffdf0feebb6f98812d9d01cdd0cf97f3e19ba76f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8698 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Set all the xattrs needed by index xlatorAnuradha2014-09-161-15/+10
| | | | | | | | | | | | | | | | | | | | | | Index xlator removes the index file from indices xattrop directory in case the value for keys sent are zero. If all the required keys are not set by afr then index file might be removed in an invalid way. With this change all the keys required by index xlator are set by afr such that invalid removal of files does not occur. Change-Id: Idbed0764a95157fd5cab8d6685057a43788fc7df BUG: 1139230 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8652 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr : Mark pending changelog xattrs for new creationsAnuradha2014-09-031-0/+37
| | | | | | | | | | | | | Based on type of file, set appropriate pending changelogs for new entries. Change-Id: Ifd124bf9bc54b996ce83ab9f39d03b3ccca7eb3c BUG: 1130892 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/8555 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Propagate EIO on inode's type mismatchKrutika Dhananjay2014-09-021-4/+5
| | | | | | | | | | | | | Original author of the test script: Pranith Kumar K <pkarampu@redhat.com> Change-Id: If515ecefd3c17f85f175b6a8cb4b78ce8c916de2 BUG: 1132469 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8574 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix dict_t leaksKrutika Dhananjay2014-08-281-0/+3
| | | | | | | | | | | | | dict_t objects that are ref'd in alloca'd "replies" in afr_replies_copy() are not unref'd after "replies" go out of scope. Change-Id: Id5a6ca3c17a8de72b94b3e0f92165609da5a36ea BUG: 1134221 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8553 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/changelog: Capture "correct" internal FOPsVenky Shankar2014-07-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes changelog capturing internal FOPs in a cascaded setup, where the intermediate master would record internal FOPs (generated by DHT on link()/rename()). This is due to I/O happening on the intermediate slave on geo-replication's auxillary mount with client-pid -1. Currently, the internal FOP capturing logic depends on client pid being non-negative and the presence of a special key in dictionary. Due to this, internal FOPs on an inter-mediate master would be recorded in the changelog. Checking client-pid being non-negative was introduced to capture AFR self-heal traffic in changelog, thereby breaking cascading setups. By coincidence, AFR self-heal daemon uses -1 as frame->root->pid thereby making is hard to differentiate b/w geo-rep's auxillary mount and self-heal daemon. Change-Id: Ib7bd71e80dd1856770391edb621ba9819cab7056 BUG: 1122037 Original-Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/8347 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: move messages to new logging frameworkRavishankar N2014-05-171-1/+2
| | | | | | | | | | | | | | Change important (from a diagnostics point of view) log messages to use the gf_msg() framework. Change-Id: I0a58184bbb78989db149e67f07c140a21c781bc2 BUG: 1075611 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7784 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Remove stale index in self-heal codepathPranith Kumar K2014-05-081-1/+1
| | | | | | | | | Change-Id: I635fc0fa955b33590f1c5b4dfec22d591ea8575c BUG: 1032894 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6592 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Fix inode_forget assert failurePranith Kumar K2014-04-281-25/+31
| | | | | | | | | | | | | | | | | | | | | | Problem: If two self-heals are triggered on same inode in parallel then one inode will be linked and the other inode will not be linked as an inode with that gfid is already linked in inode table. Calling inode-forget on that inode leads to assert failure. Fix: Always use linked inode for performing self-heal. Added inode-forgets in other places as well even though its not really a memory leak. Change-Id: Ib84bf080c8cb6a4243f66541ece587db28f9a052 BUG: 1091597 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7567 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: refactorAnand Avati2014-03-221-2545/+742
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Remove client side self-healing completely (opendir, openfd, lookup) - Re-work readdir-failover to work reliably in case of NFS - Remove unused/dead lock recovery code - Consistently use xdata in both calls and callbacks in all FOPs - Per-inode event generation, used to force inode ctx refresh - Implement dirty flag support (in place of pending counts) - Eliminate inode ctx structure, use read subvol bits + event_generation - Implement inode ctx refreshing based on event generation - Provide backward compatibility in transactions - remove unused variables and functions - make code more consistent in style and pattern - regularize and clean up inode-write transaction code - regularize and clean up dir-write transaction code - regularize and clean up common FOPs - reorganize transaction framework code - skip setting xattrs in pending dict if nothing is pending - re-write self-healing code using syncops - re-write simpler self-heal-daemon Change-Id: I1e4080c9796c8a2815c2dab4be3073f389d614a8 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6010 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Set size based source only when sizes are unequalPranith Kumar K2013-09-031-0/+13
| | | | | | | | | Change-Id: I18583f14edf1011401be15744371e2a6b79d75cc BUG: 1003842 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5763 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Improvement in logging of self heal completion statusVenkatesh Somyajulu2013-08-291-17/+46
| | | | | | | | | | | Additional information for source and sinks are added. Change-Id: I1704956ff86ac3ae36744efe7499c1d1c43faeaf BUG: 968301 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/5638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Add largest file is source policyAnand Avati2013-08-141-7/+93
| | | | | | | | | | | | | | For Write Once Read Many times type of work-load choosing largest file to be the source will always resolve fool-fool scenarios correctly. In other cases we fsync() the files and will have a reliable 'wise man'. Change-Id: Ic4dbea8d06db6d578fbcb866fb65ee2d066ac7ba BUG: 958118 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5519 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Print self-heal log when self-heal succeedsPranith Kumar K2013-07-311-32/+54
| | | | | | | | | Change-Id: I95e47e589419dc6a032cbd8ba01964b6c176c2d5 BUG: 927146 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5408 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Refactor inodelk to handle multiple domainsPranith Kumar K2013-07-031-57/+32
| | | | | | | | | | | | | | | | | | - afr_local_copy should not be memduping locked nodes, that would mean that lock is taken in self-heal on those nodes even before it actually takes the lock. So removed memdup code. Even entry lock related copying (lockee info) is also not necessary for self-heal functionality, so removing that as well. Since it is not local_copy anymore changed its name. - My editor changed tabs to spaces. Change-Id: I8dfb92cb8338e9a967c06907a8e29a8404782d61 BUG: 967717 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5099 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Allow data/entry self heal for metadata split-brainVenkatesh Somyajulu2013-07-021-74/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently whenever there is metadata split-brain, a variable sh->op_failed is set to 1 to denote that self heal got failed. But if we proceed for data self heal, even code-path of data self heal also relies on the sh->op_failed variable. So if will check for sh->op_failed variable and will eventually fails to do data self heal. So needed a mechanism to allow data self heal even if metadata is in split brain. Fix: Some data structure revamp is done in http://review.gluster.com/#/c/5106/ fix and this patch is based on the above fix. Now we can store which particular self-heal got failed i.e GFID_OR_MISSING_ENTRY_SELF_HEAL, METADATA, DATA, ENTRY. And we can do two types of self heal failure check. 1. Individual type check: We can check which among all four (Metadata, Data, Gfid or missing entry, entry self heal) got failed. 2. In afr_self_heal_completion_cbk, we need to make check based on the fact that if any specific self heal got failed treat the complete self heal as failure so that it will populate corresponding circular buffer of event history accordingly. Change-Id: Icb91e513bcc752386fc8a78812405cfabe5cac2d BUG: 977797 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/5253 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Improvement in logging of self heal completion statusVenkatesh Somyajulu2013-06-131-33/+182
| | | | | | | | | | | | | | | | | | | | | | Problem: As the end of the self heal, message logged by "afr_self_heal_completion_cbk" is inadequate to determine what exactly failed during the course of afr self heal. It is worth to have knowledge of what all types of self heal got triggered for an entity and whether the status is success or failure. Fix: At the end of self heal, it will log information about out of 4 types of self heal (gfid or missing entry self heal, metadata, data and entry self heal), who all got triggered and who all got failed or successful at the end. Change-Id: I5360762fbd7d391ac4c6af6706b4835c5801835a BUG: 968301 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/5106 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Club missing entry, missing gfid self-healsPranith Kumar K2013-05-071-45/+8
| | | | | | | | | | | | | | | | | | | | | Problem: gfid-self-heal always assigns the gfid(GFID-1) it gets from lookup. Between the time of lookup to triggering the gfid-self-heal the entry could have changed. Now lets say there is a case where one of the files of the replica subolumes already has a gfid (GFID-2) and the other does not. In that case healing should happen with GFID-2 instead of GFID-1. Fix: Missing-entry-self-heal already handles all these cases. So removed separate handling of gfid-self-heal. Change-Id: Ie96261e9036c8f3cb4cad89347f9bf7b681cdc1a BUG: 767585 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/2670 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Made afr_sh_purge_entry_common message log more clear.Venkatesh Somyajulu2013-04-031-3/+1
| | | | | | | | | | | | | | | FIX: In missing entry self heal, once the source directories are determined after the lookup and if file is not present on any of the brick which contains the souce directory, the entry is removed from the directory. So log message should give information of "Purging of entry". Change-Id: I4d3deb602e0812dc1c9c8ba0a466716d81dede7e BUG: 947312 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4753 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: added logging of changelog for split-brain in glustershd.log fileVenkatesh Somyajula2013-02-031-0/+60
| | | | | | | | | | Change-Id: Iaf119f839cb2113b8f8efb7bf7636d471b6541bf BUG: 866440 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4385 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: Modified book-keeping structures for entrylksKrishnan Parthasarathi2013-01-231-9/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * There are upto 3 entry lockees that may be needed to perform entrylk'ing in posix dir-write operations. * For eg, rmdir ("/a/b") needs to acquire locks on two entities, - entrylk ("/a", "b") - entrylk ("/a/b", null) * Changed existing entrylk/rename/selfheal (entrylk) transactions to use the new book-keeping structures * Fixed few issues in afr_trace_entry_lk{in,out} functions. Tracing is now aware of the new entry lockee structure. Implementation notes: * Changed 'cookie' sent in stack_wind to encode lockee_entity_no and subvol_no. cookie is a non-negative integer such that 0 <= cookie < replica_count, When more than one lock is being acquired across the subvolumes, cookie % replica_count gives the subvol_no cookie / replica_count gives the lockee_entity_no. Change-Id: Idbf41803387a7d59a0f7fcb1453d91cea74da153 BUG: 765564 Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.org/2828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: conditionally prioritize EIO errors over ENOENTBrian Foster2013-01-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | The most important errno logic historically only prioritized ESTALE over ENOENT. Commit c8c0942d added EIO prioritization over ENOENT to ensure that split-brain was reported when it occurs in conjunction with bricks missing the file entry. The unintended side effect of this change is that (non split-brain) EIO errors reported from the bricks themselves are now reported to the client when the expectation is that afr should squash said errors in favor of marking the file inconsistent. The high-level problem is that EIO is overloaded with different meanings from different contexts. This commit adds an eio parameter to the errno priority logic to conditionally flag when EIO is of higher priority and should be propagated to the client. BUG: 892730 Change-Id: Ib692a8a1f1737ef190d57894f392ec53ffb33aab Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4376 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: replace afr_more_important_error with afr_most_important_errorBrian Foster2013-01-171-2/+1
| | | | | | | | | | | | | | | | | | | | | afr_more_important_error() is written to return whether a new errno should override an existing errno for high-level operations that could span multiple sub-operations. It specifically prioritizes ESTALE over EIO over ENOENT, and otherwise defaults to the latest error passed having priority. This change preserves current behavior, but rewrites the logic to return the higher priority error of the existing and new errno. The purpose of the change is to make the logic a bit more clear and set the stage for future changes to make the logic flexible based on context. BUG: 892730 Change-Id: Id1aa48855dfb0507abc9d1ef22f2259b30472576 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4375 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* replicate: don't clear changelog for un-healed replicasJeff Darcy2013-01-161-6/+44
| | | | | | | | | | Change-Id: Iebfa6770a688e89c051666b46977862188061738 BUG: 802417 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4034 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Remember type of split-brain in inode-ctxPranith Kumar K2012-12-111-5/+2
| | | | | | | | | | | | | Along with this change, fixed the race of setting the split-brain status in inode-ctx after unwinding the fop from self-heal in case of back-ground self-heal. Change-Id: Ifc829300df485f50f139443802e8b6dc7038b4ad BUG: 873962 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4198 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>