summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/ec/src
Commit message (Collapse)AuthorAgeFilesLines
* cluster/ec: Add support for hardware accelerationXavier Hernandez2016-09-0838-12302/+21106
| | | | | | | | | | | | | | | | | | | | | | | This patch implements functionalities for fast encoding/decoding using hardware support. Currently optimized x86_64, SSE and AVX is added. Additionally this patch implements a caching mecanism for inverse matrices to reduce computation time, as well as a new method for computing the inverse that takes quadratic time instead of cubic. Finally some unnecessary memory copies have been eliminated to further increase performance. Change-Id: I26c75f26fb4201bd22b51335448ea4357235065a BUG: 1289922 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12837 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Add events for EC translatorAshish Pandey2016-09-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch will generates events in following cases which will be consumed by new event framework. Consider an EC volume with (K+M) configuration K = Data bricks M = Redundancy bricks 1- EVENT_EC_MIN_BRICKS_NOT_UP - When minimum "K" number of bricks, required for any ec fop, are not up. 2- EVENT_EC_MIN_BRICKS_UP When minimum "K" number of bricks, required for any ec fop, are up. Change-Id: I0414b8968c39740a171e5aa14b087afd524d574f BUG: 1371470 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/15348 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Use locks for opendirPranith Kumar K2016-08-251-1/+21
| | | | | | | | | | | | | | | | | | | | | | | Problem: In some cases we see that readdir keeps winding to the brick that doesn't have any blocked locks i.e. first brick. This is leading to the client assuming that there are no blocking locks on the inode so it won't give away the lock. Other clients end up blocked on the lock as if the command hung. Fix: Proper way to fix this issue is to use infra present in http://review.gluster.org/14736 This is a stop gap fix where we start taking inodelks in opendir which goes to all the bricks, this will detect if there is any contention. BUG: 1346719 Change-Id: I91109107a26f6535b945ac476338e9f21dc31eb9 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15309 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: Do multi-threaded self-healPranith Kumar K2016-08-243-2/+39
| | | | | | | | | | | BUG: 1368451 Change-Id: I5d6b91d714ad6906dc478a401e614115c89a8fbb Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15083 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Prevent split-brain when bricks are brought off and on in ↵Krutika Dhananjay2016-08-224-32/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cyclic order When the bricks are brought offline and then online in cyclic order while writes are in progress on a file, thanks to inode refresh in write txns, AFR will mostly fail the write attempt when the only good copy is offline. However, there is still a remote possibility that the file will run into split-brain if the brick that has the lone good copy goes offline *after* the inode refresh but *before* the write txn completes (I call it in-flight split-brain in the patch for ease of reference), requiring intervention from admin to resolve the split-brain before the IO can resume normally on the file. To get around this, the patch does the following things: i) retains the dirty xattrs on the file ii) avoids marking the last of the good copies as bad (or accused) in case it is the one to go down during the course of a write. iii) fails that particular write with the appropriate errno. This way, we still have one good copy left despite the split-brain situation which when it is back online, will be chosen as source to do the heal. Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a BUG: 1363721 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15080 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* libglusterfs: move alloca0 definition to common-utilsRavishankar N2016-08-121-1/+0
| | | | | | | | | | | | | | ...and remove their definitons from EC and AFR. Also `s/alloca+memset0/alloca0` wherever it is used. Change-Id: I3b71e596d12a7d8900f5d761af6b98305c8874d5 BUG: 1366226 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/15147 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Handle absence of keys in some callback dictAshish Pandey2016-07-251-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: This issue arises when we do a rolling update from 3.7.5 to 3.7.9. For 4+2 volume running 3.7.5, if we update 2 nodes and after heal completion kill 2 older nodes, this problem can be seen. After update and killing of bricks, 2 nodes will return inodelk count key in dict while other 2 nodes will not have inodelk count in dict. This is also true for get-link-count. During dictionary match , ec_dict_compare, this will lead to mismatch of answers and the file operation on mount point will fail with IO error. Solution: Don't match inode, entry and link count keys while comparing two dictionaries. However, while combining the data in ec_dict_combine, go through all the dictionaries and select the maximum values received in different dicts for these keys. Change-Id: I33546e3619fe8f909286ee48fb0df2009cd3d22f BUG: 1347686 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/14761 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* core: add a basis function to reduce verbose codeZhou Zhengping2016-07-181-15/+0
| | | | | | | | | | | Change-Id: Icebe1b865edb317685e93f3ef11d98fd9b2c2e9a BUG: 1357226 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: http://review.gluster.org/14936 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Unlock stale locks when inodelk/entrylk/lk failsPranith Kumar K2016-06-141-6/+6
| | | | | | | | | | | | | | | | Thanks to Rafi for hinting a while back that this kind of problem he saw once. I didn't think the theory was valid. Could have caught it earlier if I had tested his theory. Change-Id: Iac6ffcdba2950aa6f8cf94f8994adeed6e6a9c9b BUG: 1344836 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14703 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: mohammed rafi kc <rkavunga@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Fix race in timer cancellationXavier Hernandez2016-06-131-15/+56
| | | | | | | | | | | | | | | | | | | A race in timer cancellation for delayed unlock could cause a crash if the cancelling thread fails to cancel the timer because it has already been fired but not executed, and the callback is scheduled out of the CPU, delaying it until the thread has released important resources needed by the callback. This patch improves the handling of this case to make it robust. Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4 BUG: 1345855 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14712 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Fix invalid __fd_unref() callXavier Hernandez2016-06-131-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | __fd_unref() doesn't do any cleanup, so it cannot be called to release fd references, specially if it's the last reference. The code has been changed to avoid a call to this function. In the previous version we always tried to keep the newest fd in the ec_lock_t structure. However this is not necessary. We'll always keep one reference to an open file on the same inode. It's irrelevant if the reference is new or old. The function __fd_unref() has also been removed from fd.h to avoid being used in the future since it's useless as it's defined now. Change-Id: Ia728777fc8e464758d5ea4d3bf020f0603919039 BUG: 1344396 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14683 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: Pass xdata to dht in case of errorAshish Pandey2016-06-101-4/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: In case of mkdir failure, dht expects error information so that it can act accordingly. Aftre adding bricks and re balance, layout gets changed. Fop "mkdir" with old layout returns EIO. EC gets this error in xdata but does not pass it back to dht. In this case dht will not be able to take corrective action. Solution: Return xdata back to dht Change-Id: I24def8038e6880607689b7b046dc6428f564c6ab BUG: 1344277 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/14679 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Restrict the launch of replace brick healAshish Pandey2016-06-061-1/+2
| | | | | | | | | | | | | | | | | | | | Problem: When features.cache-invalidation is ON, a lot of ec_notify function gets called which leads to launch of too many heals. This leads to no heal completion, which causes accumulation of heals. Solution: ec_launch_replace_heal should not be launch for every event. Replace brick will trigger a child up event and then only this heal function should be called. Change-Id: I57b44c6a279d57230daea1d93229be6069245b7d BUG: 1342796 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/14649 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Add/Modify description for eager-lock optionAshish Pandey2016-06-031-2/+12
| | | | | | | | | | | | | | | | | | | This patch provides description for disperse.eager-lock option for disperse volume. It also modifies the description for cluster.eager-lock option to indicate that this option is only for replica volume. Change-Id: Ie73298947fcaaa6aaf825978bc2d27ceaff386d2 BUG: 1327171 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13999 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Use correct log levelsAshish Pandey2016-05-242-2/+2
| | | | | | | | | | | | | | | | | | | | | | Problem : Misleading messages are getting logged in mount logs and bricks log. "Mismatching xdata" and "Heal failed" are getting logged Solution : Reduce the level of logs from INFO, WARNING and NOTICE to DEBUG level wherever applicable OR use fop_log_level to get proper log level. Change-Id: Ia824c71e75ab683d3cb8949e1966ea09c9ccce72 BUG: 1231224 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13266 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: assorted typos and spelling mistakes reported by Debian lintianKaleb S KEITHLEY2016-05-181-1/+1
| | | | | | | | | | | | | | | Also missing bang (!) in #!/bin/bash in shell scripts. Change-Id: I567a4be8f0f31f6285550f243fe802895f6bc43b BUG: 1336793 Reported-by: Patrick Matthäi <pmatthaei@debian.org> Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14398 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Fix issues with eager lockingXavier Hernandez2016-05-022-75/+192
| | | | | | | | | | | | | | | | | | Due to a race in timer cancellation, in some cases it was possible to unlock the lock while another concurrent fop that needed it continues execution as if it were not released. This patch also fixes an issue that caused a lock to not be released if an error was found while preparing ec_update_size_version(). Change-Id: I1344a3f5ecfc333f05a09e62653838264c9c26b1 BUG: 1331254 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14112 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Chen Chen <chenchen@smartquerier.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Don't lookup/forget inodesPranith Kumar K2016-03-311-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | Problem: All inodes that are looked-up are always forgotten without fail in afr removing the benefits of them being in lru. This same code can cause crashes if between inode_lookup, inode_forget in afr if the top xlator does inode_forget(0). Fix: Don't use lookup/forget in afr. No benefits are there at the moment for keeping this code. It is impossible to prevent top xlators to do inode_forget(0). Found similar instances in ec and removed them even though those code paths are not going to be executed in any place other than heal-daemon. BUG: 1321554 Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13834 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com>
* cluster/ec: Provide an option to enable/disable eager lockAshish Pandey2016-03-153-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If a fop takes lock, and completes its operation, it waits for 1 second before releasing the lock. However, If ec find any lock contention within this time period, it release the lock immediately before time expires. As we take lock on first brick, for few operations, like read, it might happen that discovery of lock contention might take long time and can degrades the performance. Solution: Provide an option to enable/disable eager lock. If eager lock is disabled, lock will be released as soon as fop completes. gluster v set <VOLUME NAME> disperse.eager-lock on gluster v set <VOLUME NAME> disperse.eager-lock off Change-Id: I000985a787eba3c190fdcd5981dfbf04e64af166 BUG: 1314649 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13605 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Do not ref dictionary in lookupPranith Kumar K2016-03-141-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1) dict_for_each loops over the elements without any locks, so the members of the dictionary can be ref/unrefed while dict_for_each is executed by another thread leading to crashes. Basically with distributed ec + disctributed replicate as cold, hot tiers. tier sends a lookup which fails on ec. (By this time dict already contains ec xattrs) After this lookup_everywhere code path is hit in tier which triggers lookup on each of distribute's hash lookup but fails which leads to the cold, hot dht's lookup_everywhere in two parallel epoll threads where in ec when it tries to set trusted.ec.version/dirty/size as keys in the dictionary, the older values against the same key get erased. While this erasing is going on if the thread that is doing lookup on afr's subvolume accesses these keys either in dict_copy_with_ref or client xlator trying to serialize, that can either lead to crash or hang based on if the spin/mutex lock is called on invalid memory. 2) EC deletes GF_CONTENT_KEY from the dictionary, this may lead to extra reads in case of lookup-everwhere for tiered volumes. Fix: Do dict_copy_with_ref() for the lookup-dictionary. This is avoiding the problem and is not actually fixing the 1st problem. 2nd problem will be fixed. Change-Id: I5427aa14c48cb7572977d4de9a28c5ffff2b4b95 BUG: 1315560 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13680 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Fix invalid config check for directoriesXavier Hernandez2016-02-292-1/+15
| | | | | | | | | | | | | | | | | | | | | | The trusted.ec.config xattr is not defined for directories. However sometimes it could be requested because the inode type of a directory can temporarily be IA_INVAL. Requesting such xattr using the xattrop fop when it doesn't exist, returns a config value full of 0's, which is invalid and caused some fops to fail. This patch filters out this case by ignoring config xattr == 0. Change-Id: Ied51c35b313ea8c3eeae27812f9bae61d3808e92 BUG: 1293223 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/13446 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: Automate heal for replace brickAshish Pandey2016-02-083-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: After a replace brick command, newly added brick does not contain data which existed on old brick. Solution: Do getxattr after initialization of all the bricks. This will trigger heal for brick root as soon as it finds the version mismatch on newly added brick. Removing tests from ec-new-entry.t which were required to simulate automation of heal after replace brick. Change-Id: I08e3dfa565374097f6c08856325ea77727437e11 BUG: 1304686 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13353 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Never return -ve statePranith Kumar K2016-02-081-1/+1
| | | | | | | | | | | | | Ec manager shouldn't return -ve states, but it is, fixed that. Change-Id: I3f97c6ba2dbf9da724e8e1ee9b2c9da73f40013d BUG: 1300929 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13278 Tested-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Handle non-existent config xattr for non regular filesXavier Hernandez2016-02-051-23/+25
| | | | | | | | | | | | | | | | | | | | Since we now try to get the 'trusted.ec.config' xattr for inodes of type IA_INVAL (these inodes will be set to some valid type later), if that inode corresponds to a non regular file, the xattr won't exist and we will handle this as an error when it's not. This patch solves the problem by only considering errors for inodes that are already known to be regular files. Change-Id: Id72f314e209459236d75cf087fc51e09943756b4 BUG: 1293223 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/13238 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: add seek() FOPXavier Hernandez2016-02-054-1/+209
| | | | | | | | | | | | BUG: 1220173 Change-Id: Iaa23ba81df4ee78ddaab1f96b3d926a563b4bb3d Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11494 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: Create this->itable in all casesPranith Kumar K2016-01-283-5/+5
| | | | | | | | | | | | | | | | | | | Problem: glfsheal operates based on mount's volfile which doesn't have iamshd flag due to which this->itable is NULL, this leads to "inode not found" logs in glfsheal logs. Fix: Ec only allocates itable with 10 inodes, so allocating this->itable in all cases in init. Change-Id: I01d3c05e93a17007a4716a2d6f392d2aa306a34b BUG: 1294743 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13112 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Get size and config for invalid inodeAshish Pandey2016-01-131-11/+20
| | | | | | | | | | | | | | | | | | | Problem: After creating an inode and before linking it to inode table, if there is a request to setattr for that file, it fails and leads to crash. Before linking inode to inode table ia_type is IA_INVAL which will casue have_size and have_config as zero. Solution: Check and get size and config if an inode is invalid Change-Id: I0c0e564940b1b9f351369a76ab14f6b4aa81f23b BUG: 1293223 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13039 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* build: export minimum symbols from xlators for correct resolutionKaleb S KEITHLEY2015-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | Revisiting http://review.gluster.org/#/c/11814/, which unintentionally introduced warnings from libtool about the xlator .so names. According to [1], the -module option must appear in the Makefile.am file(s); if -module is defined in a macro, e.g. in configure(.ac), then libtool will not recognize that this is a module and will emit a warning. [1] http://www.gnu.org/software/automake/manual/automake.html#Libtool-Modules Change-Id: Ifa5f9327d18d139597791c305aa10cc4410fb078 BUG: 1248669 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13003 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/ec: Create copy of dict for setting internal xattrsPranith Kumar K2015-12-012-19/+19
| | | | | | | | | | | | | | | | | | | | | | Problem: Ec takes a ref of the request xdata and sets trusted.ec.version/algo etc xattrs as part of it. But this request xdata could be using same dictionary to do the operation on multiple subvolumes, due to which other subvolumes will have internal xattrs of ec in it and will be created on subvols where they are not supposed to appear. Fix: Take a copy of the request xdata/dict to prevent this from happening. Most of the debugging work and test script is contributed by Nitya. BUG: 1286910 Change-Id: If146435dfb89656158dbed3862a3e9a0cda60581 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12831 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Mark internal fops appropriatelyXavier Hernandez2015-11-194-27/+58
| | | | | | | | | | | | | 1) Mark read fops in read-modify-write by EC as internal. 2) Handle uid/gid set/reset correctly BUG: 1282761 Change-Id: I5c1ce0cd6213367eaead5fed33aa2397c4e46df7 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12599 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Mark self-heal fops as internalPranith Kumar K2015-11-182-3/+5
| | | | | | | | | | Change-Id: I8ae7af266d3e00460f0cfdc9389a926e5f2fee36 BUG: 1282761 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12598 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: fix bug in update_goodPranith Kumar K2015-11-111-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Bricks that didn't participate in the fops are considered to be good. This is happening two fold. Examples: Case-1: 1) 2+1 volume. 'd1' directory on Brick-0 is bad. 2) readdir takes locks and lock->good_mask is '7' 3) readdir does xattrop and fop->mask is '6'. 4) because fop->expected is '1' lock->good_mask remains '7' Case-2: 1) when all the bricks are up, it does lock + xattrop before op and figures out all the bricks are good. 2) By the time second operation starts brick-0 is down. Now lock->good_mask will always have the '0' bit set as long as the operations are happening on it. because: "lock->good_mask &= ~fop->mask | fop->remaining" fop->mask doesn't have '0' th bit. 3) When it comes time to perform the final xattrop in update_size_version brick-0 comes online because of which it gives the same version to brick-0 as well thinking it has participated in all the transactions till then, even when it didn't participate in the transactions. Fix: Case-1's fix: Update lock->good_mask in ec_prepare_update_cbk with latest good/bad bricks Case-2's fix: Consider non-participating brick as bad. Change-Id: Ic01a733f8180131ded6a3cc784fcb1960758cf23 BUG: 1276989 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12561 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Fix bad management of lock ownersXavier Hernandez2015-11-053-10/+11
| | | | | | | | | | | | | | | | | | | | Since the addition of parallel reads patch for ec, a lock can have more than one owner at the same time. The list of owners was stored inside the 'owner_list' field of each fop. The problem was with fops that required more than one lock (like rename). In this case the same field was used to add the fop to more than one list, casing an overwrite of the previous list. This has been solved moving the 'owner_list' field from ec_fop_data_t to ec_lock_link_t structure. Change-Id: I6042129f09082497b80782b5704a52c35c78f44d BUG: 1276031 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12445 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: update version and size on good bricksAshish Pandey2015-10-281-10/+2
| | | | | | | | | | | | | | | | | | | | Problem: readdir/readdirp fops calls [f]xattrop with fop->good which contain only one brick for these operations. That causes xattrop to be failed as it requires at least "minimum" number of brick. Solution: Use lock->good_mask to call xattrop. lock->good_mask contain all the good locked bricks on which the previous write opearion was successfull. Change-Id: If1b500391aa6fca6bd863702e030957b694ab499 BUG: 1274629 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/12419 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec : Remove index entries if file/dir does not existAshish Pandey2015-10-151-33/+45
| | | | | | | | | | | | | | | | | | | | Problem: During write and rebalance if a brick is down, index entries will be created. If the same file gets migrated to other subvol by rebalance process, these index entries will remain in index directory. During heal, these indices should be removed when we get ENOENT or ESTALE for a index. Solution: Capture correct errno and take appropriate action to purge these indices. Change-Id: I1aad8b99e4df2e139648e3bf971e4cb1c4b38699 Bug: 1271358 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/12353 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Implement gfid-hash read-policyPranith Kumar K2015-10-093-10/+73
| | | | | | | | | | | | | | Add a policy in ec to performs reads from same bricks as long as they are good. Based on the gfid of the file/directory it determines the bricks to be considered for reading. Change-Id: Ic97b5c54c086a28b5e07a330a4fd448551b49376 BUG: 1261260 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12133 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec : Mark new entry changelog in entry self-healAshish Pandey2015-10-062-7/+79
| | | | | | | | | | | | | | | | | | | | Problem : When a new entry is created dirty mark xattrs are not created this will need full heal to be performed, even when there are partial failures. Solution : Marks new entry changelog in self-heal. PS: Also fixed erasing of dirty markers when no data heal is required. BUG: 1254121 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Change-Id: I156e3d3201afa77efe118e1aaace1d91c90a9613 Reviewed-on: http://review.gluster.org/11938 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* build: export minimum symbols from xlators for correct resolutionKaleb S. KEITHLEY2015-09-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've been lucky that we haven't had any symbol collisions until now. Now we have a collision between the snapview-client's svc_lookup() and libntirpc's svc_lookup() with nfs-ganesha's FSAL_GLUSTER and libgfapi. As a short term solution all the snapview-client's FOP methods were changed to static scope. See http://review.gluster.org/11805. This works in snapview-client because all the FOP methods are defined in a single source file. This solution doesn't work for other xlators with FOP methods defined in multiple source files. To address this we link with libtool's '-export-symbols $symbol-file' (a wrapper around `ld --version-script ...` --- on linux anyway) and only export the minimum required symbols from the xlator sharedlib. N.B. the libtool man page says that the symbol file should be named foo.sym, thus the rename of *.exports to *.sym. While foo.exports worked, we will follow the documentation. Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> BUG: 1248669 Change-Id: I1de68b3e3be58ae690d8bfb2168bfc019983627c Reviewed-on: http://review.gluster.org/11814 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* all: reduce "inline" usageJeff Darcy2015-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155 BUG: 1245331 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11769 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* cluster/ec: Allow read fops to be processed in parallelXavier Hernandez2015-08-297-192/+359
| | | | | | | | | | | | | | Currently ec only sends a single read request at a time for a given inode. Since reads do not interfere between them, this patch allows multiple concurrent read requests to be sent in parallel. Change-Id: If853430482a71767823f39ea70ff89797019d46b BUG: 1245689 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11742 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec : trusted.ec.version xattr of all root directories of all bricks should ↵Ashish Pandey2015-08-291-0/+3
| | | | | | | | | | | | | | | | | | | | be same. Problem: After replacing the brick using "replace-brick" command and running "heal full", the version of the root directory of the newly added brick is not getting healed. heal starts running on the dentries of the root but does not run on root directory. Solution: Run heal on root directory. Change-Id: Ifd42a3fb341b049c895817e892e5b484a5aa6f80 BUG: 1243382 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11676 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* fd: Do fd_bind on successful openPranith Kumar K2015-08-281-0/+1
| | | | | | | | | | | | | | | - fd_unref should decrement fd->inode->fd_count only if it is present in the inode's fd list. - successful open/opendir should perform fd_bind. Change-Id: I81dd04f330e2fee86369a6dc7147af44f3d49169 BUG: 1207735 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11044 Reviewed-by: Anoop C S <anoopcs@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Fix write size in self-healXavier Hernandez2015-08-142-0/+10
| | | | | | | | | | | | | | | | | | Self-heal was always using a fixed block size to heal a file. This was incorrect for dispersed volumes with a number of data bricks not being a power of 2. This patch adjusts the block size to a multiple of the stripe size of the volume. It also propagates errors detected during the data heal to stop healing the file and not mark it as healed. Change-Id: I9ee3fde98a9e5d6116fd096ceef88686fd1d28e2 BUG: 1251446 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11862 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Fix tracking of good bricksXavier Hernandez2015-08-0611-258/+122
| | | | | | | | | | | | | | | | | | | The bitmask of good and bad bricks was kept in the context of the corresponding inode or fd. This was problematic when an external process (another client or the self-heal process) did heal the bricks but no one changed the bitmaks of other clients. This patch removes the bitmask stored in the context and calculates which bricks are healthy after locking them and doing the initial xattrop. After that, it's updated using the result of each fop. Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c BUG: 1236065 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11844 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Minimize usage of EIO errorXavier Hernandez2015-07-2814-1890/+1201
| | | | | | | | | | Change-Id: I82e245615419c2006a2d1b5e94ff0908d2f5e891 BUG: 1245276 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11741 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dict: dict_set_bin() should never free the pointer on errorNiels de Vos2015-07-241-3/+17
| | | | | | | | | | | | | | | | | | | | | | dict_set_bin() is handling the pointer that it passed inconsistently. Depending on the errors that can occur, the pointer passed to the dict can be free'd, but there is no guarantee. It is cleaner to have the caller free the pointer that allocated it and dict_set_bin() returned an error. When dict_set_bin() returned success, the given pointer will be free'd when dict_unref() calls data_destroy(). Many callers of dict_set_bin() already take care of free'ing the pointer on error. The ones that did not, are corrected with this change too. Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966 BUG: 1242280 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Handle race between unlock-timer, new lockPranith Kumar K2015-07-233-50/+17
| | | | | | | | | | | | | | | | | | | | | | | | Problem: New lock could come at the time timer is on the way to unlock. This was leading to crash in timer thread because thread executing new lock can free up the timer_link->fop and then timer thread will try to access structures already freed. Fix: If the timer event is fired, set lock->release to true and wait for unlock to complete. Thanks to Xavi and Bhaskar for helping in confirming that this race is the RC. Thanks to Kritika for pointing out and explaining how Avati's patch can be used to fix this bug. Change-Id: I45fa5470bbc1f03b5f3d133e26d1e0ab24303378 BUG: 1243187 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11670 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Propogate correct errno in case of failuresPranith Kumar K2015-07-141-1/+4
| | | | | | | | | | | | | | | | | | - Also remove internal-fop setting in create/mknod etc xattrs. Rebalance was failing because ec was giving EIO when lock acquiring fails as the file/dir doesn't exist. Posix_create/mknod are not setting config xattr because internal-fop key is present in dict and setxattr for this fails leading to failure in setting rest of xattrs. Change-Id: Ifb429c8db9df7cd51e4f8ce53fdf1e1b975c9993 BUG: 1242254 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11639 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: wind readlink on good subvol(s)Pranith Kumar K2015-07-146-48/+90
| | | | | | | | | | BUG: 1232172 Change-Id: I3a56e487840d86147dd85bf5fbe79b165eae289f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11589 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Prevent data corruptionsPranith Kumar K2015-07-143-14/+32
| | | | | | | | | | | | | | - On lock reuse preserve 'healing' bits - Don't set ctx->size outside locks in healing code - Allow xattrop internal fops also on the fop->mask. Change-Id: I6b76da5d7ebe367d8f3552cbf9fd18e556f2a171 BUG: 1232678 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11640 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>