summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/ec/src/ec-common.c
Commit message (Collapse)AuthorAgeFilesLines
* cluster/ec: create eager-lock option for non-regular filesXavier Hernandez2017-11-161-1/+21
| | | | | | | | | | | | | A new option is added to allow independent configuration of eager locking for regular files and non-regular files. >Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 >BUG: 1502610 >Signed-off-by: Xavier Hernandez <jahernan@redhat.com> Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 BUG: 1512460 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: Allow parallel writes in EC if possiblePranith Kumar K2017-10-241-59/+132
| | | | | | | | | | | | | | | | | | Problem: Ec at the moment sends one modification fop after another, so if some of the disks become slow, for a while then the wait time for the writes that are waiting in the queue becomes really bad. Fix: Allow parallel writes when possible. For this we need to make 3 changes. 1) Each fop now has range parameters they will be updating. 2) Xattrop is changed to handle parallel xattrop requests where some would be modifying just dirty xattr. 3) Fops that refer to size now take locks and update the locks. Fixes #251 Change-Id: Ibc3c15372f91bbd6fb617f0d99399b3149fa64b2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Handle parallel get_size_versionPranith Kumar K2017-10-101-55/+95
| | | | | | Updates #251 Change-Id: I6244014dbc90af3239d63d75a064ae22ec12a054 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* Coverity Issue Fix : CHECKED_RETURNSubha sree Mohankumar2017-09-261-1/+1
| | | | | | | | | | Issue :Event check_return: Calling "ec_dict_set_number" without checking return value. Fix : Type casted the return value of the function "ec_dict_set_number" to void. Change-Id: Id97034f9b1b8591536d63dca680ca7c7a9c4fcc3 BUG: 789278 Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>
* cluster/ec: fix for BAD_SHIFT, follow-up patchKaleb S. KEITHLEY2017-09-201-11/+14
| | | | | | | | | | | | | | | | | | Address comments to https://review.gluster.org/18067, (Change-Id I86e15d12939c610c99f5f96c551bb870df20f4b4) Which was posted as an RFC as an example of a possible alternative fix to https://review.gluster.org/17860 (Change-Id I28a3bdd4a357526dba0cf84c262919c05cfa173e) An alternative fix that preserved the unsignedness of the indexes throughout, obviating the need to check its value before using it to shift. (shift by negative number is undefined, as is shift by more bits than in the type.) BUG: 1474309 Change-Id: I46fe9cec140d3397463780748f6876251acb06dd Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* cluster/ec: coverity, fix for BAD_SHIFTKaleb S. KEITHLEY2017-08-281-11/+14
| | | | | | | | | | | | | | | | | This is how I would like to see this fixed. passes (eliminates the warning in) coverity. The use of uintptr_t as a bitmask is a problem IMO, especially on 32-bit clients. Change-Id: I86e15d12939c610c99f5f96c551bb870df20f4b4 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18067 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* cluster/ec: Non-disruptive upgrade on EC volume failsSunil Kumar Acharya2017-07-141-1/+4
| | | | | | | | | | | | | | | | | | | | Problem: Enabling optimistic changelog on EC volume was not handling node down scenarios appropriately resulting in volume data inaccessibility. Solution: Update dirty xattr appropriately on good bricks whenever nodes are down. This would fix the metadata information as part of heal and thus ensures data accessibility. BUG: 1468261 Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17703 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Get size of file in EC [f]xattropPranith Kumar K2017-07-131-2/+17
| | | | | | | | | | | | | | | | | | | | Problem: For allowing parallel writes we shouldn't depend on ia_size to be same for all the bricks in each write_cbk(). But we need to make sure backend size is correct on all the bricks and no crashes/manual modifications happened. Fix: At the time of get_size_version() we do 1 check to make sure size of the file is same across the bricks. From then on the FOPs will give the status of the fop, so we rely on this information to keep which bricks are good/bad. Updates #251 Change-Id: I1df645347e2e9f2e09cfa4411b6cc305d7f4e4e5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17741 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Update xattr and heal size properlyAshish Pandey2017-06-061-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem-1 : Recursive healing of same file is happening when IO is going on even after data heal completes. Solution: RCA: At the end of the write, when ec_update_size_version gets called, we send it only on good bricks and not on healing brick. Due to this, xattr on healing brick will always remain out of sync and when the background heal check source and sink, it finds this brick to be healed and start healing from scratch. That involve ftruncate and writing all of the data again. To solve this, send xattrop on all the good bricks as well as healing bricks. Problem-2: The above fix exposes the data corruption during heal. If the write on a file is going on and heal finishes, we find that the file gets corrupted. RCA: The real problem happens in ec_rebuild_data(). Here we receive the 'size' argument which contains the real file size at the time of starting self-heal and it's assigned to heal->total_size. After that, a sequence of calls to ec_sync_heal_block() are done. Each call ends up calling ec_manager_heal_block(), which does the actual work of healing a block. First a lock on the inode is taken in state EC_STATE_INIT using ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is called. This function calls ec_set_inode_size() to store the real size of the inode (it uses heal->total_size). The next step is to read the block to be healed. This is done using a regular ec_readv(). One of the things this call does is to trim the returned size if the file is smaller than the requested size. In our case, when we read the last block of a file whose size was = 512 mod 1024 at the time of starting self-heal, ec_readv() will return only the first 512 bytes, not the whole 1024 bytes. This isn't a problem since the following ec_writev() sent from the heal code only attempts to write the amount of data read, so it shouldn't modify the remaining 512 bytes. However ec_writev() also checks the file size. If we are writing the last block of the file (determined by the size stored on the inode that we have set to heal->total_size), any data beyond the (imposed) end of file will be cleared with 0's. This causes the 512 bytes after the heal->total_size to be cleared. Since the file was written after heal started, the these bytes contained data, so the block written to the damaged brick will be incorrect. Solution: Align heal->total_size to a multiple of the stripe size. Thanks "Xavier Hernandez" <xhernandez@datalab.es> to find out the root cause and to fix the issue. Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e BUG: 1428673 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/16985 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec : Don't count healing brick as healthy brickAshish Pandey2017-04-121-1/+1
| | | | | | | | | | | | | | | | | | | In ec_child_select, we should send fop on healing bricks unconditionaly but to check the number of healthy bricks against fragments and minimum count, we should not count these healing bricks. Count bits of fop->mask before adding ealing brick to fop->mask Change-Id: I3fa80bdd5ca34ca070d610116b84154b917c5999 BUG: 1439527 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17007 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Don't mark dirty on entry/meta ops in query-infoPranith Kumar K2017-03-071-6/+0
| | | | | | | | | | | | | | | | | | | | | We wanted to mark dirty for metadata/entry operations whenever query-info is set and info is not yet there because we are anyway sending xattrop over the network. But this is causing 25% regression from 3.8.8 so removing this optimization Also fixed two small issues that we didn't find in the previous patch 1) reconfigure failure was sending return value 0 for optimistic-changelog 2) ec->optimistic_changelog was set to true even before OPTION_INIT BUG: 1408809 Change-Id: Iabb0b64bd4d3623688790e4b67e5c20b4da977a1 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16865 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Introduce optimistic changelog in ECPranith Kumar K2017-03-041-1/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Fix to https://bugzilla.redhat.com/show_bug.cgi?id=1316873 has made changes to set dirty flag before every update fop, data or metadata, and unset it after successful operation. That makes some of the fops very slow such as entry operations or metadata operations. Solution: File data operations are the only operation which take some time and setting dirty flag before a fop and unsetting it after serves the purpose as probability of failure of a fop is high when the time duration is more. For all the other operations, set dirty flag at the end of the fop, if any brick is down and need heal. Providing following option to choose between high performance or better heal marking for metadata and entry fops. Set/Unset dirty flag for every update fop at the start of the fop. If ON, this option impacts performance of entry operations or metadata operations as it will set dirty flag at the start and unset it at the end of ALL update fop. If OFF and all the bricks are good, dirty flag will be set at the start only for file fops For metadata and entry fops dirty flag will not be set at the start, if all the bricks are good. This does not impact performance for metadata operations and entry operation but has a very small window to miss marking entry as dirty in case it is required to be healed. Thanks to Xavi and Ashish for the design Picked the .t file from Ashish' patch https://review.gluster.org/16298 BUG: 1408809 Change-Id: I3ce860063f0e2901e50754dcfc3e4ed22daf819f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16821 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Don't trigger data/metadata heal on LookupsPranith Kumar K2017-02-261-14/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem-1 If Lookup which doesn't take any locks observes version mismatch it can't be trusted. If we launch a heal based on this information it will lead to self-heals which will affect I/O performance in the cases where Lookup is wrong. Considering self-heal-daemon and operations on the inode from client which take locks can still trigger heal we can choose to not attempt a heal on Lookup. Problem-2: Fixed spurious failure of tests/bitrot/bug-1373520.t For the issues above, what was happening was that ec_heal_inspect() is preventing 'name' heal to happen Problem-3: tests/basic/ec/ec-background-heals.t To be honest I don't know what the problem was, while fixing the 2 problems above, I made some changes to ec_heal_inspect() and ec_need_heal() after which when I tried to recreate the spurious failure it just didn't happen even after a long time. BUG: 1414287 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834 Reviewed-on: https://review.gluster.org/16468 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: Change level of messages to DEBUGAshish Pandey2017-01-271-2/+2
| | | | | | | | | | | | | | | | Heal failed or passed should not be logged as warning. These can be observed from heal info if the heal is happening or not. If we require to debug a case where heal is not happening, we can set the level to DEBUG. Change-Id: I347665c8c8b6223bb08a9f3dd5643a10ddc3b93e BUG: 1417050 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/16473 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/disperse: Do not log fop failed for lockless fopsAshish Pandey2017-01-191-12/+13
| | | | | | | | | | | | | | | | | | | | Problem: Operation failed messages are getting logged based on the callbacks of lockless fop's. If a fop does not take a lock, it is possible that it will get some out of sync xattr, iatts. We can not depend on these callback to psay that the fop has failed. Solution: Print failed messages only for locked fops. However, heal would still be triggered. Change-Id: I4427402c8c944c23f16073613caa03ea788bead3 BUG: 1414287 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/16435 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Fixing log messageSunil Kumar H G2017-01-081-5/+10
| | | | | | | | | | | | | | | Updating the warning message with details to improve user understanding. BUG: 1409202 Change-Id: I001f8d5c01c97fff1e4e1a3a84b62e17c025c520 Signed-off-by: Sunil Kumar H G <sheggodu@redhat.com> Reviewed-on: http://review.gluster.org/16315 Tested-by: Sunil Kumar Acharya Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Do lookup on an existing file in linkPranith Kumar K2017-01-051-3/+4
| | | | | | | | | | | | | | | | | | | Problem: In link fop lookup is happening on the new fop which doesn't exist so the iatt ec serves parent xlators has size as zero which leads to 'cat' giving empty output Fix: Change code so that lookup happens on the existing link instead. BUG: 1409730 Change-Id: I70eb02fe0633e61d1d110575589cc2dbe5235d76 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16320 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Fix lk-owner set race in ec_unlockPranith Kumar K2016-12-131-6/+8
| | | | | | | | | | | | | | | | | | | | | | Problem: Rename does two locks. There is a case where when it tries to unlock it sends xattrop of the directory with new version, callback of these two xattrops can be picked up by two separate epoll threads. Both of them will try to set the lk-owner for unlock in parallel on the same frame so one of these unlocks will fail because the lk-owner doesn't match. Fix: Specify the lk-owner which will be set on inodelk frame which will not be over written by any other thread/operation. BUG: 1402710 Change-Id: I666ffc931440dc5253d72df666efe0ef1d73f99a Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16074 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: fix unused variable warnings/errorsKaleb S. KEITHLEY2016-09-161-2/+0
| | | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a "pragma leak" where the generated rpc/xdr headers have a pair of pragmas that disable these warnings. With the warnings disabled, many unused variables have crept into the code base. And 14085 won't pass its own smoke test until all these warnings are fixed. BUG: 1369124 Change-Id: I24607fc2082c3424f876f740a88fb7d0173d322d Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15518 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: set/unset dirty flag for data/metadata updateAshish Pandey2016-09-151-122/+162
| | | | | | | | | | | | | | | | | | | | Currently, for all the update operations, metadata or data, we set the dirty flag at the end of the operation only if a brick is down. This leads to delay in healing and in some cases not at all. In this patch we set (+1) the dirty flag at the start of the metadata or data update operations and after successfull completion of the fop, we unset (-1) it again. Change-Id: Ide5668bdec7b937a61c5c840cdc79a967598e1e9 BUG: 1316873 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13733 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Add support for hardware accelerationXavier Hernandez2016-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This patch implements functionalities for fast encoding/decoding using hardware support. Currently optimized x86_64, SSE and AVX is added. Additionally this patch implements a caching mecanism for inverse matrices to reduce computation time, as well as a new method for computing the inverse that takes quadratic time instead of cubic. Finally some unnecessary memory copies have been eliminated to further increase performance. Change-Id: I26c75f26fb4201bd22b51335448ea4357235065a BUG: 1289922 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12837 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Prevent split-brain when bricks are brought off and on in ↵Krutika Dhananjay2016-08-221-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cyclic order When the bricks are brought offline and then online in cyclic order while writes are in progress on a file, thanks to inode refresh in write txns, AFR will mostly fail the write attempt when the only good copy is offline. However, there is still a remote possibility that the file will run into split-brain if the brick that has the lone good copy goes offline *after* the inode refresh but *before* the write txn completes (I call it in-flight split-brain in the patch for ease of reference), requiring intervention from admin to resolve the split-brain before the IO can resume normally on the file. To get around this, the patch does the following things: i) retains the dirty xattrs on the file ii) avoids marking the last of the good copies as bad (or accused) in case it is the one to go down during the course of a write. iii) fails that particular write with the appropriate errno. This way, we still have one good copy left despite the split-brain situation which when it is back online, will be chosen as source to do the heal. Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a BUG: 1363721 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15080 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Fix race in timer cancellationXavier Hernandez2016-06-131-15/+56
| | | | | | | | | | | | | | | | | | | A race in timer cancellation for delayed unlock could cause a crash if the cancelling thread fails to cancel the timer because it has already been fired but not executed, and the callback is scheduled out of the CPU, delaying it until the thread has released important resources needed by the callback. This patch improves the handling of this case to make it robust. Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4 BUG: 1345855 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14712 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Fix invalid __fd_unref() callXavier Hernandez2016-06-131-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | __fd_unref() doesn't do any cleanup, so it cannot be called to release fd references, specially if it's the last reference. The code has been changed to avoid a call to this function. In the previous version we always tried to keep the newest fd in the ec_lock_t structure. However this is not necessary. We'll always keep one reference to an open file on the same inode. It's irrelevant if the reference is new or old. The function __fd_unref() has also been removed from fd.h to avoid being used in the future since it's useless as it's defined now. Change-Id: Ia728777fc8e464758d5ea4d3bf020f0603919039 BUG: 1344396 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14683 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* core: assorted typos and spelling mistakes reported by Debian lintianKaleb S KEITHLEY2016-05-181-1/+1
| | | | | | | | | | | | | | | Also missing bang (!) in #!/bin/bash in shell scripts. Change-Id: I567a4be8f0f31f6285550f243fe802895f6bc43b BUG: 1336793 Reported-by: Patrick Matthäi <pmatthaei@debian.org> Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14398 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Fix issues with eager lockingXavier Hernandez2016-05-021-68/+174
| | | | | | | | | | | | | | | | | | Due to a race in timer cancellation, in some cases it was possible to unlock the lock while another concurrent fop that needed it continues execution as if it were not released. This patch also fixes an issue that caused a lock to not be released if an error was found while preparing ec_update_size_version(). Change-Id: I1344a3f5ecfc333f05a09e62653838264c9c26b1 BUG: 1331254 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14112 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Chen Chen <chenchen@smartquerier.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Provide an option to enable/disable eager lockAshish Pandey2016-03-151-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If a fop takes lock, and completes its operation, it waits for 1 second before releasing the lock. However, If ec find any lock contention within this time period, it release the lock immediately before time expires. As we take lock on first brick, for few operations, like read, it might happen that discovery of lock contention might take long time and can degrades the performance. Solution: Provide an option to enable/disable eager lock. If eager lock is disabled, lock will be released as soon as fop completes. gluster v set <VOLUME NAME> disperse.eager-lock on gluster v set <VOLUME NAME> disperse.eager-lock off Change-Id: I000985a787eba3c190fdcd5981dfbf04e64af166 BUG: 1314649 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13605 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Fix invalid config check for directoriesXavier Hernandez2016-02-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | The trusted.ec.config xattr is not defined for directories. However sometimes it could be requested because the inode type of a directory can temporarily be IA_INVAL. Requesting such xattr using the xattrop fop when it doesn't exist, returns a config value full of 0's, which is invalid and caused some fops to fail. This patch filters out this case by ignoring config xattr == 0. Change-Id: Ied51c35b313ea8c3eeae27812f9bae61d3808e92 BUG: 1293223 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/13446 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: Handle non-existent config xattr for non regular filesXavier Hernandez2016-02-051-23/+25
| | | | | | | | | | | | | | | | | | | | Since we now try to get the 'trusted.ec.config' xattr for inodes of type IA_INVAL (these inodes will be set to some valid type later), if that inode corresponds to a non regular file, the xattr won't exist and we will handle this as an error when it's not. This patch solves the problem by only considering errors for inodes that are already known to be regular files. Change-Id: Id72f314e209459236d75cf087fc51e09943756b4 BUG: 1293223 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/13238 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: Get size and config for invalid inodeAshish Pandey2016-01-131-11/+20
| | | | | | | | | | | | | | | | | | | Problem: After creating an inode and before linking it to inode table, if there is a request to setattr for that file, it fails and leads to crash. Before linking inode to inode table ia_type is IA_INVAL which will casue have_size and have_config as zero. Solution: Check and get size and config if an inode is invalid Change-Id: I0c0e564940b1b9f351369a76ab14f6b4aa81f23b BUG: 1293223 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13039 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Mark internal fops appropriatelyXavier Hernandez2015-11-191-14/+4
| | | | | | | | | | | | | 1) Mark read fops in read-modify-write by EC as internal. 2) Handle uid/gid set/reset correctly BUG: 1282761 Change-Id: I5c1ce0cd6213367eaead5fed33aa2397c4e46df7 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12599 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: fix bug in update_goodPranith Kumar K2015-11-111-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Bricks that didn't participate in the fops are considered to be good. This is happening two fold. Examples: Case-1: 1) 2+1 volume. 'd1' directory on Brick-0 is bad. 2) readdir takes locks and lock->good_mask is '7' 3) readdir does xattrop and fop->mask is '6'. 4) because fop->expected is '1' lock->good_mask remains '7' Case-2: 1) when all the bricks are up, it does lock + xattrop before op and figures out all the bricks are good. 2) By the time second operation starts brick-0 is down. Now lock->good_mask will always have the '0' bit set as long as the operations are happening on it. because: "lock->good_mask &= ~fop->mask | fop->remaining" fop->mask doesn't have '0' th bit. 3) When it comes time to perform the final xattrop in update_size_version brick-0 comes online because of which it gives the same version to brick-0 as well thinking it has participated in all the transactions till then, even when it didn't participate in the transactions. Fix: Case-1's fix: Update lock->good_mask in ec_prepare_update_cbk with latest good/bad bricks Case-2's fix: Consider non-participating brick as bad. Change-Id: Ic01a733f8180131ded6a3cc784fcb1960758cf23 BUG: 1276989 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12561 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Fix bad management of lock ownersXavier Hernandez2015-11-051-8/+8
| | | | | | | | | | | | | | | | | | | | Since the addition of parallel reads patch for ec, a lock can have more than one owner at the same time. The list of owners was stored inside the 'owner_list' field of each fop. The problem was with fops that required more than one lock (like rename). In this case the same field was used to add the fop to more than one list, casing an overwrite of the previous list. This has been solved moving the 'owner_list' field from ec_fop_data_t to ec_lock_link_t structure. Change-Id: I6042129f09082497b80782b5704a52c35c78f44d BUG: 1276031 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12445 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: update version and size on good bricksAshish Pandey2015-10-281-10/+2
| | | | | | | | | | | | | | | | | | | | Problem: readdir/readdirp fops calls [f]xattrop with fop->good which contain only one brick for these operations. That causes xattrop to be failed as it requires at least "minimum" number of brick. Solution: Use lock->good_mask to call xattrop. lock->good_mask contain all the good locked bricks on which the previous write opearion was successfull. Change-Id: If1b500391aa6fca6bd863702e030957b694ab499 BUG: 1274629 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/12419 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Implement gfid-hash read-policyPranith Kumar K2015-10-091-9/+28
| | | | | | | | | | | | | | Add a policy in ec to performs reads from same bricks as long as they are good. Based on the gfid of the file/directory it determines the bricks to be considered for reading. Change-Id: Ic97b5c54c086a28b5e07a330a4fd448551b49376 BUG: 1261260 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12133 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Allow read fops to be processed in parallelXavier Hernandez2015-08-291-160/+311
| | | | | | | | | | | | | | Currently ec only sends a single read request at a time for a given inode. Since reads do not interfere between them, this patch allows multiple concurrent read requests to be sent in parallel. Change-Id: If853430482a71767823f39ea70ff89797019d46b BUG: 1245689 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11742 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Fix tracking of good bricksXavier Hernandez2015-08-061-146/+49
| | | | | | | | | | | | | | | | | | | The bitmask of good and bad bricks was kept in the context of the corresponding inode or fd. This was problematic when an external process (another client or the self-heal process) did heal the bricks but no one changed the bitmaks of other clients. This patch removes the bitmask stored in the context and calculates which bricks are healthy after locking them and doing the initial xattrop. After that, it's updated using the result of each fop. Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c BUG: 1236065 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11844 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Minimize usage of EIO errorXavier Hernandez2015-07-281-50/+127
| | | | | | | | | | Change-Id: I82e245615419c2006a2d1b5e94ff0908d2f5e891 BUG: 1245276 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11741 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Handle race between unlock-timer, new lockPranith Kumar K2015-07-231-19/+14
| | | | | | | | | | | | | | | | | | | | | | | | Problem: New lock could come at the time timer is on the way to unlock. This was leading to crash in timer thread because thread executing new lock can free up the timer_link->fop and then timer thread will try to access structures already freed. Fix: If the timer event is fired, set lock->release to true and wait for unlock to complete. Thanks to Xavi and Bhaskar for helping in confirming that this race is the RC. Thanks to Kritika for pointing out and explaining how Avati's patch can be used to fix this bug. Change-Id: I45fa5470bbc1f03b5f3d133e26d1e0ab24303378 BUG: 1243187 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11670 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: wind readlink on good subvol(s)Pranith Kumar K2015-07-141-6/+10
| | | | | | | | | | BUG: 1232172 Change-Id: I3a56e487840d86147dd85bf5fbe79b165eae289f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11589 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Prevent data corruptionsPranith Kumar K2015-07-141-3/+20
| | | | | | | | | | | | | | - On lock reuse preserve 'healing' bits - Don't set ctx->size outside locks in healing code - Allow xattrop internal fops also on the fop->mask. Change-Id: I6b76da5d7ebe367d8f3552cbf9fd18e556f2a171 BUG: 1232678 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11640 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Don't read from bricks that are healingPranith Kumar K2015-07-091-1/+1
| | | | | | | | | | BUG: 1232678 Change-Id: I35503039e4723cf7f33d6797f0ba90dd0aca130b Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11580 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Fix use after free bugPranith Kumar K2015-07-071-0/+8
| | | | | | | | | | | | | | | In ec_lock() there is a chance that ec_resume is called on fop even before ec_sleep. This can result in refs == 0 for fop leading to use after free in this function when it calls ec_sleep so do ec_sleep at start and ec_resume at end of this function. Change-Id: I879b2667bf71eaa56be1b53b5bdc91b7bb56c650 BUG: 1240284 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11558 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Don't read from bad subvolsPranith Kumar K2015-07-061-18/+23
| | | | | | | | | | Change-Id: Ic22813371faca4e8198c9b0b20518e68d275f3c1 BUG: 1232678 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11531 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Porting messages to new logging frameworkNandaja Varma2015-06-261-38/+65
| | | | | | | | | Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c BUG: 1194640 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/10465 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: wind fops on good subvols for access/readdir[p]Pranith Kumar K2015-06-261-6/+22
| | | | | | | | | Change-Id: I1e629a6adc803c4b7164a5a7a81ee5cb1d0e139c BUG: 1232172 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11246 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Avoid parallel executions of the same state machineXavier Hernandez2015-06-181-11/+13
| | | | | | | | | | | | | | In very rare circumstances it was possible that a subfop started by another fop could finish fast enough to cause that two or more instances of the same state machine be executing at the same time. Change-Id: I319924a18bd3f88115e751a66f8f4560435e0e0e BUG: 1233258 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11317 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Prevent Null dereference in dht-renamePranith Kumar K2015-06-121-1/+1
| | | | | | | | | | Change-Id: I3059f3b577f550c92fb77c6b6b44defd0584cd2e BUG: 1230647 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11178 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Wind unlock fops at all costPranith Kumar K2015-06-101-3/+24
| | | | | | | | | | | | | | | | | Problem: While files are being created if more than redundancy number of bricks go down, then unlock for these fops do not go to the bricks. This will lead to stale locks leading to hangs. Fix: Wind unlock fops at all costs. Change-Id: I50a87e8b4d6d2dde5bf7405b82e3aeecd95ad00e BUG: 1220348 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Prevent double unwindPranith Kumar K2015-06-081-4/+2
| | | | | | | | | | | | | | | | | | | | | | Problem: 1) ec_access/ec_readlink_/ec_readdir[p] _cbks are trying to recover only from ENOTCONN. 2) When the fop succeeds it unwinds right away. But when its ec_fop_manager resumes, if the number of bricks that are up is less than ec->fragments, the the state machine will resume with -EC_STATE_REPORT which unwinds again. This will lead to crashes. Fix: - If fop fails retry on other subvols, as ESTALE/ENOENT/EBADFD etc are also recoverable. - unwind success/failure in _cbks Change-Id: I2cac3c2f9669a4e6160f1ff4abc39f0299303222 BUG: 1228952 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11111 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>