summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/ec/src/ec-heal.c
Commit message (Collapse)AuthorAgeFilesLines
* cluster/ec: Fix tracking of good bricksXavier Hernandez2015-08-141-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | The bitmask of good and bad bricks was kept in the context of the corresponding inode or fd. This was problematic when an external process (another client or the self-heal process) did heal the bricks but no one changed the bitmaks of other clients. This patch removes the bitmask stored in the context and calculates which bricks are healthy after locking them and doing the initial xattrop. After that, it's updated using the result of each fop. > Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c > BUG: 1236065 > Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> > Reviewed-on: http://review.gluster.org/11844 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Change-Id: Idbe68b28b865c4b28366703ad1e96ae16ba44b66 BUG: 1235964 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11867 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Minimize usage of EIO errorXavier Hernandez2015-08-081-10/+17
| | | | | | | | | | | | | | | | | | >Change-Id: I82e245615419c2006a2d1b5e94ff0908d2f5e891 >BUG: 1245276 >Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> >Reviewed-on: http://review.gluster.org/11741 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> Change-Id: Ifd3d63f88a686a2963c5ba2e62110249f84f338d BUG: 1250864 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11852 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Prevent data corruptionsPranith Kumar K2015-07-211-11/+11
| | | | | | | | | | | | | | | | | | | | | - On lock reuse preserve 'healing' bits - Don't set ctx->size outside locks in healing code - Allow xattrop internal fops also on the fop->mask. >Change-Id: I6b76da5d7ebe367d8f3552cbf9fd18e556f2a171 >BUG: 1232678 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/11640 >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> BUG: 1243647 Change-Id: I1b3828e4d4a863b84b2c4e732e7965d1302cea47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11686 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Remove failed subvols from source/sink computationPranith Kumar K2015-07-211-1/+6
| | | | | | | | | | | | | | | | | >Change-Id: Ib0de34c86ee25de361ec821d4015296c514742e0 >BUG: 1240210 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/11546 >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Tested-by: Gluster Build System <jenkins@build.gluster.com> BUG: 1243644 Change-Id: I41817ca933bb1eecdb3e895a16753226b608814a Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11681 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Make background healing optional behaviorPranith Kumar K2015-07-211-8/+6
| | | | | | | | | | | | | | | | | | Provide options to control number of active background heal count and qlen. >Change-Id: Idc2419219d881f47e7d2e9bbc1dcdd999b372033 >BUG: 1237381 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/11473 >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> >Tested-by: Gluster Build System <jenkins@build.gluster.com> BUG: 1238476 Change-Id: I22ba902d9911195656db9e458c01b54cf0afcd7a Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11680 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Add throttling in background healingPranith Kumar K2015-07-211-5/+104
| | | | | | | | | | | | | | | | | | | | | - 8 parallel heals can happen. - 128 heals will wait for their turn - Heals will be rejected if 128 heals are already waiting. >Change-Id: I2e99bf064db7bce71838ed9901a59ffd565ac390 >BUG: 1237381 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/11471 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> BUG: 1238476 Change-Id: Id9625536197c1f5d6c3a9ed53f80bd9e6d14c507 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11679 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Remove dead codePranith Kumar K2015-07-061-1376/+2
| | | | | | | | | | | | | | | | | | | | BUG: 1238476 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> > Change-Id: I99d7a038f29cebe823e17a8dda40d335441185bc > BUG: 1237381 > Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> > Reviewed-on: http://review.gluster.org/11472 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > (cherry picked from commit c66026b9bf521172f49ce36a5a7b94fae1bbf267) Change-Id: If3cb2ca7ea5e3c2c365b414ccc41323bf9d183c1 Reviewed-on: http://review.gluster.org/11497 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* ec: Porting messages to new logging frameworkNandaja Varma2015-06-271-51/+61
| | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/10465/ cherry-picked from commit b0b9eaea9dbb4e9a535f5e969defc4556a9e2204 >Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c >BUG: 1194640 >Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c BUG: 1217722 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/11429 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* EC : While Healing a file, set the config xattrAshish Pandey2015-06-261-1/+14
| | | | | | | | | | | | | | Problem : trusted.ec.config attr was missing for the healed file Solution: Writing trusted.ec.config while healing a file. Change-Id: I340dd45ff8ab5bc1cd6e9b0cd2b2ded236e5acf0 BUG: 1235629 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11415 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Ignore differences in non locked inodesXavier Hernandez2015-05-301-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10974 When ec combines iatt structures from multiple bricks, it checks for equality in important fields. This is ok for iatt related to inodes involved in the operation that have been locked before starting execution. However some fops return iatt information from other inodes. For example a rename locks source and destination parent directories, but it also returns an iatt from the entry itself. In these cases we ignore differences in some fields to avoid false detection of inconsistencies and trigger unnecessary self-heals. Another issue is solved in this patch that caused that the real size of the file stored into the inode context was lost during self-heal. BUG: 1225796 Change-Id: I29f328a7b4895368ded859f3bae0359436c3588f Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/10983 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Fix all EIO errors in ECPranith Kumar K2015-05-281-51/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10770 Backport of http://review.gluster.org/10806 Backport of http://review.gluster.org/10787 Backport of http://review.gluster.org/10868 Backport of http://review.gluster.com/10852 - When a blocking lock is requested, lock request is succeeded even when ec->fragment number of locks are acquired successfully in non-blocking locking phase. This will lead to fop succeeding only on the bricks where the locks are acquired, leading to the necessity of self-heals. To prevent these un-necessary self-heals, if the remaining locks fail with EAGAIN in non-blocking lock phase try blocking locking phase instead. - Handle lookup failures while op in progress - cluster/ec: Correctly cleanup delayed locks When a delayed lock is pending, a graph switch doesn't correctly terminate it. This means that the update of version and size xattrs is lost, causing EIO errors. This patch handles GF_EVENT_PARENT_DOWN event to correctly finish pending udpdates before completing the graph switch. - Fix use after free crash ec_heal creates ec_fop_data but doesn't run ec_manager. ec_fop_data_allocate adds this fop to ec->pending_fops, because ec_manager is not run on this heal fop it is never removed from ec->pending_fops. When it is accessed after free it leads to crash. It is better to not to add HEAL fops to ec->pending_fops because we don't want graph switch to hang the mount because of a BIG file/directory heal. - Forced unlock when lock contention is detected EC uses an eager lock mechanism to optimize multiple read/write requests on the same entry or inode. This increases performance but can have adverse results when other clients try to access the same entry/inode. To solve this, this patch adds a functionality to detect when this happens and force an earlier release to not block other clients. The method consists on requesting GF_GLUSTERFS_INODELK_COUNT and GF_GLUSTERFS_ENTRYLK_COUNT for all fops that take a lock. When this count is greater than one, the lock is marked to be released. All fops already waiting for this lock will be executed normally before releasing the lock, but new requests that also require it will be blocked and restarted after the lock has been released and reacquired again. Another problem was that some operations did correctly lock the parent of an entry when needed, but got the size and version xattrs from the entry instead of the parent. This patch solves this problem by binding all queries of size and version to each lock and replacing all entrylk calls by inodelk ones to remove concurrent updates on directory metadata. This also allows rename to correctly update source and destination directories. BUG: 1225279 Change-Id: I02a6084b138dd38e018a462347cd9ce38610c7ef Reviewed-on: http://review.gluster.org/10926 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* ec: Fix failures with missing filesXavier Hernandez2015-05-091-32/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.com/9407 When a file does not exist on a brick but it does on others, there could be problems trying to access it because there was some loc_t structures with null 'pargfid' but 'name' was set. This forced inode resolution based on <pargfid>/name instead of <gfid> which would be the correct one. To solve this problem, 'name' is always set to NULL when 'pargfid' is not present. Another problem was caused by an incorrect management of errors while doing incremental locking. The only allowed error during an incremental locking was ENOTCONN, but missing files on a brick can be returned as ESTALE. This caused an EIO on the operation. This patch doesn't care of errors during an incremental locking. At the end of the operation it will check if there are enough successfully locked bricks to continue or not. BUG: 1220011 Change-Id: I4a1e6235d80e20ef7ef12daba0807b859ee5c435 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/10701 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Change meaning of trusted.ec.dirtyPranith Kumar K2015-05-081-168/+439
| | | | | | | | | | | | | | | | | - With this change, the xattr will represent if the file needs to be healed or not. It will have different values for data/entry and metadata changes. - inode ref leaks and dict_set_dynstr related leaks fixed - Added support for trylock/lock based on heal-cmd execution or not in data heal. - Made fixes to pass regression runs Change-Id: I9d8def4c2badde18a76b7898816fecfac113737a BUG: 1216303 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10385 Reviewed-on: http://review.gluster.org/10693 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: data heal implementation for ecPranith Kumar K2015-05-081-7/+786
| | | | | | | | | | | | | | | | | | | | | | | | | Data self-heal: 1) Take inode lock in domain 'this->name:self-heal' on 0-0 range (full file), So that no other processes try to do self-heal at the same time. 2) Take inode lock in domain 'this->name' on 0-0 range (full file), 3) perform fxattrop+fstat and get the xattrs on all the bricks 3) Choose the brick with ec->fragment number of same version as source 4) Truncate sinks 5) Unlock lock taken in 2) 5) For each block take full file lock, Read from sources write to the sinks, Unlock 6) Take full file lock and see if the file is still sane copy i.e. File didn't become unusable while the bricks are offline. Update mtime to before healing 7) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 8) unlock lock acquired in 6) 9) unlock lock acquired in 1) Change-Id: I6f4d42cd5423c767262c9d7bb5ca7767adb3e5fd BUG: 1216303 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10384 Reviewed-on: http://review.gluster.org/10692 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: metadata/name/entry heal implementation for ecPranith Kumar K2015-05-081-0/+1053
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Metadata self-heal: 1) Take inode lock in domain 'this->name' on 0-0 range (full file) 2) perform lookup and get the xattrs on all the bricks 3) Choose the brick with highest version as source 4) Setattr uid/gid/permissions 5) removexattr stale xattrs 6) Setxattr existing/new xattrs 7) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 8) unlock lock acquired in 1) Entry self-heal: 1) take directory lock in domain 'this->name:self-heal' on 'NULL' to prevent more than one self-heal 2) we take directory lock in domain 'this->name' on 'NULL' 3) Perform lookup on version, dirty and remember the values 4) unlock lock acquired in 2) 5) readdir on all the bricks and trigger name heals 6) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 7) unlock lock acquired in 1) Name heal: 1) Take 'name' lock in 'this->name' on 'NULL' 2) Perform lookup on 'name' and get stat and xattr structures 3) Build gfid_db where for each gfid we know what subvolumes/bricks have a file with 'name' 4) Delete all the stale files i.e. the file does not exist on more than ec->redundancy number of bricks 5) On all the subvolumes/bricks with missing entry create 'name' with same type,gfid,permissions etc. 6) Unlock lock acquired in 1) Known limitation: At the moment with present design, it conservatively preserves the 'name' in case it can not decide whether to delete it. this can happen in the following scenario: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) rename d1/f1 -> d2/f2 is performed but the rename is successful only on one of the bricks (Lets say B) 3) Now name self-heal on d1 and d2 would re-create the file on both d1 and d2 resulting in d1/f1 and d2/f2. Because we wanted to prevent data loss in the case above, the following scenario is not healable, i.e. it needs manual intervention: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) We have two hard links: d1/a, d2/b and another file d3/c even before the brick went down 3) rename d3/c -> d2/b is performed 4) Now name self-heal on d2/b doesn't heal because d2/b with older gfid will not be deleted. One could think why not delete the link if there is more than 1 hardlink, but that leads to similar data loss issue I described earlier: Scenario: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) We have two hard links: d1/a, d2/b 3) rename d1/a -> d3/c, d2/b -> d4/d is performed and both the operations are successful only on one of the bricks (Lets say B) 4) Now name self-heal on the 'names' above which can happen in parallel can decide to delete the file thinking it has 2 links but after all the self-heals do unlinks we are left with data loss. Change-Id: I3a68218a47bb726bd684604efea63cf11cfd11be BUG: 1216303 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10298 Reviewed-on: http://review.gluster.org/10691 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* cluster/ec: add separate versions for data/entry, metadataAshish Pandey2015-05-081-2/+4
| | | | | | | | | | | | | | | | | | | Adding 64 bits in "version" key of extended attributes. First 64 bits (Left) represents Data version. Last 64 bits (right) represents Meta Data version. Note: 3.7 and 3.6 version ec can't co-exist with this change because xattrop in 3.6 will fail with ERANGE as the buffer passed to it will be '8' bytes where as the value will be 16 bytes in 3.7. Where as 3.7 version clients can work with old version files. For upgrades we need to tell users to complete heals and then upgrade BUG: 1215265 Change-Id: Ib85114680cb7e75b8371c984d9f7b6401c1ffb93 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10312 Reviewed-on: http://review.gluster.org/10626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Entry self-heal fixesPranith Kumar K2015-04-131-79/+27
| | | | | | | | | | | | | | | | - Directory deletion should always happen with 'rm -rf' flag, otherwise the call may fail with ENOTEMPTY. - Instead of doing an explicit 'link' call, perform mknod call with GLUSTERFS_INTERNAL_FOP_KEY which acts as 'link' if the gfid already exists. Change-Id: I8826f92170421db37efb67dfc00afad4ab695907 BUG: 1207085 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10045 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/ec: Allow heal on name less locPranith Kumar K2015-03-051-7/+33
| | | | | | | | | | | | loc->parent may not always be populated. Even in those cases, self-heal should happen if it can be completed using nameless loc. Change-Id: I8871fc811bec8b881ae7fb09dcd202c6693b9877 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9717 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Add trusted.ec.dirty xattrXavier Hernandez2015-02-231-36/+57
| | | | | | | | | | | | | | | | | This xattr will be incremented before each data modifying operation and decremented after it. This will add the possibility to detect partially updated writes and refuse them on reads. It will also be useful for interacting with index xlator and have a way to heal dispersed files from the self-heal daemon. Change-Id: Ie644a8dd074ae0f254c809c5863bdb030be5486a BUG: 1190581 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix posix compliance failuresXavier Hernandez2015-01-281-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch solves some problems that caused dispersed volumes to not pass posix smoke tests: * Problems in open/create with O_WRONLY Opening files with -w- permissions using O_WRONLY returned an EACCES error because internally O_WRONLY was replaced with O_RDWR. * Problems with entrylk on renames. When source and destination were the same, ec tried to acquire the same entrylk twice, causing a deadlock. * Overwrite of a variable when reordering locks. On a rename, if the second lock needed to be placed at the beggining of the list, the 'lock' variable was overwritten and later its timer was cancelled, cancelling the incorrect one. * Handle O_TRUNC in open. When O_TRUNC was received in an open call, it was blindly propagated to child subvolumes. This caused a discrepancy between real file size and the size stored into trusted.ec.size xattr. This has been solved by removing O_TRUNC from open and later calling ftruncate. Change-Id: I20c3d6e1c11be314be86879be54b728e01013798 BUG: 1161886 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9420 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* ec: Remove O_APPEND from flags on create and open.Xavier Hernandez2015-01-091-1/+1
| | | | | | | | | | | | | | | | | | | Allowing O_APPEND flag to pass through to the brick files corrupts fragment contents because writes are not stored on the desired place. Write fop has been modified so that it uses current file size as its write offset. This guarantees that all writes, even those comming from different file descriptors and clients, will write to the end of the file. Change-Id: I9f721f12217a98231fe52e344166d1c94172c272 BUG: 1161621 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9079 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Do not modify quota, selinux xattrs in healingPranith Kumar K2015-01-081-14/+38
| | | | | | | | | | | | | | | | | Problem: EC heal tries to heal quota-size, selinux xattrs as well. quota-size is private to the brick but since quotad accesses them using the standard interface as well, they can not be filtered in the fops. Fix: Ignore QUOTA_SIZE_KEY and SELINUX xattrs during heal. Change-Id: I1572f9e2fcba7f120b4265e034953a15ff297f04 BUG: 1179640 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9401 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Fix return errors when not enough bricksXavier Hernandez2014-12-051-0/+1
| | | | | | | | | | | | | | | | | | | | | Changes introduced by this patch: * Fix an incorrect error propagation when the state of the life cycle of a fop returns an error. * Fix incorrect unlocking of failed locks. * Return ENOTCONN if there aren't enough bricks online. * In readdir(p) check that the fd has been successfully open by a previous opendir. Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe BUG: 1162805 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9098 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix self-healing issues.Xavier Hernandez2014-12-041-45/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three problems have been detected: 1. Self healing is executed in background, allowing the fop that detected the problem to continue without blocks nor delays. While this is quite interesting to avoid unnecessary delays, it can cause spurious failures of self-heal because it may try to recover a file inside a directory that a previous self-heal has not recovered yet, causing the file self-heal to fail. 2. When a partial self-heal is being executed on a directory, if a full self-heal is attempted, it won't be executed because another self-heal is already in process, so the directory won't be fully repaired. 3. Information contained in loc's of some fop's is not enough to do a complete self-heal. To solve these problems, I've made some changes: * Improved ec_loc_from_loc() to add all available information to a loc. * Before healing an entry, it's parent is checked and partially healed if necessary to avoid failures. * All heal requests received for the same inode while another self-heal is being processed are queued. When the first heal completes, all pending requests are answered using the results of the first heal (without full execution), unless the first heal was a partial heal. In this case all partial heals are answered, and the first full heal is processed normally. * An special virtual xattr (not physically stored on bricks) named 'trusted.ec.heal' has been created to allow synchronous self-heal of files. Now, the recommended way to heal an entire volume is this: find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; Some minor changes: * ec_loc_prepare() has been renamed to ec_loc_update(). * All loc management functions return 0 on success and -1 on error. * Do not delay fop unlocks if heal is needed. * Added basic ec xattrs initially on create, mkdir and mknod fops. * Some coding style changes Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985 BUG: 1161588 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9072 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Change licenseXavier Hernandez2014-12-031-16/+6
| | | | | | | | | | Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b BUG: 1168167 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9201 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix self-heal issuesXavier Hernandez2014-10-211-17/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Doing an 'ls' of a directory that has been modified while one of the bricks was down, sometimes returns the old directory contents. Cause: Directories are not marked when they are modified as files are. The ec xlator balances requests amongst available and healthy bricks. Since there is no way to detect that a directory is out of date in one of the bricks, it is used from time to time to return the directory contents. Solution: Basically the solution consists in use versioning information also for directories, however some additional changes have been necessary. Changes: * Use directory versioning: This required to lock full directory instead of a single entry for all requests that add or remove entries from it. This is needed to allow atomic version update. This affects the following fops: create, mkdir, mknod, link, symlink, rename, unlink, rmdir Another side effect is that opendir requires to do a previous lookup to get versioning information and discard out of date bricks for subsequent readdir(p) calls. * Restrict directory self-heal: Till now, when one discrepancy was found in lookup, a self-heal was automatically started. This caused the versioning information of a bad directory to be healed instantly, making the original problem to reapear again. To solve this, when a missing directory is detected in one or more bricks on lookup or opendir fops, only a partial self-heal is performed on it. A partial self-heal basically creates the directory but does not restore any additional information. This avoids that an 'ls' could repair the directory and cause the problem to happen again. With this change, output of 'ls' is always consistent. However, since the directory has been created in the brick, this allows any other operation on it (create new files, for example) to succeed on all bricks and not add additional work to the self-heal process. To force a self-heal of a directory, any other operation must be done on it. For example a getxattr. With these changes, the correct healing procedure that would avoid inconsistent directory browsing consists on a post-order traversal of directoriesi being healed. This way, the directory contents will be healed before healing the directory itslef. * Additional changes to fix self-heal errors - Don't use fop->fd to decide between fd/loc. open, opendir and create have an fd, but the correct data is in loc. - Fix incorrect management of bad bricks per inode/fd. - Fix incorrect selection of fop's target bricks when there are bad bricks involved. - Improved ec_loc_parent() to always return a parent loc as complete as possible. Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad BUG: 1149726 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8916 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Fix incorrect management of healed bricksXavier Hernandez2014-10-201-6/+8
| | | | | | | | | | | | | | | | | | The final lookup made to restore final file attributes after a self-heal did clear the mask of bad bricks, causing that the final setattr won't modify any brick at all. This caused that some attriutes, specially the modification time of the file didn't get updated properly. Now the mask of healed bricks is saved before doing the last lookup. It's also used to correctly report the repaired bricks. Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc BUG: 1149723 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8905 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix invalid inode lock in ftruncateXavier Hernandez2014-09-191-2/+2
| | | | | | | | | | | | | | | | | | | The fops 'truncate' and 'ftruncate' share some code and inodelk() was always made against the inode inside the loc_t structure instead of that of fd_t. Since ftruncate has the loc initialized to NULL, this fop was executed without any lock, allowing some concurrent modifications in the file size. Also changed the way in which 'fop' and 'ffop' are differentiated in shared code. Now it uses 'id' field instead of checking if 'fd' is NULL. Change-Id: Ibd18accf2652193b395a841b9029729e5f4867c6 BUG: 1140396 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8695 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Only heal data/metadata when inode has enough informationXavier Hernandez2014-09-151-0/+8
| | | | | | | | | | | | | | Sometimes loc_t structure in a heal request doesn't contain enough information to do an inodelk call (basically the gfid is missing). In these cases, self heal only recovers entry information. Change-Id: I459990c7df728ff4baf164df046672ddcde3efa5 BUG: 1122581 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8368 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* porting: include compat-errno.h for errno'sHarshavardhana2014-07-221-0/+1
| | | | | | | | | | | | | | | disperse module fails to compile since ENODATA is non-existent on FreeBSD/Darwin Use errno conversion in compat-errno.h to avoid build issues. Change-Id: I8203b7195c198c77202bde9bbec1813a487f923a BUG: 1111774 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/8320 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
* ec: Fixed coveriry scan issuesXavier Hernandez2014-07-211-28/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CID list: 1226163 Logically dead code 1226166 Missing break in switch 1226167 Missing break in switch 1226168 Missing break in switch 1226169 Missing break in switch 1226170 Missing break in switch 1226171 Missing break in switch 1226172 Missing break in switch 1226173 Missing break in switch 1226174 Missing break in switch 1226175 Missing break in switch 1226176 Missing break in switch 1226177 Missing break in switch 1226178 Data race condition 1226179 Data race condition 1226180 Data race condition 1226181 Thread deadlock 1226182 Uninitialized pointer read 1226183 Uninitialized pointer read 1226184 Read from pointer after free Change-Id: I4d33aa42289371927175c43bb29e018df64fb943 BUG: 789278 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8317 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Added erasure code translatorXavier Hernandez2014-07-111-0/+1470
Change-Id: I293917501d5c2ca4cdc6303df30cf0b568cea361 BUG: 1118629 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/7749 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>