summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/ec/src/ec-locks.c
Commit message (Collapse)AuthorAgeFilesLines
* cluster/ec: Fix tracking of good bricksXavier Hernandez2015-08-141-16/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | The bitmask of good and bad bricks was kept in the context of the corresponding inode or fd. This was problematic when an external process (another client or the self-heal process) did heal the bricks but no one changed the bitmaks of other clients. This patch removes the bitmask stored in the context and calculates which bricks are healthy after locking them and doing the initial xattrop. After that, it's updated using the result of each fop. > Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c > BUG: 1236065 > Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> > Reviewed-on: http://review.gluster.org/11844 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Change-Id: Idbe68b28b865c4b28366703ad1e96ae16ba44b66 BUG: 1235964 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11867 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Minimize usage of EIO errorXavier Hernandez2015-08-081-146/+73
| | | | | | | | | | | | | | | | | | >Change-Id: I82e245615419c2006a2d1b5e94ff0908d2f5e891 >BUG: 1245276 >Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> >Reviewed-on: http://review.gluster.org/11741 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> Change-Id: Ifd3d63f88a686a2963c5ba2e62110249f84f338d BUG: 1250864 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11852 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Propogate correct errno in case of failuresPranith Kumar K2015-07-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | - Also remove internal-fop setting in create/mknod etc xattrs. Rebalance was failing because ec was giving EIO when lock acquiring fails as the file/dir doesn't exist. Posix_create/mknod are not setting config xattr because internal-fop key is present in dict and setxattr for this fails leading to failure in setting rest of xattrs. >Change-Id: Ifb429c8db9df7cd51e4f8ce53fdf1e1b975c9993 >BUG: 1242254 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/11639 >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> BUG: 1243654 Change-Id: Iedb90d6a7d980fb88d6dfa6a6c978a165a4be3fd Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11688 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Porting messages to new logging frameworkNandaja Varma2015-06-271-49/+104
| | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/10465/ cherry-picked from commit b0b9eaea9dbb4e9a535f5e969defc4556a9e2204 >Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c >BUG: 1194640 >Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c BUG: 1217722 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/11429 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Fix all EIO errors in ECPranith Kumar K2015-05-281-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10770 Backport of http://review.gluster.org/10806 Backport of http://review.gluster.org/10787 Backport of http://review.gluster.org/10868 Backport of http://review.gluster.com/10852 - When a blocking lock is requested, lock request is succeeded even when ec->fragment number of locks are acquired successfully in non-blocking locking phase. This will lead to fop succeeding only on the bricks where the locks are acquired, leading to the necessity of self-heals. To prevent these un-necessary self-heals, if the remaining locks fail with EAGAIN in non-blocking lock phase try blocking locking phase instead. - Handle lookup failures while op in progress - cluster/ec: Correctly cleanup delayed locks When a delayed lock is pending, a graph switch doesn't correctly terminate it. This means that the update of version and size xattrs is lost, causing EIO errors. This patch handles GF_EVENT_PARENT_DOWN event to correctly finish pending udpdates before completing the graph switch. - Fix use after free crash ec_heal creates ec_fop_data but doesn't run ec_manager. ec_fop_data_allocate adds this fop to ec->pending_fops, because ec_manager is not run on this heal fop it is never removed from ec->pending_fops. When it is accessed after free it leads to crash. It is better to not to add HEAL fops to ec->pending_fops because we don't want graph switch to hang the mount because of a BIG file/directory heal. - Forced unlock when lock contention is detected EC uses an eager lock mechanism to optimize multiple read/write requests on the same entry or inode. This increases performance but can have adverse results when other clients try to access the same entry/inode. To solve this, this patch adds a functionality to detect when this happens and force an earlier release to not block other clients. The method consists on requesting GF_GLUSTERFS_INODELK_COUNT and GF_GLUSTERFS_ENTRYLK_COUNT for all fops that take a lock. When this count is greater than one, the lock is marked to be released. All fops already waiting for this lock will be executed normally before releasing the lock, but new requests that also require it will be blocked and restarted after the lock has been released and reacquired again. Another problem was that some operations did correctly lock the parent of an entry when needed, but got the size and version xattrs from the entry instead of the parent. This patch solves this problem by binding all queries of size and version to each lock and replacing all entrylk calls by inodelk ones to remove concurrent updates on directory metadata. This also allows rename to correctly update source and destination directories. BUG: 1225279 Change-Id: I02a6084b138dd38e018a462347cd9ce38610c7ef Reviewed-on: http://review.gluster.org/10926 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* ec: Fix failures with missing filesXavier Hernandez2015-05-091-211/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.com/9407 When a file does not exist on a brick but it does on others, there could be problems trying to access it because there was some loc_t structures with null 'pargfid' but 'name' was set. This forced inode resolution based on <pargfid>/name instead of <gfid> which would be the correct one. To solve this problem, 'name' is always set to NULL when 'pargfid' is not present. Another problem was caused by an incorrect management of errors while doing incremental locking. The only allowed error during an incremental locking was ENOTCONN, but missing files on a brick can be returned as ESTALE. This caused an EIO on the operation. This patch doesn't care of errors during an incremental locking. At the end of the operation it will check if there are enough successfully locked bricks to continue or not. BUG: 1220011 Change-Id: I4a1e6235d80e20ef7ef12daba0807b859ee5c435 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/10701 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Fix return errors when not enough bricksXavier Hernandez2014-12-051-0/+3
| | | | | | | | | | | | | | | | | | | | | Changes introduced by this patch: * Fix an incorrect error propagation when the state of the life cycle of a fop returns an error. * Fix incorrect unlocking of failed locks. * Return ENOTCONN if there aren't enough bricks online. * In readdir(p) check that the fd has been successfully open by a previous opendir. Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe BUG: 1162805 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9098 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Change licenseXavier Hernandez2014-12-031-16/+6
| | | | | | | | | | Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b BUG: 1168167 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9201 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix self-heal issuesXavier Hernandez2014-10-211-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Doing an 'ls' of a directory that has been modified while one of the bricks was down, sometimes returns the old directory contents. Cause: Directories are not marked when they are modified as files are. The ec xlator balances requests amongst available and healthy bricks. Since there is no way to detect that a directory is out of date in one of the bricks, it is used from time to time to return the directory contents. Solution: Basically the solution consists in use versioning information also for directories, however some additional changes have been necessary. Changes: * Use directory versioning: This required to lock full directory instead of a single entry for all requests that add or remove entries from it. This is needed to allow atomic version update. This affects the following fops: create, mkdir, mknod, link, symlink, rename, unlink, rmdir Another side effect is that opendir requires to do a previous lookup to get versioning information and discard out of date bricks for subsequent readdir(p) calls. * Restrict directory self-heal: Till now, when one discrepancy was found in lookup, a self-heal was automatically started. This caused the versioning information of a bad directory to be healed instantly, making the original problem to reapear again. To solve this, when a missing directory is detected in one or more bricks on lookup or opendir fops, only a partial self-heal is performed on it. A partial self-heal basically creates the directory but does not restore any additional information. This avoids that an 'ls' could repair the directory and cause the problem to happen again. With this change, output of 'ls' is always consistent. However, since the directory has been created in the brick, this allows any other operation on it (create new files, for example) to succeed on all bricks and not add additional work to the self-heal process. To force a self-heal of a directory, any other operation must be done on it. For example a getxattr. With these changes, the correct healing procedure that would avoid inconsistent directory browsing consists on a post-order traversal of directoriesi being healed. This way, the directory contents will be healed before healing the directory itslef. * Additional changes to fix self-heal errors - Don't use fop->fd to decide between fd/loc. open, opendir and create have an fd, but the correct data is in loc. - Fix incorrect management of bad bricks per inode/fd. - Fix incorrect selection of fop's target bricks when there are bad bricks involved. - Improved ec_loc_parent() to always return a parent loc as complete as possible. Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad BUG: 1149726 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8916 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Fix invalid inode lock in ftruncateXavier Hernandez2014-09-191-6/+6
| | | | | | | | | | | | | | | | | | | The fops 'truncate' and 'ftruncate' share some code and inodelk() was always made against the inode inside the loc_t structure instead of that of fd_t. Since ftruncate has the loc initialized to NULL, this fop was executed without any lock, allowing some concurrent modifications in the file size. Also changed the way in which 'fop' and 'ffop' are differentiated in shared code. Now it uses 'id' field instead of checking if 'fd' is NULL. Change-Id: Ibd18accf2652193b395a841b9029729e5f4867c6 BUG: 1140396 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8695 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fixed coveriry scan issuesXavier Hernandez2014-07-211-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CID list: 1226163 Logically dead code 1226166 Missing break in switch 1226167 Missing break in switch 1226168 Missing break in switch 1226169 Missing break in switch 1226170 Missing break in switch 1226171 Missing break in switch 1226172 Missing break in switch 1226173 Missing break in switch 1226174 Missing break in switch 1226175 Missing break in switch 1226176 Missing break in switch 1226177 Missing break in switch 1226178 Data race condition 1226179 Data race condition 1226180 Data race condition 1226181 Thread deadlock 1226182 Uninitialized pointer read 1226183 Uninitialized pointer read 1226184 Read from pointer after free Change-Id: I4d33aa42289371927175c43bb29e018df64fb943 BUG: 789278 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8317 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Added erasure code translatorXavier Hernandez2014-07-111-0/+1292
Change-Id: I293917501d5c2ca4cdc6303df30cf0b568cea361 BUG: 1118629 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/7749 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>