glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/ec: Fix issues with eager locking	Xavier Hernandez	2016-05-04	1	-7/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to a race in timer cancellation, in some cases it was possible to unlock the lock while another concurrent fop that needed it continues execution as if it were not released. This patch also fixes an issue that caused a lock to not be released if an error was found while preparing ec_update_size_version(). > Change-Id: I1344a3f5ecfc333f05a09e62653838264c9c26b1 > BUG: 1331254 > Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> > Reviewed-on: http://review.gluster.org/14112 > Smoke: Gluster Build System <jenkins@build.gluster.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Chen Chen <chenchen@smartquerier.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Change-Id: I21edd17d914dfa8d2f98e6bbde50830496e12a92 BUG: 1330132 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14174 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	cluster/ec: Allow read fops to be processed in parallel	Xavier Hernandez	2015-11-24	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently ec only sends a single read request at a time for a given inode. Since reads do not interfere between them, this patch allows multiple concurrent read requests to be sent in parallel. This is a backport of these patches: > Change-Id: If853430482a71767823f39ea70ff89797019d46b > BUG: 1245689 > Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> > Reviewed-on: http://review.gluster.org/11742 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > > Change-Id: I6042129f09082497b80782b5704a52c35c78f44d > BUG: 1276031 > Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Change-Id: I1b1146d1fd1828b12bfc566cd76e5ea110f8909b BUG: 1251467 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12447 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Mark internal fops appropriately	Xavier Hernandez	2015-11-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) Mark read fops in read-modify-write by EC as internal. 2) Handle uid/gid set/reset correctly >BUG: 1282761 >Change-Id: I5c1ce0cd6213367eaead5fed33aa2397c4e46df7 >Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> >Reviewed-on: http://review.gluster.org/12599 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> BUG: 1283757 Change-Id: I9f039cf3ec6351525fb65381bad44d986595844f Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12669 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec : Mark new entry changelog in entry self-healv3.7.5	Ashish Pandey	2015-10-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : When a new entry is created dirty mark xattrs are not created this will need full heal to be performed, even when there are partial failures. Solution : Marks new entry changelog in self-heal. PS: Also fixed erasing of dirty markers when no data heal is required. BUG: 1258313 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Change-Id: I156e3d3201afa77efe118e1aaace1d91c90a9613 Reviewed-on: http://review.gluster.org/12306 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Fix write size in self-heal	Xavier Hernandez	2015-08-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Self-heal was always using a fixed block size to heal a file. This was incorrect for dispersed volumes with a number of data bricks not being a power of 2. This patch adjusts the block size to a multiple of the stripe size of the volume. It also propagates errors detected during the data heal to stop healing the file and not mark it as healed. This is a backport if http//review.gluster.org/11862 Change-Id: I5104ae4bfed8585ca40cb45831ca20582566370c BUG: 1236050 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11869 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Fix tracking of good bricks	Xavier Hernandez	2015-08-14	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bitmask of good and bad bricks was kept in the context of the corresponding inode or fd. This was problematic when an external process (another client or the self-heal process) did heal the bricks but no one changed the bitmaks of other clients. This patch removes the bitmask stored in the context and calculates which bricks are healthy after locking them and doing the initial xattrop. After that, it's updated using the result of each fop. > Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c > BUG: 1236065 > Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> > Reviewed-on: http://review.gluster.org/11844 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Change-Id: Idbe68b28b865c4b28366703ad1e96ae16ba44b66 BUG: 1235964 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11867 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Prevent data corruptions	Pranith Kumar K	2015-07-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- On lock reuse preserve 'healing' bits - Don't set ctx->size outside locks in healing code - Allow xattrop internal fops also on the fop->mask. >Change-Id: I6b76da5d7ebe367d8f3552cbf9fd18e556f2a171 >BUG: 1232678 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/11640 >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> BUG: 1243647 Change-Id: I1b3828e4d4a863b84b2c4e732e7965d1302cea47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11686 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec: wind readlink on good subvol(s)	Pranith Kumar K	2015-07-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	>BUG: 1232172 >Change-Id: I3a56e487840d86147dd85bf5fbe79b165eae289f >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/11589 >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> >Tested-by: Gluster Build System <jenkins@build.gluster.com> BUG: 1234679 Change-Id: I08560eee095a3921e9c24f16dc2a242a76018a42 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11687 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Add throttling in background healing	Pranith Kumar K	2015-07-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- 8 parallel heals can happen. - 128 heals will wait for their turn - Heals will be rejected if 128 heals are already waiting. >Change-Id: I2e99bf064db7bce71838ed9901a59ffd565ac390 >BUG: 1237381 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/11471 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> BUG: 1238476 Change-Id: Id9625536197c1f5d6c3a9ed53f80bd9e6d14c507 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11679 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: wind fops on good subvols for access/readdir[p]	Pranith Kumar K	2015-07-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.com/11246 BUG: 1234679 Change-Id: I2d774f62740c82e922efab50fc78fa74050ede93 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11357 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec: EC_XATTR_DIRTY doesn't come in response	Pranith Kumar K	2015-06-06	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.com/11078 Problem: ec_update_size_version expects all the keys it did xattrop with to come in response so that it can set the values again in ec_update_size_version_done. But EC_XATTR_DIRTY is not combined so the value won't be present in the response. So ctx->post/pre_dirty are not updated in ec_update_size_version_done. So these values are still non-zero. When ec_unlock_now is called as part of flush's unlock phase it again tries to perform same xattrop for EC_XATTR_DIRTY. But ec_update_size_version is not expected to be called in unlock phase of flush because ec_flush_size_version should have reset everything to zero and unlock is never invoked from ec_update_size_version_done for flush/fsync/fsyncdir. This leads to stale lock which leads to hang. Fix: EC_XATTR_DIRTY is removed in ex_xattrop_cbk and is never combined with other answers. So remove handling of this in the response. BUG: 1228160 Change-Id: I657efca6e706e7acb541f98f526943f67562da9f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11084 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	cluster/ec: Fix all EIO errors in EC	Pranith Kumar K	2015-05-28	1	-36/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/10770 Backport of http://review.gluster.org/10806 Backport of http://review.gluster.org/10787 Backport of http://review.gluster.org/10868 Backport of http://review.gluster.com/10852 - When a blocking lock is requested, lock request is succeeded even when ec->fragment number of locks are acquired successfully in non-blocking locking phase. This will lead to fop succeeding only on the bricks where the locks are acquired, leading to the necessity of self-heals. To prevent these un-necessary self-heals, if the remaining locks fail with EAGAIN in non-blocking lock phase try blocking locking phase instead. - Handle lookup failures while op in progress - cluster/ec: Correctly cleanup delayed locks When a delayed lock is pending, a graph switch doesn't correctly terminate it. This means that the update of version and size xattrs is lost, causing EIO errors. This patch handles GF_EVENT_PARENT_DOWN event to correctly finish pending udpdates before completing the graph switch. - Fix use after free crash ec_heal creates ec_fop_data but doesn't run ec_manager. ec_fop_data_allocate adds this fop to ec->pending_fops, because ec_manager is not run on this heal fop it is never removed from ec->pending_fops. When it is accessed after free it leads to crash. It is better to not to add HEAL fops to ec->pending_fops because we don't want graph switch to hang the mount because of a BIG file/directory heal. - Forced unlock when lock contention is detected EC uses an eager lock mechanism to optimize multiple read/write requests on the same entry or inode. This increases performance but can have adverse results when other clients try to access the same entry/inode. To solve this, this patch adds a functionality to detect when this happens and force an earlier release to not block other clients. The method consists on requesting GF_GLUSTERFS_INODELK_COUNT and GF_GLUSTERFS_ENTRYLK_COUNT for all fops that take a lock. When this count is greater than one, the lock is marked to be released. All fops already waiting for this lock will be executed normally before releasing the lock, but new requests that also require it will be blocked and restarted after the lock has been released and reacquired again. Another problem was that some operations did correctly lock the parent of an entry when needed, but got the size and version xattrs from the entry instead of the parent. This patch solves this problem by binding all queries of size and version to each lock and replacing all entrylk calls by inodelk ones to remove concurrent updates on directory metadata. This also allows rename to correctly update source and destination directories. BUG: 1225279 Change-Id: I02a6084b138dd38e018a462347cd9ce38610c7ef Reviewed-on: http://review.gluster.org/10926 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Change meaning of trusted.ec.dirty	Pranith Kumar K	2015-05-08	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- With this change, the xattr will represent if the file needs to be healed or not. It will have different values for data/entry and metadata changes. - inode ref leaks and dict_set_dynstr related leaks fixed - Added support for trylock/lock based on heal-cmd execution or not in data heal. - Made fixes to pass regression runs Change-Id: I9d8def4c2badde18a76b7898816fecfac113737a BUG: 1216303 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10385 Reviewed-on: http://review.gluster.org/10693 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec: add separate versions for data/entry, metadata	Ashish Pandey	2015-05-08	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding 64 bits in "version" key of extended attributes. First 64 bits (Left) represents Data version. Last 64 bits (right) represents Meta Data version. Note: 3.7 and 3.6 version ec can't co-exist with this change because xattrop in 3.6 will fail with ERANGE as the buffer passed to it will be '8' bytes where as the value will be 16 bytes in 3.7. Where as 3.7 version clients can work with old version files. For upgrades we need to tell users to complete heals and then upgrade BUG: 1215265 Change-Id: Ib85114680cb7e75b8371c984d9f7b6401c1ffb93 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10312 Reviewed-on: http://review.gluster.org/10626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Perform inode-write on healing subvols	Pranith Kumar K	2015-05-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of http://review.gluster.org/10372 If the version numbers do not match, then writes are performed only on at least N-R bricks which have same version. But if we want to do healing of files which are constantly modified we need to allow writes on subvols that are undergoing heal. Data healing will mark 62nd bit while the heal is going on. When the data transaction sees that this bit is set it needs to perform the fop on that subvol irrespective of whether the versions match or do not match. Fop is considered successful only if N-R non-healing bricks succeed. BUG: 1216303 Change-Id: I79aaf1ac86357c51547cdaaa56cf7338004cc512 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10440 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
*	cluster/ec: Allow heal on name less loc	Pranith Kumar K	2015-03-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	loc->parent may not always be populated. Even in those cases, self-heal should happen if it can be completed using nameless loc. Change-Id: I8871fc811bec8b881ae7fb09dcd202c6693b9877 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9717 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	ec: Add trusted.ec.dirty xattr	Xavier Hernandez	2015-02-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This xattr will be incremented before each data modifying operation and decremented after it. This will add the possibility to detect partially updated writes and refuse them on reads. It will also be useful for interacting with index xlator and have a way to heal dispersed files from the self-heal daemon. Change-Id: Ie644a8dd074ae0f254c809c5863bdb030be5486a BUG: 1190581 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Fix self-healing issues.	Xavier Hernandez	2014-12-04	1	-23/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Three problems have been detected: 1. Self healing is executed in background, allowing the fop that detected the problem to continue without blocks nor delays. While this is quite interesting to avoid unnecessary delays, it can cause spurious failures of self-heal because it may try to recover a file inside a directory that a previous self-heal has not recovered yet, causing the file self-heal to fail. 2. When a partial self-heal is being executed on a directory, if a full self-heal is attempted, it won't be executed because another self-heal is already in process, so the directory won't be fully repaired. 3. Information contained in loc's of some fop's is not enough to do a complete self-heal. To solve these problems, I've made some changes: * Improved ec_loc_from_loc() to add all available information to a loc. * Before healing an entry, it's parent is checked and partially healed if necessary to avoid failures. * All heal requests received for the same inode while another self-heal is being processed are queued. When the first heal completes, all pending requests are answered using the results of the first heal (without full execution), unless the first heal was a partial heal. In this case all partial heals are answered, and the first full heal is processed normally. * An special virtual xattr (not physically stored on bricks) named 'trusted.ec.heal' has been created to allow synchronous self-heal of files. Now, the recommended way to heal an entire volume is this: find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; Some minor changes: * ec_loc_prepare() has been renamed to ec_loc_update(). * All loc management functions return 0 on success and -1 on error. * Do not delay fop unlocks if heal is needed. * Added basic ec xattrs initially on create, mkdir and mknod fops. * Some coding style changes Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985 BUG: 1161588 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9072 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	ec: Change license	Xavier Hernandez	2014-12-03	1	-16/+6
\| \| \| \| \| \| \| \| \| \|	Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b BUG: 1168167 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9201 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Fix rebalance issues	Xavier Hernandez	2014-10-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some issues in ec xlator made that rebalance didn't complete successfully and generated some warnings and errors in the log. The most critical error was a race condition that caused false corruption detection when two specific operations were executed sequentially and they shared the same lock. This explains the problem: 1. A setxattr is issued. 2. setxattr: ec locks the inode before updating the xattr. 3. setxattr: The xattr is updated. 4. setxattr: Upper xlator is notified that the operation completed. 5. setxattr: A background task is initiated to update the version of the file. 6. A stat is issued on the same file. 7. stat: Since the lock is already acquired, it's reused. 8. stat: A lookup is issued to determine version and size information of the file. At this point, operations 5 and 8 can interfere. This can make that lookup sees different information on each brick, determining that some bricks are corrupted and incorrectly excluding them from the operation and initiating a self-heal. In some cases this false detection combined with self-heal could lead to invalid updates of the trusted.ec.size xattr, leaving the file smaller than it should be. This only happens if the first operation does not perform a lookup, because chained operations reuse the information returned by the previous one, avoiding this kind of problems. To solve this, now the background update is executed atomically with the posterior unlock. This avoids some reuses of the lock while updating. However this reduces performance because the window in which new requests can reuse the lock is much smaller now. This has been alleviated by using the same technique implemented in AFR (i.e. waiting some time before releasing the lock). Some minor changes also introduced in this patch: * Bug in management of 'trusted.glusterfs.pathinfo' that was writing beyond the allocated space. * Uninitialized variable. * trusted.ec.config was not created for regular files created with mknod. * An invalid state was used in access fop. Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395 BUG: 1152902 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8947 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Fix self-heal issues	Xavier Hernandez	2014-10-21	1	-23/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Doing an 'ls' of a directory that has been modified while one of the bricks was down, sometimes returns the old directory contents. Cause: Directories are not marked when they are modified as files are. The ec xlator balances requests amongst available and healthy bricks. Since there is no way to detect that a directory is out of date in one of the bricks, it is used from time to time to return the directory contents. Solution: Basically the solution consists in use versioning information also for directories, however some additional changes have been necessary. Changes: * Use directory versioning: This required to lock full directory instead of a single entry for all requests that add or remove entries from it. This is needed to allow atomic version update. This affects the following fops: create, mkdir, mknod, link, symlink, rename, unlink, rmdir Another side effect is that opendir requires to do a previous lookup to get versioning information and discard out of date bricks for subsequent readdir(p) calls. * Restrict directory self-heal: Till now, when one discrepancy was found in lookup, a self-heal was automatically started. This caused the versioning information of a bad directory to be healed instantly, making the original problem to reapear again. To solve this, when a missing directory is detected in one or more bricks on lookup or opendir fops, only a partial self-heal is performed on it. A partial self-heal basically creates the directory but does not restore any additional information. This avoids that an 'ls' could repair the directory and cause the problem to happen again. With this change, output of 'ls' is always consistent. However, since the directory has been created in the brick, this allows any other operation on it (create new files, for example) to succeed on all bricks and not add additional work to the self-heal process. To force a self-heal of a directory, any other operation must be done on it. For example a getxattr. With these changes, the correct healing procedure that would avoid inconsistent directory browsing consists on a post-order traversal of directoriesi being healed. This way, the directory contents will be healed before healing the directory itslef. * Additional changes to fix self-heal errors - Don't use fop->fd to decide between fd/loc. open, opendir and create have an fd, but the correct data is in loc. - Fix incorrect management of bad bricks per inode/fd. - Fix incorrect selection of fop's target bricks when there are bad bricks involved. - Improved ec_loc_parent() to always return a parent loc as complete as possible. Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad BUG: 1149726 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8916 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	ec: Fix 32 bits issues on file size calculation	Xavier Hernandez	2014-10-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some additional 32 bits issues have been added by a recent patch. This patch solves it. Change-Id: Ice81032fbe8e36e5ccad19a781b7876891993906 BUG: 1146903 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8882 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Tested-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Fix incorrect management of healed bricks	Xavier Hernandez	2014-10-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The final lookup made to restore final file attributes after a self-heal did clear the mask of bad bricks, causing that the final setattr won't modify any brick at all. This caused that some attriutes, specially the modification time of the file didn't get updated properly. Now the mask of healed bricks is saved before doing the last lookup. It's also used to correctly report the repaired bricks. Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc BUG: 1149723 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8905 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Add config information in an xattr	Xavier Hernandez	2014-09-23	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To simplify backward compatibility of the ec xlator when some parameter or the implementation itself is changed, a new xattr is added to each file with the configuration needed to recover it. The new attribute is called 'trusted.ec.config', and it's a 64-bit value containing the following information: 8 bits: version of the config information (currently always 0) 8 bits: algorithm used to encode the file (currently always 0) 8 bits: size of the galois field (currently always 8) 8 bits: number of bricks 8 bits: redundancy 24 bits: chunk size (currently 512) This new xattr could allow, in a future version, to have different configurations per file. Change-Id: I8c12d40ff546cc201fc66caa367484be3d48aeb4 BUG: 1140861 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8770 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Fix some size_t vars to 64 bits even on 32 bits machines	Xavier Hernandez	2014-09-19	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 64 bits 'trusted.ec.size' extended attribute was incorrectly computed on 32 bits machines due to an overflow on negative numbers. Also changed some potentially dangerous uses of size_t in other places. Change-Id: Id76cfe49a2f350e564b5c71d8c8644fb9ce86662 BUG: 1125312 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8738 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Optimize read/write performance	Xavier Hernandez	2014-09-15	1	-3/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch significantly improves performance of read/write operations on a dispersed volume by reusing previous inodelk/ entrylk operations on the same inode/entry. This reduces the latency of each individual operation considerably. Inode version and size are also updated when needed instead of on each request. This gives an additional boost. Change-Id: I4b98d5508c86b53032e16e295f72a3f83fd8fcac BUG: 1122586 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8369 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	cluster/ec: Added erasure code translator	Xavier Hernandez	2014-07-11	1	-0/+260
	Change-Id: I293917501d5c2ca4cdc6303df30cf0b568cea361 BUG: 1118629 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/7749 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>