summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/ec
Commit message (Collapse)AuthorAgeFilesLines
...
* locks: added inodelk/entrylk contention upcall notificationsXavier Hernandez2018-01-164-94/+239
| | | | | | | | | | | | | | The locks xlator now is able to send a contention notification to the current owner of the lock. This is only a notification that can be used to improve performance of some client side operations that might benefit from extended duration of lock ownership. Nothing is done if the lock owner decides to ignore the message and to not release the lock. For forced release of acquired resources, leases must be used. Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: OpenFD heal implementation for ECSunil Kumar Acharya2018-01-057-91/+243
| | | | | | | | | | | | | Existing EC code doesn't try to heal the OpenFD to avoid unnecessary healing of the data later. Fix implements the healing of open FDs before carrying out file operations on them by making an attempt to open the FDs on required up nodes. BUG: 1431955 Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* posix: Introduce flags for validity of iatt membersRavishankar N2017-12-291-2/+2
| | | | | | | | | | | | | | | | | | v1 of the patch started off as adding new fields to iatt that can be filled up using statx but the discussions were more around introducing masks to check the validity of different fields from a RIO perspective. To that extent, I have dropped the statx call in this version and introduced a 64 bit mask for existing fields. The masks I have defined are similar with the statx() flags' masks. I have *not* changed iatt_to_stat() to use the macros IATT_TYPE_VALID, IATT_GFID_VALID etc before blindly copying from struct iatt to struct. Also fixed warnings in xlators because of atime/mtime/ctime seconds field change from uint32_t to int64_t. Change-Id: I4ac614f1e8d5c8246fc99d5bc2d2a23e7941512b Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/ec: Fix possible shift overflowXavier Hernandez2017-12-223-10/+12
| | | | | | | | | | A coverity scan has revelaed a potential shift overflow while scanning the bitmap of available subvolumes. The actual overflow cannot happen, but I've changed to test used to control the limit to make it explicit. Change-Id: Ieb55f010bbca68a1d86a93e47822f7c709a26e83 BUG: 789278 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: Change [f]getxattr to parallel-dispatch-onePranith Kumar K2017-12-224-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment in EC, [f]getxattr operations wait to acquire a lock while other operations are in progress even when it is in the same mount with a lock on the file/directory. This happens because [f]getxattr operations follow the model where the operation is wound on 'k' of the bricks and are matched to make sure the data returned is same on all of them. This consistency check requires that no other operations are on-going while [f]getxattr operations are wound to the bricks. We can perform [f]getxattr in another way as well, where we find the good_mask from the lock that is already granted and wind the operation on any one of the good bricks and unwind the answer after adjusting size/blocks to the parent xlator. Since we are taking into account good_mask, the reply we get will either be before or after a possible on-going operation. Using this method, the operation doesn't need to depend on completion of on-going operations which could be taking long time (In case of some slow disks and writes are in progress etc). Thus we reduce the time to serve [f]getxattr requests. I changed [f]getxattr to dispatch-one and added extra logic in ec_link_has_lock_conflict() to not have any conflicts for fops with EC_MINIMUM_ONE as fop->minimum to achieve the effect described above. Modified scripts to make sure READ fop is received in EC to trigger heals. Updates gluster/glusterfs#368 Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Add default value for the redundancy optionKamal Mohanan2017-12-211-0/+1
| | | | | | | Updates #302 Change-Id: Ifccc6a64c2b2a3b3a0133e6c8042cdf61a0838ce Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
* all: Simplify component message id's definitionXavier Hernandez2017-12-141-572/+88
| | | | | | | | | This patch creates a new way of defining message id's that is easier and less error prone because it doesn't require so many manual changes each time a new component is defined or a new message created. Change-Id: I71ba8af9ac068f5add7e74f316a2478bc991c67b Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: Fix bugs in stripe-cache featureAshish Pandey2017-12-052-1/+4
| | | | | | | | | | | | | | 1 - This patch fixes a bug in ec_update_stripe() that prevented some stripes to be updated after a write. 2 - This patch also include code modification for the case in which a file does not exist and we write on unaligned offset and user size, the last stripe on which "end" will fall should also be cached. Change-Id: I069cb4be1c8d59c206e3b35a6991e1fbdbc9b474 BUG: 1520758 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: EC DISCARD doesn't punch hole properlySunil Kumar Acharya2017-11-281-2/+4
| | | | | | | | | | | | | | Problem: DISCARD operation on EC volume was punching hole of lesser size than the specified size in some cases. Solution: EC was not handling punch hole for tail part in some cases. Updated the code to handle it appropriately. BUG: 1516206 Change-Id: If3e69e417c3e5034afee04e78f5f78855e65f932 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: Prevent self-heal to work after PARENT_DOWNXavier Hernandez2017-11-282-28/+52
| | | | | | | | | | | | | | | | | | | When the volume is being stopped, PARENT_DOWN event is received. This instructs EC to wait until all pending operations are completed before declaring itself down. However heal operations are ignored and allowed to continue even after having said it was down. This may cause unexpected results and crashes. To solve this, heal operations are considered exactly equal as any other operation and EC won't propagate PARENT_DOWN until all operations, including healing, are complete. To avoid big delays if this happens in the middle of a big heal, a check has been added to quit current heal if shutdown is detected. Change-Id: I26645e236ebd115eb22c7ad4972461111a2d2034 BUG: 1515266 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* ec: Use tiebreaker_inodelk where necessaryPranith Kumar K2017-11-221-8/+11
| | | | | | | | | | | | | | | | | When there are big directories or files that need to be healed, other shds are stuck on getting lock on self-heal domain for these directories/files. If there is a tie-breaker logic, other shds can heal some other files/directories while 1 of the shds is healing the big file/directory. Before this patch: 96.67 4890.64 us 12.89 us 646115887.30us 340869 INODELK After this patch: 40.76 42.35 us 15.09 us 6546.50us 438478 INODELK Fixes gluster/glusterfs#354 Change-Id: Ia995b5576b44f770c064090705c78459e543cc64 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Fix op-version for disperse.other-eager-lockXavier Hernandez2017-11-141-1/+1
| | | | | | | | | The op-version used for the new option was wrong. It has been set to 3.13.0. Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf BUG: 1502610 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: Keep last written strip in in-memory cacheAshish Pandey2017-11-108-84/+519
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Consider an EC volume with configuration 4 + 2. The stripe size for this would be 512 * 4 = 2048. That means, 2048 bytes of user data stored in one stripe. Let's say 2048 + 512 = 2560 bytes are already written on this volume. 512 Bytes would be in second stripe. Now, if there are sequential writes with offset 2560 and of size 1 Byte, we have to read the whole stripe, encode it with 1 Byte and then again have to write it back. Next, write with offset 2561 and size of 1 Byte will again READ-MODIFY-WRITE the whole stripe. This is causing bad performance because of lots of READ request travelling over the network. There are some tools and scenario's where such kind of load is coming and users are not aware of that. Example: fio and zip Solution: One possible solution to deal with this issue is to keep last stripe in memory. This way, we need not to read it again and we can save READ fop going over the network. Considering the above example, we have to keep last 2048 bytes (maximum) in memory per file. Change-Id: I3f95e6fc3ff81953646d374c445a40c6886b0b85 BUG: 1471753 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: UNUSED_VALUE coverity fixSunil Kumar Acharya2017-11-081-0/+4
| | | | | | | | | | Problem: Return values are not used. Solution: Removed the unused values. BUG: 789278 Change-Id: Ica2b95bb659dfc7ec5346de0996b9d2fcd340f8d Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: REVERSE_INULL coverity fixSunil Kumar Acharya2017-11-081-2/+1
| | | | | | | | | | Problem: fop could be NULL. Solution: Check has been added to verify fop. BUG: 789278 Change-Id: I7e8d2c1bdd8960c609aa485f180688a95606ebf7 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: NEGATIVE_RETURNS coverity fixSunil Kumar Acharya2017-11-071-0/+5
| | | | | | | | | | Problem: Source could be negative. Solution: Check has been added to verify the same. BUG: 789278 Change-Id: I8cca6297327e7923b25949c20ec8d1a711207a76 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixSunil Kumar Acharya2017-11-071-4/+6
| | | | | | | | | | Problem: xattr could be NULL. Solution: Added check to verify the same. BUG: 789278 Change-Id: Ie013f5655f4621434e5023dd76cef44b976adc68 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: Remove possibility of NULL derefAshish Pandey2017-11-051-1/+1
| | | | | | | | | | | | | Coverity ID: 237 Problem: In ec_check_status we are trying to deref fop->answer which could be NULL. Solution: Check Null condition before using this pointer. Change-Id: I4f9a73dc2f062ca9c62b4c4baf0a6fcadade88f2 BUG: 789278 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: create eager-lock option for non-regular filesXavier Hernandez2017-11-053-13/+47
| | | | | | | | | A new option is added to allow independent configuration of eager locking for regular files and non-regular files. Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 BUG: 1502610 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixSunil Kumar Acharya2017-11-012-0/+5
| | | | | | | | | | Problem: cbk could be NULL. Solution: Assigned appropriate value to cbk. BUG: 789278 Change-Id: I2e4bba9a54f965c6a7bccf0b0cb6c5f75399f6e6 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: MISSING_BREAK coverity fixSunil Kumar Acharya2017-10-311-0/+3
| | | | | | | | | | Problem: switch case syntax issue. Solution: syntax fixed. BUG: 789278 Change-Id: I76da72c3ab6ffc5db671686a71d6a596beaf496e Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: Implement DISCARD FOP for ECSunil Kumar Acharya2017-10-255-48/+332
| | | | | | | | | | | Updates #254 This code change implements DISCARD FOP support for EC. BUG: 1461018 Change-Id: I09a9cb2aa9d91ec27add4f422dc9074af5b8b2db Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: Allow parallel writes in EC if possiblePranith Kumar K2017-10-248-142/+282
| | | | | | | | | | | | | | | | | | Problem: Ec at the moment sends one modification fop after another, so if some of the disks become slow, for a while then the wait time for the writes that are waiting in the queue becomes really bad. Fix: Allow parallel writes when possible. For this we need to make 3 changes. 1) Each fop now has range parameters they will be updating. 2) Xattrop is changed to handle parallel xattrop requests where some would be modifying just dirty xattr. 3) Fops that refer to size now take locks and update the locks. Fixes #251 Change-Id: Ibc3c15372f91bbd6fb617f0d99399b3149fa64b2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Coverity Fix UNUSED_VALUE in ec_create_nameKamal Mohanan2017-10-161-1/+1
| | | | | | | | | | | | | Problem: The value returned by cluster_mkdir is assigned to ret at ec-heal.c:1076. But this value is overwritten before it can be used. Solution: The return value of cluster_mkdir is ignored. It is not assigned to ret. Change-Id: Iee6b8d8b04e0bd800dd30d2c24cab755b9e63443 BUG: 789278 Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
* cluster/ec: Improve heal info command to handle obvious casesAshish Pandey2017-10-163-24/+41
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1 - If a brick is down and we see an index entry in .glusterfs/indices, we should show it in heal info output as it most certainly needs heal. 2 - The first problem is also not getting handled after ec_heal_inspect. Even if in ec_heal_inspect, lookup will mark need_heal as true, we don't handle it properly in ec_get_heal_info and continue with locked inspect which takes lot of time. Solution: 1 - In first case we need not to do any further invstigation. As soon as we see that a brick is down, we should say that this index entry needs heal for sure. 2 - In second case, if we have need_heal as _gf_true after ec_heal_inspect, we should show it as heal requires. Change-Id: Ibe7f9d7602cc0b382ba53bddaf75a2a2c3326aa6 BUG: 1476668 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: add functions for stripe alignmentXavier Hernandez2017-10-136-47/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes old functions to align offsets and sizes to stripe size boundaries and adds new ones to offer more possibilities. The new functions are: * ec_adjust_offset_down() Aligns a given offset to a multiple of the stripe size equal or smaller than the initial one. It returns the size of the gap between the aligned offset and the given one. * ec_adjust_offset_up() Aligns a given offset to a multiple of the stripe size equal or greater than the initial one. It returns the size of the skipped region between the given offset and the aligned one. If an overflow happens, the returned valid has negative sign (but correct value) and the offset is set to the maximum value (not aligned). * ec_adjust_size_down() Aligns the given size to a multiple of the stripe size equal or smaller than the initial one. It returns the size of the missed region between the aligned size and the given one. * ec_adjust_size_up() Aligns the given size to a multiple of the stripe size equal or greater than the initial one. It returns the size of the gap between the given size and the aligned one. If an overflow happens, the returned value has negative sign (but correct value) and the size is set to the maximum value (not aligned). These functions have been defined in ec-helpers.h as static inline since they are very small and compilers can optimize them (specially the 'scale' argument). Change-Id: I4c91009ad02f76c73772034dfde27ee1c78a80d7 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: Handle parallel get_size_versionPranith Kumar K2017-10-103-59/+102
| | | | | | Updates #251 Change-Id: I6244014dbc90af3239d63d75a064ae22ec12a054 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Improve performance with xattrop updateSunil Kumar Acharya2017-10-061-24/+102
| | | | | | | | | | | | Existing EC code updates the xattr on the subvolume in a sequential pattern resulting in very poor performance. With this fix EC now updates the xattr on the subvolume in parallel which improves the xattr update performance. BUG: 1445663 Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixAkarsha Rai2017-09-291-1/+1
| | | | | | | | | | Problem: ctx pointer could be NULL Solution: Updated the code to verify ctx pointer BUG: 789278 Change-Id: I25e07a07c6ebe2f630c99ba3aa9a61656fbaa981 Signed-off-by: Akarsha Rai <akrai@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixAkarsha Rai2017-09-291-0/+1
| | | | | | | | | | | | | Problem: cbk could be NULL. Solution: Returning NULL when memory is not allocated for cbk. BUG: 789278 Change-Id: Iea9128e0f3b95100deca560f690f9baaae226abf Signed-off-by: Akarsha Rai <akrai@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixAkarsha Rai2017-09-281-1/+3
| | | | | | | | | | Problem: Pool pointer could be NULL while destroying it. Solution: Verifying pointer before destroying it. BUG: 789278 Change-Id: I497d1310aa47cb749a4c992aa961bd4dfa23ee48 Signed-off-by: Akarsha Rai <akrai@redhat.com>
* Coverity Issue Fix : CHECKED_RETURNSubha sree Mohankumar2017-09-261-1/+1
| | | | | | | | | | Issue :Event check_return: Calling "ec_dict_set_number" without checking return value. Fix : Type casted the return value of the function "ec_dict_set_number" to void. Change-Id: Id97034f9b1b8591536d63dca680ca7c7a9c4fcc3 BUG: 789278 Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>
* cluster/ec: fix for BAD_SHIFT, follow-up patchKaleb S. KEITHLEY2017-09-201-11/+14
| | | | | | | | | | | | | | | | | | Address comments to https://review.gluster.org/18067, (Change-Id I86e15d12939c610c99f5f96c551bb870df20f4b4) Which was posted as an RFC as an example of a possible alternative fix to https://review.gluster.org/17860 (Change-Id I28a3bdd4a357526dba0cf84c262919c05cfa173e) An alternative fix that preserved the unsignedness of the indexes throughout, obviating the need to check its value before using it to shift. (shift by negative number is undefined, as is shift by more bits than in the type.) BUG: 1474309 Change-Id: I46fe9cec140d3397463780748f6876251acb06dd Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* mem-pool: track glusterfs_ctx_t in struct mem_poolNiels de Vos2017-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | In order to generate statedumps per glusterfs_ctx_t, it is needed to place all the memory pools in a structure that the context can reach. The 'struct mem_pool' has been extended with a 'list_head owner' that is linked with the glusterfs_ctx_t->mempool_list. All callers of mem_pool_new() have been updated to pass the current glusterfs_ctx_t along. This context is needed to add the new memory pool to the list and for grabbing the ctx->lock while updating the glusterfs_ctx_t->mempool_list. Updates: #307 Change-Id: Ia9384424d8d1630ef3efc9d5d523bf739c356c6e Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/18075 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* cluster/ec: coverity, fix for BAD_SHIFTKaleb S. KEITHLEY2017-08-283-13/+16
| | | | | | | | | | | | | | | | | This is how I would like to see this fixed. passes (eliminates the warning in) coverity. The use of uintptr_t as a bitmask is a problem IMO, especially on 32-bit clients. Change-Id: I86e15d12939c610c99f5f96c551bb870df20f4b4 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18067 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* ec/cluster: Update failure of fop on a brick properlyAshish Pandey2017-07-311-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | Problem: In case of truncate, if writev or open fails on a brick, in some cases it does not mark the failure onlock->good_mask. This causes the update of size and version on all the bricks even if it has failed on one of the brick. That ultimately causes a data corruption. Solution: In callback of such writev and open calls, mark fop->good for parent too. Thanks Pranith Kumar K <pkarampu@redhat.com> for finding the root cause. Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466 BUG: 1476205 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17906 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs: Name threads on creationRaghavendra Talur2017-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set names to threads on creation for easier debugging. Output of top -H -p <PID-OF-GLUSTERFSD> Before: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd After: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703 BUG: 1254002 Updates: #271 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: https://review.gluster.org/11926 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Non-disruptive upgrade on EC volume failsSunil Kumar Acharya2017-07-141-1/+4
| | | | | | | | | | | | | | | | | | | | Problem: Enabling optimistic changelog on EC volume was not handling node down scenarios appropriately resulting in volume data inaccessibility. Solution: Update dirty xattr appropriately on good bricks whenever nodes are down. This would fix the metadata information as part of heal and thus ensures data accessibility. BUG: 1468261 Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17703 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Get size of file in EC [f]xattropPranith Kumar K2017-07-132-4/+17
| | | | | | | | | | | | | | | | | | | | Problem: For allowing parallel writes we shouldn't depend on ia_size to be same for all the bricks in each write_cbk(). But we need to make sure backend size is correct on all the bricks and no crashes/manual modifications happened. Fix: At the time of get_size_version() we do 1 check to make sure size of the file is same across the bricks. From then on the FOPs will give the status of the fop, so we rely on this information to keep which bricks are good/bad. Updates #251 Change-Id: I1df645347e2e9f2e09cfa4411b6cc305d7f4e4e5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17741 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec : Don't try to heal when no sink is UPAshish Pandey2017-07-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: 4 + 2 EC volume configuration. If untar of linux is going on and we kill a brick, indices will be created for the files/dir which need to be healed. ec_shd_index_sweep spawns threads to scan these entries and start heal. If in the middle of this we kill one more brick, we end up in a situation where we can not heal an entry as there are only "ec->fragment" number of bricks are UP. However, the scan will be continued and it will trigger the heal for those entries. Solution: When a heal is triggered for an entry, check if it *CAN* be healed or not. If not come out with ENOTCONN. Change-Id: I305be7701c289f36bd7bde22491b71074771424f BUG: 1464359 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17692 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: correctly handle end of file for seekXavier Hernandez2017-07-061-0/+18
| | | | | | | | | | | | | | | | | When a SEEK_HOLE was issued near to the end of file, sometimes an offset beyond the end of file was returned. Another problem was that using some offsets greater than the end of file returned successfully instead of failing with ENXIO. Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5 BUG: 1449348 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/17228 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* ec: Increase notification in all the casesAshish Pandey2017-06-281-31/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: "gluster v heal <volname> info" is taking long time to respond when a brick is down. RCA: Heal info command does virtual mount. EC wait for 10 seconds, before sending UP call to upper xlator, to get notification (DOWN or UP) from all the bricks. Currently, we are increasing ec->xl_notify_count based on the current status of the brick. So, if a DOWN event notification has come and brick is already down, we are not increasing ec->xl_notify_count in ec_handle_down. Solution: Handle DOWN even as notification irrespective of what is the current status of brick. Change-Id: I0acac0db7ec7622d4c0584692e88ad52f45a910f BUG: 1464091 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17606 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Node uuid xattr support update for ECSunil Kumar Acharya2017-06-232-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: The change in EC to return list of node uuids for GF_XATTR_NODE_UUID_KEY was causing problems with geo-rep. Fix: This patch will allow to get the single node uuid as it was doing before with the key "GF_XATTR_NODE_UUID_KEY", and will also allow to get the list of node uuids by using a new key "GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve the problem with geo-rep and any other features which were depending on this. BUG: 1462790 Change-Id: I2d9214a9658d4a41a3d6de08600884d2bda5f3eb Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17594 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: lk shouldn't be a transactionPranith Kumar K2017-06-161-19/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When application sends a blocking lock, the lk fop actually waits under inodelk. This can lead to a dead-lock. 1) Let's say app-1 takes exculsive-fcntl-lock on the file 2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking stage note: app-2 is blocked inside transaction which holds an inode-lock 3) app-1 tries to perform write which needs inode-lock so it gets blocked on app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock Fix: Correct way to fix this issue and make fcntl locks perform well would be to introduce 2-phase locking for fcntl lock: 1) Implement a try-lock phase where locks xlator will not merge lk call with existing calls until a commit-lock phase. 2) If in try-lock phase we get quorum number of success without any EAGAIN error, then send a commit-lock which will merge locks. 3) In case there are any errors, unlock should just delete the lock-object which was tried earlier and shouldn't touch the committed locks. Unfortunately this is a sizeable feature and need to be thought through for any corner cases. Until then remove transaction from lk call. BUG: 1455049 Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17542 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Update xattr and heal size properlyAshish Pandey2017-06-062-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem-1 : Recursive healing of same file is happening when IO is going on even after data heal completes. Solution: RCA: At the end of the write, when ec_update_size_version gets called, we send it only on good bricks and not on healing brick. Due to this, xattr on healing brick will always remain out of sync and when the background heal check source and sink, it finds this brick to be healed and start healing from scratch. That involve ftruncate and writing all of the data again. To solve this, send xattrop on all the good bricks as well as healing bricks. Problem-2: The above fix exposes the data corruption during heal. If the write on a file is going on and heal finishes, we find that the file gets corrupted. RCA: The real problem happens in ec_rebuild_data(). Here we receive the 'size' argument which contains the real file size at the time of starting self-heal and it's assigned to heal->total_size. After that, a sequence of calls to ec_sync_heal_block() are done. Each call ends up calling ec_manager_heal_block(), which does the actual work of healing a block. First a lock on the inode is taken in state EC_STATE_INIT using ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is called. This function calls ec_set_inode_size() to store the real size of the inode (it uses heal->total_size). The next step is to read the block to be healed. This is done using a regular ec_readv(). One of the things this call does is to trim the returned size if the file is smaller than the requested size. In our case, when we read the last block of a file whose size was = 512 mod 1024 at the time of starting self-heal, ec_readv() will return only the first 512 bytes, not the whole 1024 bytes. This isn't a problem since the following ec_writev() sent from the heal code only attempts to write the amount of data read, so it shouldn't modify the remaining 512 bytes. However ec_writev() also checks the file size. If we are writing the last block of the file (determined by the size stored on the inode that we have set to heal->total_size), any data beyond the (imposed) end of file will be cleared with 0's. This causes the 512 bytes after the heal->total_size to be cleared. Since the file was written after heal started, the these bytes contained data, so the block written to the damaged brick will be incorrect. Solution: Align heal->total_size to a multiple of the stripe size. Thanks "Xavier Hernandez" <xhernandez@datalab.es> to find out the root cause and to fix the issue. Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e BUG: 1428673 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/16985 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* core: fix spelling errorsKaleb S. KEITHLEY2017-06-021-2/+1
| | | | | | | | | | | | | | fixes for various minor spelling errors and typos Reported-by: Patrick Matthäi <pmatthaei@debian.org> Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5 BUG: 1457808 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17442 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Implement FALLOCATE FOP for ECSunil Kumar Acharya2017-05-233-3/+209
| | | | | | | | | | | | | | | FALLOCATE file operations is not implemented in the existing EC code. This change set implements it for EC. BUG: 1448293 Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/15200 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: return all node uuids from all subvolumesXavier Hernandez2017-05-172-105/+141
| | | | | | | | | | | | | | | | | | | | | | | | EC was retuning the UUID of the brick with smaller value. This had the side effect of not evenly balancing the load between bricks on rebalance operations. This patch modifies the common functions that combine multiple subvolume values into a single result to take into account the subvolume order and, optionally, other subvolumes that could be damaged. This makes easier to add future features where brick order is important. It also makes possible to easily identify the originating brick of each answer, in case some brick will have an special meaning in the future. Change-Id: Iee0a4da710b41224a6dc8e13fa8dcddb36c73a2f BUG: 1366817 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/17297 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: fix incorrect answer check in seek fopXavier Hernandez2017-05-091-15/+8
| | | | | | | | | | | | | | | A bad check in the answer of a seek request caused a segmentation fault when seek reported an error. Change-Id: Ifb25ae8bf7cc4019d46171c431f7b09b376960e8 BUG: 1439068 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/16998 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Implement self-heal-window_size optionSunil Kumar Acharya2017-04-253-3/+16
| | | | | | | | | | | | | | | | Fix implements the heal window size option for EC. This option control the maximum size of read/write operation carried out in self-heal process. BUG: 1441491 Change-Id: I6c0ef65c9ca18b0828f91b319d4f52ac5b77d0d8 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17098 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>