summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/ec: Fix op-version for disperse.other-eager-lockXavier Hernandez2017-11-161-1/+1
| | | | | | | | | | | | | The op-version used for the new option was wrong. It has been set to 3.13.0. >Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf >BUG: 1502610 >Signed-off-by: Xavier Hernandez <jahernan@redhat.com> Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf BUG: 1512460 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: create eager-lock option for non-regular filesXavier Hernandez2017-11-163-13/+47
| | | | | | | | | | | | | A new option is added to allow independent configuration of eager locking for regular files and non-regular files. >Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 >BUG: 1502610 >Signed-off-by: Xavier Hernandez <jahernan@redhat.com> Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 BUG: 1512460 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* xlator/tier: flood of -Wformat-truncation warnings with gcc-7.1v4.0dev1Kaleb S. KEITHLEY2017-11-011-8/+9
| | | | | | | | | | | | | | | | Starting in Fedora 26 which has gcc-7.1.x, -Wformat-trunction is enabled with -Wformat, resulting in a flood of new warnings. This many warnings is a concern because it makes it hard(er) to see other warnings that should be addressed. An example is at https://kojipkgs.fedoraproject.org//packages/glusterfs/3.12.0/1.fc28/data/logs/x86_64/build.log For more info see https://review.gluster.org/#/c/18267/ Change-Id: Id7ef8e0dedd28ada55f72c03d91facbe1c9888bd BUG: 1492849 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixSunil Kumar Acharya2017-11-011-1/+3
| | | | | | | | | | Problem: frame could be NULL. Solution: Added check to verify frame. BUG: 789278 Change-Id: I55a64c936ae71ec8587a3f9dfa0fdee5d0ea5213 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixSunil Kumar Acharya2017-11-012-0/+5
| | | | | | | | | | Problem: cbk could be NULL. Solution: Assigned appropriate value to cbk. BUG: 789278 Change-Id: I2e4bba9a54f965c6a7bccf0b0cb6c5f75399f6e6 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: MISSING_BREAK coverity fixSunil Kumar Acharya2017-10-311-0/+3
| | | | | | | | | | Problem: switch case syntax issue. Solution: syntax fixed. BUG: 789278 Change-Id: I76da72c3ab6ffc5db671686a71d6a596beaf496e Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/afr: Honor default timeout of 5min for analyzing split-brain fileskarthik-us2017-10-301-1/+5
| | | | | | | | | | | | | | | | | | | Problem: After setting split-brain-choice option to analyze the file to resolve the split brain using the command "setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>" should allow to access the file from mount for default timeout of 5mins. But the timeout was not honored and was able to access the file even after the timeout. Fix: Call the inode_invalidate() in afr_set_split_brain_choice_cbk() so that it will triger the cache invalidate after resetting the timer and the split brain choice. So the next calls to access the file will fail with EIO. Change-Id: I698cb833676b22ff3e4c6daf8b883a0958f51a64 BUG: 1503519 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/afr: Fail open on split-brainPranith Kumar K2017-10-2611-95/+208
| | | | | | | | | | | | | | | | | Problem: Append on a file with split-brain succeeds. Open is intercepted by open-behind, when write comes on the file, open-behind does open+write. Open succeeds because afr doesn't fail it. Then write succeeds because write-behind intercepts it. Flush is also intercepted by write-behind, so the application never gets to know that the write failed. Fix: Fail open on split-brain, so that when open-behind does open+write open fails which leads to write failure. Application will know about this failure. Change-Id: I4bff1c747c97bb2925d6987f4ced5f1ce75dbc15 BUG: 1294051 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Implement DISCARD FOP for ECSunil Kumar Acharya2017-10-255-48/+332
| | | | | | | | | | | Updates #254 This code change implements DISCARD FOP support for EC. BUG: 1461018 Change-Id: I09a9cb2aa9d91ec27add4f422dc9074af5b8b2db Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: Allow parallel writes in EC if possiblePranith Kumar K2017-10-248-142/+282
| | | | | | | | | | | | | | | | | | Problem: Ec at the moment sends one modification fop after another, so if some of the disks become slow, for a while then the wait time for the writes that are waiting in the queue becomes really bad. Fix: Allow parallel writes when possible. For this we need to make 3 changes. 1) Each fop now has range parameters they will be updating. 2) Xattrop is changed to handle parallel xattrop requests where some would be modifying just dirty xattr. 3) Fops that refer to size now take locks and update the locks. Fixes #251 Change-Id: Ibc3c15372f91bbd6fb617f0d99399b3149fa64b2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: fix crash when deleting directoriesZhang Huan2017-10-161-2/+4
| | | | | | | | | | | | | | | | | | | | | In DHT, after locks on all subvolumes are acquired, it would perform the following steps sequentially, 1. send remove dir on all other subvolumes except the hashed one in a loop; 2. wait for all pending rmdir to be done 3. remove dir on the hashed subvolume The problem is that in step 1 there is a check to skip hashed subvolume in the loop. If the last subvolume to check is actually the hashed one, and step 3 is quickly done before the last and hashed subvolume is checked, by accessing shared context data be destroyed in step 3, would cause a crash. Fix by saving shared data in a local variable to access later in the loop. Change-Id: I8db7cf7cb262d74efcb58eb00f02ea37df4be4e2 BUG: 1490642 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* cluster/ec: Coverity Fix UNUSED_VALUE in ec_create_nameKamal Mohanan2017-10-161-1/+1
| | | | | | | | | | | | | Problem: The value returned by cluster_mkdir is assigned to ret at ec-heal.c:1076. But this value is overwritten before it can be used. Solution: The return value of cluster_mkdir is ignored. It is not assigned to ret. Change-Id: Iee6b8d8b04e0bd800dd30d2c24cab755b9e63443 BUG: 789278 Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
* cluster/ec: Improve heal info command to handle obvious casesAshish Pandey2017-10-163-24/+41
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1 - If a brick is down and we see an index entry in .glusterfs/indices, we should show it in heal info output as it most certainly needs heal. 2 - The first problem is also not getting handled after ec_heal_inspect. Even if in ec_heal_inspect, lookup will mark need_heal as true, we don't handle it properly in ec_get_heal_info and continue with locked inspect which takes lot of time. Solution: 1 - In first case we need not to do any further invstigation. As soon as we see that a brick is down, we should say that this index entry needs heal for sure. 2 - In second case, if we have need_heal as _gf_true after ec_heal_inspect, we should show it as heal requires. Change-Id: Ibe7f9d7602cc0b382ba53bddaf75a2a2c3326aa6 BUG: 1476668 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: add functions for stripe alignmentXavier Hernandez2017-10-136-47/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes old functions to align offsets and sizes to stripe size boundaries and adds new ones to offer more possibilities. The new functions are: * ec_adjust_offset_down() Aligns a given offset to a multiple of the stripe size equal or smaller than the initial one. It returns the size of the gap between the aligned offset and the given one. * ec_adjust_offset_up() Aligns a given offset to a multiple of the stripe size equal or greater than the initial one. It returns the size of the skipped region between the given offset and the aligned one. If an overflow happens, the returned valid has negative sign (but correct value) and the offset is set to the maximum value (not aligned). * ec_adjust_size_down() Aligns the given size to a multiple of the stripe size equal or smaller than the initial one. It returns the size of the missed region between the aligned size and the given one. * ec_adjust_size_up() Aligns the given size to a multiple of the stripe size equal or greater than the initial one. It returns the size of the gap between the given size and the aligned one. If an overflow happens, the returned value has negative sign (but correct value) and the size is set to the maximum value (not aligned). These functions have been defined in ec-helpers.h as static inline since they are very small and compilers can optimize them (specially the 'scale' argument). Change-Id: I4c91009ad02f76c73772034dfde27ee1c78a80d7 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* dht: free memory allocated in dht_init() and dht_init_subvolumes()Niels de Vos2017-10-111-1/+17
| | | | | | | | | When glfs_fini() is called, DHT fails to free all memory allocations which result in a considerable leak. Change-Id: I37c6de5c93ca4516266dbe8288b4a416f5589901 BUG: 1443145 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* cluster/dht: Don't store the entire uuid for subvolsN Balachandran2017-10-104-19/+40
| | | | | | | | | | | | Comparing the uuid string of the local node against that stored in the local_subvol information is inefficient, especially as it is done for every file to be migrated. The code has now been changed to set the value of info to 1 if the nodeuuid is that of the node making the comparison so this becomes an integer comparison. Change-Id: I7491d59caad3b71dbf5facc94dcde0cd53962775 BUG: 1451434 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/ec: Handle parallel get_size_versionPranith Kumar K2017-10-103-59/+102
| | | | | | Updates #251 Change-Id: I6244014dbc90af3239d63d75a064ae22ec12a054 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr: heal gfid as a part of entry healRavishankar N2017-10-094-67/+120
| | | | | | | | | | | | | | | | | | Problem: If a brick crashes after an entry (file or dir) is created but before gfid is assigned, the good bricks will have pending entry heal xattrs but the heal won't complete because afr_selfheal_recreate_entry() tries to create the entry again and it fails with EEXIST. Fix: We could have fixed posx_mknod/mkdir etc to assign the gfid if the file already exists but the right thing to do seems to be to trigger a lookup on the bad brick and let it heal the gfid instead of winding an mknod/mkdir in the first place. Change-Id: I82f76665a7541f1893ef8d847b78af6466aff1ff BUG: 1493415 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/ec: Improve performance with xattrop updateSunil Kumar Acharya2017-10-061-24/+102
| | | | | | | | | | | | Existing EC code updates the xattr on the subvolume in a sequential pattern resulting in very poor performance. With this fix EC now updates the xattr on the subvolume in parallel which improves the xattr update performance. BUG: 1445663 Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* xlator/cluster/afr:coverity Issue "UNUSED_VALUE" in afr_get_split_brain_statusSubha sree Mohankumar2017-10-051-1/+7
| | | | | | | | | | | Issue: Event value_overwrite:Overwriting previous write to "ret" with value "-1". Fix : An "If" condition is added to check the value of "ret". Change-Id: I7b6bd4f20f73fa85eb8a5169644e275c7b56af51 BUG: 789278 Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>
* cluster/dht : User xattrs are not healed after brick stop/startMohit Agrawal2017-10-047-111/+2077
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In a distributed volume custom extended attribute value for a directory does not display correct value after stop/start or added newly brick. If any extended(acl) attribute value is set for a directory after stop/added the brick the attribute(user|acl|quota) value is not updated on brick after start the brick. Solution: First store hashed subvol or subvol(has internal xattr) on inode ctx and consider it as a MDS subvol.At the time of update custom xattr (user,quota,acl, selinux) on directory first check the mds from inode ctx, if mds is not present on inode ctx then throw EINVAL error to application otherwise set xattr on MDS subvol with internal xattr value of -1 and then try to update the attribute on other non MDS volumes also.If mds subvol is down in that case throw an error "Transport endpoint is not connected". In dht_dir_lookup_cbk| dht_revalidate_cbk|dht_discover_complete call dht_call_dir_xattr_heal to heal custom extended attribute. In case of gnfs server if hashed subvol has not found based on loc then wind a call on all subvol to update xattr. Fix: 1) Save MDS subvol on inode ctx 2) Check if mds subvol is present on inode ctx 3) If mds subvol is down then call unwind with error ENOTCONN and if it is up then set new xattr "GF_DHT_XATTR_MDS" to -1 and wind a call on other subvol. 4) If setxattr fop is successful on non-mds subvol then increment the value of internal xattr to +1 5) At the time of directory_lookup check the value of new xattr GF_DHT_XATTR_MDS 6) If value is not 0 in dht_lookup_dir_cbk(other cbk) functions then call heal function to heal user xattr 7) syncop_setxattr on hashed_subvol to reset the value of xattr to 0 if heal is successful on all subvol. Test : To reproduce the issue followed below steps 1) Create a distributed volume and create mount point 2) Create some directory from mount point mkdir tmp{1..5} 3) Kill any one brick from the volume 4) Set extended attribute from mount point on directory setfattr -n user.foo -v "abc" ./tmp{1..5} It will throw error " Transport End point is not connected " for those hashed subvol is down 5) Start volume with force option to start brick process 6) Execute getfattr command on mount point for directory 7) Check extended attribute on brick getfattr -n user.foo <volume-location>/tmp{1..5} It shows correct value for directories for those xattr fop were executed successfully. Note: The patch will resolve xattr healing problem only for fuse mount not for nfs mount. BUG: 1371806 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I4eb137eace24a8cb796712b742f1d177a65343d5
* cluster/afr: Make choose-local "reconfigurable"Krutika Dhananjay2017-09-301-0/+11
| | | | | | | | | | With this change, enabling choose-local (which means its state makes transition from "off" to "on") will be effective after the first gfid-lookup on "/" since volume-set was executed. Change-Id: Ibab292ba705d993b475cd0303fb3318211fb2500 BUG: 1480525 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixAkarsha Rai2017-09-291-1/+1
| | | | | | | | | | Problem: ctx pointer could be NULL Solution: Updated the code to verify ctx pointer BUG: 789278 Change-Id: I25e07a07c6ebe2f630c99ba3aa9a61656fbaa981 Signed-off-by: Akarsha Rai <akrai@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixAkarsha Rai2017-09-291-0/+1
| | | | | | | | | | | | | Problem: cbk could be NULL. Solution: Returning NULL when memory is not allocated for cbk. BUG: 789278 Change-Id: Iea9128e0f3b95100deca560f690f9baaae226abf Signed-off-by: Akarsha Rai <akrai@redhat.com>
* dht: fix a coverity error of type - UNREACHABLEKamal Mohanan2017-09-281-3/+0
| | | | | | | | | | | | | | Problem: Unreachable assignment statement at dht-rebalance.c:1040 Fix: Delete line dht-rebalance.c:1040. The goto statements at lines 1037 and 1031 are also deleted since both branches of the if statement finally go to the same immediately-following label anyway. Change-Id: I5f47ea99244cae2a0a9f2aec7284faadf2ea286a BUG: 789278 Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixAkarsha Rai2017-09-281-1/+3
| | | | | | | | | | Problem: Pool pointer could be NULL while destroying it. Solution: Verifying pointer before destroying it. BUG: 789278 Change-Id: I497d1310aa47cb749a4c992aa961bd4dfa23ee48 Signed-off-by: Akarsha Rai <akrai@redhat.com>
* afr: don't check for file size in afr_mark_source_sinks_if_file_emptyRavishankar N2017-09-271-6/+7
| | | | | | | | | ... for AFR_METADATA_TRANSACTION and just mark source and sinks if metadata is the same. Change-Id: I69e55d3c842c7636e3538d1b57bc4deca67bed05 BUG: 1491670 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* Coverity Issue Fix : CHECKED_RETURNSubha sree Mohankumar2017-09-261-1/+1
| | | | | | | | | | Issue :Event check_return: Calling "ec_dict_set_number" without checking return value. Fix : Type casted the return value of the function "ec_dict_set_number" to void. Change-Id: Id97034f9b1b8591536d63dca680ca7c7a9c4fcc3 BUG: 789278 Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>
* Fix a coverity error of checker type: CHECKED_RETURNKamal Mohanan2017-09-261-1/+1
| | | | | | | | | | | Problem: dht_frame_return was being called without checking the return value. Solution: Typecast the value returned by the function to void. Change-Id: Idfc6a7ed467d1c8f5f8d09ec26d9059f3d23b760 BUG: 789278 Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
* afr: auto-resolve split-brains for zero-byte filesRavishankar N2017-09-263-0/+78
| | | | | | | | | | | | | | | | | | | | | | | | Problems: As described in BZ 1491670, renaming hardlinks can result in data/mdata split-brain of the DHT link-to files (T files) without any mismatch of data and metadata. As described in BZ 1486063, for a zero-byte file with only dirty bits set, arbiter brick will likely be chosen as the source brick. Fix: For zero byte files in split-brain, pick first brick as a) data source if file size is zero on all bricks. b) metadata source if metadata is the same on all bricks In arbiter case, if file size is zero on all bricks and there are no pending afr xattrs, pick 1st brick as data source. Change-Id: I0270a9a2f97c3b21087e280bb890159b43975e04 BUG: 1491670 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Rahul Hinduja <rhinduja@redhat.com> Reported-by: Mabi <mabi@protonmail.ch>
* cluster/afr: Sending subvol up/down events when subvol comes up or goes downkarthik-us2017-09-201-0/+2
| | | | | | Change-Id: I6580351b245d5f868e9ddc6a4eb4dd6afa3bb6ec BUG: 1493539 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/ec: fix for BAD_SHIFT, follow-up patchKaleb S. KEITHLEY2017-09-201-11/+14
| | | | | | | | | | | | | | | | | | Address comments to https://review.gluster.org/18067, (Change-Id I86e15d12939c610c99f5f96c551bb870df20f4b4) Which was posted as an RFC as an example of a possible alternative fix to https://review.gluster.org/17860 (Change-Id I28a3bdd4a357526dba0cf84c262919c05cfa173e) An alternative fix that preserved the unsignedness of the indexes throughout, obviating the need to check its value before using it to shift. (shift by negative number is undefined, as is shift by more bits than in the type.) BUG: 1474309 Change-Id: I46fe9cec140d3397463780748f6876251acb06dd Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* dht: add FOP check to dht_file_setattr_cbkRavishankar N2017-09-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | Problem: bug-797171.7 loaded error-gen xlator on the brick which sent EBADF for a non fd-based fop, namely setattr. This caused dht_check_and_open_fd_on_subvol_task() to crash as local->fd was NULL. Fix: Call dht_check_and_open_fd_on_subvol_task() from dht_file_setattr_cbk only for dht_fsetattr and not dht_setattr or dht_setattr2 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: Iab4999e213bf2065804f3f8237e470ad454e3c99 BUG: 1488399 Reviewed-on: https://review.gluster.org/18208 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* afr: discover/lookup heal fixesRavishankar N2017-09-043-53/+53
| | | | | | | | | | | | Addresses review comments in commit 468ca877807625817b72921d1e9585036687b640 Change-Id: I04b1bd3b00abfd6758798d6272954e36a24249a9 BUG: 1473636 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/18187 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: check validity of afr_replyRavishankar N2017-08-315-47/+71
| | | | | | | | | | | | | | | | | ...in various self-heal code paths. Originally found by Pranith in __afr_selfheal_name_impunge () Also change __afr_selfheal_assign_gfid() to send lookup only on those bricks that don't have a gfid matching that of the source. Change-Id: I70a2ccd750a2af92c5fc36e0eefb2b6125404b4a BUG: 1482923 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/18065 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Log files skipped by rebalanceN Balachandran2017-08-302-1/+19
| | | | | | | | | | | | | | | | | There was no easy way to find out which files were skipped during a rebalance. Rebalance now logs a message for every skipped file using msgid 109126, making it easier to find all files that were skipped. Change-Id: I2cac7db7285e2f82354251f3ea4094827b0daf3e BUG: 1480445 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/18021 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Aggregate xattrs only for dirs in dht_discover_cbkN Balachandran2017-08-301-2/+11
| | | | | | | | | | | | | | | | | If dht_discover finds data files on more than one subvol, racing calls to dht_discover_cbk could end up calling dht_aggregate_xattr which could delete dictionary data that is being accessed by higher layer translators. Fixed to call dht_aggregate_xattr only for directories and consider only the first file to be found. Change-Id: I4f3d2a405ec735d4f1bb33a04b7255eb2d179f8a BUG: 1484709 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/18137 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* mem-pool: track glusterfs_ctx_t in struct mem_poolNiels de Vos2017-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | In order to generate statedumps per glusterfs_ctx_t, it is needed to place all the memory pools in a structure that the context can reach. The 'struct mem_pool' has been extended with a 'list_head owner' that is linked with the glusterfs_ctx_t->mempool_list. All callers of mem_pool_new() have been updated to pass the current glusterfs_ctx_t along. This context is needed to add the new memory pool to the list and for grabbing the ctx->lock while updating the glusterfs_ctx_t->mempool_list. Updates: #307 Change-Id: Ia9384424d8d1630ef3efc9d5d523bf739c356c6e Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/18075 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* cluster/ec: coverity, fix for BAD_SHIFTKaleb S. KEITHLEY2017-08-283-13/+16
| | | | | | | | | | | | | | | | | This is how I would like to see this fixed. passes (eliminates the warning in) coverity. The use of uintptr_t as a bitmask is a problem IMO, especially on 32-bit clients. Change-Id: I86e15d12939c610c99f5f96c551bb870df20f4b4 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18067 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* cluster/dht: Reorder dir operations in gf_defrag_fix_layoutN Balachandran2017-08-201-76/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | Earlier, rebalance performed a fix-layout on a directory before healing its subdirectories. If there were a lot of subdirs, it could take a while before all subdirs were created on the newly added bricks. As dht_readdirp only lists dirs from their hashed subvol, those dirs which hashed to the newly added bricks but were not yet created on them were not listed. Now, the child dirs are listed and processed before the layout of the parent is fixed. This introduces a change in behaviour where files in subdirs are migrated before those in parent directories. Credit: Shyam <srangana@redhat.com> Github issue: #239 Change-Id: I8ae7f24a510754cd8d1b31e5d608bcf1928599e2 BUG: 1248393 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/18045 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr: heal metadata in discover code pathRavishankar N2017-08-163-31/+60
| | | | | | | | | | | | | | | | | | | | During graph switch, if fuse sends nameless (gfid) lookups, afr takes the discover code path to serve it. If there are pending metadata heals, they do not happen unless an inode refresh happens as a part of discover (which is not guaranteed to happen always). This patch fixes it by attempting metadata heal as a part of discover, just like how it is done in lookup code path. Also removed creating superfluous heal frames when launching heal. Change-Id: I49868649361ebe5d70b6ea150f4686169b6c3070 BUG: 1473636 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17850 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Karthik U S <ksubrahm@redhat.com>
* cluster/dht: EBADF handling for fremovexattr and fsetxattrN Balachandran2017-08-093-3/+44
| | | | | | | | | | | | | Add EBADF handling for dht_fremovexattr and dht_fsetxattr. Change-Id: Ide0d5812dae79655d2565157e5baabcd753b4309 BUG: 1476665 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17999 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Check for open fd only on EBADFN Balachandran2017-08-084-160/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | DHT fd based fops used to check if the fd was open on the cached subvol before winding the call. However, this introduced a performance regression of about 30% for reads. This check was introduced to handle cases where files were migrated while IOs were happening. As this is not the common case, dht will now check if the fd is open on the cached subvol only if the call fails with EBADF. This will prevent a performance hit where a rebalance is not running. Change-Id: I2035a858d63c3fcd22bb634055bbb0ad01686808 BUG: 1476665 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17976 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr: Prevent null gfids in self-heal entry re-creationRavishankar N2017-08-081-3/+11
| | | | | | | | | | | Change-Id: I5acb8bd0a19fc4e764d61e349bb690b5236ee610 BUG: 1478297 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17981 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Karthik U S <ksubrahm@redhat.com>
* Revert "cluster/dht: Check for open fd only on EBADF"N Balachandran2017-08-042-116/+160
| | | | | | | | | | | | This reverts commit 91c9f4a19fde4894576b398252c77f730832a26a. This patch needs to be reworked. Change-Id: I4c24f647c2b1abc68fc4e9fe6eb810418e2033aa BUG: 1476665 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17970 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Handle wrong rebalance status reportingSusant Palai2017-08-011-29/+32
| | | | | | | | | | | Change-Id: Id91ef35f890055cd42b9a94462f92297c77f1fff Bug: 1475282 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17868 Tested-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: rebalance min-free-disk fixSusant Palai2017-08-011-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | To calculate available space on a subvolume we used to do the following in __dht_check_free_space. post_availspace = (dst_statfs.f_bavail * dst_statfs.f_frsize) - stbuf->ia_size Now to subtracting the file size from available space is tricky here. Sometime available space will be lesser than the file size and since all the participating members in calculation are unsigned int, the result is a large number (integer overflow). Solution: We do not need to subtract the file size from the space available, since fallocate would have reserved file size space already. Change-Id: I4f724358c44b9911933742ff3ff8d55b3dfda1cb BUG: 1475282 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17876 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Check for open fd only on EBADFN Balachandran2017-07-312-160/+116
| | | | | | | | | | | | | | | | | | DHT fd based fops will now check if the fd is open on the cached subvol only if the call fails with EBADF. This will improve performance for scenarios where a rebalance is not running which would be most of the time. Change-Id: Idfaeb8927af769c6110d07a165a0fe2307369239 BUG: 1476665 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17922 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* ec/cluster: Update failure of fop on a brick properlyAshish Pandey2017-07-311-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | Problem: In case of truncate, if writev or open fails on a brick, in some cases it does not mark the failure onlock->good_mask. This causes the update of size and version on all the bricks even if it has failed on one of the brick. That ultimately causes a data corruption. Solution: In callback of such writev and open calls, mark fop->good for parent too. Thanks Pranith Kumar K <pkarampu@redhat.com> for finding the root cause. Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466 BUG: 1476205 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17906 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: change log level to debug for thread activitySusant Palai2017-07-261-13/+11
| | | | | | | | | | | | | | | | | Every time all the thread sleeps or wakes up, we log a message about that event. Sometime this can be noisy where the number of files eligible to be migrated are placed far away from each other. Moving the logs to DEBUG. Change-Id: I4dc2cc9fdf4f42d4001754532a5bc4aeb3f0f959 BUG: 1474639 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17866 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>