summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: Allow lookup on root if it is from ADD_REPLICA_MOUNTkarthik-us2018-12-187-30/+78
| | | | | | | | | | | | | | | | | | | | | Problem: When trying to convert a plain distribute volume to replica-3 or arbiter type it is failing with ENOTCONN error as the lookup on the root will fail as there is no quorum. Fix: Allow lookup on root if it is coming from the ADD_REPLICA_MOUNT which is used while adding bricks to a volume. It will try to set the pending xattrs for the newly added bricks to allow the heal to happen in the right direction and avoid data loss scenarios. Note: This fix will solve the problem of type conversion only in the case where the volume was mounted at least once. The conversion of non mounted volumes will still fail since the dht selfheal tries to set the directory layout will fail as they do that with the PID GF_CLIENT_PID_NO_ROOT_SQUASH set in the frame->root. Change-Id: Ic511939981dad118cc946754341318b164954b3b fixes: bz#1655854 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* xlators/cluster/afr/src/afr-self-heal-common.c: remove a variable array.Yaniv Kaul2018-12-181-10/+6
| | | | | | | | | | | | | Added '-Wvla' and saw this - gcc doesn't like variable arrays. There are plenty of others in the EC code, but this seems OK to remove: there is no use for the array members (I hope - that was from reading the code). Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I350f4520e52b86c8bbcd60eea1b27ef99cd119aa
* Don't depend on string options to be valid alwaysPranith Kumar K2018-12-175-41/+56
| | | | | | updates bz#1650403 Change-Id: Ib5a11e691599ce4bd93c1ed5aca6060592893961 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* AFR xlator: use dict_{setn|getn|deln|get_int32n|set_int32n|set_strn}Yaniv Kaul2018-12-1712-203/+282
| | | | | | | | | | | | | | | | | | | | In a previous patch (https://review.gluster.org/20769) we've added the key length to be passed to dict_* funcs, to remove the need to strlen() it. This patch moves some xlators to use it. - In some cases, moved strlen() of the key length outside of locks, which is usually a good thing. Please verify it's safe to do so. - In some cases, created a prefix for the keys, replacing something like "%d-%d" with a "%s" in snprintf(). Not sure it adds value, but improves readability. Please review carefully. Compile-tested only! Change-Id: I04f2a1eb2ecfc3283d849d150d10d088ae7aa7f1 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/afr: Fix mem leak reported by ASANKotresh HR2018-12-171-4/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Traceback: Direct leak of 765 byte(s) in 9 object(s) allocated from: #0 0x7ffb9cad2c48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7ffb9c5f8949 in __gf_malloc ./libglusterfs/src/mem-pool.c:136 #2 0x7ffb9c5f91bb in gf_vasprintf ./libglusterfs/src/mem-pool.c:236 #3 0x7ffb9c5f938a in gf_asprintf ./libglusterfs/src/mem-pool.c:256 #4 0x7ffb826714ab in afr_get_heal_info ./xlators/cluster/afr/src/afr-common.c:6204 #5 0x7ffb825765e5 in afr_handle_heal_xattrs ./xlators/cluster/afr/src/afr-inode-read.c:1481 #6 0x7ffb825765e5 in afr_getxattr ./xlators/cluster/afr/src/afr-inode-read.c:1571 #7 0x7ffb9c635af7 in syncop_getxattr ./libglusterfs/src/syncop.c:1680 #8 0x406c78 in glfsh_process_entries ./heal/src/glfs-heal.c:810 #9 0x408555 in glfsh_crawl_directory ./heal/src/glfs-heal.c:898 #10 0x408cc0 in glfsh_print_pending_heals_type ./heal/src/glfs-heal.c:970 #11 0x408fc5 in glfsh_print_pending_heals ./heal/src/glfs-heal.c:1012 #12 0x409546 in glfsh_gather_heal_info ./heal/src/glfs-heal.c:1154 #13 0x403e96 in main ./heal/src/glfs-heal.c:1745 #14 0x7ffb99bc411a in __libc_start_main ../csu/libc-start.c:308 The dictionary is referenced by caller to print the status. So set it as dynstr, the last unref of dictionary will free it. updates: bz#1633930 Change-Id: Ib5a7cb891e6f7d90560859aaf6239e52ff5477d0 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* dht: Fix clang warnings in dht-common.cShyamsundarR2018-12-161-20/+37
| | | | | | Change-Id: I0894d62edd68e13d123aaa5ca1827b98283f0d3e Updates: bz#1622665 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* cluster/ec: NULL pointer deferencing clang fixSheetal Pamecha2018-12-141-1/+0
| | | | | | | | Removing VALIDATE_OR_GOTO check on "this" Change-Id: I154deaca5302b41c1cafd87077de880dd03ec613 Updates: bz#1622665 Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
* all: remove code which is not being considered in buildAmar Tumballi2018-12-1311-11078/+0
| | | | | | | | | | | | | | | | | | | | | | | | | These xlators are now removed from build as per discussion/announcement done at https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html * move rot-13 to playground, as it is used only as demo purpose, and is documented in many places. * Removed code of below xlators: - cluster/stripe - cluster/tier - features/changetimerecorder - features/glupy - performance/symlink-cache - encryption/crypt - storage/bd - experimental/posix2 - experimental/dht2 - experimental/fdl - experimental/jbr updates: bz#1635688 Change-Id: I1d2d63c32535e149bc8dcb2daa76236c707996e8 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* xlator: make 'xlator_api' mandatoryAmar Tumballi2018-12-137-47/+54
| | | | | | | | | | | | | | * Remove the options to load old symbol. * keep only 'xlator_api' symbol from being exported using xlator.sym * add xlator_api to all the xlators where its missing NOTE: This covers all the xlators which has at least a test case to validate its loading. If there is a translator, which doesn't have any test, then we should probably remove that from codebase. fixes: #164 Change-Id: Ibcdc8c9844cda6b4463d907a15813745d14c1ebb Signed-off-by: Amar Tumballi <amarts@redhat.com>
* afr: some minor itable related cleanupsRavishankar N2018-12-124-12/+17
| | | | | | | | | | | | - this->itable always needs to be allocated, hence move it outside afr_selfheal_daemon_init(). - Invoke afr_selfheal_daemon_init() only for self-heal daemon case. - remove redundant itable allocation in afr_discover(). - destroy itable in fini. Updates bz#1193929 Change-Id: Ib28b50b607386f5a5aa7d2f743c8b506ccb10eae Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Do not update read_subvol in inode_ctx after rename/link fopkarthik-us2018-12-121-1/+3
| | | | | | | | | | | Since rename/link fops on a file will not change any data in it, it should not update the read_subvol values in the inode_ctx, which interprets the data & metadata readable subvols for that file. The old read_subvol values should be retained even after the rename/link operations. Change-Id: I068044a426823a566f5bea8aa063cd689199d6dd fixes: bz#1657783 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* afr: Resource leak coverity fixesBhumika Goyal2018-12-111-2/+13
| | | | | | | | | | | | | Problem reported by Coverity: Leak of memory or pointers to system resources. Deallocate the memory pointed to by xattr_serz as the memory reference is not stored anywhere. Fixes CID: 1124760, 124787, 1382418 Change-Id: Ib9c2ef28c52e2d43de2552cfd959a98b26272bc1 updates: bz#789278 Signed-off-by: Bhumika Goyal <bgoyal@redhat.com>
* all: add xlator_api to many translatorsAmar Tumballi2018-12-061-2/+17
| | | | | | Fixes: #164 Change-Id: I93ad6f0232a1dc534df099059f69951e1339086f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-0563-214/+214
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* afr: assign gfid during name heal when no 'source' is present.Ravishankar N2018-12-034-52/+52
| | | | | | | | | | | | | | | | | | Problem: If parent dir is in split-brain or has dirty xattrs set, and the file has gfid missing on one of the bricks, then name heal won't assign the gfid. Fix: Use the brick we select the gfid from as the 'source'. Note: Problem was found while trying to debug a split-brain issue on Cynthia Zhou's setup. updates: bz#1637249 Change-Id: Id088d4f0fb017aa35122de426654194e581ed742 Reported-by: Cynthia Zhou <cynthia.zhou@nokia-sbell.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* Multiple xlator .h files: remove unused private gf_* memory types.Yaniv Kaul2018-11-303-21/+1
| | | | | | | | | | | | | It seems there were quite a few unused enums (that in turn cause unndeeded memory allocation) in some xlators. I've removed them, hopefully not causing any damage. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I8252bd763dc1506e2d922496d896cd2fc0886ea7
* libglusterfs: rename macros roof and floor to not conflict with math.hRaghavendra Gowdappa2018-11-281-13/+14
| | | | | | Change-Id: I666eeb63ebd000711b3f793b948d4e0c04b1a242 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Updates: bz#1644629
* dht: fix buffer overflowSusant Palai2018-11-231-6/+35
| | | | | | | | CID: 1382461 Change-Id: I25b5edf7fd5fdaa52079d0348ebb7f5de9f11503 updates: bz#789278 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht: sync brick root perms on add brickN Balachandran2018-11-191-16/+9
| | | | | | | | | | | | | | | | | If a single brick is added to the volume and the newly added brick is the first to respond to a dht_revalidate call, its stbuf will not be merged into local->stbuf as the brick does not yet have a layout. The is_permission_different check therefore fails to detect that an attr heal is required as it only considers the stbuf values from existing bricks. To fix this, merge all stbuf values into local->stbuf and use local->prebuf to store the correct directory attributes. Change-Id: Ic9e8b04a1ab9ed1248b6b056e3450bbafe32e1bc fixes: bz#1648298 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* afr: open_ftruncate_cbk should read fd from local->cont.open structSoumya Koduri2018-11-151-2/+2
| | | | | | | | | | | | afr_open stores the fd as part of its local->cont.open struct but when it calls ftruncate (if open flags contain O_TRUNC), the corresponding cbk function (afr_ open_ftruncate_cbk) is incorrectly referencing uninitialized local->fd. This patch fixes the same. Change-Id: Icbdedbd1b8cfea11d8f41b6e5c4cb4b44d989aba updates: bz#1648687 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* core: fix strncpy warningsKaleb S. KEITHLE2018-11-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since gcc-8.2.x (fedora-28 or so) gcc has been emitting warnings about buggy use of strncpy. Most uses that gcc warns about in our sources are exactly backwards; the 'limit' or len is the strlen/size of the _source param_, giving exactly zero protection against overruns. (Which was, after all, one of the points of using strncpy in the first place.) IOW, many warnings are about uses that look approximately like this: ... char dest[8]; char src[] = "this is a string longer than eight chars"; ... strncpy (dest, src, sizeof(src)); /* boom */ ... The len/limit should be sizeof(dest). Note: the above example has a definite over-run. In our source the overrun is typically only theoretical (but possibly exploitable.) Also strncpy doesn't null-terminate on truncation; snprintf does; prefer snprintf over strncpy. Mildly surprising that coverity doesn't warn/isn't warning about this. Change-Id: I022d5c6346a751e181ad44d9a099531c1172626e updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLE <kkeithle@redhat.com>
* cluster/afr: s/uuid_is_null/gf_uuid_is_nullPranith Kumar K2018-11-071-1/+1
| | | | | | Updates bz#1193929 Change-Id: I1b312dabffac7e101df8ce15557527fd28a2c61f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* md-cache: request cached xattrs at getxattr/fgetxattrKinglong Mee2018-11-061-1/+8
| | | | | | Change-Id: I8e3ad961164815683776850e3a5fd4f510003690 Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* all: fix the format string exceptionsAmar Tumballi2018-11-053-12/+12
| | | | | | | | | | | | | | | | Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. While fixing it, makes sense to make the explicit check for such issues across the codebase, by making the format call properly. Fixes: CVE-2018-14661 Fixes: bz#1644763 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* dht: fix use after free in dht_rmdir_readdirp_cbkKinglong Mee2018-11-051-8/+11
| | | | | | | | | The frame is freed when linkfile exist in dht_rmdir_is_subvol_empty(), the following message use the freed local. Change-Id: I41191e8bd477f031a2444d5f15e578dc4f086e6b Updates: bz#1640489 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* tiering: remove the translator from build and glusterdAmar Tumballi2018-11-022-27/+3
| | | | | | | | | | | | | Based on the proposal to remove few features as they are not actively maintained [1], removing tier translator from the build. Also make sure there are no regression tests involving tiering feature are present. [1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html Change-Id: I2c177f711f9b54b7b24e1a13525ff3132bd9a9c5 updates: bz#1642807 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* cluster/ec: prevent infinite loop in self-heal fullXavi Hernandez2018-10-311-5/+6
| | | | | | | | | | | | | | | | | | | | | | | There was a problem in commit 7f81067 that caused infinite loop when full heal was triggered. The previous commit was made to prevent self-heal to go idle after a replace brick operation. One of the changes consisted on setting a flag to force an immediate scan of the dirty directory if a heal on a directory succeeded (assuming it could have generated newer entries). However that change was causing an issue with a full self-heal, since every time an already healed directory was checked and it returned suceessfully, it was also setting the flag, forcing self-heal to start over again. This patch fixes this issue by only setting the flag if the heal is not full. It's assumed that a full self-heal will already traverse all entries automatically, so there's no need to force a new scan later. Change-Id: Id12dbfc04e622b18183e796cc6cc87ccc30a6d55 fixes: bz#1636631 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/dht: NULL pointer dereferencing clang fixHarpreet Lalwani2018-10-313-44/+32
| | | | | | | | | | | | | Dereferencing NUll pointers this,local and stbuf. 1.Replaced this->name with "dht". 2.Removed GF_VALIDATE_OR_GOTO. 3.Removed the check for "stbuf" and "this". Updates: bz#1622665 Change-Id: Id2fb2270d5ec37b76fa2aae1f1c8dca72dcc728a Signed-off-by: Harpreet Lalwani <hlalwani@redhat.com>
* cluster/ec: Change log level to DEBUG for lookup combineAshish Pandey2018-10-311-1/+1
| | | | | | | | | | | | | As lookup is not a locked fop, we can not trust the data received in this to be same. Changing the log level to DEBUG in case lookup finds any difference. Change-Id: I39499c44688a2455c7c6c69a798762d045d21b39 updates: bz#1640066 BUG: 1640066 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* stripe: remove the translator from build and glusterdAmar Tumballi2018-10-311-1/+1
| | | | | | | | | | | | | | | | Based on the proposal to remove few features as they are not actively maintained [1], removing stripe translator from the build. Also make sure there are no regression tests involving stripe translator. [1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html Note that this patch aims at removing the translator from build, and a followup patch is needed to remove the code from repository. Updates: bz#1364707 Change-Id: I235b305338f138e29e9f30cba65bc0dadbebbbd5 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* afr/lease: Read child nodes from lease structureroot2018-10-251-1/+1
| | | | | | | | | | For lease operation, we allocate and store child nodes data in lease structure. Use the same in afr_lease_cbk() while checking for the quorum. Change-Id: If1fdd5a0798888afd39ad3df57d96487baf9d1e6 updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* afr: thin-arbiter 2 domain locking and in-memory stateRavishankar N2018-10-256-76/+679
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2 domain locking + xattrop for write-txn failures: -------------------------------------------------- - A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad. - All further write txn failures are handled based on this in-memory value without querying the TA. - When shd heals the files, it does so by requesting full lock on AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall), releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory notion of which brick is bad. The next write txn failure is wound on TA to again update the in-memory state. - Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release request is got is completed before the lock is released. - Any write txns got after the release request are maintained in a ta_waitq. - After the release is complete, the ta_waitq elements are spliced to a separate queue which is then processed one by one. - For fops that come in parallel when the in-memory bad brick is still unknown, only one is wound to TA on wire. The other ones are maintained in a ta_onwireq which is then processed after we get the response from TA. Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64 updates: bz#1579788 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/afr : Check for UP bricks before starting healAshish Pandey2018-10-243-1/+19
| | | | | | | | | | | | | | | | | | | | | | | Problem: Currently for replica volume, even if only one brick is UP SHD will keep crawling index entries even if it can not heal anything. In thin-arbiter volume which is also a replica 2 volume, this causes inode lock contention which in turn sends upcall to all the clients to release notify locks, even if it can not do anything for healing. This will slow down the client performance and kills the purpose of keeping in memory information about bad brick. Solution: Before starting heal or even crawling, check if sufficient number of children are UP and available to check and heal entries. Change-Id: I011c9da3b37cae275f791affd56b8f1c1ac9255d updates: bz#1640581 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: NULL pointer deferencing clang fixSheetal Pamecha2018-10-191-2/+0
| | | | | | | | | Removing VALIDATE_OR_GOTO check on "this" Updates: bz#1622665 Change-Id: Ic7cffbb697da814f835d0ad46e25256da6afb406 Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
* cluster/dht: fixes to unlinking invalid linkto fileRaghavendra Gowdappa2018-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If unlinking of an invalid linkto file failed in lookup-everywhere codepath, lookup was failed with EIO. The rational as per the comment was, <snip> /* When dht_lookup_everywhere is performed, one cached *and one hashed file was found and hashed file does *not point to the above mentioned cached node. So it *was considered as stale and an unlink was performed. *But unlink fails. So may be rebalance is in progress. *now ideally we have two data-files. One obtained during *lookup_everywhere and one where unlink-failed. So *at this point in time we cannot decide which one to *choose because there are chances of first cached *file is truncated after rebalance and if it is chosen *as cached node, application will fail. So return EIO. */ </snip> However, this reasoning is only valid when * op_errno is EBUSY, indicating rebalance is in progress * op_errno is ENOTCONN as wecannot determine what was the status of file on brick. Hence this patch doesn't fail lookup unless unlink fails with an either EBUSY or ENOTCONN Change-Id: Ife55f3d97fe557f3db05beae0c2d786df31e8e55 Fixes: bz#1635145 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
* libglusterfs/dict: Add sizeof()-1 variants of dict functionsPranith Kumar K2018-10-151-4/+4
| | | | | | | | | | | One needs to be very careful about giving same key for the key and SLEN(key) arguments in dict_xxxn() functions. Writing macros that would take care of passing the SLEN(key) would help reduce this burden on the developer and reviewer. updates: bz#1193929 Change-Id: I312c479b919826570b47ae2c219c53e2f9b2ddef Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht : Fix coverity issueAshish Pandey2018-10-111-2/+2
| | | | | | | | | | | | To check if the gfid is null or not we should use function gf_uuid_is_null CID: 1382364 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=86243257&defectInstanceId=26374360&mergedDefectId=1382364 Change-Id: I81944b823c9ee6e6dcc9695f64f7e5966e0d7030 updates: bz#789278 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* afr: prevent winding inodelks twice for arbiter volumesRavishankar N2018-10-101-1/+1
| | | | | | | | | | | | | | | | | | Problem: In an arbiter volume, if there is a pending data heal of a file only on arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks it only once, leaving behind a stale lock on the brick. This causes the next write to the file to hang. Fix: Fix the code-bug to take lock only once. This bug was introduced master with commit eb472d82a083883335bc494b87ea175ac43471ff Thanks to Pranith Kumar K <pkarampu@redhat.com> for finding the RCA. fixes: bz#1637802 Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* all: fix warnings on non 64-bits architecturesXavi Hernandez2018-10-1020-126/+142
| | | | | | | | | | When compiling in other architectures there appear many warnings. Some of them are actual problems that prevent gluster to work correctly on those architectures. Change-Id: Icdc7107a2bc2da662903c51910beddb84bdf03c0 fixes: bz#1632717 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* dht: coverity fixesSusant Palai2018-10-054-35/+30
| | | | | | | | | | CID: 1356541 Issue: Dereference null return value CID: 1382411 Issue: Dereference after null check CID: 1391409 Issue: Unchecked return value Change-Id: Id3d4feb4e88df424003cc8e8a1540e77bbe030e3 Updates: bz#789278 Signed-off-by: Susant Palai <spalai@redhat.com>
* dht: volume_options 'options' collision with nfs-ganesha's 'options'Kaleb S. KEITHLEY2018-10-044-7/+11
| | | | | | | | | | | | | | | | | When dht was converted to xlator_api, the variable 'options' was not changed to dht_options, the same as was done to all the other xlators that were converted to xlator_api. Thus the reference to 'options' in dht.c is not resolved until runtime, and the RTlinker's search path starts with symbols in the executable, i.e. ganesha.nfsd's 'options'. (Which is obviously not the right one.) The unused extern references to 'options' (now dht_options) in nufa.c and switch.c is curious. Change-Id: Idf4a5d5fbd39aadfa5a4b529bceea65a3cbdf8f3 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* Quota related files: use dict_{setn|getn|deln|get_int32n|set_int32n|set_strn}Yaniv Kaul2018-09-262-2/+3
| | | | | | | | | | | | | | In a previous patch (https://review.gluster.org/20769) we've added the key length to be passed to dict_* funcs, to remove the need to strlen() it. This patch moves some code to use it. Please review carefully. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: If4f425a9827be7c36ccfbb9761006ae824a818c6
* cluster/ec: variable-length array (VLA) declaration clang fixSheetal Pamecha2018-09-261-1/+1
| | | | | | | | | | | Problem: variable-length array size becomes zero Modified array size to size+1 while declaring Updates: bz#1622665 Change-Id: I98ee8447c87f37c36c49f50058292e8c1757a1f9 Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
* dht: Coverity fixesSusant Palai2018-09-242-2/+3
| | | | | | | | | | | CID: 1274236 Issue: Logically dead code (op_errno will never be -1) CID: 1351652 Issue: Dereference after null check. (local->fd is dereferenced anyway and it should not be NULL ever for dht_readdirp_cbk.) Change-Id: Ied9c5f5b72536be1ca944237165acdc62b792e58 updates: bz#789278 Signed-off-by: Susant Palai <spalai@redhat.com>
* afr: fix incorrect reporting of directory split-brainRavishankar N2018-09-217-16/+22
| | | | | | | | | | | | | | | | | Problem: When a directory has dirty xattrs due to failed post-ops or when replace/reset brick is performed, AFR does a conservative merge as expected, but heal-info reports it as split-brain because there are no clear sources. Fix: Modify pending flag to contain information about pending heals and split-brains. For directories, if spit-brain flag is not set,just show them as needing heal and not being in split-brain. Fixes: bz#1626994 Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Make data eager-lock decision based on number of locksPranith Kumar K2018-09-213-6/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For both Virt and block workloads the file is opened multiple times leading to dynamically setting eager-lock to off for the workload. Instead of depending on the number-of-open-fds, if we change the logic to depend on number of inodelks, then it will give better performance than the earlier logic. When there is an eager-lock and number of inodelks is more than 1 we know that there is a conflicting lock, so depend on that information to decide whether to keep the current transaction go through delayed-post-op or not. Locks xlator doesn't have implementation to query number of locks in fxattrop in releases older than 3.10 so to keep things backward compatible in 3.12, data transactions will use new logic where as fxattrop transactions will use old logic. I am planning to send one more patch which makes metadata domain locks also depend on inodelk-count Profile info for a dd of 500MB to a file with another fd opened on the file using exec 250>filename Without this patch: 0.14 67.41 us 16.72 us 3870.82 us 892 FINODELK 0.59 279.87 us 95.71 us 2085.89 us 898 FXATTROP 3.46 366.43 us 81.75 us 6952.79 us 4000 WRITE 95.79 148733.99 us 50568.12 us 919127.86 us 273 FSYNC With this patch: 0.00 51.01 us 38.07 us 80.16 us 4 FINODELK 0.00 235.43 us 235.43 us 235.43 us 1 TRUNCATE 0.00 125.07 us 56.80 us 193.33 us 2 GETXATTR 0.00 135.86 us 62.13 us 209.59 us 2 INODELK 0.00 197.88 us 155.39 us 253.90 us 4 FXATTROP 0.00 450.59 us 394.28 us 506.89 us 2 XATTROP 0.00 56.96 us 19.06 us 406.59 us 23 FLUSH 37.81 273648.93 us 48.43 us 6017657.05 us 44 LOOKUP 62.18 4951.86 us 93.80 us 1143154.75 us 3999 WRITE postgresql benchmark performance changed from ~1130 TPS to ~2300TPS randio fio job inside Ovirt based VM went from ~600IOPs to ~2000IOPS fixes bz#1630368 Change-Id: If7f7388d2f08cf7f17ca517a4ea222560661dc36 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* dht: Operate internal fops with negative pidSusant Palai2018-09-201-0/+1
| | | | | | | | | | | | | | | | | | | With root-squash on, all root credentials are converted to a random uid, gid(65535). And ideally this does not carry the necessary permission bits to carry out the operation. But posix-acl will allow operations from this inode as long as its ctx has the ngroup information and ngroup has the owner group information. The problem we ran into recently was somehow posix-acl xlator did not cache the ngroup info and some of the dht internal fops(layout setxattr) failed with root-squash enabled. DHT internal fops now use a negative pid to pretend that the operation is from an internal client so posix-acl allows them to pass Change-Id: I5bb8d068389bf4c94629d668a16015a95ccb53ab fixes: bz#1624796 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/afr: Use 2 domain locking in SHD for thin-arbiterkarthik-us2018-09-203-91/+162
| | | | | | | | | | | | | | | | | | | | With this change when SHD starts the index crawl it requests all the clients to release the AFR_TA_DOM_NOTIFY lock so that clients will know the in memory state is no more valid and any new operations needs to query the thin-arbiter if required. When SHD completes healing all the files without any failure, it will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on TA to see whether there are any new failures happened by that time. If there are new failures marked on TA, SHD will start the crawl immediately to heal those failures as well. If there are no new failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets the xattrs on TA, so that both the data bricks will be considered as good there after. Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1 fixes: bz#1579788 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/afr: Batch writes in same lock even when multiple fds are openPranith Kumar K2018-09-191-9/+2
| | | | | | | | | | | | | | | | | | | | | | Problem: When eager-lock is disabled because of multiple-fds opened and app writes come on conflicting regions, the number of locks grows very fast leading to all the CPU being spent just in locking and unlocking by traversing huge queues in locks xlator for granting locks. Fix: Reduce the number of locks in transit by bundling the writes in the same lock and disable delayed piggy-pack when we learn that multiple fds are open on the file. This will reduce the size of queues in the locks xlator. This also reduces the number of network calls like inodelk/fxattrop. Please note that this problem can still happen if eager-lock is disabled as the writes will not be bundled in the same lock. fixes bz#1625961 Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* dht: utilize the framework to pass-through xlator tasksAmar Tumballi2018-09-197-27/+231
| | | | | | | | | | | | | | | | | | | | | Also fixes the issue caused due to not converting back the fn function to after getting its address. We wanted the value of the field, not the address of the pt_fop field. With this patch, DHT will always be started in pass-through mode if the number of subvols is just 1. Fixes some tests to make sure DHT is in full config (ie, subvols > 1). - increased timeout of brick-mux test as it was bordering on 300 seconds. - Also change the volume type to supported 'replica 3' from 'replica 2'. - also no DHT tests should assume presence of DHT when there is just 1 brick in volume Credits: Nithya B <nbalacha@redhat.com> fixes: #405 Change-Id: I8e55239ce58d6ac6ae1901e2e384be1ecbd33d6e Signed-off-by: Amar Tumballi <amarts@redhat.com>