summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* xlators/cluster/afr/src/afr-inode-read.c: move to GF_MALLOC() instead of ↵Yaniv Kaul2018-09-041-1/+1
| | | | | | | | | | | | | | | | | | GF_CALLOC() when It doesn't make sense to calloc (allocate and clear) memory when the code right away fills that memory with data. It may be optimized by the compiler, or have a microscopic performance improvement. Please review carefully, especially for string allocation, with the terminating NULL string. Only compile-tested! Change-Id: Ief156de98769fea852553044a398a309e831754b updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/afr: Delegate name-heal when possiblePranith Kumar K2018-09-042-27/+85
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: When name-self-heal is triggered on the mount, it blocks lookup until name-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs name heal and all of them trigger heals waiting for other clients to complete heal. Fix: When a name-heal is needed but quorum number of names have the file and pending xattrs exist on the parent, then better to delegate the heal to SHD which will be completed as part of entry-heal of the parent directory. We could also do the same for quorum-number of names not present but we don't have any known use-case where this is a frequent occurrence so not changing that part at the moment. When there is a gfid mismatch or missing gfid it is important to complete the heal so that next rename doesn't assume everything is fine and perform a rename etc fixes bz#1622821 Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: In rename, unlink after creating linkto fileN Balachandran2018-09-031-122/+132
| | | | | | | | | | | | | | The linkto file creation for the dst was done in parallel with the unlink of the old src linkto. If these operations reached the brick out of order, we end up with a dst linkto file without a .glusterfs handle. Fixed by the unlinking only after the linkto file creation has completed. Change-Id: I4246f7655f5bc180f5ded7fd34d263b7828a8110 fixes: bz#1621981 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/ec: Fix Coverity issueAshish Pandey2018-08-312-42/+0
| | | | | | | | | | | | | | | | | | | | | | | | Fix following coverity issues- CID: 1382378 1382459 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85091670&defectInstanceId=25915064&mergedDefectId=1382459 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85091670&defectInstanceId=25915063&mergedDefectId=1382378 Problem: ASSERT_LOCAL(this, healer) function is supposed to get the local healer so that we can take advantage of it while healing and reading data. However, we are not using healer->local anywhere. Also, this is not as useful in context of EC as it is in AFR. In EC we have to raed fragments from 4 bricks to heal a bad fragment on other brick. Change-Id: Iea8ce127ea02cc84e3823cb2be82a47872217b33 updates: bz#789278 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* xlators/cluster/dht/src/dht-common.c: simplify some if statementsYaniv Kaul2018-08-311-48/+52
| | | | | | | | | | | | Use goto when some vars are not set to simplify long if statements. Please review logic has not changed! Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I45ea2e906d0ccb468af5e1fa65db008edb00d734
* build: add --enable-asan configure optionsNiels de Vos2018-08-301-1/+1
| | | | | | | | | | | | | | Introduce a `./configure --enable-asan` to build with `-fsanitize=address -fno-omit-frame-pointer` options. This uses the libasan.so shared library, so that needs to be available. While running builds with the ASAN options, several linker issues surfaced and these have been addressed with this change as well. Building with --enable-asan has been tested on Fedora 28. Change-Id: I428a9da70dd8f7d0056cfbe5c398619a571469b2 Updates: #492 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* multiple files: remove unndeeded memset()Yaniv Kaul2018-08-293-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of multiple commits: contrib/fuse-lib/misc.c: remove unneeded memset() All flock variables are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I8e0512c5a88daadb0e587f545fdb9b32ca8858a2 libglusterfs/src/{client_t|fd|inode|stack}.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2be9ccc3a5cb5da51a92af73488cdabd1c527f59 libglusterfs/src/xlator.c: remove unneeded memset() All xl->mem_acct members are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I7f264cd47e7a06255a3f3943c583de77ae8e3147 xlators/cluster/afr/src/afr-self-heal-common.c: remove unneeded memset() Since we are going over the whole array anyway, initialize it properly, to either 1 or 0. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ied4210388976b6a7a2e91cc3de334534d6fef201 xlators/cluster/dht/src/dht-common.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Idc436d2bd0563b6582908d7cbebf9dbc66a42c9a xlators/cluster/ec/src/ec-helpers.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I81bf971f7fcecb4599e807d37f426f55711978fa xlators/mgmt/glusterd/src/glusterd-volgen.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I476ea59ba53546b5153c269692cd5383da81ce2d xlators/mgmt/glusterd/src/glusterd-geo-rep.c: read() in 4K blocks The current 1K seems small. 4K is usually better (in Linux). Also remove a memset() that I don't think is needed between reads. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I5fb7950c92d282948376db14919ad12e589eac2b xlators/storage/posix/src/posix-{gfid-path|inode-fd-ops}.c: remove memset() before sys_*xattr() functions. I don't see a reason to memset the array sent to the functions sys_llistxattr(), sys_lgetxattr(), sys_lgetxattr(), sys_flistxattr(), sys_fgetxattr(). (Note: it's unclear to me why we are calling sys_*txattr() functions with XATTR_VAL_BUF_SIZE-1 size instead of XATTR_VAL_BUF_SIZE ). Only compile-tested! Change-Id: Ief2103b56ba6c71e40ed343a93684eef6b771346 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/afr: Coverity fixes in afrkarthik-us2018-08-292-3/+5
| | | | | | | | | | | | | | | Fixes the deadcode issue in "afr-common.c" and null pointer dereference isse in "afr-dir-read.c". CIDs: 1395160, 1389018 Scan details: https://scan6.coverity.com/reports.htm#v42418/p10714/fileInstanceId=85017760&defectInstanceId=25877740&mergedDefectId=1395160 https://scan6.coverity.com/reports.htm#v42418/p10714/fileInstanceId=85017734&defectInstanceId=25877951&mergedDefectId=1389018 Change-Id: I65dff57305aa3ae43544be5353f801d761193e97 updates: bz#789278 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* multiple files: move from strlen() to sizeof()Yaniv Kaul2018-08-293-10/+10
| | | | | | | | | | | | | | | {glusterfsd|glusterfsd-mgmt|quota-common-utils|xlator|tier|stripe}.c tools/setgfid2path/src/main.c xlators/cluster/afr/src/afr-inode-read.c {glusterfs-acl|glusterfs}.h For const strings, just do compile time size calc instead of runtime. Compile-tested only! Change-Id: I303684b1ff29b05c10126fb1057f507e404ced07 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/afr: Delegate metadata heal with pending xattrs to SHDPranith Kumar K2018-08-283-38/+47
| | | | | | | | | | | | | | | | | | | Problem: When metadata-self-heal is triggered on the mount, it blocks lookup until metadata-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs metadata heal and all of them trigger heals waiting for other clients to complete heal. Fix: Only when the heal is needed but the pending xattrs are not set, trigger metadata heal that could block lookup. This is the only case where different clients may give different metadata to the clients without heals, which should be avoided. Updates bz#1622821 Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Prevent a possible out-of-bounds readVijay Bellur2018-08-271-0/+1
| | | | | | | | | | | Addresses CID 1370939 In ec_code_x64_epilog(), there is a possibility of reading from an incorrect index of ec_code_x64_regmap array Change-Id: Ib8a228bbe13631188343634b2bde5919cdaab5a4 Updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* multiple files: move from strlen() to sizeof()Yaniv Kaul2018-08-254-8/+8
| | | | | | | | | | | | {ec-heal|ec-combine|ec-helpers|ec-inode-read}.c For const strings, just do compile time size calc instead of runtime. Compile-tested only! Change-Id: If92ba0a7a20f64b898d01c6e3b6708190ca93e04 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* stripe: FORWARD_NULL coverity fixShwetha Acharya2018-08-241-6/+9
| | | | | | | | | | | | Problem: frame could be NULL. Solution: Added condition checks to avoid NULL pointer dereferencing. CID: 1124478, 1124501, 1124504, 1124510 BUG: 789278 Change-Id: I5c81d912102a7e672386db3fdb820f883d08666f Signed-off-by: Shwetha Acharya <shwetha174@gmail.com>
* dht/switch: Fix coverity issuesShyamsundarR2018-08-241-4/+19
| | | | | | | | | | | | | | | | CID: 1124779: dup_childs not freed in an err path CID: 1382398: option_string leaked CID: 1382389: memcpy may cause string overflow Also, fixed up NULL termination for the string Potential use after free (or double free in this case) as below, (link split into multiple lines) https://download.gluster.org/pub/gluster/glusterfs/static-analysis/ master/glusterfs-coverity/2018-08-22-0ebaa9c6/html/ 1/427switch.c.html#error Change-Id: I76681af6a8091666918a3d5dff30a152a7b97905 Updates: bz#789278 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* afr: common thin-arbiter functionsRavishankar N2018-08-236-12/+173
| | | | | | | | | | | | | | ...that can be used by client and self-heal daemon, namely: afr_ta_post_op_lock() afr_ta_post_op_unlock() Note: These are not yet consumed. They will be used in the write txn changes patch which will introduce 2 domain locking. updates: bz#1579788 Change-Id: I636d50f8fde00736665060e8f9ee4510d5f38795 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: coverity fixesN Balachandran2018-08-212-12/+14
| | | | | | | | | Fixes 1133997, 1370910, 1382387, 1382444, 1394635 Change-Id: Ie63ad47abd5519b9b9536da26b61ed4c9eaf2c75 updates: bz#789278 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* performance/readdir-ahead: keep stats of cached dentries in sync with ↵Krutika Dhananjay2018-08-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modifications PROBLEM: Stats of dentries that are readdirp'd ahead can become stale due to fops like writes, truncate etc that modify the file pointed by dentries. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: * Store the iatt in context of the inode pointed by dentry. * Whenever the inode pointed by dentry undergoes modification, in cbk of modification fop, update the iatt stored in inode-ctx to reflect the modification. * When serving a readdirp response from application, update iatts of dentries with the iatts stored in the context of inodes pointed by these dentries. * Some fops don't have valid iatts in their responses. For eg., write response whose data is still cached in write-behind will have zeroed out stat. In this case keep only ia_type and ia_gfid and reset rest of the iatt members to zero. - fuse-bridge in this case just sends "entry" information back to kernel and attr is not sent. - gfapi sets entry->inode to NULL and zeroes out the entire stat * There is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, when entry->inode doesn't eqaul to linked_inode, - fuse-bridge is made to send only "entry" information without attributes - gfapi sets entry->inode to NULL and zeroes out the entire stat. Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: FORWARD_NULL coverity fixSunil Kumar Acharya2018-08-162-1/+5
| | | | | | | | | | | Fixing FORWARD_NULL coverify errors with EC. CID: 1394650 BUG: 789278 Change-Id: I52c99dac3483ca31a86cd7e3a959d4010b195f32 updates: bz#789278 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/dht: Fixed rebalanced filesN Balachandran2018-08-141-1/+1
| | | | | | | | | An error caused skipped files to be counted as rebalanced files. Change-Id: I02333f099fb8b73ba953f41a2922021a1e4da7be fixes: bz#1615474 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: fix inode ref management in dht_heal_pathSusant Palai2018-08-141-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In dht_heal_path, the inodes are created & looked up from top to down. If the path is "a/b/c", then lookup will be done on a, then b and so on. Here is a rough snippet of the function "dht_heal_path". <snippet> if (bname) { ref_count - loc.inode = create/grep inode 1 - syncop_lookup (loc.inode) - linked_inode = inode_link (loc.inode) 2 /*clean up current loc*/ - loc_wipe(&loc) 1 /*set up parent and bname for next child */ - loc.parent = inode - bname = next_child_name } out: - inode_ref (linked_inode) 2 - loc_wipe (&loc) 1 </snippet> The problem with the above code is if _bname_ is empty ie the chain lookup is done, then for the next iteration we populate loc.parent anyway. Now that bname is empty, the loc_wipe is done in the _out_ section as well. Since, the loc.parent was set to the previous inode, we lose a ref unwantedly. Now a dht_local_wipe as part of the DHT_STACK_UNWIND takes away the last ref leading to inode_destroy. This problenm is observed currently with nfs-ganesha with the nameless lookup. Post the inode_purge, gfapi does not get the new inode to link and hence, it links the inode it sent in the lookup fop, which does not have any dht related context (layout) leading to "invalid argument error" in lookup path done parallely with tar operation. test done in the following way: - create two nfs client connected with two different nfs servers. - run untar on one client and run lookup continuously on the other. - Prior to this patch, invalid arguement was seen which is fixed with the current patch. Change-Id: Ifb90c178a2f3c16604068c7da8fa562b877f5c61 fixes: bz#1610256 Signed-off-by: Susant Palai <spalai@redhat.com>
* All: remove memset() before sprintf()Yaniv Kaul2018-08-143-3/+3
| | | | | | | | | | | | It's not needed. There's a good chance the compiler is smart enough to remove it anyway, but it can't hurt - I hope. Compile-tested only! Change-Id: Id7c054e146ba630227affa591007803f3046416b updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Revert "performance/readdir-ahead: Invalidate cached dentries if they're ↵Raghavendra G2018-08-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modified while in cache" This reverts commit 7131de81f72dda0ef685ed60d0887c6e14289b8c. With the latest master, I created a single brick volume and some files inside it. [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying again"; ls -l /mnt/fuse1 umount: /mnt/fuse1: not mounted total 0 ----------. 0 root root 0 Jan 1 1970 file-1 ----------. 0 root root 0 Jan 1 1970 file-2 ----------. 0 root root 0 Jan 1 1970 file-3 ----------. 0 root root 0 Jan 1 1970 file-4 ----------. 0 root root 0 Jan 1 1970 file-5 d---------. 0 root root 0 Jan 1 1970 subdir Trying again total 3 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-1 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-2 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-3 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-4 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-5 d---------. 0 root root 0 Jan 1 1970 subdir [root@rhgs313-6 ~]# Conversation can be followed on gluster-devel on thread with subj: tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected pointed this patch as culprit. Change-Id: I1eb46f6c196f44fde8ce991840a0e724e6f50862 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1390050
* performance/readdir-ahead: Invalidate cached dentries if they're modified ↵Krutika Dhananjay2018-07-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while in cache PROBLEM: Entries that are readdirp'd ahead can undergo modification in terms of writes, truncates which could modify their iatts. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: Whenever a dentry undergoes modification, in the cbk of the modification fop, a "dirty" flag (default 0) is set in its inode ctx. When it's time for readdir-ahead to serve these entries, it will read the inode ctx and check if the entry is "dirty", and if it is, set the entry's attrs to all zeroes, as an indicator to fuse, md-cache etc not to cache these attributes. Also there is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, fuse-bridge is made to drop entries for which dentry->inode is not the same as linked inode, in readdirp cbk. Change-Id: If7396507632b5268442ca580473d5155fee9cbef BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* coverity: Ignore most of SECURE_TEMP issuesShyamsundarR2018-07-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | mkstemp as per the Linux man page, uses 0600 as the permission bits when creating the file. This is hence safe and a Coverity warning that should be ignored. Further, we are mostly a multi-threaded program in all our daemons and cannot set and unset umask at will in a multi-threaded program, to address the coverity issue. This change attempts to nudge coverity to ignore this warning, using the pattern, /* coverity[EVENT_TAG_NAME] ... */ <line of code that has the issue> This commit is an experiment, if post merge the next coverity report ignores these errors, the above pattern (as found using an internet search) works and can be applied to certain other warnings as well. Change-Id: I73a184ce1a54dd9e66542952b1190a74438c826a Updates: bz#789278 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* core (named threads): flood of -Wformat-truncation warnings with gcc-7.1Kaleb S. KEITHLEY2018-07-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Starting in Fedora 26 which has gcc-7.1.x, -Wformat-trunction is enabled with -Wformat, resulting in a flood of new warnings. This many warnings is a concern because it makes it hard(er) to see other warnings that should be addressed. An example is at https://kojipkgs.fedoraproject.org//packages/glusterfs/3.12.0/1.fc28/data/logs/x86_64/build.log For more info see https://review.gluster.org/#/c/18267/ I can't find much (or good) documentation on the heuristics the compiler uses for this warning. In the case of printing integer types it appears it looks at the available space in the destination and the range of values for the variable and/or its type. To address the specific question about why 0x3ff versus 0xfff to mask the value, either would suffice to hint to the compiler that the printed value will fit in three characters. But the loop is from 0...1023 (or 0...0x3ff if you prefer) so I chose that as a more "accurate" mask to use as it exactly matches the range of values of the loop. Fixes: bz#1492847 Change-Id: I6e309ba42159841131d8241bfc0566ef09e00aa9
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-2223-50/+50
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/dht: Set loc->gfid before healing attrN Balachandran2018-07-181-1/+2
| | | | | | | | | | | | | AFR takes inodelks when setting attrs. The loc->gfid and loc->inode->gfid were both null when dht_dir_attr_heal was called during a fresh lookup of an existing directory. As the gfid is null, client_pre_inodelk asserts in the gfid check. We now set the loc->gfid before calling dht_dir_attr_heal. Change-Id: I457f5a73fd301d97a03ca032587e73d4803298ac fixes: bz#1602866 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* afr: switch lk_owner only when pre-op succeedsRavishankar N2018-07-181-10/+10
| | | | | | | | | | | | | | | | | | | Problem: In a disk full scenario, we take a failure path in afr_transaction_perform_fop() and go to unlock phase. But we change the lk-owner before that, causing unlock to fail. When mount issues another fop that takes locks on that file, it hangs. Fix: Change lk-owner only when we are about to perform the fop phase. Also fix the same issue for arbiters when afr_txn_arbitrate_fop() fails the fop. Also removed the DISK_SPACE_CHECK_AND_GOTO in posix_xattrop. Otherwise truncate to zero will fail pre-op phase with ENOSPC when the user is actually trying to freee up space. Change-Id: Ic4c8a596b4cdf4a7fc189bf00b561113cf114353 fixes: bz#1602236 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* dht: remove useless argument from dht_iatt_mergeKinglong Mee2018-07-187-88/+78
| | | | | | | | | | The last using of the subvol argument has been removed at 4e1ec35ef4f7 ("core: fill 'ia_ino' from 'ia_gfid' in 'storage/posix' ......") 7 years ago (2011-06-16). Change-Id: I9788d79e2e40cc153cf2960e28c7c1c1033dc8f7 fixes: bz#1601683 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* dht: delete tier related internal xattr in dht_getxattr_cbkSunny Kumar2018-07-171-2/+1
| | | | | | | | | | | | | | Use dict_del instead of GF_REMOVE_INTERNAL_XATTR. For problem and fix related information see here - https://review.gluster.org/20450. This patch have some modification as requested by reviewers on already merged patch : https://review.gluster.org/20450. Change-Id: I50c263e3411354bb9c1e028b64b9ebfd755dfe37 fixes: bz#1597563 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* tier: Move time string formattingYaniv Kaul2018-07-171-7/+10
| | | | | | | | | | | | | | | There is no need to format the time, unless ret is true. I don't think there's a reason to allocate memory for those struct and char array unless we are formatting either (But I'm not sure what the code convention is - are we ok with 'local' variable declarations?) Only compile-tested. Change-Id: I9feb09871943764bd76bdfc9ac6ca506f329aac1 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/afr: Mark dirty for entry transactions for quorum failureskarthik-us2018-07-161-11/+51
| | | | | | | | | | | | | | | | | | Problem: If an entry creation transaction fails on quprum number of bricks it might end up setting the pending changelogs on the file itself on the brick where it got created. But the parent does not have any entry pending marker set. This will lead to the entry not getting healed by the self heal daemon automatically. Fix: For entry transactions mark dirty on the parent if it fails on quorum number of bricks, so that the heal can do conservative merge and entry gets healed by shd. Change-Id: I56448932dd409b3ddb095e2ae32e037b6157a607 fixes: bz#1586020 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* dht: delete tier related internal xattr in dht_getxattr_cbkSunny Kumar2018-07-161-0/+15
| | | | | | | | | | | | | | Problem : Hot and Cold tier brick changelogs report rsync failure Solution : georep session is failing to sync directory from master volume to slave volume due to lot of changelog retries, solution would be to ignore tier related internal xattrs trusted.tier.fix.layout.complete and trusted.tier.tier-dht.commithash in dht_getxattr_cbk. Change-Id: I3530ffe7c4157584b439486f33ecd82ed8d66aee fixes: bz#1597563 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* md-cache: Do not invalidate cache post set/remove xattrPoornima G2018-07-112-2/+65
| | | | | | | | | | | | | | | Since setxattr and removexattr fops cbk do not carry poststat, the stat cache was being invalidated in setxatr/remoxattr cbk. Hence the further lookup wouldn't be served from cache. To prevent this invalidation, md-cache is modified to get the poststat in set/removexattr_cbk in dict. Co-authored with Xavi Hernandez. Change-Id: I6b946be2d20b807e2578825743c25ba5927a60b4 fixes: bz#1586018 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Poornima G <pgurusid@redhat.com>
* dht: Inconsistent permission for directories after brick stop/startMohit Agrawal2018-07-113-9/+77
| | | | | | | | | | | | | | | | Problem: Inconsistent access permissions on directories after bringing back the down sub-volumes, in case of directories dht_setattr first wind a call on MDS once call is finished on MDS then wind a call on NON-MDS.At the time of revalidating dht just compare the uid/gid with stbuf uid/gid and if anyone differs set a flag to heal the same. Solution: Add a condition to compare permission also in dht_revalidate_cbk to set a flag to call dht_dir_attr_heal. BUG: 1584517 Change-Id: I3e039607148005015b5d93364536158380d4c5aa fixes: bz#1584517 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* Fix compile warningsXavi Hernandez2018-07-101-12/+25
| | | | | | | | | | | This patch fixes compile warnings that appear with newer compilers. The solution applied is only to remove the warnings, but it doesn't always solve the problem in the best way. It assumes that the problem will never happen, as the previous code assumed. Change-Id: I6e8470d6c2e2dbd3bd7d324b5fd2f92ffdc3d6ec updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/afr: Prevent execution of code after call_count decrementingPranith Kumar K2018-07-101-7/+8
| | | | | | | | | | | | | | Problem: When call_count is decremented by one thread, another thread can go ahead with the operation leading to undefined behavior for the thread executing statements after decrementing call count. Fix: Do the operations necessary before decrementing call count. fixes bz#1598663 Change-Id: Icc90cd92ac16e5fbdfe534d9f0a61312943393fe Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/afr: Make sure lk-owner is assigned at the time of lockPranith Kumar K2018-07-041-2/+1
| | | | | | | | | | | | | | Problem: In the new eager-lock implementation lk-owner is assigned after the 'local' is added to the eager-lock list, so there exists a possibility of lock being sent even before lk-owner is assigned. Fix: Make sure to assign lk-owner before adding local to eager-lock list fixes bz#1597805 Change-Id: I26d1b7bcf3e8b22531f1dc0b952cae2d92889ef2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr,ec: Print if the subvolume is up in statedumpPranith Kumar K2018-07-032-0/+2
| | | | | | fixes bz#1597156 Change-Id: I323eb9190e40b12df216698dcdba74a6d336beeb Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: Do not try to use up the readdirp bufferN Balachandran2018-06-293-53/+66
| | | | | | | | | | | | | | | | | | | | | | | | | DHT attempts to use up the entire buffer in readdirp before unwinding in an attempt to reduce the number of calls. However, this has 2 disadvantages: 1. This can cause a stack overflow when parallel readdir is enabled. If the buffer only has a little space,rda can send back only one or two entries. If those entries are stripped out by dht_readdirp_cbk (linkto files for example) it will once again wind down to rda in an attempt to fill the buffer before unwinding to FUSE. This process can continue for several iterations, causing the stack to grow and eventually overflow, causing the process to crash. 2. If parallel readdir is disabled, dht could send readdirp calls with small buffers to the bricks, thus increasing the number of network calls. We are therefore reverting to the existing behaviour. Please note, this only mitigates the stack overflow, it does not prevent it from happening. This is still possible if a subvol has thousands of linkto files for instance. Change-Id: I291bc181c5249762d0c4fe27fa4fc2631166adf5 fixes: bz#1593548 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Fix rename journal in changelogKotresh HR2018-06-241-0/+11
| | | | | | | | | | | | | | | With patch [1], renames are journalled only on cached subvolume. The dht sends the special key on the cached subvolume so that the changelog journals the rename. With single distribute sub-volume, the key is not being set. This patch fixes the same. [1] https://review.gluster.org/10410 fixes: bz#1583018 Change-Id: Ic2e35b40535916fa506a714f257ba325e22d0961 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/dht: refactor dht_lookupN Balachandran2018-06-211-211/+321
| | | | | | | | | | The dht lookup code is getting difficult to maintain due to its size. Refactoring the code will make it easier to modify it in future. Change-Id: Ic7cb5bf4f018504dfaa7f0d48cf42ab0aa34abdd updates: bz#1590385 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Minor code cleanupN Balachandran2018-06-201-15/+13
| | | | | | | | Removed extra variable. Change-Id: If43c47f6630454aeadab357a36d061ec0b53cdb5 updates: bz#1590385 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* afr: heal gfids when file is not present on all bricksRavishankar N2018-06-195-12/+51
| | | | | | | | | | | commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression wherein if a file is present in only 1 brick of replica *and* doesn't have a gfid associated with it, it doesn't get healed upon the next lookup from the client. Fix it. Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599 fixes: bz#1591193 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr: don't update readables if inode refresh failed on all childrenRavishankar N2018-06-184-32/+50
| | | | | | | | | | | | | | | | | | | | | | | | Problem: If inode refresh failed on all children of afr due to ENOENT (say file migrated by dht), it resets the readables to zero. Any inflight txn which then later comes on the inode fails with EIO because no readable children present for the inode. Fix: Don't update readables when inode refresh fails on *all* children of afr. In that way any inflight txns will either proceed with its own inode refresh if needed and fail it with the right errno or use the old value of readables and continue with the txn. Also, add quorum checks to the beginning of afr_transaction(). Otherwise, we seem to be winding the lock and checking for quorum only in pre-op pahse. Note: This should ideally fix BZ 1329505 since the stop gap fix for it is has been reverted at https://review.gluster.org/#/c/20028. Change-Id: Ia638c092d8d12dc27afb3cdad133394845061319 updates: bz#1584483 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: Refactor rebalance codeN Balachandran2018-06-131-309/+253
| | | | | | | | | | Created init and cleanup functions for certain functionality in order to improve readability. Removed unused code. Change-Id: Ia6a2f4ab64923b6ea8e10487227fb5621eec1488 updates: bz#1586363 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Leverage MDS subvol for dht_removexattr alsoMohit Agrawal2018-06-111-60/+190
| | | | | | | | | | | | | Problem: In a distributed volume situation can be arise when custom extended attributed are not removed from all bricks after stop/start or added newly brick. Solution: To resolve the same use MDS subvol for remove xattr also BUG: 1575587 Change-Id: I7701e0d3833e3064274cb269f26061bff9b71f50 fixes: bz#1575587 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* dht: Delete MDS internal xattr from dict in dht_getxattr_cbkMohit Agrawal2018-06-031-0/+4
| | | | | | | | | | | | | | Problem: At the time of fetching xattr to heal xattr by afr it is not able to fetch xattr because posix_getxattr has a check to ignore if xattr name is MDS Solution: To ignore same xattr update a check in dht_getxattr_cbk instead of having a check in posix_getxattr BUG: 1584098 Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc fixes: bz#1584098 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster:dht: Corrected ret code checkN Balachandran2018-05-301-1/+1
| | | | | | | | syncop functions return -op_errno. Change-Id: Ifdb1bd1d1d11972b4306a2336e6737d6236a2fb1 fixes: bz#1580238 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht: Excessive 'dict is null' logs in dht_revalidate_cbkMohit Agrawal2018-05-291-2/+4
| | | | | | | | | | | | | Problem: In case of error(ESTALE/ENOENT) dht_revalidate_cbk throws "dict is null" error because xattr is not available Solution: To avoid the logs update condition in dht_revalidate_cbk and dht_lookup_dir_cbk BUG: 1583565 Change-Id: Ife6b3eeb6d91bf24403ed3100e237bb5d15b4357 fixes: bz#1583565 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>