summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht
Commit message (Collapse)AuthorAgeFilesLines
...
* multiple files: calloc -> mallocYaniv Kaul2018-09-044-27/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlators/cluster/stripe/src/stripe-helpers.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/dht/src/tier.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/dht/src/dht-layout.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/dht/src/dht-helper.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/dht/src/dht-common.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/afr/src/afr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible xlators/cluster/afr/src/afr-inode-read.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible tests/bugs/replicate/bug-1250170-fsync.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible tests/basic/gfapi/gfapi-async-calls-test.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible tests/basic/ec/ec-fast-fgetxattr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible rpc/xdr/src/glusterfs3.h: Move to GF_MALLOC() instead of GF_CALLOC() when possible rpc/rpc-transport/socket/src/socket.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible rpc/rpc-lib/src/rpc-clnt.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible extras/geo-rep/gsync-sync-gfid.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-xml-output.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-rpc-ops.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-volume.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-system.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-snapshot.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-peer.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible cli/src/cli-cmd-global.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible It doesn't make sense to calloc (allocate and clear) memory when the code right away fills that memory with data. It may be optimized by the compiler, or have a microscopic performance improvement. In some cases, also changed allocation size to be sizeof some struct or type instead of a pointer - easier to read. In some cases, removed redundant strlen() calls by saving the result into a variable. 1. Only done for the straightforward cases. There's room for improvement. 2. Please review carefully, especially for string allocation, with the terminating NULL string. Only compile-tested! updates: bz#1193929 Original-Author: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Change-Id: I16274dca4078a1d06ae09a0daf027d734b631ac2
* cluster/dht: In rename, unlink after creating linkto fileN Balachandran2018-09-031-122/+132
| | | | | | | | | | | | | | The linkto file creation for the dst was done in parallel with the unlink of the old src linkto. If these operations reached the brick out of order, we end up with a dst linkto file without a .glusterfs handle. Fixed by the unlinking only after the linkto file creation has completed. Change-Id: I4246f7655f5bc180f5ded7fd34d263b7828a8110 fixes: bz#1621981 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* xlators/cluster/dht/src/dht-common.c: simplify some if statementsYaniv Kaul2018-08-311-48/+52
| | | | | | | | | | | | Use goto when some vars are not set to simplify long if statements. Please review logic has not changed! Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I45ea2e906d0ccb468af5e1fa65db008edb00d734
* build: add --enable-asan configure optionsNiels de Vos2018-08-301-1/+1
| | | | | | | | | | | | | | Introduce a `./configure --enable-asan` to build with `-fsanitize=address -fno-omit-frame-pointer` options. This uses the libasan.so shared library, so that needs to be available. While running builds with the ASAN options, several linker issues surfaced and these have been addressed with this change as well. Building with --enable-asan has been tested on Fedora 28. Change-Id: I428a9da70dd8f7d0056cfbe5c398619a571469b2 Updates: #492 Signed-off-by: Niels de Vos <ndevos@redhat.com>
* multiple files: remove unndeeded memset()Yaniv Kaul2018-08-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of multiple commits: contrib/fuse-lib/misc.c: remove unneeded memset() All flock variables are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I8e0512c5a88daadb0e587f545fdb9b32ca8858a2 libglusterfs/src/{client_t|fd|inode|stack}.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2be9ccc3a5cb5da51a92af73488cdabd1c527f59 libglusterfs/src/xlator.c: remove unneeded memset() All xl->mem_acct members are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I7f264cd47e7a06255a3f3943c583de77ae8e3147 xlators/cluster/afr/src/afr-self-heal-common.c: remove unneeded memset() Since we are going over the whole array anyway, initialize it properly, to either 1 or 0. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ied4210388976b6a7a2e91cc3de334534d6fef201 xlators/cluster/dht/src/dht-common.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Idc436d2bd0563b6582908d7cbebf9dbc66a42c9a xlators/cluster/ec/src/ec-helpers.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I81bf971f7fcecb4599e807d37f426f55711978fa xlators/mgmt/glusterd/src/glusterd-volgen.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I476ea59ba53546b5153c269692cd5383da81ce2d xlators/mgmt/glusterd/src/glusterd-geo-rep.c: read() in 4K blocks The current 1K seems small. 4K is usually better (in Linux). Also remove a memset() that I don't think is needed between reads. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I5fb7950c92d282948376db14919ad12e589eac2b xlators/storage/posix/src/posix-{gfid-path|inode-fd-ops}.c: remove memset() before sys_*xattr() functions. I don't see a reason to memset the array sent to the functions sys_llistxattr(), sys_lgetxattr(), sys_lgetxattr(), sys_flistxattr(), sys_fgetxattr(). (Note: it's unclear to me why we are calling sys_*txattr() functions with XATTR_VAL_BUF_SIZE-1 size instead of XATTR_VAL_BUF_SIZE ). Only compile-tested! Change-Id: Ief2103b56ba6c71e40ed343a93684eef6b771346 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* multiple files: move from strlen() to sizeof()Yaniv Kaul2018-08-291-1/+1
| | | | | | | | | | | | | | | {glusterfsd|glusterfsd-mgmt|quota-common-utils|xlator|tier|stripe}.c tools/setgfid2path/src/main.c xlators/cluster/afr/src/afr-inode-read.c {glusterfs-acl|glusterfs}.h For const strings, just do compile time size calc instead of runtime. Compile-tested only! Change-Id: I303684b1ff29b05c10126fb1057f507e404ced07 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* dht/switch: Fix coverity issuesShyamsundarR2018-08-241-4/+19
| | | | | | | | | | | | | | | | CID: 1124779: dup_childs not freed in an err path CID: 1382398: option_string leaked CID: 1382389: memcpy may cause string overflow Also, fixed up NULL termination for the string Potential use after free (or double free in this case) as below, (link split into multiple lines) https://download.gluster.org/pub/gluster/glusterfs/static-analysis/ master/glusterfs-coverity/2018-08-22-0ebaa9c6/html/ 1/427switch.c.html#error Change-Id: I76681af6a8091666918a3d5dff30a152a7b97905 Updates: bz#789278 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* cluster/dht: coverity fixesN Balachandran2018-08-212-12/+14
| | | | | | | | | Fixes 1133997, 1370910, 1382387, 1382444, 1394635 Change-Id: Ie63ad47abd5519b9b9536da26b61ed4c9eaf2c75 updates: bz#789278 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* performance/readdir-ahead: keep stats of cached dentries in sync with ↵Krutika Dhananjay2018-08-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modifications PROBLEM: Stats of dentries that are readdirp'd ahead can become stale due to fops like writes, truncate etc that modify the file pointed by dentries. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: * Store the iatt in context of the inode pointed by dentry. * Whenever the inode pointed by dentry undergoes modification, in cbk of modification fop, update the iatt stored in inode-ctx to reflect the modification. * When serving a readdirp response from application, update iatts of dentries with the iatts stored in the context of inodes pointed by these dentries. * Some fops don't have valid iatts in their responses. For eg., write response whose data is still cached in write-behind will have zeroed out stat. In this case keep only ia_type and ia_gfid and reset rest of the iatt members to zero. - fuse-bridge in this case just sends "entry" information back to kernel and attr is not sent. - gfapi sets entry->inode to NULL and zeroes out the entire stat * There is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, when entry->inode doesn't eqaul to linked_inode, - fuse-bridge is made to send only "entry" information without attributes - gfapi sets entry->inode to NULL and zeroes out the entire stat. Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Fixed rebalanced filesN Balachandran2018-08-141-1/+1
| | | | | | | | | An error caused skipped files to be counted as rebalanced files. Change-Id: I02333f099fb8b73ba953f41a2922021a1e4da7be fixes: bz#1615474 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: fix inode ref management in dht_heal_pathSusant Palai2018-08-141-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In dht_heal_path, the inodes are created & looked up from top to down. If the path is "a/b/c", then lookup will be done on a, then b and so on. Here is a rough snippet of the function "dht_heal_path". <snippet> if (bname) { ref_count - loc.inode = create/grep inode 1 - syncop_lookup (loc.inode) - linked_inode = inode_link (loc.inode) 2 /*clean up current loc*/ - loc_wipe(&loc) 1 /*set up parent and bname for next child */ - loc.parent = inode - bname = next_child_name } out: - inode_ref (linked_inode) 2 - loc_wipe (&loc) 1 </snippet> The problem with the above code is if _bname_ is empty ie the chain lookup is done, then for the next iteration we populate loc.parent anyway. Now that bname is empty, the loc_wipe is done in the _out_ section as well. Since, the loc.parent was set to the previous inode, we lose a ref unwantedly. Now a dht_local_wipe as part of the DHT_STACK_UNWIND takes away the last ref leading to inode_destroy. This problenm is observed currently with nfs-ganesha with the nameless lookup. Post the inode_purge, gfapi does not get the new inode to link and hence, it links the inode it sent in the lookup fop, which does not have any dht related context (layout) leading to "invalid argument error" in lookup path done parallely with tar operation. test done in the following way: - create two nfs client connected with two different nfs servers. - run untar on one client and run lookup continuously on the other. - Prior to this patch, invalid arguement was seen which is fixed with the current patch. Change-Id: Ifb90c178a2f3c16604068c7da8fa562b877f5c61 fixes: bz#1610256 Signed-off-by: Susant Palai <spalai@redhat.com>
* All: remove memset() before sprintf()Yaniv Kaul2018-08-142-2/+2
| | | | | | | | | | | | It's not needed. There's a good chance the compiler is smart enough to remove it anyway, but it can't hurt - I hope. Compile-tested only! Change-Id: Id7c054e146ba630227affa591007803f3046416b updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Revert "performance/readdir-ahead: Invalidate cached dentries if they're ↵Raghavendra G2018-08-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modified while in cache" This reverts commit 7131de81f72dda0ef685ed60d0887c6e14289b8c. With the latest master, I created a single brick volume and some files inside it. [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying again"; ls -l /mnt/fuse1 umount: /mnt/fuse1: not mounted total 0 ----------. 0 root root 0 Jan 1 1970 file-1 ----------. 0 root root 0 Jan 1 1970 file-2 ----------. 0 root root 0 Jan 1 1970 file-3 ----------. 0 root root 0 Jan 1 1970 file-4 ----------. 0 root root 0 Jan 1 1970 file-5 d---------. 0 root root 0 Jan 1 1970 subdir Trying again total 3 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-1 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-2 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-3 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-4 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-5 d---------. 0 root root 0 Jan 1 1970 subdir [root@rhgs313-6 ~]# Conversation can be followed on gluster-devel on thread with subj: tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected pointed this patch as culprit. Change-Id: I1eb46f6c196f44fde8ce991840a0e724e6f50862 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1390050
* performance/readdir-ahead: Invalidate cached dentries if they're modified ↵Krutika Dhananjay2018-07-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while in cache PROBLEM: Entries that are readdirp'd ahead can undergo modification in terms of writes, truncates which could modify their iatts. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: Whenever a dentry undergoes modification, in the cbk of the modification fop, a "dirty" flag (default 0) is set in its inode ctx. When it's time for readdir-ahead to serve these entries, it will read the inode ctx and check if the entry is "dirty", and if it is, set the entry's attrs to all zeroes, as an indicator to fuse, md-cache etc not to cache these attributes. Also there is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, fuse-bridge is made to drop entries for which dentry->inode is not the same as linked inode, in readdirp cbk. Change-Id: If7396507632b5268442ca580473d5155fee9cbef BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* core (named threads): flood of -Wformat-truncation warnings with gcc-7.1Kaleb S. KEITHLEY2018-07-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Starting in Fedora 26 which has gcc-7.1.x, -Wformat-trunction is enabled with -Wformat, resulting in a flood of new warnings. This many warnings is a concern because it makes it hard(er) to see other warnings that should be addressed. An example is at https://kojipkgs.fedoraproject.org//packages/glusterfs/3.12.0/1.fc28/data/logs/x86_64/build.log For more info see https://review.gluster.org/#/c/18267/ I can't find much (or good) documentation on the heuristics the compiler uses for this warning. In the case of printing integer types it appears it looks at the available space in the destination and the range of values for the variable and/or its type. To address the specific question about why 0x3ff versus 0xfff to mask the value, either would suffice to hint to the compiler that the printed value will fit in three characters. But the loop is from 0...1023 (or 0...0x3ff if you prefer) so I chose that as a more "accurate" mask to use as it exactly matches the range of values of the loop. Fixes: bz#1492847 Change-Id: I6e309ba42159841131d8241bfc0566ef09e00aa9
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-227-25/+25
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/dht: Set loc->gfid before healing attrN Balachandran2018-07-181-1/+2
| | | | | | | | | | | | | AFR takes inodelks when setting attrs. The loc->gfid and loc->inode->gfid were both null when dht_dir_attr_heal was called during a fresh lookup of an existing directory. As the gfid is null, client_pre_inodelk asserts in the gfid check. We now set the loc->gfid before calling dht_dir_attr_heal. Change-Id: I457f5a73fd301d97a03ca032587e73d4803298ac fixes: bz#1602866 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht: remove useless argument from dht_iatt_mergeKinglong Mee2018-07-187-88/+78
| | | | | | | | | | The last using of the subvol argument has been removed at 4e1ec35ef4f7 ("core: fill 'ia_ino' from 'ia_gfid' in 'storage/posix' ......") 7 years ago (2011-06-16). Change-Id: I9788d79e2e40cc153cf2960e28c7c1c1033dc8f7 fixes: bz#1601683 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* dht: delete tier related internal xattr in dht_getxattr_cbkSunny Kumar2018-07-171-2/+1
| | | | | | | | | | | | | | Use dict_del instead of GF_REMOVE_INTERNAL_XATTR. For problem and fix related information see here - https://review.gluster.org/20450. This patch have some modification as requested by reviewers on already merged patch : https://review.gluster.org/20450. Change-Id: I50c263e3411354bb9c1e028b64b9ebfd755dfe37 fixes: bz#1597563 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* tier: Move time string formattingYaniv Kaul2018-07-171-7/+10
| | | | | | | | | | | | | | | There is no need to format the time, unless ret is true. I don't think there's a reason to allocate memory for those struct and char array unless we are formatting either (But I'm not sure what the code convention is - are we ok with 'local' variable declarations?) Only compile-tested. Change-Id: I9feb09871943764bd76bdfc9ac6ca506f329aac1 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* dht: delete tier related internal xattr in dht_getxattr_cbkSunny Kumar2018-07-161-0/+15
| | | | | | | | | | | | | | Problem : Hot and Cold tier brick changelogs report rsync failure Solution : georep session is failing to sync directory from master volume to slave volume due to lot of changelog retries, solution would be to ignore tier related internal xattrs trusted.tier.fix.layout.complete and trusted.tier.tier-dht.commithash in dht_getxattr_cbk. Change-Id: I3530ffe7c4157584b439486f33ecd82ed8d66aee fixes: bz#1597563 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* md-cache: Do not invalidate cache post set/remove xattrPoornima G2018-07-111-2/+2
| | | | | | | | | | | | | | | Since setxattr and removexattr fops cbk do not carry poststat, the stat cache was being invalidated in setxatr/remoxattr cbk. Hence the further lookup wouldn't be served from cache. To prevent this invalidation, md-cache is modified to get the poststat in set/removexattr_cbk in dict. Co-authored with Xavi Hernandez. Change-Id: I6b946be2d20b807e2578825743c25ba5927a60b4 fixes: bz#1586018 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Poornima G <pgurusid@redhat.com>
* dht: Inconsistent permission for directories after brick stop/startMohit Agrawal2018-07-113-9/+77
| | | | | | | | | | | | | | | | Problem: Inconsistent access permissions on directories after bringing back the down sub-volumes, in case of directories dht_setattr first wind a call on MDS once call is finished on MDS then wind a call on NON-MDS.At the time of revalidating dht just compare the uid/gid with stbuf uid/gid and if anyone differs set a flag to heal the same. Solution: Add a condition to compare permission also in dht_revalidate_cbk to set a flag to call dht_dir_attr_heal. BUG: 1584517 Change-Id: I3e039607148005015b5d93364536158380d4c5aa fixes: bz#1584517 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* Fix compile warningsXavi Hernandez2018-07-101-12/+25
| | | | | | | | | | | This patch fixes compile warnings that appear with newer compilers. The solution applied is only to remove the warnings, but it doesn't always solve the problem in the best way. It assumes that the problem will never happen, as the previous code assumed. Change-Id: I6e8470d6c2e2dbd3bd7d324b5fd2f92ffdc3d6ec updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/dht: Do not try to use up the readdirp bufferN Balachandran2018-06-293-53/+66
| | | | | | | | | | | | | | | | | | | | | | | | | DHT attempts to use up the entire buffer in readdirp before unwinding in an attempt to reduce the number of calls. However, this has 2 disadvantages: 1. This can cause a stack overflow when parallel readdir is enabled. If the buffer only has a little space,rda can send back only one or two entries. If those entries are stripped out by dht_readdirp_cbk (linkto files for example) it will once again wind down to rda in an attempt to fill the buffer before unwinding to FUSE. This process can continue for several iterations, causing the stack to grow and eventually overflow, causing the process to crash. 2. If parallel readdir is disabled, dht could send readdirp calls with small buffers to the bricks, thus increasing the number of network calls. We are therefore reverting to the existing behaviour. Please note, this only mitigates the stack overflow, it does not prevent it from happening. This is still possible if a subvol has thousands of linkto files for instance. Change-Id: I291bc181c5249762d0c4fe27fa4fc2631166adf5 fixes: bz#1593548 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Fix rename journal in changelogKotresh HR2018-06-241-0/+11
| | | | | | | | | | | | | | | With patch [1], renames are journalled only on cached subvolume. The dht sends the special key on the cached subvolume so that the changelog journals the rename. With single distribute sub-volume, the key is not being set. This patch fixes the same. [1] https://review.gluster.org/10410 fixes: bz#1583018 Change-Id: Ic2e35b40535916fa506a714f257ba325e22d0961 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/dht: refactor dht_lookupN Balachandran2018-06-211-211/+321
| | | | | | | | | | The dht lookup code is getting difficult to maintain due to its size. Refactoring the code will make it easier to modify it in future. Change-Id: Ic7cb5bf4f018504dfaa7f0d48cf42ab0aa34abdd updates: bz#1590385 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Minor code cleanupN Balachandran2018-06-201-15/+13
| | | | | | | | Removed extra variable. Change-Id: If43c47f6630454aeadab357a36d061ec0b53cdb5 updates: bz#1590385 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Refactor rebalance codeN Balachandran2018-06-131-309/+253
| | | | | | | | | | Created init and cleanup functions for certain functionality in order to improve readability. Removed unused code. Change-Id: Ia6a2f4ab64923b6ea8e10487227fb5621eec1488 updates: bz#1586363 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Leverage MDS subvol for dht_removexattr alsoMohit Agrawal2018-06-111-60/+190
| | | | | | | | | | | | | Problem: In a distributed volume situation can be arise when custom extended attributed are not removed from all bricks after stop/start or added newly brick. Solution: To resolve the same use MDS subvol for remove xattr also BUG: 1575587 Change-Id: I7701e0d3833e3064274cb269f26061bff9b71f50 fixes: bz#1575587 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* dht: Delete MDS internal xattr from dict in dht_getxattr_cbkMohit Agrawal2018-06-031-0/+4
| | | | | | | | | | | | | | Problem: At the time of fetching xattr to heal xattr by afr it is not able to fetch xattr because posix_getxattr has a check to ignore if xattr name is MDS Solution: To ignore same xattr update a check in dht_getxattr_cbk instead of having a check in posix_getxattr BUG: 1584098 Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc fixes: bz#1584098 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster:dht: Corrected ret code checkN Balachandran2018-05-301-1/+1
| | | | | | | | syncop functions return -op_errno. Change-Id: Ifdb1bd1d1d11972b4306a2336e6737d6236a2fb1 fixes: bz#1580238 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht: Excessive 'dict is null' logs in dht_revalidate_cbkMohit Agrawal2018-05-291-2/+4
| | | | | | | | | | | | | Problem: In case of error(ESTALE/ENOENT) dht_revalidate_cbk throws "dict is null" error because xattr is not available Solution: To avoid the logs update condition in dht_revalidate_cbk and dht_lookup_dir_cbk BUG: 1583565 Change-Id: Ife6b3eeb6d91bf24403ed3100e237bb5d15b4357 fixes: bz#1583565 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster/dht: Increase failure count for lookup failure in remove-brick opSusant Palai2018-05-281-3/+31
| | | | | | | | | | | | | | | | | | An entry from readdirp might get renamed just before migration leading to lookup failures. For such lookup failure, remove-brick process does not see any increment in failure count. Though there is a warning message after remove-brick commit for the user to check in the decommissioned brick for any files those are not migrated, it's better to increase the failure count so that user can check in the decommissioned bricks for files before commit. Note: This can result in false negative cases for rm -rf interaction with remove-brick op, where remove-brick shows non-zero failed count, but the entry was actually deleted by user. Fixes :bz#1580269 Change-Id: Icd1047ab9edc1d5bfc231a1f417a7801c424917c fixes: bz#1580269 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht: Fix rebalance log msgN Balachandran2018-05-241-2/+2
| | | | | | | | | | Corrected the name of the xattr and fixed the code to log an error only if op_errno is not ENODATA or ENOATTR. Change-Id: I42c5b1d838eec586ac7bed2471eb1d27ff09a9ea fixes: bz#1580238 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* build: Disallow unresolved symbol referencesPrashanth Pai2018-05-181-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | In the past, it was often[1] forgotten for xlators to be linked against the symbols they refer to. This often caused glusterd2 to fail while loading xlator's shared object (.so) file. This change adds "--no-undefined" as a linker flag which causes the linker to treat unresolved symbol references as an error and hence fail linking. [1]: https://review.gluster.org/#/c/19912/ https://review.gluster.org/#/c/19664/ https://review.gluster.org/#/c/19056/ https://review.gluster.org/#/c/17659/ https://bugzilla.redhat.com/show_bug.cgi?id=1532238 Bonus: Added cloudsync and utime xlator's generated source files to .gitignore Updates: bz#1193929 Change-Id: I9604a4a87b7313a5fa43bda5fdb37dfa7ef8facd Signed-off-by: Prashanth Pai <ppai@redhat.com>
* cluster/dht: Remove EIO from dht_inode_missingN Balachandran2018-05-172-4/+2
| | | | | | | | | Removed EIO from the list of errnos that triggered a migrate check task. Change-Id: I7f89c7a16056421588f1af2377cebe6affddcb47 fixes: bz#1578823 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht: Excessive 'dict is null' logs in dht_discover_completeMohit Agrawal2018-05-111-1/+2
| | | | | | | | | | | | Problem: In Geo-Rep setup excessive "dict is null" logs in dht_discover_complete while xattr is NULL Solution: To avoid the logs update a condition in dht_discover_complete BUG: 1576767 Change-Id: Ic7aad712d9b6d69b85b76e4fdf2881adb0512237 fixes: bz#1576767 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster/dht: Debug virtual xattrs for dhtN Balachandran2018-05-092-0/+101
| | | | | | | | | Provide a virtual xattr with which to query the hashed subvol for a file. Change-Id: Ic7abd031f875da4b9084841ea7c25d6c8a851992 fixes: bz#1574421 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Debug logs in dht_readdir(p)_cbkN Balachandran2018-05-091-2/+27
| | | | | | | | | Additional log messages to help debug issues with file listings. Change-Id: Iccd07498ba01d597c0c40f026f4177dd06d7e901 fixes: bz#1575887 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht: Avoid dict log flooding for internal MDS xattrMohit Agrawal2018-05-081-1/+1
| | | | | | | | | | | | | | Problem: Before populate MDS internal xattr first dht checks if MDS is present in xattr or not.If xattr dictionary is NULL dict_get log the message either dict or key is NULL Solution: Before call dict_get check xattr, if it is NULL then no need to call dict_get. BUG: 1575910 Change-Id: I81604ec5945b85eba14b42f4583d06ec713028f4 fixes: bz#1575910 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster/dht: fixes to parallel renames to same destination codepathRaghavendra G2018-05-076-81/+558
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test case: # while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid" "test" -f || break; echo "done:$uuid"; done This script was run in parallel from multiple mountpoints Along the course of getting the above usecase working, many issues were found: Issue 1: ======= consider a case of rename (src, dst). We can encounter a situation where, * dst is a file present at the time of lookup * dst is removed by the time rename fop reaches glusterfs In this scenario, acquring inodelk on dst fails with ESTALE resulting in failure of rename. However, as per POSIX irrespective of whether dst is present or not, rename should be successful. Acquiring entrylk provides synchronization even in races like this. Algorithm: 1. Take inodelks on src and dst (if dst is present) on respective cached subvols. These inodelks are done to preserve backward compatibility with older clients, so that synchronization is preserved when a volume is mounted by clients of different versions. Once relevant older versions (3.10, 3.12, 3.13) reach EOL, this code can be removed. 2. Ignore ENOENT/ESTALE errors of inodelk on dst. 3. protect namespace of src and dst. To protect namespace of a file, take inodelk on parent on hashed subvol, then take entrylk on the same subvol on parent with basename of file. inodelk on parent is done to guard against changes to parent layout so that hashed subvol won't change during rename. 4. <rest of rename continues> 5. unlock all locks Issue 2: ======== linkfile creation in lookup codepath can race with a rename. Imagine the following scenario: * lookup finds a data-file with gfid - gfid-dst - without a corresponding linkto file on hashed-subvol. It decides to create linkto file with gfid - gfid-dst. - Note that some codepaths of dht-rename deletes linkto file of dst as first step. So, a lookup racing with an in-progress rename can easily run into this situation. * a rename (src-path:gfid-src, dst-path:gfid-dst) renames data-file and hence gfid of data-file changes to gfid-src with path dst-path. * lookup proceeds and creates linkto file - dst-path - with gfid - dst-gfid - on hashed-subvol. * rename tries to create a linkto file dst-path with src-gfid on hashed-subvol, but it fails with EEXIST. But EEXIST is ignored during linkto file creation. Now we've ended with dst-path having different gfids - dst-gfid on linkto file and src-gfid on data file. Future lookups on dst-path will always fail with ESTALE, due to differing gfids. The fix is to synchronize linkfile creation in lookup path with rename using the same mechanism of protecting namespace explained in solution of Issue 1. Once locks are acquired, before proceeding with linkfile creation, we check whether conditions for linkto file creation are still valid. If not, we skip linkto file creation. Issue 3: ======== gfid of dst-path can change by the time locks are acquired. This means, either another rename overwrote dst-path or dst-path was deleted and recreated by a different client. When this happens, cached-subvol for dst can change. If rename proceeds with old-gfid and old-cached subvol, we'll end up in inconsistent state(s) like dst-path with different gfids on different subvols, more than one data-file being present etc. Fix is to do the lookup with a new inode after protecting namespace of dst. Post lookup, we've to compare gfids and correct local state appropriately to be in sync with backend. Issue 4: ======== During revalidate lookup, if following a linkto file doesn't lead to a valid data-file, local->cached-subvol was not reset to NULL. This means we would be operating on a stale state which can lead to inconsistency. As a fix, reset it to NULL before proceeding with lookup everywhere. Issue 5: ======== Stale dentries left out in inode table on brick resulted in failures of link fop even though the file/dentry didn't exist on backend fs. A patch is submitted to fix this issue. Please check the dependency tree of current patch on gerrit for details In short, we fix the problem by not blindly trusting the inode-table. Instead we validate whether dentry is present by doing lookup on backend fs. Change-Id: I832e5c47d232f90c4edb1fafc512bf19bebde165 updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: log error only if layout healing is requiredRaghavendra G2018-05-072-188/+12
| | | | | | | | | | | | | | | | selfhealing of directory is invoked on two conditions: 1. no layout on disk or layout has some anomalies (holes/overlaps) 2. mds xattr is not set on the directory When dht_selfheal_directory is called with a correct layout just to set mds xattr, we see error msgs complaining about "not able to form layout on directory", which is misleading as the layout is correct. So, log this msg only if layout has anomalies. Change-Id: I4af25246fc3a2450c2426e9902d1a5b372eab125 updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: unwind if dht_selfheal_dir_mkdir returns an errorRaghavendra G2018-05-031-1/+5
| | | | | | | | | | If dht_selfheal_dir_mkdir returns an error, cbk passed to dht_selfheal_directory is not invoked. So, Current codepath leaves an unwound frame resulting in a hung fop forever. Change-Id: I422308b8a34a074301ca46b029ffe676f5e0f66c fixes: bz#1574305 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE errorSusant Palai2018-04-301-1/+8
| | | | | | | | | | | Problem: A directory deletion can happen just before gf_defrag_settle_hash which internally does a setxattr operation on a directory. Solution: Ignore ENOENT and ESTALE errors Fixes: bz#1572581 Change-Id: I2f91809f3b5e02976c4c3a5a596406a8b2f8f6f2 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht: Fix dht_rename lock orderN Balachandran2018-04-231-18/+47
| | | | | | | | | | Fixed dht_order_rename_lock to use the same inodelk ordering as that of the dht selfheal locks (dictionary order of lock subvolumes). Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d fixes: bz#1568348 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* fuse: do fd_resolve in fuse_getattr if fd is receivedSusant Palai2018-04-181-2/+2
| | | | | | | | | | | | | | | | | | | | problem: With the current code, post graph switch the old fd is received for fuse_getattr and since it is associated with old inode, it does not have the inode ctx across xlators in new graph. Hence, dht errored out saying "no layout" for fstat call. Hence the EINVAL. Solution: if fd is passed, init and resolve fd to carry on getattr test case: - Created a single brick distributed volume - Started untar - Added a new-brick Without this fix, untar used to abort with ERROR. Change-Id: I5805c463fb9a04ba5c24829b768127097ff8b9f9 fixes: bz#1566207 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht: Handle file migrations when brick downN Balachandran2018-04-131-5/+51
| | | | | | | | | | | | | | | The decision as to which node would migrate a file was based on the gfid of the file. Files were divided among the nodes for the replica/disperse set. However, if a brick was down when rebalance started, the nodeuuids would be saved as NULL and a set of files would not be migrated. Now, if the nodeuuid is NULL, the first non-null entry in the set is the node responsible for migrating the file. Change-Id: I72554c107792c7d534e0f25640654b6f8417d373 fixes: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Wind open to all subvolsN Balachandran2018-04-111-10/+5
| | | | | | | | | | dht_opendir should wind the open to all subvols whether or not local->subvols is set. This is because dht_readdirp winds the calls to all subvols. Change-Id: I67a96b06dad14a08967c3721301e88555aa01017 updates: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: act as passthrough for renames on single child DHTRaghavendra G2018-04-101-7/+15
| | | | | | | | | | Various synchronization present in dht_rename while handling directories and files is necessary only if we have more than only one child. Change-Id: Ie21ad419125504ca2f391b1ae2e5c1d166fee247 fixes: bz#1563511 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>