summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht/src/dht-rename.c
Commit message (Collapse)AuthorAgeFilesLines
* dht-rename.c: fix Coverity issues 1397018/7 - strcat into uninitialized valueYaniv Kaul2020-01-101-0/+4
| | | | | | | | | initialize both src and dst if they were not initialized already. fixes: CID#1397018 Change-Id: Ic91954423953e8bf24eaa11fc2798c554f304d28 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/dht: Fixed a memleak in dht_rename_cbkN Balachandran2019-07-021-11/+33
| | | | | | | | | Fixed a memleak in dht_rename_cbk when creating a linkto file. Change-Id: I705adef3cb79e33806520fc2b15558e90e2c211c fixes: bz#1722698 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* multiple files: another attempt to remove includesYaniv Kaul2019-06-141-2/+0
| | | | | | | | | | | | | | | | | | There are many include statements that are not needed. A previous more ambitious attempt failed because of *BSD plafrom (see https://review.gluster.org/#/c/glusterfs/+/21929/ ) Now trying a more conservative reduction. It does not solve all circular deps that we have, but it does reduce some of them. There is just too much to handle reasonably (dht-common.h includes dht-lock.h which includes dht-common.h ...), but it does reduce the overall number of lines of include we need to look at in the future to understand and fix the mess later one. Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* dht: fix buffer overflowSusant Palai2018-11-231-6/+35
| | | | | | | | CID: 1382461 Change-Id: I25b5edf7fd5fdaa52079d0348ebb7f5de9f11503 updates: bz#789278 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht : Fix coverity issueAshish Pandey2018-10-111-2/+2
| | | | | | | | | | | | To check if the gfid is null or not we should use function gf_uuid_is_null CID: 1382364 https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=86243257&defectInstanceId=26374360&mergedDefectId=1382364 Change-Id: I81944b823c9ee6e6dcc9695f64f7e5966e0d7030 updates: bz#789278 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* dht: Coverity fixesSusant Palai2018-09-241-1/+0
| | | | | | | | | | | CID: 1274236 Issue: Logically dead code (op_errno will never be -1) CID: 1351652 Issue: Dereference after null check. (local->fd is dereferenced anyway and it should not be NULL ever for dht_readdirp_cbk.) Change-Id: Ied9c5f5b72536be1ca944237165acdc62b792e58 updates: bz#789278 Signed-off-by: Susant Palai <spalai@redhat.com>
* dht: utilize the framework to pass-through xlator tasksAmar Tumballi2018-09-191-19/+21
| | | | | | | | | | | | | | | | | | | | | Also fixes the issue caused due to not converting back the fn function to after getting its address. We wanted the value of the field, not the address of the pt_fop field. With this patch, DHT will always be started in pass-through mode if the number of subvols is just 1. Fixes some tests to make sure DHT is in full config (ie, subvols > 1). - increased timeout of brick-mux test as it was bordering on 300 seconds. - Also change the volume type to supported 'replica 3' from 'replica 2'. - also no DHT tests should assume presence of DHT when there is just 1 brick in volume Credits: Nithya B <nbalacha@redhat.com> fixes: #405 Change-Id: I8e55239ce58d6ac6ae1901e2e384be1ecbd33d6e Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-121-1725/+1597
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* cluster/dht: In rename, unlink after creating linkto fileN Balachandran2018-09-031-122/+132
| | | | | | | | | | | | | | The linkto file creation for the dst was done in parallel with the unlink of the old src linkto. If these operations reached the brick out of order, we end up with a dst linkto file without a .glusterfs handle. Fixed by the unlinking only after the linkto file creation has completed. Change-Id: I4246f7655f5bc180f5ded7fd34d263b7828a8110 fixes: bz#1621981 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht: remove useless argument from dht_iatt_mergeKinglong Mee2018-07-181-18/+16
| | | | | | | | | | The last using of the subvol argument has been removed at 4e1ec35ef4f7 ("core: fill 'ia_ino' from 'ia_gfid' in 'storage/posix' ......") 7 years ago (2011-06-16). Change-Id: I9788d79e2e40cc153cf2960e28c7c1c1033dc8f7 fixes: bz#1601683 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* cluster/dht: Fix rename journal in changelogKotresh HR2018-06-241-0/+11
| | | | | | | | | | | | | | | With patch [1], renames are journalled only on cached subvolume. The dht sends the special key on the cached subvolume so that the changelog journals the rename. With single distribute sub-volume, the key is not being set. This patch fixes the same. [1] https://review.gluster.org/10410 fixes: bz#1583018 Change-Id: Ic2e35b40535916fa506a714f257ba325e22d0961 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/dht: fixes to parallel renames to same destination codepathRaghavendra G2018-05-071-54/+307
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test case: # while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid" "test" -f || break; echo "done:$uuid"; done This script was run in parallel from multiple mountpoints Along the course of getting the above usecase working, many issues were found: Issue 1: ======= consider a case of rename (src, dst). We can encounter a situation where, * dst is a file present at the time of lookup * dst is removed by the time rename fop reaches glusterfs In this scenario, acquring inodelk on dst fails with ESTALE resulting in failure of rename. However, as per POSIX irrespective of whether dst is present or not, rename should be successful. Acquiring entrylk provides synchronization even in races like this. Algorithm: 1. Take inodelks on src and dst (if dst is present) on respective cached subvols. These inodelks are done to preserve backward compatibility with older clients, so that synchronization is preserved when a volume is mounted by clients of different versions. Once relevant older versions (3.10, 3.12, 3.13) reach EOL, this code can be removed. 2. Ignore ENOENT/ESTALE errors of inodelk on dst. 3. protect namespace of src and dst. To protect namespace of a file, take inodelk on parent on hashed subvol, then take entrylk on the same subvol on parent with basename of file. inodelk on parent is done to guard against changes to parent layout so that hashed subvol won't change during rename. 4. <rest of rename continues> 5. unlock all locks Issue 2: ======== linkfile creation in lookup codepath can race with a rename. Imagine the following scenario: * lookup finds a data-file with gfid - gfid-dst - without a corresponding linkto file on hashed-subvol. It decides to create linkto file with gfid - gfid-dst. - Note that some codepaths of dht-rename deletes linkto file of dst as first step. So, a lookup racing with an in-progress rename can easily run into this situation. * a rename (src-path:gfid-src, dst-path:gfid-dst) renames data-file and hence gfid of data-file changes to gfid-src with path dst-path. * lookup proceeds and creates linkto file - dst-path - with gfid - dst-gfid - on hashed-subvol. * rename tries to create a linkto file dst-path with src-gfid on hashed-subvol, but it fails with EEXIST. But EEXIST is ignored during linkto file creation. Now we've ended with dst-path having different gfids - dst-gfid on linkto file and src-gfid on data file. Future lookups on dst-path will always fail with ESTALE, due to differing gfids. The fix is to synchronize linkfile creation in lookup path with rename using the same mechanism of protecting namespace explained in solution of Issue 1. Once locks are acquired, before proceeding with linkfile creation, we check whether conditions for linkto file creation are still valid. If not, we skip linkto file creation. Issue 3: ======== gfid of dst-path can change by the time locks are acquired. This means, either another rename overwrote dst-path or dst-path was deleted and recreated by a different client. When this happens, cached-subvol for dst can change. If rename proceeds with old-gfid and old-cached subvol, we'll end up in inconsistent state(s) like dst-path with different gfids on different subvols, more than one data-file being present etc. Fix is to do the lookup with a new inode after protecting namespace of dst. Post lookup, we've to compare gfids and correct local state appropriately to be in sync with backend. Issue 4: ======== During revalidate lookup, if following a linkto file doesn't lead to a valid data-file, local->cached-subvol was not reset to NULL. This means we would be operating on a stale state which can lead to inconsistency. As a fix, reset it to NULL before proceeding with lookup everywhere. Issue 5: ======== Stale dentries left out in inode table on brick resulted in failures of link fop even though the file/dentry didn't exist on backend fs. A patch is submitted to fix this issue. Please check the dependency tree of current patch on gerrit for details In short, we fix the problem by not blindly trusting the inode-table. Instead we validate whether dentry is present by doing lookup on backend fs. Change-Id: I832e5c47d232f90c4edb1fafc512bf19bebde165 updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Fix dht_rename lock orderN Balachandran2018-04-231-18/+47
| | | | | | | | | | Fixed dht_order_rename_lock to use the same inodelk ordering as that of the dht selfheal locks (dictionary order of lock subvolumes). Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d fixes: bz#1568348 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: act as passthrough for renames on single child DHTRaghavendra G2018-04-101-7/+15
| | | | | | | | | | Various synchronization present in dht_rename while handling directories and files is necessary only if we have more than only one child. Change-Id: Ie21ad419125504ca2f391b1ae2e5c1d166fee247 fixes: bz#1563511 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: store the 'reaction' on failures per lockRaghavendra G2018-02-231-3/+5
| | | | | | | | | | | Currently its passed in dht_blocking_inode(entry)lk, which would be a global value for all the locks passed in the argument. This would be a limitation for cases where we want to ignore failures on only few locks and fail for others. Change-Id: I02cfbcaafb593ad8140c0e5af725c866b630fb6b BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster:dht Fix crash in dht_rename_lock_cbkN Balachandran2017-06-291-2/+4
| | | | | | | | | | | | | | | | | | Use a local variable to store the call count in the STACK_WIND for loop. Using frame->local is dangerous as it could be freed while the loop is still being processed Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e BUG: 1466110 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17645 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Nigel Babu <nigelb@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* dht: send lookup on old name inside rename with bname and pargfidSusant Palai2017-04-291-9/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | Inside rename, a lookup is done on the source name to make sure that the file is there. But we used to do a gfid based lookup and hence, even if the source name was renamed to a new name from some other client, lookup will be successful as server3_3_lookup will fetch the new path based on the gfid. So even if the source file does not exist any more rename will carry on, and as server3_3_link(destination is hashed to a different brick other than source cached scenario) also does gfid based resolve, it wont detect that the source name does not exist and hardlink creation will be successful (since gfid based resolve will get the new dentry). To solve this problem, do a name based lookup inside rename. So that rename will fail right away if the source does not exist. Change-Id: Ieba8bdd6675088dbf18de90ed4622df043d163bd BUG: 1412135 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16375 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* feature/dht: Directory synchronizationKotresh HR2017-04-261-138/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Design doc: https://review.gluster.org/16876 Directory creation is now synchronized with blocking inodelk of the parent on the hashed subvolume followed by the entrylk on the hashed subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup selfheal mkdir. To maintain internal consistency of directories across all subvols of dht, we need locks. Specifically we are interested in: 1. Consistency of layout of a directory. Only one writer should modify the layout at a time. A writer (layout setting during directory heal as part of lookup) shouldn't modify the layout while there are readers (all other fops like create, mkdir etc., which consume layout) and readers shouldn't read the layout while a writer is in progress. Readers can read the layout simultaneously. Writer takes a WRITE inodelk on the directory (whose layout is being modified) across ALL subvols. Reader takes a READ inodelk on the directory (whose layout is being read) on ANY subvol. 2. Consistency of directory namespace across subvols. The path and associated gfid should be same on all subvols. A gfid should not be associated with more than one path on any subvol. All fops that can change directory names (mkdir, rmdir, renamedir, directory creation phase in lookup-heal) takes an entrylk on hashed subvol of the directory. NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a directory, the transaction itself is a consumer of layout on parent directory. So, the transaction is a reader of parent layout and does an inodelk on parent directory just like any other layout reader. So a mkdir (dir/subdir) would: > Acquire a READ inodelk on "dir" on any subvol. > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir". > creates directory on hashed subvol and possibly on non-hashed subvols. > UNLOCK (entrylk) > UNLOCK (inodelk) NOTE2: mkdir fop while setting the layout of the directory being created is considered as a reader, but NOT a writer. The reason is for a fop which can consume the layout of a directory to come either of the following conditions has to be true: > mkdir syscall from application has to complete. In this case no need of synchronization. > A lookup issued on the directory racing with mkdir has to complete. Since layout setting by a lookup is considered as a writer, only one of either mkdir or lookup will set the layout. Code re-organization: All the lock related routines are moved to "dht-lock.c" file. New wrapper function is introduced to take blocking inodelk followed by entrylk 'dht_protect_namespace' Updates #191 Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31 Signed-off-by: Kotresh HR <khiremat@redhat.com> BUG: 1443373 Reviewed-on: https://review.gluster.org/15472 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* feature/dht: undo partially successful dir renameCsaba Henk2017-01-111-7/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As with dht, dirs are present on all subvolumes, renaming them is a compound operation and thus a partial success + partial failure scenario is possible, resulting in an inconsistent state. For purposes of reproduction, such a scenario can easily be produced by stopping the volume, edit the volfile of a certain subvolume to get at an "option read-only on" setting, and then restart the volume. Thus those operations that are to make change on the affected subvolume will fail with EROFS. To handle such scenarios, we introduce an in-memory cache where we record the return values obtained from the subvolumes. At the final stage of the dir rename operation we check if it's a partial success/fail situation. If yes, then we perform a reverse rename op on those subvolumes where the operation succeeded. Change-Id: I3d05f74f53932cb984a918d252a7309c1009a51d BUG: 1412069 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/15739 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* dht: At places needed use STACK_WIND_COOKIEPoornima G2017-01-091-91/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: frame has a void * cookie pointer. In case of STACK_WIND_COOKIE frame->cookie is assigned to what is sent by the caller. In case of STACK_WIND frame->cookie is assigned to point point to the frame itself. For ease of coding, at many places, the cookie in the cbk is used to get the pointer to the next xl. This is inconsistent when STACK_WIND_TAIL comes into picture. Eg: dht_setxattr () { for (i = 0 ; i < conf->subvolume_cnt ; i++) { STACK_WIND (..dht_checking_pathinfo_cbk, conf->subvolumes[i] ..); } dht_checking_pathinfo_cbk (...void *cookie...) { prev = cookie; ... for (i = 0; i < conf->subvolume_cnt; i++) { if (conf->subvolumes[i] == prev->this) { ... } } } Consider the below graph: dht (parent) readdir-ahead => Doesn't define setxattr and uses STACK_WIND_TAIL protocol-client With this graph, when dht_checking_pathinfo_cbk is called, cookie will have frame pointing to protocol-client. i.e. prev->this will be protocol-client. But dht was expecting it to be readdir-ahead as it has stored in conf->subvolumes[i] Solution: Hence, as a thumb rule, if cbk is using cookie, then we explicitly call STACK_WIND_COOKIE. Change-Id: I83aea1e24c809c5a91a0db7283e908e125471bd4 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16039 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Do rename cleanup as rootPranith Kumar K2017-01-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: Rename linkfile cleanup is done as non-root which may not have priviliges to do the rename so it fails with EACCESS. MKDIR on that name in future will start to hole on this subvolume. It is not easy to hit on fuse mounts because vfs takes care of the permission checks even before rename fop is wound. But with nfs-ganesha mounts it happens. Fix: Do rename cleanup as root BUG: 1409727 Change-Id: I414c1eb6dce76b4516a6c940557b249e6c3f22f4 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16317 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* dht/rename : Incase of failure remove linkto file properlyJiffin Tony Thottan2016-12-011-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generally linkto file is created using root user. Consider following case, a user is trying to rename a file which he is not permitted. So the rename fails with EACESS and when rename tries to cleanup the linkto file, it fails. The above issue happens when rename/00.t test executed on nfs-ganesha clients : Steps executed in script * create a file "abc" using root * rename the file "abc" to "xyz" using a non root user, it fails with EACESS * delete "abc" * create directory "abc" using root * again try ot rename "abc" to "xyz" using non root user, test hungs here which slowly leds to OOM kill of ganesha process RCA put forwarded by Du for OOM kill of ganesha Note that when we hit this bug, we've a scenario of a dentry being present as: * a linkto file on one subvol * a directory on rest of subvols When a lookup happens on the dentry in such a scenario, the control flow goes into an infinite loop of: dht_lookup_everywhere dht_lookup_everywhere_cbk dht_lookup_unlink_cbk dht_lookup_everywhere_done dht_lookup_directory (as local->dir_count > 0) dht_lookup_dir_cbk (sets to local->need_selfheal = 1 as the entry is a linkto file on one of the subvol) dht_lookup_everywhere (as need_selfheal = 1). This infinite loop can cause increased consumption of memory due to: 1) dht_lookup_directory assigns a new layout to local->layout unconditionally 2) Most of the functions in this loop do a stack_wind of various fops. This results in growing of call stack (note that call-stack is destroyed only after lookup response is received by fuse - which never happens in this case) Thanks Du for root causing the oom kill and Sushant for suggesting the fix Change-Id: I1e16bc14aa685542afbd21188426ecb61fd2689d BUG: 1397052 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/15894 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Get hard-link-count in {unlink,rename}_cbk before deleting ↵Krutika Dhananjay2016-05-201-8/+13
| | | | | | | | | | | | | shards Change-Id: I0606b74f11f5412c4d9af44a6505635ed9022c15 BUG: 1335858 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14334 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* dht: rename takes lock on parent directory if destination existsSakshi Bansal2016-05-181-7/+32
| | | | | | | | | | | | | | | | For directory rename if destination exists the source directory is created as a child of the given destination directory. Since the new child directory does not exist take lock on parent of the child directory. Change-Id: I24a34605a2cd65984910643ff5462f35e8fc7e71 BUG: 1336698 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/14371 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* dht: lock on subvols to prevent rename and lookup selfheal raceSakshi2016-04-061-43/+158
| | | | | | | | | | | | | | | | | | | | | | This patch addresses two races while renaming directories: 1) While renaming src to dst, if a lookup selfheal is triggered it can recreate src on those subvols where rename was successful. This leads to multiple directories (src and dst) having same gfid. To avoid this we must take locks on all subvols with src. 2) While renaming if the dst exists and a lookup selfheal is triggered it will find anomalies in the dst layout and try to heal the stale layout. To avoid this we must take lock on any one subvol with dst. Change-Id: I637f637d3241d9065cd5be59a671c7e7ca3eed53 BUG: 1252244 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11880 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* dht: lock on subvols to prevent lookup vs rmdir raceSakshi2016-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | There is a possibility that while an rmdir is completed on some non-hashed subvol and proceeding to others, a lookup selfheal can recreate the same directory on those subvols for which the rmdir had succeeded. Now the deletion of the parent directory will fail with an ENOTEMPTY. To fix this take blocking inodelk on the subvols before starting rmdir. Selfheal must also take blocking inodelk before creating the entry. Change-Id: I168a195c35ac1230ba7124d3b0ca157755b3df96 BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13528 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht: report constant directory sizeJeff Darcy2016-03-201-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | Directory size is meaningless. Every filesystem has its own unpredictable way of increasing or decreasing it, based on internal data structures and even transient conditions. Some filesystems (e.g. ext4) never decrease it at all. Others (e.g. btrfs) don't even report it. Very few programs look at it, and those that do are broken. Unfortunately, one such program is GNU tar, which will complain when it sees different values because at different times we got the value from different DHT subvolumes. To avoid such problems, just report a constant value. Change-Id: Id64ce917c75b5f7ff50cb55b6e997f3b3556e7e3 BUG: 1302948 Original-author: Shyam <srangana@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13770 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: cleanup dict and free memory in rename code pathSakshi Bansal2016-02-241-0/+4
| | | | | | | | | | | | Change-Id: I2458e18197bdf7565563a85e9021b5b2850c1825 BUG: 1303945 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13392 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: file rename must take blocking inode locksSakshi Bansal2016-02-211-5/+5
| | | | | | | | | | | | | | | | Currently DHT takes non-blocking locks for file rename. Due to this during parallel renames some clients fail with EBUSY or ESTALE errors. Hence to avoid application discontinuity file rename must take blocking inode locks. Change-Id: I986e9d08b3be359f20b1a3e1564e049b0f3dffd3 BUG: 1304966 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13366 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* all: fix various cppcheck warningsKaleb S KEITHLEY2016-02-151-3/+1
| | | | | | | | | | | | | | | fixes for various warnings reported by cppcheck N.B. cppcheck output is in the bugzilla Change-Id: I33acec127bc4536935fdd8d52a0c490ec54d50b2 BUG: 1292954 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13006 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Cleanup dict in dht_do_rename()Vijay Bellur2016-02-041-0/+2
| | | | | | | | | | | | Change-Id: Ib4b3a843e78eccf5b8e0e7776cd0128013a59a3e BUG: 1303945 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/13322 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: reverting changes that takes lock on all subvols to prevent rmdir vs ↵Sakshi2015-09-141-1/+1
| | | | | | | | | | | | | | | lookup selfheal race Locking on all subvols before an rmdir is unable to remove all directory entries. Hence reverting the patch for now. Change-Id: I31baf2b2fa2f62c57429cd44f3f229c35eff1939 BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12125 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fd: Do fd_bind on successful openPranith Kumar K2015-08-281-0/+1
| | | | | | | | | | | | | | | - fd_unref should decrement fd->inode->fd_count only if it is present in the inode's fd list. - successful open/opendir should perform fd_bind. Change-Id: I81dd04f330e2fee86369a6dc7147af44f3d49169 BUG: 1207735 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11044 Reviewed-by: Anoop C S <anoopcs@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht : lock on subvols to prevent lookup vs rmdir raceSakshi2015-08-271-1/+1
| | | | | | | | | | | | | | | | | There is a possibility that while an rmdir is completed on some non-hashed subvol and proceeding to others. A lookup selfheal can recreate the same directory on those subvols for which the rmdir had succeeded. The fix is to take a blocking inodelk on the subvols before starting rmdir. Since selfheal requires lock on all subvols, if an rmdir is in progess acquiring locks will fail and vice versa. Change-Id: I841a44758c3b88f5e04d1cb73ad36e0cac9fdabb BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11725 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dict: dict_set_bin() should never free the pointer on errorNiels de Vos2015-07-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | dict_set_bin() is handling the pointer that it passed inconsistently. Depending on the errors that can occur, the pointer passed to the dict can be free'd, but there is no guarantee. It is cleaner to have the caller free the pointer that allocated it and dict_set_bin() returned an error. When dict_set_bin() returned success, the given pointer will be free'd when dict_unref() calls data_destroy(). Many callers of dict_set_bin() already take care of free'ing the pointer on error. The ones that did not, are corrected with this change too. Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966 BUG: 1242280 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht: Adding log messages to the new logging frameworkarao2015-06-231-9/+13
| | | | | | | | | | | | | Change-Id: Ib3bb61c5223f409c23c68100f3fe884918d2dc3f BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/10021 Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/changelog: Avoid setattr fop logging during renameSaravanakumar Arumugam2015-06-111-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When a file is renamed and the (renamed)file's Hashing falls into a different brick, DHT creates a special file(linkto file) in the brick(Hashed subvolume) and carries out setattr operation on that file. Currently, Changelog records this(setattr) operation in Hashed subvolume. glusterfind in turn records this operation as MODIFY operation. So, there is a NEW entry in Cached subvolume and MODIFY entry in Hashed subvolume for the same file. Solution: Avoid logging setattr operation carried out, by marking the operation as internal fop using xdata. In changelog translator, check whether setattr is set as internal fop and skip accordingly. Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd BUG: 1230007 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11137 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-291-5/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* geo-rep: rename handling in dht volumeNithya Balachandran2015-05-051-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: Glusterfs changelogs are stored in each brick, which records the changes happened in that brick. Georep will run in all the nodes of master and processes changelogs "independently". Processing changelogs is in brick level, but all the fops will be replayed on "slave mount" point. Problem: With a DHT volume, in changelog "internal fops" are NOT recorded. For Rename case, Rename is recorded in "hashed" brick changelog. (DHT's internal fops like creating linkto file, unlink is NOT recorded). This lead us to inconsistent rename operations. For example, Distribute volume created with Two bricks B1, B2. //Consider master volume mounted @ /mnt/master and following operations executed: cd /mnt/master touch f1 // f1 falls on B1 Hash mv f1 f2 // f2 falls on B2 Hash // Here, Changelogs are recorded as below: @B1 CREATE f1 @B2 RENAME f1 f2 Here, race exists between Brick B1 and B2, say B2 will get executed first. Source file f1 itself is "NOT PRESENT", so it will go ahead and create f2 (Current implementation). We have this problem When rename falls in another brick and file is unlinked in Master. Similar kind of issue exists in following case too(multiple rename): CREATE f1 RENAME f1 f2 RENAME f2 f1 Solution: Instead of carrying out "changelogging" at "HASHED volume", carry out at the "CACHED volume". This way we have rename operations carried out where actual files are present. So,Changelog recorded as : @B1 CREATE f1 RENAME f1 f2 credit: sarumuga@redhat.com PS: Some of the races as the one below are _NOT_ fixed by this patch * f1 and f2 exist. B1 and B2 are their respective cached subvols. For both files hashed-subvol == cached-subvol * mv f1 f2 on master. * B1 has change-log entry of rename f1 f2 * rebalance migrates f2 from B1 and B2 * mv f2 f1 on master. * B2 has change-log entry of rename f2 f1 Since changelog entries (rename f1 f2) and (rename f2 f1) are processed independently by gsyncds, which of either f1 and f2 survives on slave is subject to race. Note that on master its file f1 with name f1 which survived. On slave it can be either file f1 with name f1 or file f2 with name f2 based on who wins the race of processing changelog. Change-Id: Iebc222f582613924c3a7cba37fb6d3e2d8332eda BUG: 1141379 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10410 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht : coverity fixesManikandan Selvaganesh2015-04-071-4/+0
| | | | | | | | | | | | | | | | CID : 1124352,1124365 (unchecked return value), 1124377 ( logically dead code), 1124511 (null dereference) Change-Id: I61e029a078559cfe15d36bf0aa53418f6214e5cb BUG: 789278 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9622 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-041-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/dht: Fix subvol check, to correctly determine cached file renameShyam2014-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The check to treat rename as a critical failure ignored when the cached file is being renamed to new name, as the new name falls on the same subvol as the cached file. This is in addition to when the target of the rename does not exist. The current change is simpler, as the rename logic, renames the cached file in case the target exists and falls on the same subvol as source name, OR the target does not exist and the hash of target falls on the same subvol as source cached. These conditions mean we are renaming the source, other conditions mean we are renaming the source linkto file which we do not want to treat as a critical failure (and we also instruct marker that it is an internal FOP and to not account for the same). Change-Id: I4414e61a0d2b28a429fa747e545ef953e48cfb5b BUG: 1161156 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/9063 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: susant palai <spalai@redhat.com> Reviewed-by: venkatesh somyajulu <vsomyaju@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Cluster/DHT: Changing rename log severityNithya Balachandran2014-09-031-6/+5
| | | | | | | | | | | | Changing log level for a rename message from debug to info to improve debuggability Change-Id: I53031fcf97fffd62095692477330ecde0cf47dcd BUG: 1130888 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/8582 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Rename should not fail post hardlink creationShyam2014-09-021-36/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | In the rename path, we wind the creation of newname hardlink and linkto file in dst hashed a the same time. If the linkto creation fails, but the link creation succeeds, we enter the failure code and cleanup the created newname hardlink. In the interim if another client looks up newname and finds it as a hardlink from FUSE, it could send an unlink for oldname instead of a rename. This combined with the above cleanup code could end up losing all the files copies, and thereby losing data. This fix separates these steps into 2 parts, creating the linkto first and then the link file, so that post link file creation no failures would cleanup the newname file. If linkto fails then link is not attempted, thereby not polluting the name space with newname. Change-Id: I61da8e906060da16a31ea1076eec2f01fd617f44 BUG: 1130888 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/8570 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Treat linkto file rename failure as non-critial errorShyam2014-09-021-8/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It is a critical failure iff we fail to rename the cached file if the rename of the linkto failed, it is not a critical failure, and we do not want to lose the created hard link for the new name as that could have been read by other clients. NOTE: If another client is attempting the same oldname -> newname rename, and finds both file names as existing, and are hard links to each other, then FUSE would send in an unlink for oldname. In this time duration if we treat the linkto as a critical error and unlink the newname we created, we would have effectively lost the file to rename operations. Repercussions of treating this as a non-critical error is that we could leave behind a stale linkto file and/or not create the new linkto file, the second case would be rectified by a subsequent lookup, the first case by a rebalance, like for all stale linkto files Change-Id: Ia53ad8b43c3cf8f48ef5b43fd1fec4274e807556 BUG: 1130888 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/8563 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: synchronize rename and file-migrationRaghavendra G2014-08-281-21/+239
| | | | | | | | | Change-Id: I4f243c946f76d440680b651235f925e3d0ebf0fd BUG: 1130888 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/8523 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Cluster/DHT : Additional log messagesNithya Balachandran2014-08-241-1/+1
| | | | | | | | | | | | Adding log messages in the rename and lookup calls to help with debugging. Change-Id: I13b1c6f98fb49ead45362550c46359ab1f9028c0 BUG: 1130888 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/8516 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: fix rename raceNithya Balachandran2014-07-301-2/+5
| | | | | | | | | | | | | Additional check to check if we created the linkto file before deleting it in the rename cleanup function Change-Id: I919cd7cb24f948ba4917eb9cf50d5169bb730a67 BUG: 1117851 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/8338 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: fix rename raceJeff Darcy2014-07-171-3/+9
| | | | | | | | | | | | | | | | | | | | | | | If two clients try to rename the same file at the same time, we sometimes end up with *no file at all* in either the old or new location. That's kind of bad. The culprit seems to be some overly aggressive cleanup code. AFAICT, based on today's study of the code, the intent of the changed section is to remove any linkfile we might have created before the actual rename. However, what we're removing might not be our extra link. If we're racing with another client that's also doing a rename, it might be the only remaining link to the user's data. The solution, which is good enough to pass this test but almost certainly still not complete, is to be more selective about when we do this unlink. Now, we only do it if we know that, at some point, we did in fact create the link without error (notably ENOENT on the source or EEXIST on the destination) ourselves. Change-Id: I8d8cce150b6f8b372c9fb813c90be58d69f8eb7b BUG: 1117851 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/8269 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>