summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht
Commit message (Collapse)AuthorAgeFilesLines
...
* cluster/distribute: use a linked inode in directory heal codepathRaghavendra G2016-05-162-11/+58
| | | | | | | | | | | | | | | | | | | | This is needed for following reasons: * healing is done in lookup and mkdir codepath where inode is not linked _yet_ as normally linking is done in interface layers (fuse-bridge, gfapi, nfsv3 etc). * healing consists of non-lookup fops like inodelk, setattr, setxattr etc. All non-lookup fops expect a linked inode. Change-Id: I1bd8157abbae58431b7f6f6fffee0abfe5225342 BUG: 1334164 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/14295 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
* cluster/tier: downgrade max-cycle-time log message to INFODan Lambright2016-05-161-1/+1
| | | | | | | | | | | | | | | The "max cycle time" log message was incorrectly logged as an error. Downgrade it to INFO. Change-Id: Ia7d074423019fa79443bc6ea694148b7b8da455d BUG: 1335973 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/14336 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: N Balachandran <nbalacha@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* tier/detach: Clear tier-fix-layout-complete xattr after migration threads joinJoseph Fernandes2016-05-141-33/+42
| | | | | | | | | | | | | | | | | | | | | Previously we had wrongly placed the clearing tier-fix-layout-complete xattr before the joining of migration threads. This would lead to situations where failure of clearing the xattr would cause the premature death of migration threads. Now we clear the xattr only after the data movement threads join, ensuring that all migration is done. Change-Id: I829b671efa165ae13dbff7b00707434970b37a09 BUG: 1334839 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/14285 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Joseph Fernandes CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* tier: avoid pthread_join if pthread_create failsPrasanna Kumar Kalever2016-05-111-2/+5
| | | | | | | | | | | | | | | | | this patch rearrange the code, to add some defence functionality for pthread_create(), i.e. only on a success on pthread_create() call pthread_join(). Change-Id: I0836bc950a210574cfdc755a666c6ac5df6ab430 BUG: 1332219 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/14152 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* tier/detach : During detach check if background fixlayout is doneJoseph Fernandes2016-05-061-1/+16
| | | | | | | | | | | | | | | During detach check if background fixlayout is done, if not done ignore the case and continue detach. Change-Id: I5d5cfc0e73d0eb217fdeab54c432dc4af8bc598d BUG: 1332136 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/14147 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* dht:remember locked subvol and send unlock to the sameMohammed Rafi KC2016-05-065-21/+218
| | | | | | | | | | | | | | | | | | | | | | | | | During locking we send lock request to cached subvol, and normally we unlock to the cached subvol But with parallel fresh lookup on a directory, there is a race window where the cached subvol can change and the unlock can go into a different subvol from which we took lock. This will result in a stale lock held on one of the subvol. So we will store the details of subvol which we took the lock and will unlock from the same subvol Change-Id: I47df99491671b10624eb37d1d17e40bacf0b15eb BUG: 1311002 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13492 Reviewed-by: N Balachandran <nbalacha@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Perform NULL check on xdata before dict_get()Krutika Dhananjay2016-05-041-1/+1
| | | | | | | | | | | | | | .. to prevent unnecessary logs from gf_msg_callingfn() Change-Id: I367628fee2f6783ba9ed6f918deabd034df820c9 BUG: 1333043 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14212 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Handle rmdir failure correctlyN Balachandran2016-05-022-13/+108
| | | | | | | | | | | | | | | | | DHT did not handle rmdir failures on non-hashed subvols correctly in a 2x2 dist-rep volume, causing the directory do be deleted from the hashed subvol. Also fixed an issue where the dht_selfheal_restore errcodes were overwriting the rmdir error codes. Change-Id: If2c6f8dc8ee72e3e6a7e04a04c2108243faca468 BUG: 1330032 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/14060 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/rebalance: add lock migration fop to dht_migrate_fileSusant Palai2016-05-012-63/+112
| | | | | | | | | | | Change-Id: Id0e7400c8ae950c90d42a3ddf8b558a14959a1f8 BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14074 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: handle EREMOTE in dht lk/flushSusant Palai2016-05-012-7/+98
| | | | | | | | | | | | | | | | With lock-migration, we need to send requests to destination brick post migration. Once, the source brick marks the lock structure to be already migrated, the requests will be redirected to destination brick by dht_lk2/flush2. Change-Id: I50b14011c5ab68c34826fb7ba7f8c8d42a68ad97 BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/13493 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* glusterd: volume set changes for lock migrationSusant Palai2016-05-012-3/+27
| | | | | | | | | | | Change-Id: I48c6f9cdda47503615ba65882acd5eedf0a70c89 BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14024 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* tier/migrator: Fetch the next query file for the next cycleJoseph Fernandes2016-04-302-0/+25
| | | | | | | | | | | | | | | | | | | | | | | Problem: When we spawn promote and demote thread, query files are build. And only query file with index 0 is picked for migration as the first query file. This may not be suitable for scenarios, where the file in the query are too big to move in the first cycle, as a result file in the other query files always get missed. We need to shuffle so that other query files also get a chance. Fix: Remember the previous first query file and shift it by one index, before the migration starts. Change-Id: I704947bcf4bab6b20b1179a6d9ae4a15a3d51bd9 BUG: 1330353 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/14068 Tested-by: Joseph Fernandes Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* dht/afr/client/posix: Fail mkdir without gfid-reqPranith Kumar K2016-04-291-0/+8
| | | | | | | | | | | | | | | Do not allow directory creations without gfids as after the directories are created, operations on them fail anyway. So it is better to fail mkdir. BUG: 1317361 Change-Id: I8f8e3b38bbded1960b7215bac0432500f7e78038 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13690 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* libglusterfs: Add debug and trace logs for stack traceRaghavendra Talur2016-04-271-1/+2
| | | | | | | | | | | | | | | | | It has become very difficult to identify the xlator which returned negative op_ret. Being able to just change the log level and visualize the stack is helpful in such cases. Change-Id: I6545b4802c1ab4d0d230d5e9e036afb2384882e1 BUG: 1330052 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/13448 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tier/dht: check for rebalance completion for EIO errorMohammed Rafi KC2016-04-251-1/+3
| | | | | | | | | | | | | | | | | When an ongoing rebalance completion check task been triggered by dht, there is a possibility of a race between afr setting subvol as non-readable and dht updates the cached subvol. In this window a write can fail with EIO. Change-Id: I42638e6d4104c0dbe893d1bc73e1366188458c5d BUG: 1329503 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/14049 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* dht/rebalance: Handle GF_DEFRAG_STOPSusant Palai2016-04-251-0/+17
| | | | | | | | | | | | | | | | Problem: On a rebal stop, the migrator threads don't intimate the crawler thread to wake up in case it is waiting on signal from migrator thread. Change-Id: I3cc4be41a4db25f48fee059ebb79a97ee99dcd00 BUG: 1327507 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14004 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* dht: Add lease() fopPoornima G2016-04-253-0/+47
| | | | | | | | | | | | | Change-Id: I0bbc2c2ef115c78393f6570815a5b80316e7e4be BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/11720 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: detect stale layouts in entry fopsRaghavendra G2016-04-225-27/+646
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dht_mkdir () { first-hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any subvol, but we choose first-hashed-subvol randomly"); { begin: hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; hash-range = extract hashe-range from layout of "parent"; ret = mkdir (parent/bname, hashed-subvol, hash-range); if (ret == "hash-value doesn't fall into layout stored on the brick (this error is returned by posix-mkdir)") { refresh_parent_layout (); goto begin; } } inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN", "first-hashed-subvol"); proceed with other parts of dht_mkdir; } posix_mkdir (parent/bname, client-hash-range) { disk-hash-range = getxattr (parent, "dht-layout-key"); if (disk-hash-range != client-hash-range) { fail-with-error ("hash-value doesn't fall into layout stored on the brick"); return 0; } continue-with-posix-mkdir; } Similar changes need to be done for dentry operations like create, symlink, link, unlink, rmdir, rename. These will be addressed in subsequent patches. This patch addresses only mkdir codepath. This change breaks stripe tests, as on some striped subvols dht layout xattrs are not set for some reason. This results in failure of mkdir. Since striped volumes are always created with dht, some tests associated with stripe also fail. So, I am making following tests changes (since stripe is out of maintainance): * modify ./tests/basic/rpc-coverage.t to not to use striped volumes * mark all (2) tests in tests/bugs/stripe/ as bad tests Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25 BUG: 1323040 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/13885 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* quota: setting 'read-only' option in xdata to instruct DHT to not healSakshi Bansal2016-04-191-2/+10
| | | | | | | | | | | | | | | | | | | | | | When quota is enabled the quota enforcer tries to get the size of the source directory by sending nameless lookup to quotad. But if the rename is successful even on one subvol or the source layout has anomalies then this nameless lookup in quotad tries to heal the directory which requires a lock on as many subvols as it can. But src is already locked as part of rename. For rename to proceed in brick it needs to complete a cluster-wide lookup. But cluster-wide lookup in quotad is blocked on locks held by rename, hence a deadlock. To avoid this quota sends an option in xdata which instructs DHT not to heal. Change-Id: I792f9322331def0b1f4e16e88deef55d0c9f17f0 BUG: 1252244 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13988 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: add "nuke" functionality for efficient server-side deletionJeff Darcy2016-04-071-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This turns a special xattr into an rmdir with flags set. When that hits the posix translator on the server side, that causes the file/directory to be moved into the special "landfill" directory. From there, the posix janitor thread will take care of deleting it entirely on the server side - traversing it recursively if necessary. A couple of secondary issues were fixed to make this effective. * FUSE now ensures that setxattr values are NUL terminated. * The janitor thread now gets woken up immediately when something is placed in 'landfill' instead of only when file descriptors need to be closed. * The default landfill-emptying interval was reduced to 10s. To use the feature, issue a setxattr something like this: setfattr -n glusterfs.dht.nuke -v "" /mnt/glusterfs/vol/some_dir The value doesn't actually matter; the mere receipt of a request with this key is sufficient. Some day it might be useful to allow setting a required value as a sort of password, so that only those who know it can access the underlying special functionality. Change-Id: I8a343c2cdb40a76d5a06c707191fb67babb8514f Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/13878 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: lock on subvols to prevent rename and lookup selfheal raceSakshi2016-04-061-43/+158
| | | | | | | | | | | | | | | | | | | | | | This patch addresses two races while renaming directories: 1) While renaming src to dst, if a lookup selfheal is triggered it can recreate src on those subvols where rename was successful. This leads to multiple directories (src and dst) having same gfid. To avoid this we must take locks on all subvols with src. 2) While renaming if the dst exists and a lookup selfheal is triggered it will find anomalies in the dst layout and try to heal the stale layout. To avoid this we must take lock on any one subvol with dst. Change-Id: I637f637d3241d9065cd5be59a671c7e7ca3eed53 BUG: 1252244 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11880 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* dht: lock on subvols to prevent lookup vs rmdir raceSakshi2016-04-055-89/+435
| | | | | | | | | | | | | | | | | | | | | | There is a possibility that while an rmdir is completed on some non-hashed subvol and proceeding to others, a lookup selfheal can recreate the same directory on those subvols for which the rmdir had succeeded. Now the deletion of the parent directory will fail with an ENOTEMPTY. To fix this take blocking inodelk on the subvols before starting rmdir. Selfheal must also take blocking inodelk before creating the entry. Change-Id: I168a195c35ac1230ba7124d3b0ca157755b3df96 BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13528 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* tier/dht : Attach tier fix layout to run in backgroundJoseph Fernandes2016-03-282-25/+247
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Spawn a thread for background fix-layout for tier process. 2. Once the fix-layout is completed a marker xttr is set on the root of volume to mark the completion of the background fixlayout, so that even if the tier process is spawned again, fixlayout will not be issued, if it was completed last time. 3. Please note that promotion of legacy files will happen eventually as the ctr lookup heal in the fixlayout slowly heals the ctr db for legacy files OR the ctr lookup heal happend due to a name lookup. 4. When a detach tier is successful in evacuation data from hot tier, we remove the marker xattr is removed. So that next attach tier runs the background tier fixlayout. what is remaining ? 1. Instead of clearing the marker xattr of tiering fix layout at the end of detach start clear it during detach commit. But the issue is detach commit is a glusterd operation and the volume is not mounted in glusterd. The reason we want to do it in detach commit is that if the admin wants to attach the same tier again, then a background fixlayout will be triggered, which would not be needed. 2. Clearing the CTR DB of the cold bricks when there is a detach commit, as it will be having entries which will be stale when the volume is used, with ctr off (ctr is switched off only when we have detach commit.) Change-Id: Ibe343572e95865325cd0eef4d0b976b626a3c0c5 BUG: 1313228 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/13491 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Joseph Fernandes NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* dht: update attr information in refresh layout to avoidSakshi Bansal2016-03-231-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | stale timestamp Consider the scenario where an mkdir has just created the directory but has not healed it yet. A parallel lookup on this entry will find anomalies and trigger a selfheal which will sample the ctime of the directory after the mkdir phase. Meanwhile the mkdir has completed setting the layout and updated the ctime. The selfheal then sees the layout to be healed and returns with the ctime it got after the mkdir phase which has now become stale. However if the lookup happens to unwind before the mkdir then the inode associated with lookup will get linked in the inode table which has the stale ctime. To avoid this selfheal must do an iatt_merge in refresh layout to get the latest timestamp irrespective of whether it needs to heal the layout or not. Change-Id: I3634c3978bcc1710705f44b48f3876601682d33e BUG: 1302948 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13781 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* dht: report constant directory sizeJeff Darcy2016-03-204-1/+81
| | | | | | | | | | | | | | | | | | | | | | | | Directory size is meaningless. Every filesystem has its own unpredictable way of increasing or decreasing it, based on internal data structures and even transient conditions. Some filesystems (e.g. ext4) never decrease it at all. Others (e.g. btrfs) don't even report it. Very few programs look at it, and those that do are broken. Unfortunately, one such program is GNU tar, which will complain when it sees different values because at different times we got the value from different DHT subvolumes. To avoid such problems, just report a constant value. Change-Id: Id64ce917c75b5f7ff50cb55b6e997f3b3556e7e3 BUG: 1302948 Original-author: Shyam <srangana@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13770 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/tier: add tunable to migrate files based on sizeDan Lambright2016-03-163-0/+29
| | | | | | | | | | | | | | | | | This fix adds a paramater "tier-max_promote_size" to control wether a file is migrated or not based on its size. By default the value is 0, meaning all files are migrated. If set to a non-zero value, files larger than the parameter won't be moved in tiered volumes. Change-Id: Ia6b88e9b2508935bef500d956f9192e59670fe00 BUG: 1313495 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/13570 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes
* TIER: stopping the tierd when the volume goes downhari gowtham2016-03-141-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | If there are large number of files to be migrated and by this time if the volume goes down, then the tierd has to be stopped. But on a huge query file list it keeps checking for each file before stopping. If the volume comes up before the old tierd dies then due to the presence of old tierd new one won't be created. After the old one completes the task, it dies and the status ends up as failed. This patch will check if the status is still running and then let it continue its work. Else it will stop running the tierd. Change-Id: I6522a4e2919e84bf502b99b13873795b9274f3cd BUG: 1315659 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13646 Tested-by: Dan Lambright <dlambrig@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* Tier: Avoiding stale entries from causing demotion to stophari gowtham2016-03-132-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the parent GFID is a stale entry, the lookup on this parent fails and this in turn fails the demotion process. This patch will make the stale entry error to be skipped. Situation for pargfid to be stale: Consider a folder from a tar file. Once the tar file is untared the files in the tar-file will start to demote. when the demotion is under progress, if we tend to delete the actual folder, then the files under it which are undergoing demotion will do a lookup on the parent which was deleted and become stale entry. This stale entry fails the Lookup and this will fail the demotion of the other files(not from tar) that are supposed to be demoted. Change-Id: I3d47c32c4077526d477a25912b0135bab98b23fc BUG: 1311178 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13501 Tested-by: hari gowtham <hari.gowtham005@gmail.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* tier: Fix unused-but-set-variable warningRaghavendra Talur2016-03-031-2/+0
| | | | | | | | | | | | Change-Id: Ie5eaf1075b1c9c29dd7d85bf7b61b22e1fbce422 BUG: 1314291 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/13591 Reviewed-by: hari gowtham <hari.gowtham005@gmail.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/tier: make promotion and demotion independantDan Lambright2016-02-282-126/+127
| | | | | | | | | | | | | | | | | | | | Currently a main loop in tiering spawns promotion and demotion threads, and does a join to wait for them to complete. When one of the two threads takes a long time, the main thread waits for it before exiting the join. It may wait so long the scheduled time for the other thread is skipped. In the case of demotion, it may be a long time before another attempt. This patch fixes that by making the promotion and demotion activities independant. A side effect of this change is the logic is significantly simplified. Change-Id: I1196bd4bbfc95e8aa326a9bd4ebf395032369d1c BUG: 1306852 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/13433 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* dht: mkdir must unwind with latest ctimeSakshi Bansal2016-02-262-0/+24
| | | | | | | | | | | | | | | | | | Currently fops like mkdir used the the ctime it gets after creating the directory entry. But setting layout also updates the ctime of a directory. Hence DHT must get the ctime after the setxattr call and unwind with the latest ctime to avoid mismatch in time seen by applications like tar. Change-Id: Iecbbe3aac5244af5da9788b48ccf299ca56b4bae BUG: 1302948 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13352 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: cleanup dict and free memory in rename code pathSakshi Bansal2016-02-241-0/+4
| | | | | | | | | | | | Change-Id: I2458e18197bdf7565563a85e9021b5b2850c1825 BUG: 1303945 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13392 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: file rename must take blocking inode locksSakshi Bansal2016-02-211-5/+5
| | | | | | | | | | | | | | | | Currently DHT takes non-blocking locks for file rename. Due to this during parallel renames some clients fail with EBUSY or ESTALE errors. Hence to avoid application discontinuity file rename must take blocking inode locks. Change-Id: I986e9d08b3be359f20b1a3e1564e049b0f3dffd3 BUG: 1304966 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13366 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* all: fixes for clang compile warningsKaleb S KEITHLEY2016-02-152-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cli/src/cli-cmd-parser.c (chenk) cli/src/cli-xml-output.c (spandit) cli/src/cli.c (chenk) libglusterfs/src/common-utils.c (vmallika) libglusterfs/src/gfdb/gfdb_sqlite3.c (jfernand +1) rpc/rpc-transport/socket/src/socket.c (?) xlators/cluster/afr/src/afr-transaction.c (?) xlators/cluster/dht/src/dht-common.h (srangana +2) xlators/cluster/dht/src/dht-selfheal.c (srangana +2) xlators/debug/io-stats/src/io-stats.c (R. Wareing) xlators/features/barrier/src/barrier.c (vshastry) xlators/features/bit-rot/src/bitd/bit-rot-scrub.h (vshankar +1) xlators/features/shard/src/shard.c (kdhananj +1) xlators/mgmt/glusterd/src/glusterd-ganesha.c (skoduri) xlators/mgmt/glusterd/src/glusterd-handler.c (atinmu) xlators/mgmt/glusterd/src/glusterd-op-sm.h (atinmu) xlators/mgmt/glusterd/src/glusterd-snapshot.c (spandit) xlators/mgmt/glusterd/src/glusterd-syncop.c (atinmu) xlators/mgmt/glusterd/src/glusterd-volgen.c (atinmu) xlators/protocol/client/src/client-messages.h (mselvaga +1) xlators/storage/bd/src/bd-helper.c (M. Mohan Kumar) xlators/storage/bd/src/bd.c (M. Mohan Kumar) xlators/storage/posix/src/posix.c (nbalacha +1) Change-Id: I85934fbcaf485932136ef3acd206f6ebecde61dd BUG: 1293133 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13031 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* all: fix various cppcheck warningsKaleb S KEITHLEY2016-02-151-3/+1
| | | | | | | | | | | | | | | fixes for various warnings reported by cppcheck N.B. cppcheck output is in the bugzilla Change-Id: I33acec127bc4536935fdd8d52a0c490ec54d50b2 BUG: 1292954 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13006 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Skip subvols if no layout presentN Balachandran2016-02-141-0/+8
| | | | | | | | | | | | | | | | | | Running "rm -rf" on a tiered volume sometimes caused the client to crash because dht_readdirp_cbk referenced a NULL layout for the hot tier subvol. Now, entries are skipped if the layout is NULL. This can cause "rm -rf" to fail with ENOTEMPTY rmdir failures. Change-Id: Idd71a9d0f7ee712899cc7113bbf2cd3dcb25808b BUG: 1307208 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13440 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier : Fixed wrong variable comparisonN Balachandran2016-02-101-3/+15
| | | | | | | | | | | | | | The wrong variable was being checked to determine the watermark value. Change-Id: If4c97fa70b772187f1fcbdf5193e077cb356a8b1 BUG: 1303895 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13357 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* dht/quota : heal the limit_objects_key xattr needed for inode-quotaManikandan Selvaganesh2016-02-101-0/+11
| | | | | | | | | | | | | | | | | | Whenever a new brick is added, quota related xattr's should be healed but currently, the xattr "quota.limit-objects.<suffix>" needed for inode-quota is not being healed. The patch fixes this issue. Change-Id: I1e7b229126f7b058642bbc3fb5c109bfd8925325 BUG: 1302257 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/13299 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier: Create linkfiles to hardlinks correctlyN Balachandran2016-02-073-1/+132
| | | | | | | | | | | | | | There is a bug in the way hardlinks are handled in tiered volumes. Ideally, the tier linkto files on the cold tier to files that are hardlinks to each other on the hot tier, should themselves be hardlinks of each other. As they are not, they end up being files with the same gfid but different names for the cold tier dht, and end up overwriting the cached-subvol information stored in the dht inode-ctx. Change-Id: Ic658a316836e6a1729cfea848b7d212674b0edd2 BUG: 1305277 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13391 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/dht: Cleanup dict in dht_do_rename()Vijay Bellur2016-02-041-0/+2
| | | | | | | | | | | | Change-Id: Ib4b3a843e78eccf5b8e0e7776cd0128013a59a3e BUG: 1303945 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/13322 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier/gfdb : Round-Robin read of query filesJoseph Fernandes2016-02-033-81/+326
| | | | | | | | | | | | | | | | | | | 1. Each brick on a host will get a separate query file. 2. While reading query record from these query files we read them in a Round-Robin manner. 3. When an error occurs during migration we rename it to query file with an time stamp and .err extension for better debugging. Change-Id: I27c4285d24fd695d2d5cbd9fd7db3879d277ecc8 BUG: 1302772 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/13293 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: N Balachandran <nbalacha@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/tier : Reset watermarks in tierN Balachandran2016-02-031-9/+36
| | | | | | | | | | | | | | | | A node which contains only cold bricks and has detected that the high watermark value has been breached on the hot tier will never reset the watermark to the correct value. The promotion check will thus always fail and no promotions will occur from that node. Change-Id: I0f0804744cd184c263acbea1ee50cd6010a49ec5 BUG: 1303895 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13341 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: break out of iterating query file once cycle time endsDan Lambright2016-02-031-0/+27
| | | | | | | | | | | | | | | | When iterating the query file during migration, tiering should break out of the loop once cycle time completes. Otherwise it may be possible to stay in the loop for a long time. If that happens updates to files will become stale and have not impact migration. Change-Id: Ib60cf74bc84e8646e6a0da21ff04954b1b83c414 BUG: 1301227 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/13284 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* tier/dht : Default value for demote-freq, max files and mbJoseph Fernandes2016-01-262-5/+6
| | | | | | | | | | | | | | | | | | | Default value for tier-demote-frequency is 3600 sec to avoid frequent demotions. Default value for tier-max-mb is 4000 mb Default value for tier-max-files is 10000 files Change-Id: Ie60951c478a7462c425059699ab82511aa13fa0a BUG: 1300412 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/13270 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Dan Lambright <dlambrig@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: Ignore quota-deem-statfs for watermark calculationN Balachandran2016-01-251-0/+21
| | | | | | | | | | | | | | | | | | The tier process watermark calculations were incorrect when the quota-deem-statfs option was enabled. We now ignore this while calculating hot tier usage to determine watermark levels. Change-Id: I308bc24432e2fa5ad1d5703e80fc391433538bbb BUG: 1301473 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13288 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Tested-by: mohammed rafi kc <rkavunga@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/dht : Rebalance process crashes due to double fd_unrefN Balachandran2016-01-101-6/+11
| | | | | | | | | | | | | The dst_fd was being unrefed twice in case the call to __dht_rebalance_create_dst_file failed. Change-Id: I56c5aff7fa3827887e67936b8aa1ecbd1a08a9e9 BUG: 1296611 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13193 Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier: allow db queries to be interruptableDan Lambright2016-01-073-30/+110
| | | | | | | | | | | | | | | | | | | | A query to the database may take a long time if the database has many entries. The tier daemon also sends IPC calls to the bricks which can run slowly, espcially in RHEL6. While it is possible to track down each such instance, the snapshot feature should not be affected by database operations. It requires no migration be underway. Therefore it is okay to pause tiering at any time except when DHT is moving a file. This fix implements this strategy by monitoring when control passes to DHT to migrate a file using the GF_XATTR_FILE_MIGRATE_KEY trigger. If it is not, the pause operation is successful. Change-Id: I21f168b1bd424077ad5f38cf82f794060a1fabf6 BUG: 1287842 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/13104 Reviewed-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: missleading indendentation, gcc-6Kaleb S KEITHLEY2016-01-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc-6 now has -Wmisleading-indentation as part of -Wall. compiling with gcc-6 gives this warning. ... dht-diskusage.c: In function ‘dht_subvol_has_err’: dht-diskusage.c:361:33: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation] goto out; ^~~~ dht-diskusage.c:358:25: note: ...this ‘if’ clause, but it is not if (conf->decommissioned_bricks[i] && ^~ ... Inspection of the source shows that without braces the loop is terminated prematurely. Change-Id: Ica48a8c59ee5d0a206797827d7920259d33b47ec BUG: 1295784 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13176 Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* tier/create: store TIER_LINKFILE_GFID in xattr dictionaryMohammed Rafi KC2016-01-051-17/+68
| | | | | | | | | | | | | | | | | | | In tier_create, a new key TIER_LINKFILE_GFID was introduced to avoid a race in stale linkfile deletion. Storing this key in xattr dictionary instead of using local->params dictionary. Because local->params dictionary was also used to create the file before stale linkfile deletion, that leads posix_create to fail, trying to set the added key as extended attributes Change-Id: I24fecb62b47bee65a1e86103925a67d13304c5df BUG: 1290677 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13130 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: check watermark during migrationDan Lambright2016-01-051-46/+57
| | | | | | | | | | | | | | | Currently we check the watermarks only before a cycle begins. We should also check the hot tier's fullness against the watermarks during the migration so the watermark is not exceeded as files are promoted. Change-Id: I2ff87a1c308d64fbdca14bbdf55f3ec3007290ae BUG: 1293932 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/13103 Reviewed-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com>