summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht/src/dht-common.c
Commit message (Collapse)AuthorAgeFilesLines
* dht/tiering: fix unused variable warnings/errorsKaleb S. KEITHLEY2016-08-301-1/+0
| | | | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a/the "leak" - via the generated rpc/xdr headers - of pragmas that mask these warnings. However 14085 won't pass the smoke test until all the warnings are fixed. Change-Id: I367a737570dd7d2f6cc25f4bf4299d31bb6826aa BUG: 1369124 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15242 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* dht: Implement ipc fopPoornima G2016-08-271-0/+84
| | | | | | | | | | | | | | | | | | ipc is used by md-cache to communicate the list of xattrs that it is caching, to the upcall xlator. Hence implement this in dht, such that it winds to all the bricks if the ipc op is GF_IPC_MDC_TARGET_UPCALL. The ips should not fail if any of the bricks is down, as md-cache will replay the ipc late when the brick comes back up. Change-Id: Ica551a550c04cbb1240c0d211fe831c2e5eb6017 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15225 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier: fix statfs for dht/tiered volumesDan Lambright2016-07-271-20/+27
| | | | | | | | | | | | | | | | | | | | | | | | | Return the correct size of the tiered volume in statfs. It should be the size of the cold tier, not the sum of the hot and cold tier, because the hot tier is a cache and not an extension of the volume's capacity. The number of free blocks, etc is the cold tier's capacity subtracted by the sum of utilization on the hot and cold tiers. Note if both tiers are part of the same file system this must be accounted for as well. The patch also fixes a pre-existing bug in the DHT/tier translators. If statfs was taken on a file, the code only calculated free space on the cached subvolume, not all subvolumes in the replica group. With the fix, this is corrected, except in the case where quota is used with the deem-statfs option set to "on". Change-Id: I2b8bcb4511edf83f12130960aad0a609fcf8f513 BUG: 1339689 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/14536 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/distribute: heal layout in discover codepath tooRaghavendra G2016-05-161-33/+7
| | | | | | | | | | | BUG: 1334164 Change-Id: I4259d88f2b6e4f9d4ad689bc4e438f1db9cfd177 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/14365 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/distribute: use a linked inode in directory heal codepathRaghavendra G2016-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | This is needed for following reasons: * healing is done in lookup and mkdir codepath where inode is not linked _yet_ as normally linking is done in interface layers (fuse-bridge, gfapi, nfsv3 etc). * healing consists of non-lookup fops like inodelk, setattr, setxattr etc. All non-lookup fops expect a linked inode. Change-Id: I1bd8157abbae58431b7f6f6fffee0abfe5225342 BUG: 1334164 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/14295 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
* dht:remember locked subvol and send unlock to the sameMohammed Rafi KC2016-05-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | During locking we send lock request to cached subvol, and normally we unlock to the cached subvol But with parallel fresh lookup on a directory, there is a race window where the cached subvol can change and the unlock can go into a different subvol from which we took lock. This will result in a stale lock held on one of the subvol. So we will store the details of subvol which we took the lock and will unlock from the same subvol Change-Id: I47df99491671b10624eb37d1d17e40bacf0b15eb BUG: 1311002 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13492 Reviewed-by: N Balachandran <nbalacha@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Perform NULL check on xdata before dict_get()Krutika Dhananjay2016-05-041-1/+1
| | | | | | | | | | | | | | .. to prevent unnecessary logs from gf_msg_callingfn() Change-Id: I367628fee2f6783ba9ed6f918deabd034df820c9 BUG: 1333043 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14212 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Handle rmdir failure correctlyN Balachandran2016-05-021-12/+98
| | | | | | | | | | | | | | | | | DHT did not handle rmdir failures on non-hashed subvols correctly in a 2x2 dist-rep volume, causing the directory do be deleted from the hashed subvol. Also fixed an issue where the dht_selfheal_restore errcodes were overwriting the rmdir error codes. Change-Id: If2c6f8dc8ee72e3e6a7e04a04c2108243faca468 BUG: 1330032 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/14060 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/afr/client/posix: Fail mkdir without gfid-reqPranith Kumar K2016-04-291-0/+8
| | | | | | | | | | | | | | | Do not allow directory creations without gfids as after the directories are created, operations on them fail anyway. So it is better to fail mkdir. BUG: 1317361 Change-Id: I8f8e3b38bbded1960b7215bac0432500f7e78038 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13690 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* libglusterfs: Add debug and trace logs for stack traceRaghavendra Talur2016-04-271-1/+2
| | | | | | | | | | | | | | | | | It has become very difficult to identify the xlator which returned negative op_ret. Being able to just change the log level and visualize the stack is helpful in such cases. Change-Id: I6545b4802c1ab4d0d230d5e9e036afb2384882e1 BUG: 1330052 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/13448 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/distribute: detect stale layouts in entry fopsRaghavendra G2016-04-221-24/+609
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dht_mkdir () { first-hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any subvol, but we choose first-hashed-subvol randomly"); { begin: hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; hash-range = extract hashe-range from layout of "parent"; ret = mkdir (parent/bname, hashed-subvol, hash-range); if (ret == "hash-value doesn't fall into layout stored on the brick (this error is returned by posix-mkdir)") { refresh_parent_layout (); goto begin; } } inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN", "first-hashed-subvol"); proceed with other parts of dht_mkdir; } posix_mkdir (parent/bname, client-hash-range) { disk-hash-range = getxattr (parent, "dht-layout-key"); if (disk-hash-range != client-hash-range) { fail-with-error ("hash-value doesn't fall into layout stored on the brick"); return 0; } continue-with-posix-mkdir; } Similar changes need to be done for dentry operations like create, symlink, link, unlink, rmdir, rename. These will be addressed in subsequent patches. This patch addresses only mkdir codepath. This change breaks stripe tests, as on some striped subvols dht layout xattrs are not set for some reason. This results in failure of mkdir. Since striped volumes are always created with dht, some tests associated with stripe also fail. So, I am making following tests changes (since stripe is out of maintainance): * modify ./tests/basic/rpc-coverage.t to not to use striped volumes * mark all (2) tests in tests/bugs/stripe/ as bad tests Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25 BUG: 1323040 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/13885 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* quota: setting 'read-only' option in xdata to instruct DHT to not healSakshi Bansal2016-04-191-2/+10
| | | | | | | | | | | | | | | | | | | | | | When quota is enabled the quota enforcer tries to get the size of the source directory by sending nameless lookup to quotad. But if the rename is successful even on one subvol or the source layout has anomalies then this nameless lookup in quotad tries to heal the directory which requires a lock on as many subvols as it can. But src is already locked as part of rename. For rename to proceed in brick it needs to complete a cluster-wide lookup. But cluster-wide lookup in quotad is blocked on locks held by rename, hence a deadlock. To avoid this quota sends an option in xdata which instructs DHT not to heal. Change-Id: I792f9322331def0b1f4e16e88deef55d0c9f17f0 BUG: 1252244 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13988 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: add "nuke" functionality for efficient server-side deletionJeff Darcy2016-04-071-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This turns a special xattr into an rmdir with flags set. When that hits the posix translator on the server side, that causes the file/directory to be moved into the special "landfill" directory. From there, the posix janitor thread will take care of deleting it entirely on the server side - traversing it recursively if necessary. A couple of secondary issues were fixed to make this effective. * FUSE now ensures that setxattr values are NUL terminated. * The janitor thread now gets woken up immediately when something is placed in 'landfill' instead of only when file descriptors need to be closed. * The default landfill-emptying interval was reduced to 10s. To use the feature, issue a setxattr something like this: setfattr -n glusterfs.dht.nuke -v "" /mnt/glusterfs/vol/some_dir The value doesn't actually matter; the mere receipt of a request with this key is sufficient. Some day it might be useful to allow setting a required value as a sort of password, so that only those who know it can access the underlying special functionality. Change-Id: I8a343c2cdb40a76d5a06c707191fb67babb8514f Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/13878 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: lock on subvols to prevent lookup vs rmdir raceSakshi2016-04-051-26/+166
| | | | | | | | | | | | | | | | | | | | | | There is a possibility that while an rmdir is completed on some non-hashed subvol and proceeding to others, a lookup selfheal can recreate the same directory on those subvols for which the rmdir had succeeded. Now the deletion of the parent directory will fail with an ENOTEMPTY. To fix this take blocking inodelk on the subvols before starting rmdir. Selfheal must also take blocking inodelk before creating the entry. Change-Id: I168a195c35ac1230ba7124d3b0ca157755b3df96 BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13528 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht: report constant directory sizeJeff Darcy2016-03-201-1/+62
| | | | | | | | | | | | | | | | | | | | | | | | Directory size is meaningless. Every filesystem has its own unpredictable way of increasing or decreasing it, based on internal data structures and even transient conditions. Some filesystems (e.g. ext4) never decrease it at all. Others (e.g. btrfs) don't even report it. Very few programs look at it, and those that do are broken. Unfortunately, one such program is GNU tar, which will complain when it sees different values because at different times we got the value from different DHT subvolumes. To avoid such problems, just report a constant value. Change-Id: Id64ce917c75b5f7ff50cb55b6e997f3b3556e7e3 BUG: 1302948 Original-author: Shyam <srangana@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13770 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: mkdir must unwind with latest ctimeSakshi Bansal2016-02-261-0/+6
| | | | | | | | | | | | | | | | | | Currently fops like mkdir used the the ctime it gets after creating the directory entry. But setting layout also updates the ctime of a directory. Hence DHT must get the ctime after the setxattr call and unwind with the latest ctime to avoid mismatch in time seen by applications like tar. Change-Id: Iecbbe3aac5244af5da9788b48ccf299ca56b4bae BUG: 1302948 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13352 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Skip subvols if no layout presentN Balachandran2016-02-141-0/+8
| | | | | | | | | | | | | | | | | | Running "rm -rf" on a tiered volume sometimes caused the client to crash because dht_readdirp_cbk referenced a NULL layout for the hot tier subvol. Now, entries are skipped if the layout is NULL. This can cause "rm -rf" to fail with ENOTEMPTY rmdir failures. Change-Id: Idd71a9d0f7ee712899cc7113bbf2cd3dcb25808b BUG: 1307208 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/13440 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* Tier: "tier start force" command implementationhari gowtham2015-12-221-1/+2
| | | | | | | | | | | | | | | | The start command doesnt restart the tier deamon if the deamon is running at one node. hence to bring up the tierd on the nodes where the deamon is down, the force command is implemented. It skips the check for tierd running. Change-Id: I0037d3e5ecfe56637d0da201a97903c435d26436 BUG: 1292112 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12983 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/dht : Ftruncate on migrating file fails with EINVALN Balachandran2015-12-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | What: If dht_open is called on a migrating file after the inode_ctx is set, subsequent FOPs on that fd do not open the fd on the dst subvol. This is seen when the open-ftruncate-close sequence is repeatedly called on a migrating file. A second call to the sequence described above causes dht_truncate_cbk to call dht_truncate2 as the dht_inode_ctx was already set by the first call. As dht_rebalance_in_progress_check is not called, the fd is not opened on the dst subvol. On a distributed-replicate volume, this causes AFR to open the fd using afr_fix_open, but with the wrong flags, causing posix_ftruncate to fail with EINVAL. The fix: We require fd specific information to make a decision while handling migrating files. Set the fd_ctx to indicate the fd has been opened on the dst subvol and check if it has been set while processing Phase1/Phase2 checks in the FOP callback functions. Change-Id: I43cdcd8017b4a11e18afdd210469de7cd9a5ef14 BUG: 1284823 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12985 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier:delete the linkfile if data file creation failsMohammed Rafi KC2015-12-221-81/+0
| | | | | | | | | | | | | | | | | | | | | | | If we are creating data file in a hot subvolume then we will create a linkfile in cold subvolume. Linkfile creation happens first. If linkfile creation was successful and data file creation failed, then linkfile in cold subvolume will become stale. This patch will delete the linkfile as well, if data file creation fails. Also this code duplicates dht_create to make tier_create Change-Id: I377a90dad47f288e9576c7323b23cf694a91a7a3 BUG: 1290677 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12948 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/dht: Handle failure in getxattrSusant Palai2015-12-161-0/+7
| | | | | | | | | | | | | | | | | Problem: Currently even if we have received xattrs from any one of the subvolume, we unwind with error in case the last subvol (which unwinds) received a negative response. To handle the case check if any of the subvolume has received a response and pass it down. Change-Id: Ia12a1f9671a6764f7550e6dc223324b1039fcc51 BUG: 1287539 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12845 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier:unlink during migrationMohammed Rafi KC2015-12-161-74/+77
| | | | | | | | | | | | | | | | | | | | | | files deleted during promotion were not deleting as the files are moving from hashed to non-hashed. On deleting a file that is undergoing promotion, the unlink call is not sent to the dst file as the hashed subvol == cached subvol. This causes the file to reappear once the migration is complete. This patch also fixes a problem with stale linkfile deleting. Change-Id: I4b02a498218c9d8eeaa4556fa4219e91e7fa71e5 BUG: 1282390 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12829 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier/dht: files are still going to decommissioned subvolMohammed Rafi KC2015-12-091-3/+27
| | | | | | | | | | | | | | After detach tier start, creates are still going to hot tier. Because when creating data files we are not checking for decommissioned bricks. Change-Id: I8e28258d9b2367dcc8ad6e5e91d0e54d92fdf771 BUG: 1289602 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12914 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: fix loading tier.so into glusterdN Balachandran2015-12-031-14/+6
| | | | | | | | | | | | | | | | | glusterd occasionally loads shared libraries of translators. This failed for tiering due to a reference to dht_methods which is defined as a global variable which is not necessary. The global variable has been removed and this is now a member of dht_conf and is now initialised in the *_init calls. Change-Id: Ifa0a21e3962b5cd8d9b927ef1d087d3b25312953 BUG: 1287842 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12863 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: readdirp to cold tier onlyDan Lambright2015-11-231-65/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible a file would get migrated in the middle of a readdir operation. If there are four subvolumes A,B,C,D, and if readdir reads them in order and reaches subvol B, then, if a file is moved from D to A, it will not be included in the readdir output. This phenonema has pre-existed in DHT migration but is more apparent in tiering. When a file is moved off the hashed subvolume a T file is created. For tiering, we will make the cold subvolume the hashed subvolume. This will ensure the creation of a T file. Readdir will not skip T files in the tier translator. Making the cold subvolume the hashed subvolume ensures the T files created on promotions or creates will be less likely to fill the volume. Creates still put the data on the hot subvolume. Change-Id: Ifde557d3d0e94a4570ca9f115adee3db2ee75407 BUG: 1281598 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12530 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: loc should store proper gfidSakshi Bansal2015-11-171-0/+1
| | | | | | | | | | Change-Id: Ic1393d44a9ed4aaba23d7c9ddea45977b9dae5e4 BUG: 1281265 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12574 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: set proper errno when hashed subvol is not foundSakshi Bansal2015-11-171-5/+5
| | | | | | | | | | | Change-Id: I0c4c72e2f5a9f8a7c60ef65251c596b54de89479 BUG: 1279705 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12559 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr: replica pair going offline does not require CHILD_MODIFIED eventSakshi Bansal2015-11-161-0/+6
| | | | | | | | | | | | | | | | | As a part of CHILD_MODIFIED event DHT forgets the current layout and performs fresh lookup. However this is not required when a replica pair goes offline as the xattrs can be read from other replica pairs. Hence setting different event to handle replica pair going down. Change-Id: I5ede2a6398e63f34f89f9d3c9bc30598974402e3 BUG: 1281230 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12573 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* dht: heal directory path if the directory is not presentMohammed Rafi KC2015-11-081-7/+69
| | | | | | | | | | | | | | | | | | | After a successful nameless lookup if the directory is not present on any of the subvol, then we will get the path of the directory and will recursively send a named lookp on each parent directory. This will help particularly for the scenarios like add brick and attach-tier. Change-Id: I64c2118a5ab03bbaa59b0dfc62babdf4472a92a3 BUG: 1272949 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12376 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: update cached subvolume during readdirp cbkMohammed Rafi KC2015-11-081-32/+62
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bb2370514598a99e6ab268af81df57dc16caa2c5. issue and impact: readdirp_cbk was not resetting the layout for files, this causes problem if the files is moved from one cached subvolume and if the layout was not proper, then there is chance to fail entry fops if the fops executed with out a lookup. Because the cached subvolume will not change and the application assumes the presence of file in cached subvol. so it fails with ENOENT. The patch preset the layout information in readdirp cbk for each files in the entry. That leaves the problem the commit bb2370514598a99e6ab268af81df57dc16caa2c5 try to fix. We will fix the problem in a separate patch. Change-Id: I878ec32f44edde2fb9d4f132d9b1b547cde993d9 BUG: 1272949 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12449 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* quota: add version to quota xattrsvmallika2015-11-021-2/+2
| | | | | | | | | | | | | | | | | | | | | When a quota is disable and the clean-up process terminated without completely cleaning-up the quota xattrs. Now when quota is enabled again, this can mess-up the accounting A version number is suffixed for all quota xattrs and this version number is specific to marker xaltor, i.e when quota xattrs are requested by quotad/client marker will remove the version suffix in the key before sending the response Change-Id: I1ca2c11460645edba0f6b68db70d476d8d26e1eb BUG: 1272411 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12386 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier: add pause tier for snapshotsDan Lambright2015-10-211-1/+5
| | | | | | | | | | | | | | | | | | | | Snaps of tiered volumes cannot handle files undergoing migration. We implement a helper mechanism to "pause" migration. Any files undergoing migration are aborted. Clean up is done to remove sticky bits and data at the destination. Migration is restarted after snap completes. For testing an internal switch is added. It is not exposed externally. gluster volume set vol1 tier-pause [true|false] Change-Id: Ia85bbf89ac142e9b7e73fcbef98bb9da86097799 BUG: 1267950 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12304 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/tier: Handle FOPs on files being migratedN Balachandran2015-09-221-19/+107
| | | | | | | | | | | | | | | | Determine which DHT level is responsible for handling fops on a file undergoing migration based on the name of the the linkto xattr set on the file being migrated and process accordingly. Change-Id: I82772e39314d4fe7f2ba0dcf22de0c6a374ee139 BUG: 1254428 Signed-off-by: N Balachandran <nbalacha@redhat.com> Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12090 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier/dht: unlink fails after lookup in a directoryMohammed Rafi KC2015-09-171-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unlink fails with invalid argument for files that are being present on cold tier, before attaching. All of the fops will be hashed to hot_tier after attach-tier (unless explicitly set the "rule" option). Lookups sent to directory, will eventually search the directory using readdirp, and will populate inode_ctx for the inodes based on the output, in respective dht_xlators. So the readdirp will populate inodes_ctx for the files (that is already present in volume before attaching) in cold-dht only because it got the entries from the cold-tier. So when an unlink comes on such an inode, the lookup associated with the unlink will be send as a re validate request to cold-tier only, since already a lookup was performed on the inode, and the new lookup will succeed. So from the unlink of dht, it will hash to cold-tier but the cached_subvol will be cold, since there is a mismatch in hash and cach , it chose hashed subvolume and will sent the fop to hot dht, and the fops fail with EINVAL from the hot-dht since it does not have inode_ctx stored for that inode (because, no lookup was performed from hot-dht). Change-Id: Ib7c14a9297a22d615f7a890a060be4809b5a745a BUG: 1236032 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11675 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: reverting changes that takes lock on all subvols to prevent rmdir vs ↵Sakshi2015-09-141-157/+23
| | | | | | | | | | | | | | | lookup selfheal race Locking on all subvols before an rmdir is unable to remove all directory entries. Hence reverting the patch for now. Change-Id: I31baf2b2fa2f62c57429cd44f3f229c35eff1939 BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12125 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/cluster: Avoid crash if local is NULLSusant Palai2015-09-131-5/+17
| | | | | | | | | | | | | | This patch addresses crash handling if local is NULL. In addition to that, we were not unwinding if no lock is taken in dht_linkfile_create_cbk(create/mknod). This patch handles that also. Change-Id: Ibcff317f10d60e7865fd7ffb9479b3af53c9ef17 BUG: 1260051 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12160 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/remove-brick: Avoid data loss for hard link migrationSusant Palai2015-09-091-6/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If the hashed subvol of a file has reached cluster.min-free-disk, for a create opertaion a linkto file will be created on the hashed and the data file will be created on some other brick. For creation of the linkfile we populate the dictionary with linkto key and value as the cached subvol. After successful linkto file creation, the linkto-key-value pair is not deleted form the dictionary and hence, the data file will also have linkto xattr which points to itself.This looks something like this. client-0 client-1 -------T file rwx------file linkto.xattr=client-1 linkto.xattr=client-1 Now coming to the data loss part. Hardlink migration highly depend on this linkto xattr on the data file. This value should be the new hashed subvol of the first hardlink encountered post fix-layout. But when it tries to read the linkto xattr it gets the same target as where it is sitting. Now the source and destination are same for migration. At the end of migration the source file is truncated and deleted, which in this case is the destination and also the only data file it self resulting in data loss. Change-Id: I36b1d105752bd9467757ecf3f103b45c666783d6 BUG: 1260051 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12105 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: NULL dereferencing causes crashMohammed Rafi KC2015-09-081-2/+2
| | | | | | | | | | | | | | If linkfile_create is failed for some reason, then we are trying to dereference a null variable Change-Id: I3c6ff3715821b9b993d1bab7b90167de2861e190 BUG: 1260147 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12106 Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fd: Do fd_bind on successful openPranith Kumar K2015-08-281-0/+1
| | | | | | | | | | | | | | | - fd_unref should decrement fd->inode->fd_count only if it is present in the inode's fd list. - successful open/opendir should perform fd_bind. Change-Id: I81dd04f330e2fee86369a6dc7147af44f3d49169 BUG: 1207735 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11044 Reviewed-by: Anoop C S <anoopcs@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht : lock on subvols to prevent lookup vs rmdir raceSakshi2015-08-271-23/+157
| | | | | | | | | | | | | | | | | There is a possibility that while an rmdir is completed on some non-hashed subvol and proceeding to others. A lookup selfheal can recreate the same directory on those subvols for which the rmdir had succeeded. The fix is to take a blocking inodelk on the subvols before starting rmdir. Since selfheal requires lock on all subvols, if an rmdir is in progess acquiring locks will fail and vice versa. Change-Id: I841a44758c3b88f5e04d1cb73ad36e0cac9fdabb BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11725 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: avoid mknod on decommissioned brickSusant Palai2015-08-251-35/+329
| | | | | | | | | | Change-Id: I8c39ce38e257758e27e11ccaaff4798138203e0c BUG: 1256243 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11998 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: block/handle create op falling to decommissioned brickSusant Palai2015-08-231-48/+408
| | | | | | | | | | | | | | | | | | | | | | | Problem: Post remove-brick start till commit phase, the client layout may not be in sync with disk layout because of lack of lookup. Hence,a create call may fall on the decommissioned brick. Solution: Will acquire a lock on hashed subvol. So that a fix-layout or selfheal can not step on layout while reading the layout. Even if we read a layout before remove-brick fix-layout and the file falls on the decommissioned brick, the file should be migrated to a new brick as per the fix-layout. Change-Id: If84a12ec34f981adb2b9b224e80f535cfe5bf9f2 BUG: 1232378 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11260 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: return non NULL xattr,xdata for ret >= 0Susant Palai2015-08-131-2/+2
| | | | | | | | | | Change-Id: I4a3dd8c00894ceeed4af77df2d960f372281a03b BUG: 1235989 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11409 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht/tier :rename fails with EBUSYMohammed Rafi KC2015-08-131-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the files was in hot tier and the look up was done already, then hashed and cached subvolume will be hot-tier. Once the file is moved from hot-tier to cold-tier, then subsequent lookup will send a revalidate lookup to hot-tier and it will find out that the file was actually moved and there is only link in the cached subvolume. So dht will return an ESTALE to fuse. Upon receiving ESTALE for a lookup, fuse will create a new inode and sent a fresh lookup. This lookup will be successful, and it will locate the file properly. Then fuse try to link the inode, but the older inode was already there in inmemory inode cache with same gfid and that is also shared with fuse kernal. So inode_link will return the older ionode itself. So the subsequent rename fop will come to gluster with the older inode. From dht_rename, we will take a lock on the inode and after successful inodelk on inode dht will send lookup before creating a link. this lookup will again find out that the file is a link file, and then dht will think that file is migrating/migrated in the mean time, and will send EBUSY. Change-Id: Ib3a01e5b1d7f64514b04bb6234026d049f082679 BUG: 1248306 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11768 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: fixes for migration over ec as cold tierDan Lambright2015-07-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | An opendir is done in rebalance. The graph constructed when EC is used in tiering may have no local volumes (if all the hot volumes are on one node and all the others on another node). Previously the opendir only sent fops down the local subvolumes for migration. They must be sent down both the hot and cold subvolumes for tiering. When setxattr2() received a NULL subvolume; this dereferenced an uninitialized variable. When a lookup is done during creation of the destination file, the xattr dict is "polluted" with virtual xattrs. These cause subsequent xattrs in the new file to not be written by posix. They are required by EC. The inode gfid for "entry_loc" in gf_defrag_migrate_single_file() was not initialized. This made underlying translators think the gfid was 0, and failed migration. Change-Id: I6ccda8ca8e43485b9b354341bbfcb302496f632c BUG: 1236212 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11433 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* tiering/rebalance: tier daemon stopped with out updating statusMohammed Rafi KC2015-06-251-3/+0
| | | | | | | | | | | | | | | | When a subvol goes down, tier daemon stopped immediately, and the status shows as "Progressing". With this change, with respect to tier xlator, when a subvol goes offline it will update the status as failed. Change-Id: I9f722ed0d35cda8c7fc1a7e75af52222e2d0fdb7 BUG: 1227803 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11068 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht: Adding log messages to the new logging frameworkarao2015-06-231-133/+173
| | | | | | | | | | | | | Change-Id: Ib3bb61c5223f409c23c68100f3fe884918d2dc3f BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/10021 Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier: account for reordered layoutsDan Lambright2015-06-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a tiered volume the cold subvolume is always at a fixed position in the graph. DHT's layout array, on the other hand, may have the cold subvolume in either the first or second index, therefore code cannot make any assumptions. The fix searches the layout for the correct position dynamically rather than statically. The bug manifested itself in NFS, in which a newly attached subvolume had not received an existing directory. This case is a "stale entry" and marked as such in the layout for that directory. The code did not see this, because it looked at the wrong index in the layout array. The fix also adds the check for decomissioned bricks, and fixes a problem in detach tier related to starting the rebalance process: we never received the right defrag command and it did not get directed to the tier translator. Change-Id: I77cdf9fbb0a777640c98003188565a79be9d0b56 BUG: 1214289 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Joseph Fernandes <josferna@redhat.com> Reviewed-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11092
* cluster/dht: fix incorrect dst subvol info in inode_ctxNithya Balachandran2015-06-021-22/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Stashing additional information in the inode_ctx to help decide whether the migration information is stale, which could happen if a file was migrated several times but FOPs only detected the P1 migration phase. If no FOP detects the P2 phase, the inode ctx1 is never reset. We now save the src subvol as well as the dst subvol in the inode ctx. The src subvol is the subvol on which the FOP was sent when the mig info was set in the inode ctx. This information is considered stale if: 1. The subvol on which the current FOP is sent is the same as the dst subvol in the ctx 2. The subvol on which the current FOP is sent is not the same as the src subvol in the ctx This does not handle the case where the same file might have been renamed such that the src subvol is the same but the dst subvol is different. However, that is unlikely to happen very often. Change-Id: I05a2e9b107ee64750c7ca629aee03b03a02ef75f BUG: 1142423 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10834 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: pass a destination subvol to fop2 variants to avoid races.Raghavendra G2015-06-021-59/+19
| | | | | | | | | | | | | | | | | | | The destination subvol used in the fop2 variants is either stored in inode-ctx1 or local->cached_subvol. However, it is not guaranteed that a value stored in these locations before invocation of fop2 is still present after the invocation as these locations are shared among different concurrent operations. So, to preserve the atomicity of "check dst-subvol and invoke fop2 variant if dst-subvol found", we pass down the dst-subvol to fop2 variant. This patch also fixes error handling in some fop2 variants. Change-Id: Icc226228a246d3f223e3463519736c4495b364d2 BUG: 1142423 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/10943 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>