summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* afr: Add lease() fopPoornima G2018-05-053-0/+157
| | | | | | | | Change-Id: Ied047dd5ee44e9d5a5d3db214826f7df30332ef9 updates: #350 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* cluster/dht: unwind if dht_selfheal_dir_mkdir returns an errorRaghavendra G2018-05-031-1/+5
| | | | | | | | | | If dht_selfheal_dir_mkdir returns an error, cbk passed to dht_selfheal_directory is not invoked. So, Current codepath leaves an unwound frame resulting in a hung fop forever. Change-Id: I422308b8a34a074301ca46b029ffe676f5e0f66c fixes: bz#1574305 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE errorSusant Palai2018-04-301-1/+8
| | | | | | | | | | | Problem: A directory deletion can happen just before gf_defrag_settle_hash which internally does a setxattr operation on a directory. Solution: Ignore ENOENT and ESTALE errors Fixes: bz#1572581 Change-Id: I2f91809f3b5e02976c4c3a5a596406a8b2f8f6f2 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/afr: shd changes for thin arbiterkarthik-us2018-04-301-0/+184
| | | | | | | Updates #352 Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d Signed-off-by: karthik-us <ksubrahm@redhat.com>
* afr: initial changes for thin arbiterRavishankar N2018-04-306-8/+229
| | | | | | | | | 1. Create thin arbiter index file during mount. 2. Set pending marker in thin arbiter id file in case of failure. Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82 updates: #352 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* libglusterfs: Capture the dict response in syncop_xattrop_cbkkarthik-us2018-04-271-1/+1
| | | | | | | | | | | | | | | Problem: Currently it is not possible to capture the xattrs values which are set on the bricks by calling syncop_(f)xattrop, because the response dict is not being assigned to any of the dictionaries. Fix: In the xattrop callback capture the response dict and send it back to the caller if it is requested. Change-Id: I9de9bcd97d6008091c9b060bcca3676cb9ae8ef9 fixes: bz#1572076 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/afr: Keep child-up until ping-eventPranith Kumar K2018-04-253-25/+40
| | | | | | | | | | | | | | | | | | | | | Problem: If we have 2 bricks, brick-A and brick-B with brick-A within halo-max-latency and brick-B more than halo-max-latency. If we set both halo-min, halo-max replicas as '1'. In this case, brick-A comes online and then ping-latency will be updated for it. When brick-B comes online, we have 2 up-bricks, so the code tries to find the brick with worst latency to mark it down. Since Brick-B just came online it always had '0' latency so brick-B used to be marked offline and Brick-B would eventually be the one to be online even when brick-A is more suited. Fix: Consider latency of just-up child as HALO_MAX_LATENCY so that worst-child until ping-latency is found as the just-up brick. Also keep ping-latency as -1 until child-up during initialization. BUG: 1567881 fixes bz#1567881 Change-Id: I148262fe505468190f0eb99225d0f6d57cdb6f04 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: Fix dht_rename lock orderN Balachandran2018-04-231-18/+47
| | | | | | | | | | Fixed dht_order_rename_lock to use the same inodelk ordering as that of the dht selfheal locks (dictionary order of lock subvolumes). Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d fixes: bz#1568348 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Need heal-timeout to be configured as low as 5 secondsPranith Kumar K2018-04-201-1/+1
| | | | | | | | | | | In Halo replication, there are pending heals more often than not. It makes sense to give users the capability to configure it as low as 5 seconds. BUG: 1569489 fixes bz#1569489 Change-Id: I451c1975827f66398b903f659c981ef3121d5376 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* fuse: do fd_resolve in fuse_getattr if fd is receivedSusant Palai2018-04-181-2/+2
| | | | | | | | | | | | | | | | | | | | problem: With the current code, post graph switch the old fd is received for fuse_getattr and since it is associated with old inode, it does not have the inode ctx across xlators in new graph. Hence, dht errored out saying "no layout" for fstat call. Hence the EINVAL. Solution: if fd is passed, init and resolve fd to carry on getattr test case: - Created a single brick distributed volume - Started untar - Added a new-brick Without this fix, untar used to abort with ERROR. Change-Id: I5805c463fb9a04ba5c24829b768127097ff8b9f9 fixes: bz#1566207 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/afr: Make sure latency-arg is passed to afrPranith Kumar K2018-04-181-0/+2
| | | | | | | | | | | xlator_notify doesn't pass the extra arguments that come in the input function, so XLATOR_NOTIFY macro should be used instead to pass the extra arguments to the function. BUG: 1567881 fixes bz#1567881 Change-Id: Ic15b6c446638cbacf3149693147a754219037c47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr: fixes to afr-eager lockingRavishankar N2018-04-181-0/+2
| | | | | | | | | | | | | 1. If pre-op fails on all bricks,set lock->release to true in afr_handle_lock_acquire_failure so that the GF_ASSERT in afr_unlock() does not crash. 2. Added a missing 'return' after handling pre-op failure in afr_transaction_perform_fop(), fixing a use-after-free issue. Change-Id: If0627a9124cb5d6405037cab3f17f8325eed2d83 fixes: bz#1561129 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: Handle file migrations when brick downN Balachandran2018-04-131-5/+51
| | | | | | | | | | | | | | | The decision as to which node would migrate a file was based on the gfid of the file. Files were divided among the nodes for the replica/disperse set. However, if a brick was down when rebalance started, the nodeuuids would be saved as NULL and a set of files would not be migrated. Now, if the nodeuuid is NULL, the first non-null entry in the set is the node responsible for migrating the file. Change-Id: I72554c107792c7d534e0f25640654b6f8417d373 fixes: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Wind open to all subvolsN Balachandran2018-04-111-10/+5
| | | | | | | | | | dht_opendir should wind the open to all subvols whether or not local->subvols is set. This is because dht_readdirp winds the calls to all subvols. Change-Id: I67a96b06dad14a08967c3721301e88555aa01017 updates: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: act as passthrough for renames on single child DHTRaghavendra G2018-04-101-7/+15
| | | | | | | | | | Various synchronization present in dht_rename while handling directories and files is necessary only if we have more than only one child. Change-Id: Ie21ad419125504ca2f391b1ae2e5c1d166fee247 fixes: bz#1563511 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Turn ON the stripe-cache option by defaultAshish Pandey2018-04-061-1/+1
| | | | | | Change-Id: I0a290396c30c635b13ee73004d20259efb76a954 fixes: bz#1563945 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* afr: add quorum checks in pre-opRavishankar N2018-04-051-33/+31
| | | | | | | | | | | | | | | | | Problem: We seem to be winding the FOP if pre-op did not succeed on quorum bricks and then failing the FOP with EROFS since the fop did not meet quorum. This essentially masks the actual error due to which pre-op failed. (See BZ). Fix: Skip FOP phase if pre-op quorum is not met and go to post-op. Fixes: 1561129 Change-Id: Ie58a41e8fa1ad79aa06093706e96db8eef61b6d9 fixes: bz#1561129 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: enable lookup-optimize by defaultN Balachandran2018-04-042-2/+4
| | | | | | | | | | | | | | Lookup-optimize has been shown to improve create performance. The code has been in the project for several years and is considered stable. Enabling this by default in order to test this in the upstream regression runs. Change-Id: Iab792979ee34f0af4713931e0b5b399c23f65313 updates: bz#1557435 BUG: 1557435 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Prevent ping-event handling on shdPranith Kumar K2018-04-031-0/+2
| | | | | | | | | On shd, we shouldn't treat any brick down based on latency, otherwise self-heal will never happen fixes: bz#1562717 Change-Id: Ica07fcc4fae91a6bfd9c9a670e2be464704d94b7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: Update dht option levelsN Balachandran2018-04-021-2/+16
| | | | | | | | | Set the levels for DHT options based on https://review.gluster.org/#/c/19466/ Change-Id: I51b31a706a0b9517404e83224c89de145fd5d7e1 updates: #430 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Update layout in inode only on successN Balachandran2018-04-022-4/+24
| | | | | | | | | | | | | | | | | | | | | With lookup-optimize enabled, gf_defrag_settle_hash in rebalance sometimes flips the on-disk layout on volume root post the migration of all files in the directory. This is sometimes seen when attempting to fix the layout of a directory multiple times before calling gf_defrag_settle_hash. dht_fix_layout_of_directory generates a new layout in memory but updates it in the inode ctx before it is set on disk. The layout may be different the second time around due to dht_selfheal_layout_maximize_overlap. If the layout is then not written to the disk, the inode now contains the wrong layout. gf_defrag_settle_hash does not check the correctness of the layout in the inode before updating the commit-hash and writing it to the disk thus changing the layout of the directory. Change-Id: Ie1407d92982518f2a0c40ec70ad370b34a87b4d4 updates: bz#1557435 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* afr: add new value for read-hash-mode volume optionRavishankar N2018-03-296-32/+119
| | | | | | | | | | Updates: #363 This new value (3) will try to wind read requests to the child of AFR having the least amount of pending requests in its queue. Change-Id: If6bda2aac9bf7aec3fc39622f78659313c4b6508 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/ec: send list-node-uuids request to all subvolumesXavi Hernandez2018-03-281-1/+1
| | | | | | | | | | | | The xattr trusted.glusterfs.list-node-uuids was only sent to a single subvolume. This was returning null uuids from the other subvolumes as if they were down. This fix forces that xattr to be requested from all subvolumes. Change-Id: If62eb39a6857258923ba625e153d4ad79018ea2f fixes: bz#1561406 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/dht: ENOSPC will not fail rebalanceN Balachandran2018-03-281-6/+2
| | | | | | | | | ENOSPC returned by a file migration is no longer considered a rebalance failure. Change-Id: I21cf3a8acdc827bc478e138d6cb5db649d53a28c fixes: bz#1553598 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* Quota: heal directory on newly added bricks when quota limit is reachedSanoj Unnikrishnan2018-03-281-1/+16
| | | | | | | | | | | | | | | | | Problem: if a lookup is done on a newly added brick for a path on which limit has been reached, the lookup fails to heal the directory tree due to quota. Solution: Tag the lookup as an internal fop and ignore it in quota. Since marking internal fop does not usually give enough contextual information. Introducing new flags to pass the contextual info. Adding dict_check_flag and dict_set_flag to aid flag operations. A flag is a single bit in a bit array (currently limited to 256 bits). Change-Id: Ifb6a68bcaffedd425dd0f01f7db24edd5394c095 fixes: bz#1505355 BUG: 1505355 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com>
* cluster/ec: fix SHD crash for null gfid'sXavi Hernandez2018-03-211-0/+8
| | | | | | | | | | | | | | | | | | | When the self-heal daemon is doing a full sweep it uses readdirp to get extra stat information from each file. This information is obtained in two steps by the posix xlator: first the directory is read to get the entries and then each entry is stated to get additional info. Between these two steps, it's possible that the file is removed by the user, so we'll get an error, leaving stat info empty. EC's heal daemon was using the gfid blindly, causing an assert failure when protocol/client was trying to encode the gfid. To fix the problem a check has been added. If we detect a null gfid, we simply ignore it and continue healing. Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228 BUG: 1558016 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/afr: Switch to active-fd-count for open-fd checksPranith Kumar K2018-03-211-8/+8
| | | | | | BUG: 1557932 Change-Id: I3783e41b3812267bc10c0d05d062a31396ce135b Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/afr: Make AFR eager-locking similar to ECPranith Kumar K2018-03-149-908/+813
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1) Afr's eager-lock only works for data transactions. 2) When there are conflicting writes, write with conflicting region initiates unlock of eager-lock leading to extra pre-ops and post-ops on the file. When eager-lock goes off, it leads to extra fsyncs for random-write workload in afr. Solution (that is modeled after EC): In EC, when there is a conflicting write, it waits for the current write to complete before it winds the conflicted write. This leads to better utilization of network and disk, because we will not be doing extra xattrops and FSYNCs and inodelk/unlock. Moved fd based counters to inode based counters. I tried to model the solution based on EC's locking, but it is not similar to AFR because we had to keep backward compatibility. Lifecycle of lock: ================== First transaction is added to inode->owners list and an inodelk will be sent on the wire. All the next transactions will be put in inode->waiters list until the first transaction completes inodelk and [f]xattrop completely. Once [f]xattrop also completes, all the requests in the inode->waiters list are checked if it conflict with any of the existing locks which are in inode->owners list and if not are added to inode->owners list and resumed with doing transaction. When these transactions complete fop phase they will be moved to inode->post_op list and resume the transactions that were paused because of conflicts. Post-op and unlock will not be issued on the wire until that is the last transaction on that inode. Last transaction when it has to perform post-op can choose to sleep for deyed-post-op-secs value. During that time if any other transaction comes, it will wake up the sleeping transaction and takes over the ownership of the lock and the cycle continues. If the dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping transaction and it will set lock->release to true and starts doing post-op and then unlock. During this time if any other transactions come, they will be put in inode->frozen list. Once the previous unlock comes it will move the frozen list to waiters list and moves the first element from this waiters-list to owners-list and attempts the lock and the cycle continues. This is the general idea. There is logic at the time of dealying and at the time of new transaction or in flush fop to wakeup existing sleeping transactions or choosing whether to delay a transaction etc, which is subjected to change based on future enhancements etc. Fixes: #418 BUG: 1549606 Change-Id: I88b570bbcf332a27c82d2767dfa82472f60055dc Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Change default read policy to gfid-hashAshish Pandey2018-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Whenever we read data from file over NFS, NFS reads more data then requested and caches it. Based on the stat information it makes sure that the cached/pre-read data is valid or not. Consider 4 + 2 EC volume and all the bricks are on differnt nodes. In EC, with round-robin read policy, reads are sent on different set of data bricks. This way, it balances the read fops to go on all the bricks and avoid heating UP (overloading) same set of bricks. Due to small difference in clock speed, it is possible that we get minor difference for atime, mtime or ctime for different bricks. That might cause a different stat returned to NFS based on which NFS will discard cached/pre-read data which is actually not changed and could be used. Solution: Change read policy for EC as gfid-hash. That will force all the read to go to same set of bricks. Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84 BUG: 1554743 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: avoid delays in self-healXavi Hernandez2018-03-144-48/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Self-heal creates a thread per brick to sweep the index looking for files that need to be healed. These threads are started before the volume comes online, so nothing is done but waiting for the next sweep. This happens once per minute. When a replace brick command is executed, the new graph is loaded and all index sweeper threads started. When all bricks have reported, a getxattr request is sent to the root directory of the volume. This causes a heal on it (because the new brick doesn't have good data), and marks its contents as pending to be healed. This is done by the index sweeper thread on the next round, one minute later. This patch solves this problem by waking all index sweeper threads after a successful check on the root directory. Additionally, the index sweep thread scans the index directory sequentially, but it might happen that after healing a directory entry more index entries are created but skipped by the current directory scan. This causes the remaining entries to be processed on the next round, one minute later. The same can happen in the next round, so the heal is running in bursts and taking a lot to finish, specially on volumes with many directory levels. This patch solves this problem by immediately restarting the index sweep if a directory has been healed. Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e BUG: 1547662 Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
* cluster/dht: Skipped files are not treated as errorsN Balachandran2018-03-121-9/+11
| | | | | | | | | For skipped files, use a return value of 1 to prevent error messages being logged. Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861 BUG: 1553598 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Remove unused code pathsPranith Kumar K2018-03-068-760/+42
| | | | | | | | | | | | | | | | Removed 1) afr-v1 self-heal locks related code which is not used anymore 2) transaction has some data types that are not needed, so removed them 3) Never used lock tracing available in afr as gluster's network tracing does the job. So removed that as well. 4) Changelog is always enabled and afr is always used with locks, so __changelog_enabled, afr_lock_server_count etc functions can be deleted. 5) transaction.fop/done/resume always call the same functions, so no need to have these variables. BUG: 1549606 Change-Id: I370c146fec2892d40e674d232a5d7256e003c7f1 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/afr: Remove compound-fops usage in afrPranith Kumar K2018-03-065-396/+7
| | | | | | | | | We are not seeing much improvement with this change. So removing the feature so that it doesn't need to be maintained anymore. Fixes: #414 Change-Id: Ic7969b151544daf2547bd262a9fa03f575626411 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/afr: Make afr_fsync a transactionkarthik-us2018-03-025-164/+117
| | | | | | Change-Id: I713401feb96393f668efb074f2d5b870d19e6fda BUG: 1548361 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/afr: Fix dict-leak in pre-opPranith Kumar K2018-02-283-20/+20
| | | | | | | | | | | | At the time of pre-op, pre_op_xdata is populted with the xattrs we get from the disk and at the time of post-op it gets over-written without unreffing the previous value stored leading to a leak. This is a regression we missed in https://review.gluster.org/#/q/ba149bac92d169ae2256dbc75202dc9e5d06538e BUG: 1550078 Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: store the 'reaction' on failures per lockRaghavendra G2018-02-236-38/+46
| | | | | | | | | | | Currently its passed in dht_blocking_inode(entry)lk, which would be a global value for all the locks passed in the argument. This would be a limitation for cases where we want to ignore failures on only few locks and fail for others. Change-Id: I02cfbcaafb593ad8140c0e5af725c866b630fb6b BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Handle single dht child in dht_lookupN Balachandran2018-02-221-0/+13
| | | | | | | | | | | | | | | This patch limits itself to only handling the case where no file (data or linkto) exists on the subvol. Additional cases to be handled: 1. A linkto file was found on the only child subvol. This currently calls dht_lookup_everywhere which eventually deletes it. It can be deleted directly as it will not be pointing to a valid subvol. 2. Directory lookups - locking might be unnecessary in some cases. Change-Id: I940ba34531f2aaee1d36fd9ca45ecfd46be662a4 BUG: 1546620 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Ignore ENODATA from getxattr for posix aclsN Balachandran2018-02-221-6/+8
| | | | | | | | | dht_migrate_file no longer prints an error if getxattr for posix acls fails with ENODATA/ENOATTR. Change-Id: Id9ecf6852cb5294c1c154b28d609889ea3420e1c BUG: 1546954 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Fixed a typoN Balachandran2018-02-211-2/+2
| | | | | | | | Replaced "then" with "than" Change-Id: I73090e8c1a639befd7c5458e8d63bd173248bc7d BUG: 1547128 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* xlators/features/namespace: Add namespace xlator and link into brick graphVarsha Rao2018-02-211-4/+3
| | | | | | | | | | | | | | | | | | | | | The following release-3.8-fb branch patch is upstreamed: > features/namespace: Add namespace xlator and link into brick graph > Commit ID: dbd30776f26e > https://review.gluster.org/#/c/18041/ > By Michael Goulet <mgoulet@fb.com> Changes in this patch: Removes extra config.h and namespace.h file in namespace.c Adds default_getspec_cbk to libglusterfs.sym Rename dict_for_each to dict_foreach_inline Remove fd.h header file stack.h Add test case for truncate, open and symlink This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: Ib88c95b89eecee9b8957df8a4c8712c899c761d1 Signed-off-by: Varsha Rao <varao@redhat.com>
* posix/afr: handle backward compatibility for rchecksum fopRavishankar N2018-02-193-9/+29
| | | | | | | | | Added a volume option 'fips-mode-rchecksum' tied to op version 4. If not set, rchecksum fop will use MD5 instead of SHA256. updates: #230 Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* gfapi: return pre/post attributes from glfs_ftruncateKinglong Mee2018-02-121-4/+7
| | | | | | Updates: #389 Change-Id: I8faea0828921fb17f05f7321c3cb01747373f21e Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* gfapi: return pre/post attributes from glfs_fsync/fdatasyncKinglong Mee2018-02-121-1/+1
| | | | | | Updates: #389 Change-Id: I4153df72d5eeecefa7579170899db4c340128bea Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* gfapi: return pre/post attributes from glfs_pread/pwriteKinglong Mee2018-02-122-5/+6
| | | | | | | | | | | | | | | As nfs-ganesha, a wcc data contains pre/post attributes is return in read/write rpc reply. nfs-ganesha get those attributes by two getattr between the real read/write right now. But, gluster has return pre/post attributes from glusterfsd, those attributes are skipped in syncop/gfapi, if gfapi return them, the upper user (nfs-ganesha) can use them directly without any duplicate getattr. Updates: #389 Change-Id: I7b643ae4241cfe2aeb17063de00192d81674024a Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* performance/io-threads: expose io-thread queue depthsVarsha Rao2018-02-082-1/+13
| | | | | | | | | | | | | | | | | | | | The following release-3.8-fb branch patch is upstreamed: > io-stats: Expose io-thread queue depths > Commit ID: 69509ee7d2 > https://review.gluster.org/#/c/18143/ > By Shreyas Siravara <sshreyas@fb.com> Changes in this patch: - Replace iot_pri_t with gf_fop_pri_t - Replace IOT_PRI_{HI, LO, NORMAL, MAX, LEAST} with GF_FOP_PRI_{HI, LO, NORMAL, MAX, LEAST} - Use dict_unref() instead of dict_destroy() This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: I1b47a63185a441a30fbc423ca1015df7b36c2518 Signed-off-by: Varsha Rao <varao@redhat.com>
* cluster/dht: Unlink linkto files as rootN Balachandran2018-02-061-3/+7
| | | | | | | | | | | Non-privileged users cannot delete linkto files. However the failure to unlink a stale linkto causes DHT to fail the lookup with EIO and hence prevent access to the file. Change-Id: Id295362d41e52263790694602f36f1219f0646a2 BUG: 1542318 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Cleanup on fallocate failureN Balachandran2018-02-051-1/+17
| | | | | | | | | | | It looks like fallocate leaves a non-empty file behind in case of some failures. We now truncate the file to 0 bytes on failure in __dht_rebalance_create_dst_file. Change-Id: Ia4ad7b94bb3624a301fcc87d9e36c4dc751edb59 BUG: 1541916 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: remove unnecessary child_up initializationXavier Hernandez2018-02-031-7/+0
| | | | | | | | | | | | The child_up array was initialized with all elements being -1 to allow afr_notify() to differentiate down bricks from bricks that haven't reported yet. With current implementation this is not needed anymore and it was causing unexpected results when other parts of the code considered that if child_up[i] != 0, it meant that it was up. Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472 BUG: 1541038 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/dht: Fixed a leak in inode_refN Balachandran2018-02-021-3/+2
| | | | | | | | Introduced by commit d9f773ba719397c128 Change-Id: I3f3103a5a80daed7562ace72e5aa53b77e74fb94 BUG: 1541264 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: avoid overwriting client writes during migrationSusant Palai2018-02-024-12/+138
| | | | | | | | | | | | | | | | | | | | | | | | For more details on this issue see https://github.com/gluster/glusterfs/issues/308 Solution: This is a restrictive solution where a file will not be migrated if a client writes to it during the migration. This does not check if the writes from the rebalance and the client actually do overlap. If dht_writev_cbk finds that the file is being migrated (PHASE1) it will set an xattr on the destination file indicating the file was updated by a non-rebalance client. Rebalance checks if any other client has written to the dst file and aborts the file migration if it finds the xattr. updates gluster/glusterfs#308 Change-Id: I73aec28bc9dbb8da57c7425ec88c6b6af0fbc9dd Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com>