glusterfs.git/xlators/cluster, branch v5.7

cluster/ec: honor contention notifications for partially acquired locks

2019-06-28T11:07:08+00:00

EC was ignoring lock contention notifications received while a lock was
being acquired. When a lock is partially acquired (some bricks have
granted the lock but some others not yet) we can receive notifications
from acquired bricks, which should be honored, since we may not receive
more notifications after that.

Since EC was ignoring them, once the lock was acquired, it was not
released until the eager-lock timeout, causing unnecessary delays on
other clients.

This fix takes into consideration the notifications received before
having completed the full lock acquisition. After that, the lock will
be releaed as soon as possible.

Backport of:
> BUG: bz#1708156
> Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
> Signed-off-by: Xavi Hernandez 

Fixes: bz#1717282
Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
Signed-off-by: Xavi Hernandez 
Signed-off-by: Hari Gowtham

ec: fix truncate lock to cover the write in tuncate clean

2019-05-08T14:20:13+00:00

ec_truncate_clean does writing under the lock granted for truncate,
but the lock is calculated by ec_adjust_offset_up, so that,
the write in ec_truncate_clean is out of lock.

Updates: bz#1699500
Change-Id: I15ed1b0807d75c5eb817323f1c227e97d03e0e7c
Signed-off-by: Kinglong Mee 
(cherry picked from commit 0e1223491e964096384edfae5032ed0d50d028ad)

cluster/afr: Remove local from owners_list on failure of lock-acquisition

2019-05-08T14:07:12+00:00

When eager-lock lock acquisition fails because of say network failures, the
local is not being removed from owners_list, this leads to accumulation of
waiting frames and the application will hang because the waiting frames are
under the assumption that another transaction is in the process of acquiring
lock because owner-list is not empty. Handled this case as well in this patch.
Added asserts to make it easier to find these problems in future.

fixes bz#1699736
Change-Id: I3101393265e9827755725b1f2d94a93d8709e923
Signed-off-by: Pranith Kumar K

cluster/dht: Request linkto xattrs in dht_rmdir opendir

2019-04-10T07:51:40+00:00

If parallel-readdir is enabled, the rda xlator is loaded
below dht in the graph and proactively lists and caches
entries when an opendir is performed. dht_rmdir checks if
the directory being deleted contains stale linkto files by
performing a readdirp on its child subvols. However, as
the entries are actually read in during the opendir operation
which does not request the linkto xattr,no linkto xattrs are
present for the entries causing dht to incorrectly identify
them as data files and fail the rmdir operation with ENOTEMPTY.
DHT now always adds the linkto xattr in the list of xattrs
requested in the opendir.

Change-Id: I0711198e66c59146282eb8b88084170bedfb4018
fixes: bz#1695399
Signed-off-by: N Balachandran 
(cherry picked from commit 110006bbcd5bb3e814b4cfe7d74cb41891ac3b0c)

cluster/dht: Fix lookup selfheal and rmdir race

2019-04-08T14:06:17+00:00

A race between the lookup selfheal and rmdir can cause
directories to be healed only on non-hashed subvols.
This can prevent the directory from being listed from
the mount point and in turn causes rm -rf to fail with
ENOTEMPTY.
Fix: Update the layout information correctly and reduce
the call count only after processing the response.

Change-Id: I812779aaf3d7bcf24aab1cb158cb6ed50d212451
fixes: bz#1695403
Signed-off-by: N Balachandran 
(cherry picked from commit b0f1d782fc45313fce4e1c0e74127401d5342d05)

cluster/afr: Send truncate on arbiter brick from SHD

2019-03-12T07:21:26+00:00

Problem:
In an arbiter volume configuration SHD will not send any writes onto the arbiter
brick even if there is data pending marker for the arbiter brick. If we have a
arbiter setup on the geo-rep master and there are data pending markers for the files
on arbiter brick, SHD will not mark any data changelog during healing. While syncing
the data from master to slave, if the arbiter-brick is considered as ACTIVE, then
there is a chance that slave will miss out some data. If the arbiter brick is being
newly added or replaced there is a chance of slave missing all the data during sync.

Fix:
If there is data pending marker for the arbiter brick, send truncate on the arbiter
brick during heal, so that it will record truncate as the data transaction in changelog.

Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec
fixes: bz#1687687
Signed-off-by: karthik-us

cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelog

2019-02-18T14:39:55+00:00

If a fop to create an entry fails on one of the data brick,
we mark the pending changelog on the entry on brick for which
it was successful. This is done as part of post op phase to
make sure that entry gets healed even if it gets renamed to
some other path where its parent was not marked as bad.

As it happens as part of post op, we should consider thin-arbiter
to check if the brick, which was successful, is the good brick or not.
This will avoide split brain and other issues.

>Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
>Signed-off-by: Ashish Pandey 

Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
updates: bz#1672314
Signed-off-by: Ashish Pandey

cluster/dht: Delete invalid linkto files in rmdir

2019-02-04T14:51:06+00:00

rm -rf  fails on dirs which contain linkto files
that point to themselves because dht incorrectly thought
that they were cached files after looking them up.
The fix now treats them as invalid linkto files
and deletes them.

Change-Id: I376c72a5309714ee339c74485e02cfb4e29be643
fixes: bz#1671611
Signed-off-by: N Balachandran

cluster/dht: sync brick root perms on add brick

2018-12-26T16:40:30+00:00

If a single brick is added to the volume and the
newly added brick is the first to respond to a
dht_revalidate call, its stbuf will not be merged
into local->stbuf as the brick does not yet have
a layout. The is_permission_different check therefore
fails to detect that an attr heal is required as it
only considers the stbuf values from existing bricks.
To fix this, merge all stbuf values into local->stbuf
and use local->prebuf to store the correct directory
attributes.

Change-Id: Ic9e8b04a1ab9ed1248b6b056e3450bbafe32e1bc
fixes: bz#1660736
Signed-off-by: N Balachandran

afr: thin-arbiter 2 domain locking and in-memory state

2018-12-12T14:26:42+00:00

2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.

- All further write txn failures are handled based on this in-memory
value without querying the TA.

- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.

- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.

- Any write txns got after the release request are maintained in a ta_waitq.

- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.

- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.

Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1648205
Signed-off-by: Ravishankar N 
Signed-off-by: Ashish Pandey