glusterfs.git/xlators/cluster, branch release-3.12

afr: prevent winding inodelks twice for arbiter volumes

2018-10-12T04:09:30+00:00

Backport of https://review.gluster.org/#/c/glusterfs/+/21380/

Problem:
In an arbiter volume, if there is a pending data heal of a file only on
arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks
it only once, leaving behind a stale lock on the brick. This causes
the next write to the file to hang.

Fix:
Fix the code-bug to take lock only once. This bug was introduced master
with commit eb472d82a083883335bc494b87ea175ac43471ff

Thanks to  Pranith Kumar K  for finding the RCA.

fixes: bz#1637989
Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1
BUG: 1637989
Signed-off-by: Ravishankar N

cluster/afr: Delegate metadata heal with pending xattrs to SHD

2018-10-12T04:08:57+00:00

Problem:
When metadata-self-heal is triggered on the mount, it blocks
lookup until metadata-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs metadata heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
Only when the heal is needed but the pending xattrs are not set,
trigger metadata heal that could block lookup. This is the only
case where different clients may give different metadata to the
clients without heals, which should be avoided.

Updates bz#1625588
Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66
Signed-off-by: Pranith Kumar K

cluster/afr: Delegate name-heal when possible

2018-10-12T04:08:32+00:00

Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc

fixes bz#1625588
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K

dht: Fill first_up_subvol before use in dht_opendir

2018-10-10T05:22:52+00:00

Reported by: Sam McLeod

Change-Id: Ic8f9b46b173796afd70aff1042834b03ac3e80b2
BUG: 1512371
Signed-off-by: Poornima G

afr: fix incorrect reporting of directory split-brain

2018-10-10T05:20:42+00:00

Backport of https://review.gluster.org/#/c/glusterfs/+/21135/

Problem:
When a directory has dirty xattrs due to failed post-ops or when
replace/reset brick is performed, AFR does a conservative merge as
expected, but heal-info reports it as split-brain because there are no
clear sources.

Fix:
Modify pending flag to contain information about pending heals and
split-brains. For directories, if spit-brain flag is not set,just show
them as needing heal and not being in split-brain.

Fixes: bz#1633625
Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383
BUG: 1633625
Signed-off-by: Ravishankar N

cluster/afr: Fix dict-leak in pre-op

2018-08-17T05:34:35+00:00

At the time of pre-op, pre_op_xdata is populted with the xattrs we get from the
disk and at the time of post-op it gets over-written without unreffing the
previous value stored leading to a leak.
This is a regression we missed in
https://review.gluster.org/#/q/ba149bac92d169ae2256dbc75202dc9e5d06538e


Originally:
> Signed-off-by: Pranith Kumar K 
> (cherry picked from commit e7b79c59590c203c65f7ac8548b30d068c232d33)

Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7
Fixes: bz#1613512
Signed-off-by: Amar Tumballi

afr: don't update readables if inode refresh failed on all children

2018-07-11T14:04:25+00:00

Backport of: https://review.gluster.org/#/c/20029/
3.12 still supports quorum-reads, hence modified afr_inode_refresh_done() to
support that.

If inode refresh failed on all children of afr due to ENOENT (say file
migrated by dht), it resets the readables to zero. Any inflight txn which
then later comes on the inode fails with EIO because no readable
children present for the inode.

Fix:
Don't update readables when inode refresh fails on *all* children of
afr. In that way any inflight txns will either proceed with its own inode
refresh if needed and fail it with the right errno or use the old value
of readables and continue with the txn.

Also, add quorum checks to the beginning of afr_transaction(). Otherwise, we
seem to be winding the lock and checking for quorum only in pre-op pahse.

Note: This should ideally fix BZ 1329505 since the stop gap fix for
it is has been reverted at https://review.gluster.org/#/c/20028.

Change-Id: I82990769f01be918a073fec83fc67ba4b3be24b1
BUG: 1599247
Signed-off-by: Ravishankar N

afr: heal gfids when file is not present on all bricks

2018-07-11T14:03:47+00:00

Backport of https://review.gluster.org/#/c/20271/ (only change is in .t)

commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression
wherein if a file is present in only 1 brick of replica *and* doesn't
have a gfid associated with it, it doesn't get healed upon the next
lookup from the client. Fix it.

Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599
BUG: 1598121
fixes: bz#1598121
Signed-off-by: Ravishankar N 
(cherry picked from commit eb472d82a083883335bc494b87ea175ac43471ff)

afr: fix bug-1363721.t failure

2018-07-09T10:03:07+00:00

Backport of https://review.gluster.org/#/c/20036/
Note:  We need to update inode context's write_subvol even in case of compound
fops. This is not there in master and 4.1 since compound FOPS was removed in it.

Problem:
In the .t, when the only good brick was brought down, writes on the fd were
still succeeding on the bad bricks. The inflight split-brain check was
marking the write as failure but since the write succeeded on all the
bad bricks, afr_txn_nothing_failed() was set to true and we were
unwinding writev with success to DHT and then catching the failure in
post-op in the background.

Fix:
Don't wind the FOP phase if the write_subvol (which is populated with readable
subvols obtained in pre-op cbk) does not have at least 1 good brick which was up
when the transaction started.

Change-Id: I4a1fef4569609c31cffeaef591a64c10870e8d0b
BUG: 1598720
Signed-off-by: Ravishankar N

afr: add quorum checks in pre-op

2018-07-06T01:42:54+00:00

Backport of https://review.gluster.org/#/c/19781/

Problem:
We seem to be winding the FOP if pre-op did not succeed on quorum bricks
and then failing the FOP with EROFS since the fop did not meet quorum.
This essentially masks the actual error due to which pre-op failed. (See
BZ).

Fix:
Skip FOP phase if pre-op quorum is not met and go to post-op.

Change-Id: Ie58a41e8fa1ad79aa06093706e96db8eef61b6d9
BUG: 1597154
Signed-off-by: Ravishankar N