glusterfs.git/xlators, branch v5.13

open-behind: fix missing fd reference

2020-04-06T09:42:32+00:00

Open behind was not keeping any reference on fd's pending to be
opened. This makes it possible that a concurrent close and en entry
fop (unlink, rename, ...) caused destruction of the fd while it
was still being used.

Change-Id: Ie9e992902cf2cd7be4af1f8b4e57af9bd6afd8e9
Fixes: #1028
Signed-off-by: Xavi Hernandez

features/shard: Fix crash during shards cleanup in error cases

2020-04-06T09:41:56+00:00

A crash is seen during a reattempt to clean up shards in background
upon remount. And this happens even on remount (which means a remount
is no workaround for the crash).

In such a situation, the in-memory base inode object will not be
existent (new process, non-existent base shard).
So local->resolver_base_inode will be NULL.

In the event of an error (in this case, of space running out), the
process would crash at the time of logging the error in the following line -

        gf_msg(this->name, GF_LOG_ERROR, local->op_errno, SHARD_MSG_FOP_FAILED,
               "failed to delete shards of %s",
               uuid_utoa(local->resolver_base_inode->gfid));

Fixed that by using local->base_gfid as the source of gfid when
local->resolver_base_inode is NULL.

Change-Id: I0b49f2b58becd0d8874b3d4b14ff8d92a89d02d5
Fixes: #1127
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit cc43ac8651de9aa508b01cb259b43c02d89b2afc)

cluster/afr: fix race when bricks come up

2020-04-06T04:49:36+00:00

The was a problem when self-heal was sending lookups at the same time
that one of the bricks was coming up. In this case there was a chance
that the number of 'up' bricks changes in the middle of sending the
requests to subvolumes which caused a discrepancy in the expected
number of replies and the actual number of sent requests.

This discrepancy caused that AFR continued executing requests before
all requests were complete. Eventually, the frame of the pending
request was destroyed when the operation terminated, causing a use-
after-free issue when the answer was finally received.

In theory the same thing could happen in the reverse way, i.e. AFR
tries to wait for more replies than sent requests, causing a hang.

Backport of:
> Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
> Signed-off-by: Xavi Hernandez 
> Fixes: bz#1808875

Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
Signed-off-by: Xavi Hernandez 
Fixes: bz#1809440

afr: prevent spurious entry heals leading to gfid split-brain

2020-04-06T04:46:50+00:00

Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: #1103
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N 
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)

afr: mark pending xattrs as a part of metadata heal

2020-04-02T04:35:50+00:00

...if pending xattrs are zero for all children.

Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the  BZ.

Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.

Updates: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N 
(cherry picked from commit 2d5ba449e9200b16184b1e7fc84cabd015f1f779)

Cluster/afr: Don't treat all bricks having metadata pending as split-brain

2020-03-02T09:24:06+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.

Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.

Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.

Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1806931
Signed-off-by: karthik-us

protocol/client: Do not fallback to anon-fd if fd is not open

2020-03-02T08:08:17+00:00

If an open comes on a file when a brick is down and after the brick comes up,
a fop comes on the fd, client xlator would still wind the fop on anon-fd
leading to wrong behavior of the fops in some cases.

Example:
If lk fop is issued on the fd just after the brick is up in the scenario above,
lk fop will be sent on anon-fd instead of failing it on that client xlator.
This lock will never be freed upon close of the fd as flush on anon-fd is
invalid and is not wound below server xlator.

As a fix, failing the fop unless the fd has FALLBACK_TO_ANON_FD flag.

> Change-Id: I77692d056660b2858e323bdabdfe0a381807cccc
> fixes bz#1390914
> Signed-off-by: Pranith Kumar K 

Change-Id: I77692d056660b2858e323bdabdfe0a381807cccc
fixes bz#1808256
Signed-off-by: Mohit Agrawal

afr: wake up index healer threads

2020-02-27T07:16:20+00:00

Backport of https://review.gluster.org/#/c/glusterfs/+/23288/

...whenever shd is re-enabled after disabling or there is a change in
`cluster.heal-timeout`, without needing to restart shd or waiting for the
current `cluster.heal-timeout` seconds to expire.

See BZ 1743988 for more details.

Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe
fixes: bz#1807431
Reported-by: Glen Kiessling 
Signed-off-by: Ravishankar N

cluster/ec: Change handling of heal failure to avoid crash

2020-02-25T07:04:17+00:00

Problem:
ec_getxattr_heal_cbk was called with NULL as second argument
in case heal was failing.
This function was dereferencing "cookie" argument which caused crash.

Solution:
Cookie is changed to carry the value that was supposed to be
stored in fop->data, so even in the case when fop is NULL in error
case, there won't be any NULL dereference.

Thanks to Xavi for the suggestion about the fix.

Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c
fixes: bz#1805057

cluster/ec: Update lock->good_mask on parent fop failure

2020-02-25T07:04:17+00:00

When discard/truncate performs write fop, it should do so
after updating lock->good_mask to make sure readv happens
on the correct mask

fixes: bz#1805056
Change-Id: Idfef0bbcca8860d53707094722e6ba3f81c583b7
Signed-off-by: Pranith Kumar K