glusterfs.git/xlators, branch v7.4

cluster/ec: Change handling of heal failure to avoid crash

2020-03-17T14:27:36+00:00

Problem:
ec_getxattr_heal_cbk was called with NULL as second argument
in case heal was failing.
This function was dereferencing "cookie" argument which caused crash.

Solution:
Cookie is changed to carry the value that was supposed to be
stored in fop->data, so even in the case when fop is NULL in error
case, there won't be any NULL dereference.

Thanks to Xavi for the suggestion about the fix.

Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c
updates: #1061

glusterd: Brick process fails to come up with brickmux on

2020-03-17T06:04:40+00:00

Issue:
1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1,
vol2, vol3 with 3 bricks (one from each node)
2- Set cluster.brick-multiplex on
3- Start all 3 volumes
4- Check if all bricks on a node are running on same port
5- Kill N1
6- Set performance.readdir-ahead for volumes vol1, vol2, vol3
7- Bring N1 up and check volume status
8- All bricks processes not running on N1.

Root Cause -
Since, There is a diff in volfile versions in N1 as compared
to N2 and N3 therefore glusterd_import_friend_volume() is called.
glusterd_import_friend_volume() copies the new_volinfo and deletes
old_volinfo and then calls glusterd_start_bricks().
glusterd_start_bricks() looks for the volfiles and sends an rpc
request to glusterfs_handle_attach(). Now, since the volinfo
has been deleted by glusterd_delete_stale_volume()
from priv->volumes list before glusterd_start_bricks() and
glusterd_create_volfiles_and_notify_services() and
glusterd_list_add_order is called after glusterd_start_bricks(),
therefore the attach RPC req gets an empty volfile path
and that causes the brick to crash.

Fix- Call glusterd_list_add_order() and
glusterd_create_volfiles_and_notify_services before
glusterd_start_bricks() cal is made in glusterd_import_friend_volume

> Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
> Bug: bz#1773856
> Signed-off-by: Mohammed Rafi KC 
(cherry picked from commit 45e81aae791da9d013aba2286af44826227c05ec)

Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
fixes: bz#1808964
Signed-off-by: Sanju Rakonde

glusterd: stop stale bricks during handshaking in brick mux mode

2020-03-16T08:25:40+00:00

This patch addresses two problems:

1. During friend handshaking, if a volume is imported due to change in
the version, the old bricks were not stopped which would lead to a
situation where bricks will run with old volfiles.

2. As part of attaching shd service in glusterd_attach_svc, there might
be a case that the volume for which we're attempting to attach a shd
service might become stale and in the process of deletion and hence in
every retrials (if the rpc connection isn't ready) check for the
existance of the volume and then only attempt the further attach
request.

patch on master: https://review.gluster.org/#/c/glusterfs/+/23042/

> Bug: bz#1733425
> Change-Id: I6bac6b871f7e31cb5bf277db979289dec196a03e
> Signed-off-by: Atin Mukherjee 
> Signed-off-by: Mohammed Rafi KC 

fixes: bz#1812849
Change-Id: I6bac6b871f7e31cb5bf277db979289dec196a03e
Signed-off-by: Sanju Rakonde

multiple: fix bad type cast

2020-03-16T08:21:00+00:00

When using inode_ctx_get() or inode_ctx_set(), a 'uint64_t *' is expected.
In many cases, the value to retrieve or store is a pointer, which will be
of smaller size in some architectures (for example 32-bits). In this case,
directly passing the address of the pointer casted to an 'uint64_t *' is
wrong and can cause memory corruption.

Backport of:
> Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258
> Signed-off-by: Xavi Hernandez 
> Fixes: bz#1785611

Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258
Signed-off-by: Xavi Hernandez 
Fixes: bz#1785323

cluster/afr: fix race when bricks come up

2020-03-16T08:18:51+00:00

The was a problem when self-heal was sending lookups at the same time
that one of the bricks was coming up. In this case there was a chance
that the number of 'up' bricks changes in the middle of sending the
requests to subvolumes which caused a discrepancy in the expected
number of replies and the actual number of sent requests.

This discrepancy caused that AFR continued executing requests before
all requests were complete. Eventually, the frame of the pending
request was destroyed when the operation terminated, causing a use-
after-free issue when the answer was finally received.

In theory the same thing could happen in the reverse way, i.e. AFR
tries to wait for more replies than sent requests, causing a hang.

Backport of:
> Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
> Signed-off-by: Xavi Hernandez 
> Fixes: bz#1808875

Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
Signed-off-by: Xavi Hernandez 
Fixes: bz#1809438

cluster/ec: skip updating ctx->loc again when ec_fix_open/opendir

2020-03-16T08:01:00+00:00

The ec_manager_open/opendir memsets ctx->loc which causes
memory/inode leak, and ec_fheal uses ctx->loc out of fd->lock
that loc_copy may copy bad data when memset it.

This patch skips updating ctx->loc when it is initilizaed.
With it, ctx->loc is filled once, and never updated.

Change-Id: I3bf5ffce4caf4c1c667f7acaa14b451d37a3550a
fixes: bz#1806843
Signed-off-by: Kinglong Mee

afr: prevent spurious entry heals leading to gfid split-brain

2020-02-25T17:07:04+00:00

Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1804591
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N 
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)

cluster/thin-arbiter: Wait for TA connection before ta-file lookup

2020-02-18T06:49:31+00:00

Problem:
When we mount a ta volume, as soon as 2 data bricks are connected
we consider that the mount is done and then send a lookup/create on
ta file on ta node. However, this connection with ta node might not
have been completed.
Due to this delay, ta replica id file will not be created and we
will see ENOTCONN error in log file if we do lookup.

Solution:
As we know that this ta node could have a higher latency, we should
wait for reasonable time for connection to happen before sending
lookup/create on replica id file.

fixes: bz#1804058
Change-Id: I36f90865afe617e4e84cee57fec832a16f5dd6cc
(cherry picked from commit a7fa54ddea3fe429f143b37e4de06a93b49d776a)

volgen: make thin-arbiter name unique in 'pending-xattr' option

2020-02-17T09:52:06+00:00

Thin-arbiter module makes use of 'pending-xattr' name for the translator
as the filename which gets created in thin-arbiter node. By making this
unique, we can host single thin-arbiter node for multiple clusters.

Updates: #763
Change-Id: Ib3c732e7e04e6dba229e71ae3e64f1f3cb6d794d
Signed-off-by: Amar Tumballi 
(cherry picked from commit 8db8202f716fd24c8c52f8ee5f66e169310dc9b1)

tests: Fix spurious self-heald.t failure

2020-02-13T07:12:46+00:00

Problem:
heal-info code assumes that all indices in xattrop directory
definitely need heal. There is one corner case.
The very first xattrop on the file will lead to adding the
gfid to 'xattrop' index in fop path and in _cbk path it is
removed because the fop is zero-xattr xattrop in success case.
These gfids could be read by heal-info and shown as needing heal.

Fix:
Check the pending flag to see if the file definitely needs or
not instead of which index is being crawled at the moment.

fixes: bz#1802449
Change-Id: I79f00dc7366fedbbb25ec4bec838dba3b34c7ad5
Signed-off-by: Pranith Kumar K 
(cherry picked from commit d27df94016b5526c18ee964d4a47508326329dda)