glusterfs.git/xlators/cluster, branch v9dev

dht - Remove "tier" code (part 1)

2020-04-17T04:59:18+00:00

This patch is removing some of the "tier" code in dht xlator, as it is no longer
being used.
Not all of the not-needed code is removed at once, so reviewing is easier.
Follow up patches removing additional unused code will follow.

This is based in the work done in https://review.gluster.org/#/c/glusterfs/+/23935/

Change-Id: I3cb6a0c5d8f14afcd87cf021ef8f74b91c0f908a
updates: #1097
Signed-off-by: Barak Sason Rofman

dht - fixing a permission update issue

2020-04-08T06:57:53+00:00

When bringing back a downed brick and performing lookup from the client
side, the permission on said brick aren't updated on the first lookup,
but only on the second.

This patch modifies permission update logic so the first lookup will
trigger a permission update on the downed brick.

LIMITATIONS OF THE PATCH:
As the choice of source depends on whether the directory has layout or not.
Even the directories on the newly added brick will have layout xattr[zeroed], but the same is not true for a root directory.
Hence, in case in the entire cluster only the newly added bricks are up [and others are down], then any change in permission during this time will be overwritten by the older permissions when the cluster is restarted.

fixes: #999
Change-Id: Ieb70246d41e59f9cae9f70bc203627a433dfbd33
Signed-off-by: Barak Sason Rofman

cluster/afr: Removing unsupported options from code base to improve coverage

2020-04-07T04:26:33+00:00

Support for gluster volume heal  info healed/heal-failed
was removed by commit bb02cfb56ae08f56df4452c2b948fa962ae1212b in
release-3.6. cli parser will display the usage message in all the
supported versions whenever these clis are run, leading to some
dead code in the latest branches. Since support for these clis
were removed long back, this should not give any backward
compatibility issues as well. Hence removing the dead code from
the code base which will lead to better code coverage by the
regression runs as well.

Updates: #1052
Change-Id: I0c2b061469caf233c06d9699b0d159ce48e240b9
Signed-off-by: karthik-us

afr: mark pending xattrs as a part of metadata heal

2020-04-02T04:32:57+00:00

...if pending xattrs are zero for all children.

Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the  BZ.

Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.

Fixes: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N

dht: gf_defrag_process_dir is called even if gf_defrag_fix_layout has failed

2020-03-24T05:14:07+00:00

Currently even though gf_defrag_fix_layout fails with ENOENT or ESTALE, a
subsequent call is made to gf_defrag_process_dir leading to rebalance failure.

fixes: #1102
Change-Id: Ib0c309fd78e89a000fed3feb4bbe2c5b48e61478
Signed-off-by: Susant Palai

cluster/afr: Fixes for halo

2020-03-13T13:20:37+00:00

Current implementation assumes that ping-event will come after connect event
but that may not be the case in the cases where after socket connection fds
need to be re-opened which would consume more time. So handle any order of the
ping/child-up events.

fixes: bz#1800583
Change-Id: I6bcdc0caa503bdc039ef2b4739fbf4afae121f05
Signed-off-by: Pranith Kumar K

dht - selfheal code cleaning

2020-03-12T07:31:30+00:00

1 - Converted methods to static
2 - Removed unused code

Change-Id: I49db3e28116da1c3c9ff0a33dcce7281bc3856f7
updates: bz#1193929
Signed-off-by: Barak Sason Rofman

dht/rebalance - fixing failure occurace due to rebalance stop

2020-03-04T09:40:24+00:00

Probelm description:
When topping rebalance, the following error messages appear in the
rebalance log file:
[2020-01-28 14:31:42.452070] W [dht-rebalance.c:3447:gf_defrag_process_dir] 0-distrep-dht: Found error from gf_defrag_get_entry
[2020-01-28 14:31:42.452764] E [MSGID: 109111] [dht-rebalance.c:3971:gf_defrag_fix_layout] 0-distrep-dht: gf_defrag_process_dir failed for directory: /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31
[2020-01-28 14:31:42.453498] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30

In order to avoid seing these error messages, a modification to the
error handling mechanism has been made.
In addition, several log messages had been added in order to improve debugging efficiency

fixes: bz#1800956
Change-Id: Ifc82dae79ab3da9fe22ee25088a2a6b855afcfcf
Signed-off-by: Barak Sason Rofman

xlator/dht-helper: structure logging

2020-03-03T12:00:02+00:00

convert gf_msg() to gf_smsg()

Updates: #657

Change-Id: Iab35ac89b7d7fb6fb0074fc61b11bf679c517c9d
Signed-off-by: yatipadia 
Signed-off-by: yatip

cluster/afr: fix race when bricks come up

2020-03-02T07:13:55+00:00

The was a problem when self-heal was sending lookups at the same time
that one of the bricks was coming up. In this case there was a chance
that the number of 'up' bricks changes in the middle of sending the
requests to subvolumes which caused a discrepancy in the expected
number of replies and the actual number of sent requests.

This discrepancy caused that AFR continued executing requests before
all requests were complete. Eventually, the frame of the pending
request was destroyed when the operation terminated, causing a use-
after-free issue when the answer was finally received.

In theory the same thing could happen in the reverse way, i.e. AFR
tries to wait for more replies than sent requests, causing a hang.

Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
Signed-off-by: Xavi Hernandez 
Fixes: bz#1808875