glusterfs.git/tests/bugs/replicate, branch v9dev

afr: mark pending xattrs as a part of metadata heal

2020-04-02T04:32:57+00:00

...if pending xattrs are zero for all children.

Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the  BZ.

Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.

Fixes: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N

afr: prevent spurious entry heals leading to gfid split-brain

2020-02-18T09:11:40+00:00

Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1801624
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N

afr: restore timestamp of files during metadata heal

2020-01-24T04:35:38+00:00

For files: During metadata heal, we restore timestamps
only for non-regular (char, block etc.) files.
Extenting it for regular files as timestamp is updated
via touch command also

fixes: bz#1787274
Change-Id: I26fe4fb6dff679422ba4698a7f828bf62ca7ca18
Signed-off-by: Sheetal Pamecha

tests: Fix spurious failure

2019-10-16T12:31:03+00:00

fixes: bz#1759002
Change-Id: I4d49e1c2ca9b3c1d74b9dd5a30f1c66983a76529
Signed-off-by: Pranith Kumar K

Fix spurious failure in bug-1744548-heal-timeout.t

2019-10-09T07:17:49+00:00

Script was assuming that the heal would have triggered
by the time test was executed, which may not be the case.
It can lead to following failures when the race happens:

...
18:29:45 not ok  14 [     85/      1] <  26> '[ 331 == 333 ]' -> ''
...
18:29:45 not ok  16 [  10097/      1] <  33> '[ 668 == 666 ]' -> ''

Heal on 3rd brick didn't start completely first time the command was executed.
So the extra count got added to the next profile info.

Fixed it by depending on cumulative stats and waiting until the count is
satisfied using EXPECT_WITHIN

fixes: bz#1759002
Change-Id: I3b410671c902d6b1458a757fa245613cb29d967d
Signed-off-by: Pranith Kumar K

cluster/afr: Heal entries when there is a source & no healed_sinks

2019-10-09T07:16:11+00:00

Problem:
In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame
anything for entry heal, heal will not complete even though we have
clear source and sinks. This will happen because while doing
afr_selfheal_find_direction() only the bricks which are blamed by
non-accused bricks are considered as sinks. Later in
__afr_selfheal_entry_finalize_source() when it tries to mark all the
non-sources as sinks it fails to do so because there won't be any
healed_sinks marked, no witness present and there will be a source.

Fix:
If there is a source and no healed_sinks, then reset all the locked
sources to 0 and healed sinks to 1 to do conservative merge.

Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548
Fixes: bz#1749322
Signed-off-by: karthik-us

tests: Fix spurious failure in bug-1134691-afr-lookup-metadata-heal.t

2019-10-09T06:40:05+00:00

Problem:
The .t was examining the sink brick's iatt value before the launched
client-side metadata heal got a chance to complete.

Fix:
Wait for heal completion.

Fixes: bz#1759081
Change-Id: I4dd4e3a1cccf35fd18e8cdfea6aa76a726a4763b
Signed-off-by: Ravishankar N

afr: support split-brain CLI for replica 3

2019-10-09T06:35:43+00:00

Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that there are
a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Likewise, the check is added in shard_lookup as well to permit resolving
split-brains by specifying "/.shard/shard-file.xx" as the file name
(which previously used to fail with EPERM).

Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9
Fixes: bz#1756938
Signed-off-by: Ravishankar N

ctime/rebalance: Heal ctime xattr on directory during rebalance

2019-09-16T06:42:29+00:00

After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.

Note that ctime still doesn't support consistent time across
distribute sub-volume.

This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.

Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
fixes: bz#1734026
Signed-off-by: Kotresh HR

tests: Fix spurious failure

2019-09-11T09:16:03+00:00

If heal from next brick starts after the first brick completes heal, then
opendir on the brick can change atime leading to failure of the test. When
ctime is disabled it is better to just check mtime to be same after heal.

fixes: bz#1751134
Change-Id: Ia03e30fd547e6bbe85c1e299845ffa122f3a2692
Signed-off-by: Pranith Kumar K