glusterfs.git/tests/basic/afr, branch v8dev

posix/ctime: Fix ctime upgrade issue

2019-06-21T11:09:32+00:00

Problem:
On a EC volume, during upgrade from the older version where
ctime feature is not enabled(or not present) to the newer
version where the ctime feature is available (enabled default),
the self heal hangs and doesn't complete.

Cause:
The ctime feature has both client side code (utime) and
server side code (posix). The feature is driven from client.
Only if the client side sets the time in the frame, should
the server side sets the time attributes in xattr. But posix
setattr/fseattr was not doing that. When one of the server
nodes is updated, since ctime is enabled by default, it
starts setting xattr on setattr/fseattr on the updated node/brick.

On a EC volume the first two updated nodes(bricks) are not a
problem because there are 4 other bricks with consistent data.
However once the third brick is updated, the new attribute(mdata xattr)
will cause an inconsistency on metadata on 3 bricks, which
prevents the file to be repaired.

Fix:
Don't create mdata xattr with utimes/utimensat system call.
Only update if already present.

Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c
fixes: bz#1720201
Signed-off-by: Kotresh HR

tests: Fix split-brain-favorite-child-policy.t failure

2019-06-10T18:13:26+00:00

Problem:
The test case is failing to heal the volume within $HEAL_TIMEOUT @195.
This is happening because as part of split-brain resolution the file
gets expunged from the sink and the new entry mark for that file will
be done on the source bricks as part of impunging. Since the source
bricks shd-threads failed to get the heal-domain lock, they will wait
for the heal-timeout of 10 minutes, which is greater than $HEAL_TIMEOUT.

Fix:
Set the cluster.heal-timeout to 5 seconds to trigger the heal so that
one of the source brick heals the file within the $HEAL_TIMEOUT.

Change-Id: Ie73c578cc5361c0d617a48ccc86026734d20ba8c
fixes: bz#1718998
Signed-off-by: karthik-us

tests: Fix spurious failures in ta-write-on-bad-brick.t

2019-05-24T15:00:28+00:00

Problem:
afr_child_up_status_meta works only when LOOKUP on $M0 is successful.
There are cases where quorum is not met and LOOKUP fails on $M0 which
leads to failures similar to:
grep: /mnt/glusterfs/0/.meta/graphs/active/patchy-replicate-0/private: Transport endpoint is not connected
This was happening once in a while based on attribute-timeout and
md-cache not serving the lookup.

Fix:
Find child-up status based on statedump instead. Also changed mount
options to include --entry-timeout=0 and --attribute-timeout=0

updates bz#1193929
Change-Id: Ic0de72c3006d7399a5feb3e4d10d4748949b2ab3
Signed-off-by: Pranith Kumar K

tests: improve and fix some test scripts

2019-05-09T09:24:10+00:00

Change-Id: Iceefe22af754096c599dc570d4894d14fce4deae
Updates: bz#1193929
Signed-off-by: Xavier Hernandez

cluster/afr: Invalidate inode on change of split-brain-choice

2019-04-05T09:48:50+00:00

When split-brain choice is changed from one brick to another
brick, inode-invalidate is not called so readv call is served
from cache leading to failures in split-brain-resolution.t.
Fixed it by calling inode_invaldate() when this happens.

updates bz#1193929
Change-Id: I2624614eec38c0303f3e1dc55dfae3d4b864218b
Signed-off-by: Pranith Kumar K

cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelog

2019-02-01T05:44:36+00:00

If a fop to create an entry fails on one of the data brick,
we mark the pending changelog on the entry on brick for which
it was successful. This is done as part of post op phase to
make sure that entry gets healed even if it gets renamed to
some other path where its parent was not marked as bad.

As it happens as part of post op, we should consider thin-arbiter
to check if the brick, which was successful, is the good brick or not.
This will avoide split brain and other issues.

Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
updates: bz#1662264
Signed-off-by: Ashish Pandey

tests: run nfs tests only if --enable-gnfs is provided

2019-01-24T15:18:00+00:00

Fixes: bz#1665358
Change-Id: Idbf88ec3ac683733b32c313377eeb72f2819bf0d
Signed-off-by: Amar Tumballi

cluster/afr: Disable client side heals in AFR by default.

2019-01-10T11:10:27+00:00

With this changeset, default value for the AFR client side
heal volume option is set to "off"

fixes: bz#1663102
Change-Id: Ie4016932339c4896487e3e7cb5caca68739b7ba2
Signed-off-by: Sunil Kumar Acharya

cluster/ta: Check number/type of locks held on ta file

2018-12-27T05:46:36+00:00

Change-Id: Iec47856ce2819e7d7d38a60279602e53ba45858d
updates: bz#1624332
Signed-off-by: Ashish Pandey

cluster/afr: Add test for thin-arbiter feature

2018-11-26T07:46:10+00:00

Test : Check success/failure of write fop while
different bricks/ta process are down.

Change-Id: I3c376935df93ebf1f794c964bd19bc1280d91c59
updates: bz#1624332
Signed-off-by: Ashish Pandey