glusterfs.git/tests/bugs, branch v4.0.1

cluster/ec: avoid delays in self-heal

2018-03-15T07:20:10+00:00

Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Backport of:
> BUG: 1547662

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1555198
Signed-off-by: Xavi Hernandez

glusterd: import volumes in separate synctask

2018-02-21T15:35:15+00:00

With brick multiplexing, to attach a brick to an existing brick process
the prerequisite is to have the compatible brick to finish it's
initialization and portmap sign in and hence the thread might have to go
to a sleep and context switch the synctask to allow the brick process to
communicate with glusterd. In normal code path, this works fine as
glusterd_restart_bricks () is launched through a separate synctask.

In case there's a mismatch of the volume when glusterd restarts,
glusterd_import_friend_volume is invoked and then it tries to call
glusterd_start_bricks () from the main thread which eventually may land
into the similar situation. Now since this is not done through a
separate synctask, the 1st brick will never be able to get its turn to
finish all of its handshaking and as a consequence to it, all the bricks
will fail to get attached to it.

Solution : Execute import volume and glusterd restart bricks in separate
synctask. Importing snaps had to be also done through synctask as
there's a dependency of the parent volume need to be available for the
importing snap functionality to work.

>mainline patch : https://review.gluster.org/#/c/19357/
                  https://review.gluster.org/#/c/19536/
                  https://review.gluster.org/#/c/19539/

Change-Id: I290b244d456afcc9b913ab30be4af040d340428c
BUG: 1543706
Signed-off-by: Atin Mukherjee 
(cherry picked from commit cb0339f9229fc5c05d7ef4cfcc4ca9c4569f3755)

afr: don't treat all cases all bricks being blamed as split-brain

2018-02-06T14:25:14+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not
met. Though the FOP is still unwound with failure, the xattrs remain on
the disk.  Due to these partial post-ops and partial heals (healing only when
2 bricks are up), we can end up in split-brain purely from the afr
xattrs point of view i.e each brick is blamed by atleast one of the
others. These scenarios are hit when there is frequent
connect/disconnect of the client/shd to the bricks while I/O or heal
are in progress.

Fix:
Instead of undoing the post-op, pick a source based on the xattr values.
If 2 bricks blame one, the blamed one must be treated as sink.
If there is no majority, all are sources. Once we pick a source,
self-heal will then do the heal instead of erroring out due to
split-brain.

Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae
BUG: 1542380
Signed-off-by: Ravishankar N 
(cherry picked from commit 0e6e8216823c2d9dafb81aae0f6ee3497c23d140)

md-cache: Implement dynamic configuration of xattr list for caching

2018-01-22T04:10:52+00:00

Currently, the list of xattrs that md-cache can cache is hard coded
in the md-cache.c file, this necessiates code change and rebuild
everytime a new xattr needs to be added to md-cache xattr cache
list.

With this patch, the user will be able to configure a comma
seperated list of xattrs to be cached by md-cache

Updates #297

Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e
Signed-off-by: Poornima G

afr: add quorum checks in post-op

2018-01-19T08:10:45+00:00

afr relies on pending changelog xattrs to identify source and sinks and the
setting of these xattrs happen in post-op. So if post-op fails, we need to
unwind the write txn with a failure.

Change-Id: I0f019ac03890108324ee7672883d774918b20be1
BUG: 1506140
Signed-off-by: Ravishankar N

upcall: Allow md-cache to specify invalidations on xattr with wildcard

2018-01-19T03:59:30+00:00

Currently, md-cache sends a list of xattrs, it is inttrested in recieving
invalidations for. But, it cannot specify any wildcard in the xattr names
Eg: user.* - invalidate on updating any xattr with user. prefix.

This patch, enable upcall to honor wildcard in the xattr key names

Updates: #297

Change-Id: I98caf0ed72f11ef10770bf2067d4428880e0a03a
Signed-off-by: Poornima G

posix: delete stale gfid handles in nameless lookup

2018-01-16T03:45:03+00:00

..in order for self-heal of symlinks to work properly (see BZ for
details).

Change-Id: I9a011d00b07a690446f7fd3589e96f840e8b7501
BUG: 1529488
Signed-off-by: Ravishankar N

tests: check volume status for shd being up

2018-01-12T05:56:15+00:00

so that glusterd is also aware that shd is up and running.

While not reproducible locally, on the jenkins slaves, 'gluster vol heal patchy'
fails with "Self-heal daemon is not running. Check self-heal daemon log file.",
while infact the afr_child_up_status_in_shd() checks before that passed. In the
shd log also, I see the shd being up and connected to at least one brick before
the heal is launched.

Change-Id: Id3801fa4ab56a70b1f0bd6a7e240f69bea74a5fc
BUG: 1515163
Signed-off-by: Ravishankar N

Revert "rpc: merge ssl infra with epoll infra"

2018-01-07T03:55:51+00:00

This reverts commit 56e5fdae74845dfec0ff7ad0c8fee77695d36ad5.

Change-Id: Ia62cee5440bbe8e23f5da9cff692d792091d544a
Signed-off-by: Milind Changire

cluster/ec: OpenFD heal implementation for EC

2018-01-05T06:55:44+00:00

Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.

Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.

BUG: 1431955
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya