glusterfs.git/xlators/cluster/ec/src/ec-heald.c, branch v8.2

cluster/ec: Improve detection of new heals

2020-08-19T18:00:31+00:00

When EC successfully healed a directory it assumed that maybe other
entries inside that directory could have been created, which could
require additional heal cycles. For this reason, when the heal happened
as part of one index heal iteration, it triggered a new iteration.

The problem happened when the directory was healthy, so no new entries
were added, but its index entry was not removed for some reason. In
this case self-heal started and endless loop healing the same directory
continuously, cause high CPU utilization.

This patch improves detection of new files added to the heal index so
that a new index heal iteration is only triggered if there is new work
to do.

Change-Id: I2355742b85fbfa6de758bccc5d2e1a283c82b53f
Fixes: #1354
Signed-off-by: Xavi Hernandez

multiple files: another attempt to remove includes

2019-06-14T16:50:32+00:00

There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )

Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.

Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul

ec/fini: Fix race with ec_fini and ec_notify

2019-05-21T12:57:39+00:00

During a graph cleanup, we first sent a PARENT_DOWN and wait for
a child down to ultimately free the xlator and the graph.

In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event.
But a racing event like CHILD_UP or event xl_op may trigger healing threads
after threads cleanup.

So there is a chance that the threads might access a freed private variabe

Change-Id: I252d10181bb67b95900c903d479de707a8489532
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC

ec/shd: Cleanup self heal daemon resources during ec fini

2019-05-13T06:18:14+00:00

We were not properly cleaning self-heal daemon resources
during ec fini. With shd multiplexing, it is absolutely
necessary to cleanup all the resources during ec fini.

Change-Id: Iae4f1bce7d8c2e1da51ac568700a51088f3cc7f2
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC

cluster/ec: fix shd healer wait timeout

2019-05-06T10:56:13+00:00

Right now, the timeout is written by hard code,
fix it by using heal-timeout.

fixes: bz#1703020
Change-Id: I0d154e7807f9dba7efc3896805559bbfaa7af2ad
Signed-off-by: Kinglong Mee

libglusterfs: Move devel headers under glusterfs directory

2018-12-05T21:47:04+00:00

libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation <> in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR

cluster/ec: prevent infinite loop in self-heal full

2018-10-31T11:50:32+00:00

There was a problem in commit 7f81067 that caused infinite loop when
full heal was triggered.

The previous commit was made to prevent self-heal to go idle after a
replace brick operation. One of the changes consisted on setting a
flag to force an immediate scan of the dirty directory if a heal on
a directory succeeded (assuming it could have generated newer entries).

However that change was causing an issue with a full self-heal, since
every time an already healed directory was checked and it returned
suceessfully, it was also setting the flag, forcing self-heal to start
over again.

This patch fixes this issue by only setting the flag if the heal is not
full. It's assumed that a full self-heal will already traverse all
entries automatically, so there's no need to force a new scan later.

Change-Id: Id12dbfc04e622b18183e796cc6cc87ccc30a6d55
fixes: bz#1636631
Signed-off-by: Xavi Hernandez

Land part 2 of clang-format changes

2018-09-12T12:22:45+00:00

Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu

cluster/ec: Fix Coverity issue

2018-08-31T12:56:49+00:00

Fix following coverity issues-

CID:
1382378
1382459

https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85091670&defectInstanceId=25915064&mergedDefectId=1382459
https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85091670&defectInstanceId=25915063&mergedDefectId=1382378

Problem:
ASSERT_LOCAL(this, healer) function is supposed to get the local healer
so that we can take advantage of it while healing and reading data.

However, we are not using healer->local anywhere.
Also, this is not as useful in context of EC as it is in AFR.
In EC we have to raed fragments from 4 bricks to heal a bad
fragment on other brick.

Change-Id: Iea8ce127ea02cc84e3823cb2be82a47872217b33
updates: bz#789278
Signed-off-by: Ashish Pandey

cluster/ec: fix SHD crash for null gfid's

2018-03-21T16:27:36+00:00

When the self-heal daemon is doing a full sweep it uses readdirp to
get extra stat information from each file. This information is
obtained in two steps by the posix xlator: first the directory is
read to get the entries and then each entry is stated to get additional
info. Between these two steps, it's possible that the file is removed
by the user, so we'll get an error, leaving stat info empty.

EC's heal daemon was using the gfid blindly, causing an assert failure
when protocol/client was trying to encode the gfid.

To fix the problem a check has been added. If we detect a null gfid, we
simply ignore it and continue healing.

Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228
BUG: 1558016
Signed-off-by: Xavi Hernandez