<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/xlators/cluster/ec/src/ec-heald.c, branch v8.2</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>cluster/ec: Improve detection of new heals</title>
<updated>2020-08-19T18:00:31+00:00</updated>
<author>
<name>Xavi Hernandez</name>
<email>xhernandez@redhat.com</email>
</author>
<published>2020-07-02T16:08:52+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=269ece312c9fd890c74c46e79de70efe1720752c'/>
<id>269ece312c9fd890c74c46e79de70efe1720752c</id>
<content type='text'>
When EC successfully healed a directory it assumed that maybe other
entries inside that directory could have been created, which could
require additional heal cycles. For this reason, when the heal happened
as part of one index heal iteration, it triggered a new iteration.

The problem happened when the directory was healthy, so no new entries
were added, but its index entry was not removed for some reason. In
this case self-heal started and endless loop healing the same directory
continuously, cause high CPU utilization.

This patch improves detection of new files added to the heal index so
that a new index heal iteration is only triggered if there is new work
to do.

Change-Id: I2355742b85fbfa6de758bccc5d2e1a283c82b53f
Fixes: #1354
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When EC successfully healed a directory it assumed that maybe other
entries inside that directory could have been created, which could
require additional heal cycles. For this reason, when the heal happened
as part of one index heal iteration, it triggered a new iteration.

The problem happened when the directory was healthy, so no new entries
were added, but its index entry was not removed for some reason. In
this case self-heal started and endless loop healing the same directory
continuously, cause high CPU utilization.

This patch improves detection of new files added to the heal index so
that a new index heal iteration is only triggered if there is new work
to do.

Change-Id: I2355742b85fbfa6de758bccc5d2e1a283c82b53f
Fixes: #1354
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>multiple files: another attempt to remove includes</title>
<updated>2019-06-14T16:50:32+00:00</updated>
<author>
<name>Yaniv Kaul</name>
<email>ykaul@redhat.com</email>
</author>
<published>2019-06-09T10:31:31+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=0a6fe8551ac9807a8b6ad62241ec8048cf9f9025'/>
<id>0a6fe8551ac9807a8b6ad62241ec8048cf9f9025</id>
<content type='text'>
There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )

Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.

Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul &lt;ykaul@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )

Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.

Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul &lt;ykaul@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ec/fini: Fix race with ec_fini and ec_notify</title>
<updated>2019-05-21T12:57:39+00:00</updated>
<author>
<name>Mohammed Rafi KC</name>
<email>rkavunga@redhat.com</email>
</author>
<published>2019-05-21T11:45:07+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=43ade7abac6c1d648ef337ace92194d36c8894a4'/>
<id>43ade7abac6c1d648ef337ace92194d36c8894a4</id>
<content type='text'>
During a graph cleanup, we first sent a PARENT_DOWN and wait for
a child down to ultimately free the xlator and the graph.

In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event.
But a racing event like CHILD_UP or event xl_op may trigger healing threads
after threads cleanup.

So there is a chance that the threads might access a freed private variabe

Change-Id: I252d10181bb67b95900c903d479de707a8489532
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During a graph cleanup, we first sent a PARENT_DOWN and wait for
a child down to ultimately free the xlator and the graph.

In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event.
But a racing event like CHILD_UP or event xl_op may trigger healing threads
after threads cleanup.

So there is a chance that the threads might access a freed private variabe

Change-Id: I252d10181bb67b95900c903d479de707a8489532
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ec/shd: Cleanup self heal daemon resources during ec fini</title>
<updated>2019-05-13T06:18:14+00:00</updated>
<author>
<name>Mohammed Rafi KC</name>
<email>rkavunga@redhat.com</email>
</author>
<published>2019-04-29T07:52:32+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=e2adc9dc66dc46519007790ecd7dd57642dff0fd'/>
<id>e2adc9dc66dc46519007790ecd7dd57642dff0fd</id>
<content type='text'>
We were not properly cleaning self-heal daemon resources
during ec fini. With shd multiplexing, it is absolutely
necessary to cleanup all the resources during ec fini.

Change-Id: Iae4f1bce7d8c2e1da51ac568700a51088f3cc7f2
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We were not properly cleaning self-heal daemon resources
during ec fini. With shd multiplexing, it is absolutely
necessary to cleanup all the resources during ec fini.

Change-Id: Iae4f1bce7d8c2e1da51ac568700a51088f3cc7f2
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC &lt;rkavunga@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: fix shd healer wait timeout</title>
<updated>2019-05-06T10:56:13+00:00</updated>
<author>
<name>Kinglong Mee</name>
<email>kinglongmee@gmail.com</email>
</author>
<published>2019-04-25T10:20:25+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=0d85c6abf5002ff88f17a5292cdde45847154d25'/>
<id>0d85c6abf5002ff88f17a5292cdde45847154d25</id>
<content type='text'>
Right now, the timeout is written by hard code,
fix it by using heal-timeout.

fixes: bz#1703020
Change-Id: I0d154e7807f9dba7efc3896805559bbfaa7af2ad
Signed-off-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Right now, the timeout is written by hard code,
fix it by using heal-timeout.

fixes: bz#1703020
Change-Id: I0d154e7807f9dba7efc3896805559bbfaa7af2ad
Signed-off-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libglusterfs: Move devel headers under glusterfs directory</title>
<updated>2018-12-05T21:47:04+00:00</updated>
<author>
<name>ShyamsundarR</name>
<email>srangana@redhat.com</email>
</author>
<published>2018-11-29T19:08:06+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=20ef211cfa5b5fcc437484a879fdc5d4c66bbaf5'/>
<id>20ef211cfa5b5fcc437484a879fdc5d4c66bbaf5</id>
<content type='text'>
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation &lt;&gt; in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR &lt;srangana@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation &lt;&gt; in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR &lt;srangana@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: prevent infinite loop in self-heal full</title>
<updated>2018-10-31T11:50:32+00:00</updated>
<author>
<name>Xavi Hernandez</name>
<email>xhernandez@redhat.com</email>
</author>
<published>2018-10-31T11:26:43+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=7150c51ad75ccba22045a35fc31e5037612d1ad4'/>
<id>7150c51ad75ccba22045a35fc31e5037612d1ad4</id>
<content type='text'>
There was a problem in commit 7f81067 that caused infinite loop when
full heal was triggered.

The previous commit was made to prevent self-heal to go idle after a
replace brick operation. One of the changes consisted on setting a
flag to force an immediate scan of the dirty directory if a heal on
a directory succeeded (assuming it could have generated newer entries).

However that change was causing an issue with a full self-heal, since
every time an already healed directory was checked and it returned
suceessfully, it was also setting the flag, forcing self-heal to start
over again.

This patch fixes this issue by only setting the flag if the heal is not
full. It's assumed that a full self-heal will already traverse all
entries automatically, so there's no need to force a new scan later.

Change-Id: Id12dbfc04e622b18183e796cc6cc87ccc30a6d55
fixes: bz#1636631
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There was a problem in commit 7f81067 that caused infinite loop when
full heal was triggered.

The previous commit was made to prevent self-heal to go idle after a
replace brick operation. One of the changes consisted on setting a
flag to force an immediate scan of the dirty directory if a heal on
a directory succeeded (assuming it could have generated newer entries).

However that change was causing an issue with a full self-heal, since
every time an already healed directory was checked and it returned
suceessfully, it was also setting the flag, forcing self-heal to start
over again.

This patch fixes this issue by only setting the flag if the heal is not
full. It's assumed that a full self-heal will already traverse all
entries automatically, so there's no need to force a new scan later.

Change-Id: Id12dbfc04e622b18183e796cc6cc87ccc30a6d55
fixes: bz#1636631
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Land part 2 of clang-format changes</title>
<updated>2018-09-12T12:22:45+00:00</updated>
<author>
<name>Gluster Ant</name>
<email>bugzilla-bot@gluster.org</email>
</author>
<published>2018-09-12T12:22:45+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=e16868dede6455cab644805af6fe1ac312775e13'/>
<id>e16868dede6455cab644805af6fe1ac312775e13</id>
<content type='text'>
Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu &lt;nigelb@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu &lt;nigelb@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Fix Coverity issue</title>
<updated>2018-08-31T12:56:49+00:00</updated>
<author>
<name>Ashish Pandey</name>
<email>aspandey@redhat.com</email>
</author>
<published>2018-08-30T09:38:02+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=67cfacaa9d5ce31240a4d39e81b2984a57ac8865'/>
<id>67cfacaa9d5ce31240a4d39e81b2984a57ac8865</id>
<content type='text'>
Fix following coverity issues-

CID:
1382378
1382459

https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85091670&amp;defectInstanceId=25915064&amp;mergedDefectId=1382459
https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85091670&amp;defectInstanceId=25915063&amp;mergedDefectId=1382378

Problem:
ASSERT_LOCAL(this, healer) function is supposed to get the local healer
so that we can take advantage of it while healing and reading data.

However, we are not using healer-&gt;local anywhere.
Also, this is not as useful in context of EC as it is in AFR.
In EC we have to raed fragments from 4 bricks to heal a bad
fragment on other brick.

Change-Id: Iea8ce127ea02cc84e3823cb2be82a47872217b33
updates: bz#789278
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix following coverity issues-

CID:
1382378
1382459

https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85091670&amp;defectInstanceId=25915064&amp;mergedDefectId=1382459
https://scan6.coverity.com/reports.htm#v42607/p10714/fileInstanceId=85091670&amp;defectInstanceId=25915063&amp;mergedDefectId=1382378

Problem:
ASSERT_LOCAL(this, healer) function is supposed to get the local healer
so that we can take advantage of it while healing and reading data.

However, we are not using healer-&gt;local anywhere.
Also, this is not as useful in context of EC as it is in AFR.
In EC we have to raed fragments from 4 bricks to heal a bad
fragment on other brick.

Change-Id: Iea8ce127ea02cc84e3823cb2be82a47872217b33
updates: bz#789278
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: fix SHD crash for null gfid's</title>
<updated>2018-03-21T16:27:36+00:00</updated>
<author>
<name>Xavi Hernandez</name>
<email>xhernandez@redhat.com</email>
</author>
<published>2018-03-20T09:57:13+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=b5f307fa5e7241dd000f0eeac27cc4638a5bccf8'/>
<id>b5f307fa5e7241dd000f0eeac27cc4638a5bccf8</id>
<content type='text'>
When the self-heal daemon is doing a full sweep it uses readdirp to
get extra stat information from each file. This information is
obtained in two steps by the posix xlator: first the directory is
read to get the entries and then each entry is stated to get additional
info. Between these two steps, it's possible that the file is removed
by the user, so we'll get an error, leaving stat info empty.

EC's heal daemon was using the gfid blindly, causing an assert failure
when protocol/client was trying to encode the gfid.

To fix the problem a check has been added. If we detect a null gfid, we
simply ignore it and continue healing.

Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228
BUG: 1558016
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the self-heal daemon is doing a full sweep it uses readdirp to
get extra stat information from each file. This information is
obtained in two steps by the posix xlator: first the directory is
read to get the entries and then each entry is stated to get additional
info. Between these two steps, it's possible that the file is removed
by the user, so we'll get an error, leaving stat info empty.

EC's heal daemon was using the gfid blindly, causing an assert failure
when protocol/client was trying to encode the gfid.

To fix the problem a check has been added. If we detect a null gfid, we
simply ignore it and continue healing.

Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228
BUG: 1558016
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
