<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/xlators/cluster/afr/src/afr-self-heal-name.c, branch v6.9</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>cluster/afr: fix race when bricks come up</title>
<updated>2020-04-20T07:36:23+00:00</updated>
<author>
<name>Xavi Hernandez</name>
<email>xhernandez@redhat.com</email>
</author>
<published>2020-03-01T18:49:04+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=5d5bcea32953680ab19afbeb620fcc0316d3f9cb'/>
<id>5d5bcea32953680ab19afbeb620fcc0316d3f9cb</id>
<content type='text'>
The was a problem when self-heal was sending lookups at the same time
that one of the bricks was coming up. In this case there was a chance
that the number of 'up' bricks changes in the middle of sending the
requests to subvolumes which caused a discrepancy in the expected
number of replies and the actual number of sent requests.

This discrepancy caused that AFR continued executing requests before
all requests were complete. Eventually, the frame of the pending
request was destroyed when the operation terminated, causing a use-
after-free issue when the answer was finally received.

In theory the same thing could happen in the reverse way, i.e. AFR
tries to wait for more replies than sent requests, causing a hang.

Backport of:
&gt; Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
&gt; Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
&gt; Fixes: bz#1808875

Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
Fixes: bz#1809439
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The was a problem when self-heal was sending lookups at the same time
that one of the bricks was coming up. In this case there was a chance
that the number of 'up' bricks changes in the middle of sending the
requests to subvolumes which caused a discrepancy in the expected
number of replies and the actual number of sent requests.

This discrepancy caused that AFR continued executing requests before
all requests were complete. Eventually, the frame of the pending
request was destroyed when the operation terminated, causing a use-
after-free issue when the answer was finally received.

In theory the same thing could happen in the reverse way, i.e. AFR
tries to wait for more replies than sent requests, causing a hang.

Backport of:
&gt; Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
&gt; Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
&gt; Fixes: bz#1808875

Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70
Signed-off-by: Xavi Hernandez &lt;xhernandez@redhat.com&gt;
Fixes: bz#1809439
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: prevent spurious entry heals leading to gfid split-brain</title>
<updated>2020-02-28T06:06:10+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2020-02-11T09:04:48+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=559fd060c59edec69ba66be7e0a447c8e0408d51'/>
<id>559fd060c59edec69ba66be7e0a447c8e0408d51</id>
<content type='text'>
Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1804594
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1804594
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: add client-pid to all gf_event() calls</title>
<updated>2019-04-16T10:48:40+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2019-03-15T14:01:03+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=5946db166a428d37f5bbbb3df802a1e53cab5029'/>
<id>5946db166a428d37f5bbbb3df802a1e53cab5029</id>
<content type='text'>
client-pid for glustershd is GF_CLIENT_PID_SELF_HEALD
client-pid for glfsheal is GF_CLIENT_PID_GLFS_HEALD

updates: bz#1693155
Change-Id: Ib3a863af160ff48c822a5e6b0c27c575c9887470
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
(cherry picked from commit 8016d51a3bbd410b0b927ed66be50a09574b7982)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
client-pid for glustershd is GF_CLIENT_PID_SELF_HEALD
client-pid for glfsheal is GF_CLIENT_PID_GLFS_HEALD

updates: bz#1693155
Change-Id: Ib3a863af160ff48c822a5e6b0c27c575c9887470
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
(cherry picked from commit 8016d51a3bbd410b0b927ed66be50a09574b7982)
</pre>
</div>
</content>
</entry>
<entry>
<title>AFR xlator: use dict_{setn|getn|deln|get_int32n|set_int32n|set_strn}</title>
<updated>2018-12-17T11:39:14+00:00</updated>
<author>
<name>Yaniv Kaul</name>
<email>ykaul@redhat.com</email>
</author>
<published>2018-09-13T13:03:23+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=403c69d35827b6cbb430e97a797c318cca81e86e'/>
<id>403c69d35827b6cbb430e97a797c318cca81e86e</id>
<content type='text'>
In a previous patch (https://review.gluster.org/20769) we've
added the key length to be passed to dict_* funcs, to remove the need
to strlen() it. This patch moves some xlators to use it.

- In some cases, moved strlen() of the key length outside of locks,
which is usually a good thing. Please verify it's safe to do so.
- In some cases, created a prefix for the keys, replacing something like
"%d-%d" with a "%s" in snprintf(). Not sure it adds value, but improves
readability.

Please review carefully.

Compile-tested only!

Change-Id: I04f2a1eb2ecfc3283d849d150d10d088ae7aa7f1
updates: bz#1193929
Signed-off-by: Yaniv Kaul &lt;ykaul@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In a previous patch (https://review.gluster.org/20769) we've
added the key length to be passed to dict_* funcs, to remove the need
to strlen() it. This patch moves some xlators to use it.

- In some cases, moved strlen() of the key length outside of locks,
which is usually a good thing. Please verify it's safe to do so.
- In some cases, created a prefix for the keys, replacing something like
"%d-%d" with a "%s" in snprintf(). Not sure it adds value, but improves
readability.

Please review carefully.

Compile-tested only!

Change-Id: I04f2a1eb2ecfc3283d849d150d10d088ae7aa7f1
updates: bz#1193929
Signed-off-by: Yaniv Kaul &lt;ykaul@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libglusterfs: Move devel headers under glusterfs directory</title>
<updated>2018-12-05T21:47:04+00:00</updated>
<author>
<name>ShyamsundarR</name>
<email>srangana@redhat.com</email>
</author>
<published>2018-11-29T19:08:06+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=20ef211cfa5b5fcc437484a879fdc5d4c66bbaf5'/>
<id>20ef211cfa5b5fcc437484a879fdc5d4c66bbaf5</id>
<content type='text'>
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation &lt;&gt; in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR &lt;srangana@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation &lt;&gt; in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR &lt;srangana@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: assign gfid during name heal when no 'source' is present.</title>
<updated>2018-12-03T06:44:19+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2018-09-28T11:30:00+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=4d58730c0cd6ab5db39aec8a15276f7bd3371b04'/>
<id>4d58730c0cd6ab5db39aec8a15276f7bd3371b04</id>
<content type='text'>
Problem:
If parent dir is in split-brain or has dirty xattrs set, and the file
has gfid missing on one of the bricks, then name heal won't assign the
gfid.

Fix:
Use the brick we select the gfid from as the 'source'.

Note: Problem was found while trying to debug a split-brain issue on
Cynthia Zhou's setup.

updates: bz#1637249
Change-Id: Id088d4f0fb017aa35122de426654194e581ed742
Reported-by: Cynthia Zhou &lt;cynthia.zhou@nokia-sbell.com&gt;
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
If parent dir is in split-brain or has dirty xattrs set, and the file
has gfid missing on one of the bricks, then name heal won't assign the
gfid.

Fix:
Use the brick we select the gfid from as the 'source'.

Note: Problem was found while trying to debug a split-brain issue on
Cynthia Zhou's setup.

updates: bz#1637249
Change-Id: Id088d4f0fb017aa35122de426654194e581ed742
Reported-by: Cynthia Zhou &lt;cynthia.zhou@nokia-sbell.com&gt;
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Land part 2 of clang-format changes</title>
<updated>2018-09-12T12:22:45+00:00</updated>
<author>
<name>Gluster Ant</name>
<email>bugzilla-bot@gluster.org</email>
</author>
<published>2018-09-12T12:22:45+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=e16868dede6455cab644805af6fe1ac312775e13'/>
<id>e16868dede6455cab644805af6fe1ac312775e13</id>
<content type='text'>
Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu &lt;nigelb@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu &lt;nigelb@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Delegate name-heal when possible</title>
<updated>2018-09-04T01:52:02+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2018-08-27T07:10:16+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=f29689783d970cba804505d7c778c905b2ba1992'/>
<id>f29689783d970cba804505d7c778c905b2ba1992</id>
<content type='text'>
Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc

fixes bz#1622821
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc

fixes bz#1622821
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: heal gfids when file is not present on all bricks</title>
<updated>2018-06-19T06:05:20+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2018-06-14T07:29:06+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=eb472d82a083883335bc494b87ea175ac43471ff'/>
<id>eb472d82a083883335bc494b87ea175ac43471ff</id>
<content type='text'>
commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression
wherein if a file is present in only 1 brick of replica *and* doesn't
have a gfid associated with it, it doesn't get healed upon the next
lookup from the client. Fix it.

Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599
fixes: bz#1591193
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression
wherein if a file is present in only 1 brick of replica *and* doesn't
have a gfid associated with it, it doesn't get healed upon the next
lookup from the client. Fix it.

Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599
fixes: bz#1591193
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Fail open on split-brain</title>
<updated>2017-10-26T18:23:35+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2017-09-04T11:27:25+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=786343abca3474ff01aa1017210112d97cbc4843'/>
<id>786343abca3474ff01aa1017210112d97cbc4843</id>
<content type='text'>
Problem:
Append on a file with split-brain succeeds. Open is intercepted by open-behind,
when write comes on the file, open-behind does open+write. Open succeeds
because afr doesn't fail it. Then write succeeds because write-behind
intercepts it. Flush is also intercepted by write-behind, so the application
never gets to know that the write failed.

Fix:
Fail open on split-brain, so that when open-behind does open+write open fails
which leads to write failure. Application will know about this failure.

Change-Id: I4bff1c747c97bb2925d6987f4ced5f1ce75dbc15
BUG: 1294051
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
Append on a file with split-brain succeeds. Open is intercepted by open-behind,
when write comes on the file, open-behind does open+write. Open succeeds
because afr doesn't fail it. Then write succeeds because write-behind
intercepts it. Flush is also intercepted by write-behind, so the application
never gets to know that the write failed.

Fix:
Fail open on split-brain, so that when open-behind does open+write open fails
which leads to write failure. Application will know about this failure.

Change-Id: I4bff1c747c97bb2925d6987f4ced5f1ce75dbc15
BUG: 1294051
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
