<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/tests/bugs/libgfapi/bug-1093594.sh, branch release-3.7</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>gfapi: Fix IO error caused when there is consecutive graph switches</title>
<updated>2016-08-25T04:38:22+00:00</updated>
<author>
<name>Poornima G</name>
<email>pgurusid@redhat.com</email>
</author>
<published>2016-06-06T10:29:40+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=9cd5066226770cf3c06a21757b963d315b8fe32b'/>
<id>9cd5066226770cf3c06a21757b963d315b8fe32b</id>
<content type='text'>
Issue:
Consider a simple situation, where glfs_init() is done, i.e. initial
graph is up. Now perform 2 volume sets that results in 2 client side
graph changes. After this perform some IO, the IO fails with ENOTCON.
The only way to recover this client is i guess another graph switch
or restart.

What actually is happening from code perspective:
Initial graph lets say A, followed by 2 consecutive graph switches
to B and C without any IO those two switches.

- graph_setup (A) as a result of GF_EVENT_CHILD_UP, and
fs-&gt;next_subvol = A

- glfs_init() results in fs-&gt;active_subvol = A, fs-&gt;next_subvol = NULL

- graph_setup (B) as a result of GF_EVENT_CHILD_UP, and
fs-&gt;next_subvol = B

- graph_setup (C) as a result of GF_EVENT_CHILD_UP, and
fs-&gt;next_subvol = C. It also sees that the previous graph B was never
set as fs-&gt;active_subvol, i.e. no IO or anything happened on B, so
can safely send GF_EVENT_PARENT_DOWN (by calling glfs_subvol_done(B)).
This parent down on B, results in child_down(B), which is fine.
But child_down also triggers graph_setup(B).

- graph_setup(B) as a result of GF_EVENT_CHILD_DOWN, and
fs-&gt;next_subvol = B, and GF_EVENT_PARENT_DOWN on C as explained
above. This again leads to GF_EVENT_CHILD_DOWN on C.

- graph_setup(C) as a result of GF_EVENT_CHILD_DOWN, and
fs-&gt;next_subvol = C, and GF_EVENT_PARENT_DOWN on B as explained
above.

Thus both the graphs B and C are disconnected, and hence the ENOTCON

Solution:
Remove the call to graph_setup() when the event is GF_EVENT_CHILD_DOWN.
It don't see any reason why graph_setup should be called when there is
child_down. Not sure what the original reason was, to have graph_setup
in child_down. git hostory shows the first patch itself had this call.

&gt; Reviewed-on: http://review.gluster.org/14656
&gt; Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
&gt; CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
&gt; NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
&gt; Reviewed-by: Jeff Darcy &lt;jdarcy@redhat.com&gt;

BUG: 1367294
Change-Id: I9de86555f66cc94a05649ac863b40ed3426ffd4b
Signed-off-by: Poornima G &lt;pgurusid@redhat.com&gt;
Signed-off-by: Oleksandr Natalenko &lt;oleksandr@natalenko.name&gt;
Reviewed-on: http://review.gluster.org/14835
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Kaushal M &lt;kaushal@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Issue:
Consider a simple situation, where glfs_init() is done, i.e. initial
graph is up. Now perform 2 volume sets that results in 2 client side
graph changes. After this perform some IO, the IO fails with ENOTCON.
The only way to recover this client is i guess another graph switch
or restart.

What actually is happening from code perspective:
Initial graph lets say A, followed by 2 consecutive graph switches
to B and C without any IO those two switches.

- graph_setup (A) as a result of GF_EVENT_CHILD_UP, and
fs-&gt;next_subvol = A

- glfs_init() results in fs-&gt;active_subvol = A, fs-&gt;next_subvol = NULL

- graph_setup (B) as a result of GF_EVENT_CHILD_UP, and
fs-&gt;next_subvol = B

- graph_setup (C) as a result of GF_EVENT_CHILD_UP, and
fs-&gt;next_subvol = C. It also sees that the previous graph B was never
set as fs-&gt;active_subvol, i.e. no IO or anything happened on B, so
can safely send GF_EVENT_PARENT_DOWN (by calling glfs_subvol_done(B)).
This parent down on B, results in child_down(B), which is fine.
But child_down also triggers graph_setup(B).

- graph_setup(B) as a result of GF_EVENT_CHILD_DOWN, and
fs-&gt;next_subvol = B, and GF_EVENT_PARENT_DOWN on C as explained
above. This again leads to GF_EVENT_CHILD_DOWN on C.

- graph_setup(C) as a result of GF_EVENT_CHILD_DOWN, and
fs-&gt;next_subvol = C, and GF_EVENT_PARENT_DOWN on B as explained
above.

Thus both the graphs B and C are disconnected, and hence the ENOTCON

Solution:
Remove the call to graph_setup() when the event is GF_EVENT_CHILD_DOWN.
It don't see any reason why graph_setup should be called when there is
child_down. Not sure what the original reason was, to have graph_setup
in child_down. git hostory shows the first patch itself had this call.

&gt; Reviewed-on: http://review.gluster.org/14656
&gt; Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
&gt; CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
&gt; NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
&gt; Reviewed-by: Jeff Darcy &lt;jdarcy@redhat.com&gt;

BUG: 1367294
Change-Id: I9de86555f66cc94a05649ac863b40ed3426ffd4b
Signed-off-by: Poornima G &lt;pgurusid@redhat.com&gt;
Signed-off-by: Oleksandr Natalenko &lt;oleksandr@natalenko.name&gt;
Reviewed-on: http://review.gluster.org/14835
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Kaushal M &lt;kaushal@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
