From cf1e98ff5bf8233803b4f74debee1b1f474765af Mon Sep 17 00:00:00 2001 From: Poornima G Date: Mon, 6 Jun 2016 06:29:40 -0400 Subject: gfapi: Fix IO error caused when there is consecutive graph switches Backport of http://review.gluster.org/#/c/14656/ Issue: Consider a simple situation, where glfs_init() is done, i.e. initial graph is up. Now perform 2 volume sets that results in 2 client side graph changes. After this perform some IO, the IO fails with ENOTCON. The only way to recover this client is i guess another graph switch or restart. What actually is happening from code perspective: Initial graph lets say A, followed by 2 consecutive graph switches to B and C without any IO those two switches. - graph_setup (A) as a result of GF_EVENT_CHILD_UP, and fs->next_subvol = A - glfs_init() results in fs->active_subvol = A, fs->next_subvol = NULL - graph_setup (B) as a result of GF_EVENT_CHILD_UP, and fs->next_subvol = B - graph_setup (C) as a result of GF_EVENT_CHILD_UP, and fs->next_subvol = C. It also sees that the previous graph B was never set as fs->active_subvol, i.e. no IO or anything happened on B, so can safely send GF_EVENT_PARENT_DOWN (by calling glfs_subvol_done(B)). This parent down on B, results in child_down(B), which is fine. But child_down also triggers graph_setup(B). - graph_setup(B) as a result of GF_EVENT_CHILD_DOWN, and fs->next_subvol = B, and GF_EVENT_PARENT_DOWN on C as explained above. This again leads to GF_EVENT_CHILD_DOWN on C. - graph_setup(C) as a result of GF_EVENT_CHILD_DOWN, and fs->next_subvol = C, and GF_EVENT_PARENT_DOWN on B as explained above. Thus both the graphs B and C are disconnected, and hence the ENOTCON Solution: Remove the call to graph_setup() when the event is GF_EVENT_CHILD_DOWN. It don't see any reason why graph_setup should be called when there is child_down. Not sure what the original reason was, to have graph_setup in child_down. git hostory shows the first patch itself had this call. Change-Id: I9de86555f66cc94a05649ac863b40ed3426ffd4b BUG: 1347489 Signed-off-by: Poornima G Reviewed-on: http://review.gluster.org/14656 Smoke: Gluster Build System NetBSD-regression: NetBSD Build System CentOS-regression: Gluster Build System Reviewed-by: Jeff Darcy (cherry picked from commit b8ac20e888fbacad9d90cd8f1c6ff8579a5cefe9) Reviewed-on: http://review.gluster.org/14747 --- tests/bugs/libgfapi/bug-1093594.sh | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100755 tests/bugs/libgfapi/bug-1093594.sh (limited to 'tests/bugs/libgfapi/bug-1093594.sh') diff --git a/tests/bugs/libgfapi/bug-1093594.sh b/tests/bugs/libgfapi/bug-1093594.sh new file mode 100755 index 00000000000..444319b8e63 --- /dev/null +++ b/tests/bugs/libgfapi/bug-1093594.sh @@ -0,0 +1,20 @@ +#!/bin/bash +. $(dirname $0)/../../include.rc +. $(dirname $0)/../../volume.rc + +cleanup; + +## Start and create a volume +TEST glusterd; +TEST pidof glusterd; +TEST $CLI volume info; + +TEST $CLI volume create $V0 $H0:$B0/${V0}{1,2}; +TEST $CLI volume start $V0; +logdir=`gluster --print-logdir` + +build_tester $(dirname $0)/bug-1093594.c -lgfapi +TEST $(dirname $0)/bug-1093594 $V0 $logdir/bug-1093594.log + +cleanup_tester $(dirname $0)/bug-1093594 +cleanup; -- cgit