From c9d462dc8c1250c3f3f42ca149bb062fe690335b Mon Sep 17 00:00:00 2001 From: Atin Mukherjee Date: Tue, 21 Jul 2015 09:57:43 +0530 Subject: glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick remove brick stage blindly starts the remove brick operation even if the glusterd instance of the node hosting the brick is down. Operationally its incorrect and this could result into a inconsistent rebalance status across all the nodes as the originator of this command will always have the rebalance status to 'DEFRAG_NOT_STARTED', however when the glusterd instance on the other nodes comes up, will trigger rebalance and make the status to completed once the rebalance is finished. This patch fixes two things: 1. Add a validation in remove brick to check whether all the peers hosting the bricks to be removed are up. 2. Don't copy volinfo->rebal.dict from stale volinfo during restore as this might end up in a incosistent node_state.info file resulting into volume status command failure. Change-Id: Ia4a76865c05037d49eec5e3bbfaf68c1567f1f81 BUG: 1245045 Signed-off-by: Atin Mukherjee Reviewed-on: http://review.gluster.org/11726 Tested-by: NetBSD Build System Reviewed-by: N Balachandran Reviewed-by: Krishnan Parthasarathi --- .../glusterd/bug-1245045-remove-brick-validation.t | 54 ++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 tests/bugs/glusterd/bug-1245045-remove-brick-validation.t (limited to 'tests/bugs') diff --git a/tests/bugs/glusterd/bug-1245045-remove-brick-validation.t b/tests/bugs/glusterd/bug-1245045-remove-brick-validation.t new file mode 100644 index 00000000000..22a8d557d28 --- /dev/null +++ b/tests/bugs/glusterd/bug-1245045-remove-brick-validation.t @@ -0,0 +1,54 @@ +#!/bin/bash + +. $(dirname $0)/../../include.rc +. $(dirname $0)/../../cluster.rc + +cleanup + +TEST launch_cluster 3; +TEST $CLI_1 peer probe $H2; +TEST $CLI_1 peer probe $H3; +EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count + +TEST $CLI_1 volume create $V0 $H1:$B1/$V0 $H2:$B2/$V0 +TEST $CLI_1 volume start $V0 + +kill_glusterd 2 + +#remove-brick should fail as the peer hosting the brick is down +TEST ! $CLI_1 volume remove-brick $V0 $H2:$B2/${V0} start + +TEST start_glusterd 2 + +EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count + +#volume status should work +TEST $CLI_2 volume status + + +TEST $CLI_1 volume remove-brick $V0 $H2:$B2/${V0} start +kill_glusterd 2 + +#remove-brick commit should fail as the peer hosting the brick is down +TEST ! $CLI_1 volume remove-brick $V0 $H2:$B2/${V0} commit + +TEST start_glusterd 2 + +EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count + +#volume status should work +TEST $CLI_2 volume status + +TEST $CLI_1 volume remove-brick $V0 $H2:$B2/${V0} stop + +kill_glusterd 3 +EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count + +TEST $CLI_1 volume remove-brick $V0 $H2:$B2/${V0} start + +TEST start_glusterd 3 +EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count + +TEST $CLI_3 volume status + +cleanup -- cgit