diff options
author | Atin Mukherjee <amukherj@redhat.com> | 2017-10-26 14:26:30 +0530 |
---|---|---|
committer | jiffin tony Thottan <jthottan@redhat.com> | 2017-11-06 06:09:21 +0000 |
commit | 44e3c3b5c813168d72f10ecb3c058ac3489c719c (patch) | |
tree | 9ae418d5cecd1b738a9926f227238d097bad8ca2 /xlators/mgmt/glusterd/src/glusterd-op-sm.c | |
parent | 4e6dc4f134ed81005bb91f9cb4e18bf5836dffb5 (diff) |
glusterd: fix brick restart parallelism
glusterd's brick restart logic is not always sequential as there is
atleast three different ways how the bricks are restarted.
1. through friend-sm and glusterd_spawn_daemons ()
2. through friend-sm and handling volume quorum action
3. through friend handshaking when there is a mimatch on quorum on
friend import.
In a brick multiplexing setup, glusterd ended up trying to spawn the
same brick process couple of times as almost in fraction of milliseconds
two threads hit glusterd_brick_start () because of which glusterd didn't
have any choice of rejecting any one of them as for both the case brick
start criteria met.
As a solution, it'd be better to control this madness by two different
flags, one is a boolean called start_triggered which indicates a brick
start has been triggered and it continues to be true till a brick dies
or killed, the second is a mutex lock to ensure for a particular brick
we don't end up getting into glusterd_brick_start () more than once at
same point of time.
Change-Id: I292f1e58d6971e111725e1baea1fe98b890b43e2
BUG: 1508283
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
(cherry picked from commit 82be66ef8e9e3127d41a4c843daf74c1d8aec4aa)
Diffstat (limited to 'xlators/mgmt/glusterd/src/glusterd-op-sm.c')
-rw-r--r-- | xlators/mgmt/glusterd/src/glusterd-op-sm.c | 31 |
1 files changed, 19 insertions, 12 deletions
diff --git a/xlators/mgmt/glusterd/src/glusterd-op-sm.c b/xlators/mgmt/glusterd/src/glusterd-op-sm.c index 83680cf7a7a..51579fe3826 100644 --- a/xlators/mgmt/glusterd/src/glusterd-op-sm.c +++ b/xlators/mgmt/glusterd/src/glusterd-op-sm.c @@ -2402,18 +2402,25 @@ glusterd_start_bricks (glusterd_volinfo_t *volinfo) GF_ASSERT (volinfo); cds_list_for_each_entry (brickinfo, &volinfo->bricks, brick_list) { - ret = glusterd_brick_start (volinfo, brickinfo, _gf_false); - if (ret) { - gf_msg (THIS->name, GF_LOG_ERROR, 0, - GD_MSG_BRICK_DISCONNECTED, - "Failed to start %s:%s for %s", - brickinfo->hostname, brickinfo->path, - volinfo->volname); - gf_event (EVENT_BRICK_START_FAILED, - "peer=%s;volume=%s;brick=%s", - brickinfo->hostname, volinfo->volname, - brickinfo->path); - goto out; + if (!brickinfo->start_triggered) { + pthread_mutex_lock (&brickinfo->restart_mutex); + { + ret = glusterd_brick_start (volinfo, brickinfo, + _gf_false); + } + pthread_mutex_unlock (&brickinfo->restart_mutex); + if (ret) { + gf_msg (THIS->name, GF_LOG_ERROR, 0, + GD_MSG_BRICK_DISCONNECTED, + "Failed to start %s:%s for %s", + brickinfo->hostname, brickinfo->path, + volinfo->volname); + gf_event (EVENT_BRICK_START_FAILED, + "peer=%s;volume=%s;brick=%s", + brickinfo->hostname, volinfo->volname, + brickinfo->path); + goto out; + } } } ret = glusterd_store_volinfo (volinfo, GLUSTERD_VOLINFO_VER_AC_NONE); |