glusterd: fix brick restart parallelism

glusterd's brick restart logic is not always sequential as there is atleast three different ways how the bricks are restarted. 1. through friend-sm and glusterd_spawn_daemons () 2. through friend-sm and handling volume quorum action 3. through friend handshaking when there is a mimatch on quorum on friend import. In a brick multiplexing setup, glusterd ended up trying to spawn the same brick process couple of times as almost in fraction of milliseconds two threads hit glusterd_brick_start () because of which glusterd didn't have any choice of rejecting any one of them as for both the case brick start criteria met. As a solution, it'd be better to control this madness by two different flags, one is a boolean called start_triggered which indicates a brick start has been triggered and it continues to be true till a brick dies or killed, the second is a mutex lock to ensure for a particular brick we don't end up getting into glusterd_brick_start () more than once at same point of time. Change-Id: I292f1e58d6971e111725e1baea1fe98b890b43e2 BUG: 1506513 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
author: Atin Mukherjee <amukherj@redhat.com> 2017-10-26 14:26:30 +0530
committer: Atin Mukherjee <amukherj@redhat.com> 2017-11-01 03:41:36 +0000
commit: 82be66ef8e9e3127d41a4c843daf74c1d8aec4aa (patch)
tree: 48a91287a7dd949ce7c9cb52760b337ad8a573dc /xlators/mgmt/glusterd/src/glusterd-op-sm.c
parent: bb7fd73ce4245f54517de1f378a9471f6c8bb454 (diff)
1 files changed, 19 insertions, 12 deletions
diff --git a/xlators/mgmt/glusterd/src/glusterd-op-sm.c b/xlators/mgmt/glusterd/src/glusterd-op-sm.c
index c53c1fbf08d..7b4825dd82e 100644
--- a/xlators/mgmt/glusterd/src/glusterd-op-sm.c
+++ b/xlators/mgmt/glusterd/src/glusterd-op-sm.c
@@ -2411,18 +2411,25 @@ glusterd_start_bricks (glusterd_volinfo_t *volinfo)
         GF_ASSERT (volinfo);
 
         cds_list_for_each_entry (brickinfo, &volinfo->bricks, brick_list) {
-                ret = glusterd_brick_start (volinfo, brickinfo, _gf_false);
-                if (ret) {
-                        gf_msg (THIS->name, GF_LOG_ERROR, 0,
-                                GD_MSG_BRICK_DISCONNECTED,
-                                "Failed to start %s:%s for %s",
-                                brickinfo->hostname, brickinfo->path,
-                                volinfo->volname);
-                        gf_event (EVENT_BRICK_START_FAILED,
-                                  "peer=%s;volume=%s;brick=%s",
-                                  brickinfo->hostname, volinfo->volname,
-                                  brickinfo->path);
-                        goto out;
+                if (!brickinfo->start_triggered) {
+                        pthread_mutex_lock (&brickinfo->restart_mutex);
+                        {
+                                ret = glusterd_brick_start (volinfo, brickinfo,
+                                                            _gf_false);
+                        }
+                        pthread_mutex_unlock (&brickinfo->restart_mutex);
+                        if (ret) {
+                                gf_msg (THIS->name, GF_LOG_ERROR, 0,
+                                        GD_MSG_BRICK_DISCONNECTED,
+                                        "Failed to start %s:%s for %s",
+                                        brickinfo->hostname, brickinfo->path,
+                                        volinfo->volname);
+                                gf_event (EVENT_BRICK_START_FAILED,
+                                          "peer=%s;volume=%s;brick=%s",
+                                          brickinfo->hostname, volinfo->volname,
+                                          brickinfo->path);
+                                goto out;
+                        }
                 }
 
         }
author	Atin Mukherjee <amukherj@redhat.com>	2017-10-26 14:26:30 +0530
committer	Atin Mukherjee <amukherj@redhat.com>	2017-11-01 03:41:36 +0000
commit	82be66ef8e9e3127d41a4c843daf74c1d8aec4aa (patch)
tree	48a91287a7dd949ce7c9cb52760b337ad8a573dc /xlators/mgmt/glusterd/src/glusterd-op-sm.c
parent	bb7fd73ce4245f54517de1f378a9471f6c8bb454 (diff)