From 773f32caf190af4ee48818279b6e6d3c9f2ecc79 Mon Sep 17 00:00:00 2001 From: Raghavendra G Date: Tue, 28 Feb 2017 13:13:59 +0530 Subject: rpc/clnt: remove locks while notifying CONNECT/DISCONNECT Locking during notify was introduced as part of commit aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent xlators [2]. However as part of handling DISCONNECT protocol/client does unwind saved frames (with failure) waiting for responses. This saved_frames_unwind can be a costly operation and hence ideally shouldn't be included in the critical section of notifylock, as it unnecessarily delays the reconnection to same brick. Also, its not a good practise to pass control to other xlators holding a lock as it can lead to deadlocks. So, this patch removes locking in rpc-clnt while notifying parent xlators. To fix [2], two changes are present in this patch: * notify DISCONNECT before cleaning up rpc connection (same as commit a6b63e11b7758cf1bfcb6798, patch [3]). * protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc connection and does a start while handling a DISCONNECT event from rpc. Note that patch [3] was reverted as rpc_clnt_start called in quick_reconnect path of protocol/client didn't invoke connect on transport as the connection was not cleaned up _yet_ (as cleanup was moved post notification in rpc-clnt). This resulted in clients never attempting connect to bricks. Note that one of the neater ways to fix [2] (without using locks) is to introduce generation numbers to map CONNECT and DISCONNECTS across epochs and ignore DISCONNECT events if they don't belong to current epoch. However, this approach is a bit complex to implement and requires time. So, current patch is a hacky stop-gap fix till we come up with a more cleaner solution. [1] http://review.gluster.org/15916 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626 [3] http://review.gluster.org/15681 Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418 Signed-off-by: Raghavendra G BUG: 1427012 Reviewed-on: https://review.gluster.org/16784 Smoke: Gluster Build System Reviewed-by: Milind Changire NetBSD-regression: NetBSD Build System CentOS-regression: Gluster Build System --- xlators/protocol/client/src/client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'xlators/protocol/client/src/client.c') diff --git a/xlators/protocol/client/src/client.c b/xlators/protocol/client/src/client.c index 0287944ec98..e8db8eed166 100644 --- a/xlators/protocol/client/src/client.c +++ b/xlators/protocol/client/src/client.c @@ -2310,7 +2310,7 @@ client_rpc_notify (struct rpc_clnt *rpc, void *mydata, rpc_clnt_event_t event, if (conf->quick_reconnect) { conf->quick_reconnect = 0; - rpc_clnt_start (rpc); + rpc_clnt_cleanup_and_start (rpc); } else { rpc->conn.config.remote_port = 0; -- cgit