summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPranith Kumar K <pkarampu@redhat.com>2016-10-19 15:50:50 +0530
committerPranith Kumar Karampuri <pkarampu@redhat.com>2016-10-25 04:00:46 -0700
commite6c38ae1d3f3c53f8739ab2db7c4ecfdbc58fc44 (patch)
tree27d9d6c8fc65c5768fc733a49919b5f226000ed4
parent10f2cbdfe6b375dbe602ba2c1da09008057f52c8 (diff)
rpc: Fix the race between notification and reconnection
Problem: There was a hang because unlock on an entry failed with ENOTCONN. Client thinks the connection is down where as server thinks the connection is up. This is the race we are seeing: 1) Connection from client to the brick disconnects. 2) Saved frames unwind is called which unwinds all frames that were wound before disconnect. 3) connection from client to the brick happens and setvolume. 4) Disconnect notification for the connection in 1) comes now and calls client_rpc_notify() which marks the connection to be offline even when the connection is up. This is happening because I/O can retrigger connection before disconnect notification is sent to the higher layers in rpc. Fix: Notify the higher layers that a disconnect happened and then go ahead with reconnect logic. For the logs which point to the information above check: https://bugzilla.redhat.com/show_bug.cgi?id=1386626#c1 Thanks to Raghavendra G for suggesting the correct fix. >BUG: 1386626 >Change-Id: I3c84ba1f17010bd69049fa88ec5f0ae431f8cda9 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/15681 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Niels de Vos <ndevos@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> >(cherry picked from commit a6b63e11b7758cf1bfcb67985e25ec02845f0995) Change-Id: Ifa721193c26b70e26b47b7698c077da0ad5f2e1d BUG: 1388323 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15717 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
-rw-r--r--rpc/rpc-lib/src/rpc-clnt.c7
1 files changed, 4 insertions, 3 deletions
diff --git a/rpc/rpc-lib/src/rpc-clnt.c b/rpc/rpc-lib/src/rpc-clnt.c
index e8a8ea2ecd9..3caab985cfe 100644
--- a/rpc/rpc-lib/src/rpc-clnt.c
+++ b/rpc/rpc-lib/src/rpc-clnt.c
@@ -898,6 +898,10 @@ rpc_clnt_notify (rpc_transport_t *trans, void *mydata,
switch (event) {
case RPC_TRANSPORT_DISCONNECT:
{
+ if (clnt->notifyfn)
+ ret = clnt->notifyfn (clnt, clnt->mydata,
+ RPC_CLNT_DISCONNECT, NULL);
+
rpc_clnt_connection_cleanup (conn);
pthread_mutex_lock (&conn->lock);
@@ -921,9 +925,6 @@ rpc_clnt_notify (rpc_transport_t *trans, void *mydata,
}
pthread_mutex_unlock (&conn->lock);
- if (clnt->notifyfn)
- ret = clnt->notifyfn (clnt, clnt->mydata,
- RPC_CLNT_DISCONNECT, NULL);
if (unref_clnt)
rpc_clnt_ref (clnt);