afr: use data trylock mode in read/write self-heal trigger paths

Self-heal data lock contention between clients and glustershd instances can lead to long wait and user response times if the client ends up pending its lock on glustershd self-heal of a large file. We have reports of guest vm instances going completely unresponsive during self-heal of virtual disk images. Optimize the read/write self-heal trigger codepath (i.e., afr_open_fd_fix()) to trylock for self-heal and skip the self-heal otherwise to minimize the likelihood of a running/active guest of competing with glustershd on arrival of a brick. Note that lock contention is still possible from the client (e.g., via lookup). BUG: 874045 Change-Id: I406443c061ff6acd2a851179626b78352caa5c03 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4258 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
author: Brian Foster <bfoster@redhat.com> 2012-12-03 10:45:04 -0500
committer: Anand Avati <avati@redhat.com> 2012-12-04 14:45:23 -0800
commit: 741766c708f2a246854584c064d63d3fba67be90 (patch)
tree: 659ec297c9a4f0bdb9118bb16d791aea101b29fb
parent: e19bf891d5373e1660e666fecf6740062a375617 (diff)
1 files changed, 8 insertions, 1 deletions
diff --git a/xlators/cluster/afr/src/afr-self-heal-data.c b/xlators/cluster/afr/src/afr-self-heal-data.c
index bf20d8652..c7a97c991 100644
--- a/xlators/cluster/afr/src/afr-self-heal-data.c
+++ b/xlators/cluster/afr/src/afr-self-heal-data.c
@@ -1235,6 +1235,7 @@ afr_sh_data_open_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
         afr_private_t   *priv = NULL;
         int              call_count = 0;
         int              child_index = 0;
+	gf_boolean_t	 block = _gf_true;
 
         local = frame->local;
         sh = &local->self_heal;
@@ -1276,7 +1277,13 @@ afr_sh_data_open_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
                         "fd for %s opened, commencing sync",
                         local->loc.path);
 
-                afr_sh_data_lock (frame, this, 0, 0, _gf_true,
+		/*
+		 * The read and write self-heal trigger codepaths do not provide
+		 * an unwind callback. We run a trylock in these codepaths
+		 * because we are sensitive to locking latency.
+		 */
+		block = sh->unwind ? _gf_true : _gf_false;
+                afr_sh_data_lock (frame, this, 0, 0, block,
                                   afr_sh_data_big_lock_success,
                                   afr_sh_data_fail);
         }
author	Brian Foster <bfoster@redhat.com>	2012-12-03 10:45:04 -0500
committer	Anand Avati <avati@redhat.com>	2012-12-04 14:45:23 -0800
commit	741766c708f2a246854584c064d63d3fba67be90 (patch)
tree	659ec297c9a4f0bdb9118bb16d791aea101b29fb
parent	e19bf891d5373e1660e666fecf6740062a375617 (diff)