summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr/src/afr.c
diff options
context:
space:
mode:
authorJeff Darcy <jdarcy@redhat.com>2012-03-12 09:32:40 -0400
committerAnand Avati <avati@redhat.com>2012-05-31 17:29:01 -0700
commitddc044bfa2840981de4003c3b9efcac84387dc2b (patch)
treea83d476702cac7ecc7ae59057c368f622a51af4c /xlators/cluster/afr/src/afr.c
parente066a5fea7bdaa5da78e49c9a5bf344af2f33d3c (diff)
replicate: add hashed read-child method.
Both the first-to-respond method and the round-robin method are susceptible to clients repeatedly choosing the same servers across a series of opens, creating hot spots. Also, the code to handle a replica being down will ignore both methods and just choose the first remaining (which is not an issue for two-way but can be otherwise). The hashed method more reliably avoids such hot spots. There are three values/modes. 0: use the old (broken) methods. 1: select a read-child based on a hash of the file's GFID, so all clients will choose the same subvolume for a file (ensuring maximum consistency) but will distribute load for a set of files. 2: select a read-child based on a hash of the file's GFID plus the client's PID, so different children will distribute load even for one file. Mode 2 will probably be optimal for most cases. Using response time when we open the file is problematic, both because a single sample might not have been representative even then and because load might have shifted in the hours or days since (for long-lived files). Trying to use more current load information can lead to "herd following" behavior which is just as bad. Pseudo-random distribution is likely to be the best we can reasonably do, just as it is for DHT. Change-Id: I798c2760411eacf32e82a85f03bb7b08a4a49461 BUG: 802513 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/2926 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
Diffstat (limited to 'xlators/cluster/afr/src/afr.c')
-rw-r--r--xlators/cluster/afr/src/afr.c14
1 files changed, 14 insertions, 0 deletions
diff --git a/xlators/cluster/afr/src/afr.c b/xlators/cluster/afr/src/afr.c
index 8e94d549737..b7ba2619711 100644
--- a/xlators/cluster/afr/src/afr.c
+++ b/xlators/cluster/afr/src/afr.c
@@ -149,6 +149,9 @@ reconfigure (xlator_t *this, dict_t *options)
GF_OPTION_RECONF ("read-subvolume", read_subvol, options, xlator, out);
+ GF_OPTION_RECONF ("read-hash-mode", priv->hash_mode,
+ options, uint32, out);
+
if (read_subvol) {
index = xlator_subvolume_index (this, read_subvol);
if (index == -1) {
@@ -237,6 +240,8 @@ init (xlator_t *this)
}
}
+ GF_OPTION_INIT ("read-hash-mode", priv->hash_mode, uint32, out);
+
priv->favorite_child = -1;
GF_OPTION_INIT ("favorite-child", fav_child, xlator, out);
if (fav_child) {
@@ -494,6 +499,15 @@ struct volume_options options[] = {
{ .key = {"read-subvolume" },
.type = GF_OPTION_TYPE_XLATOR
},
+ { .key = {"read-hash-mode" },
+ .type = GF_OPTION_TYPE_INT,
+ .min = 0,
+ .max = 2,
+ .default_value = "0",
+ .description = "0 = first responder, "
+ "1 = hash by GFID (all clients use same subvolume), "
+ "2 = hash by GFID and client PID",
+ },
{ .key = {"favorite-child"},
.type = GF_OPTION_TYPE_XLATOR
},