summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr/src/afr-transaction.c
diff options
context:
space:
mode:
authorJeff Darcy <jdarcy@redhat.com>2016-11-17 10:42:02 -0500
committerPranith Kumar Karampuri <pkarampu@redhat.com>2016-11-28 22:41:24 -0800
commit77f03db0131c88d607886bb02dd2a4276ab584d4 (patch)
tree8a3c464e02f2c9b391c94281821a2c8605c8c612 /xlators/cluster/afr/src/afr-transaction.c
parent1876454d2e7950f25d1e5bb8e2c07ab27d521498 (diff)
afr: fix auto-quorum
(1) afr_have_quorum is dead code. It was copied to afr_has_quorum, and everything else uses that, but the original was never deleted (until now). (2) Auto-quorum should be default for any N>2. Leaving quorum disabled is BAD, but apparently deemed acceptable for N=2 because there's no real quorum in that case. For any larger number (including arbiter configurations) there is such a thing as real quorum and we should use it by default. Note that for N=3 the answers we get from "N % 2" (the old check) and "N > 2" (the new one) are the same. (3) The special case for even N in afr_has_quorum has been simplified and explained more thoroughly in a comment. Change-Id: I48b33c15093512fecf516b26dcf09afecb7ae33b Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/15873 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Diffstat (limited to 'xlators/cluster/afr/src/afr-transaction.c')
-rw-r--r--xlators/cluster/afr/src/afr-transaction.c39
1 files changed, 29 insertions, 10 deletions
diff --git a/xlators/cluster/afr/src/afr-transaction.c b/xlators/cluster/afr/src/afr-transaction.c
index eb7571db5f1..d23654d8354 100644
--- a/xlators/cluster/afr/src/afr-transaction.c
+++ b/xlators/cluster/afr/src/afr-transaction.c
@@ -681,17 +681,36 @@ afr_has_quorum (unsigned char *subvols, xlator_t *this)
up_children_count = AFR_COUNT (subvols, priv->child_count);
if (priv->quorum_count == AFR_QUORUM_AUTO) {
- /*
- * Special case for even numbers of nodes in auto-quorum:
- * if we have exactly half children up
- * and that includes the first ("senior-most") node, then that counts
- * as quorum even if it wouldn't otherwise. This supports e.g. N=2
- * while preserving the critical property that there can only be one
- * such group.
- */
- if ((priv->child_count % 2 == 0) &&
- (up_children_count == (priv->child_count/2)))
+ /*
+ * Special case for auto-quorum with an even number of nodes.
+ *
+ * A replica set with even count N can only handle the same
+ * number of failures as odd N-1 before losing "vanilla"
+ * quorum, and the probability of more simultaneous failures is
+ * actually higher. For example, with a 1% chance of failure
+ * we'd have a 0.03% chance of two simultaneous failures with
+ * N=3 but a 0.06% chance with N=4. However, the special case
+ * is necessary for N=2 because there's no real quorum in that
+ * case (i.e. can't normally survive *any* failures). In that
+ * case, we treat the first node as a tie-breaker, allowing
+ * quorum to be retained in some cases while still honoring the
+ * all-important constraint that there can not simultaneously
+ * be two partitioned sets of nodes each believing they have
+ * quorum. Of two equally sized sets, the one without that
+ * first node will lose.
+ *
+ * It turns out that the special case is beneficial for higher
+ * values of N as well. Continuing the example above, the
+ * probability of losing quorum with N=4 and this type of
+ * quorum is (very) slightly lower than with N=3 and vanilla
+ * quorum. The difference becomes even more pronounced with
+ * higher N. Therefore, even though such replica counts are
+ * unlikely to be seen in practice, we might as well use the
+ * "special" quorum then as well.
+ */
+ if ((up_children_count * 2) == priv->child_count) {
return subvols[0];
+ }
}
if (priv->quorum_count == AFR_QUORUM_AUTO) {