From 06b888bbeac61aa1234b43e398431529988c28b6 Mon Sep 17 00:00:00 2001 From: Pranith Kumar K Date: Tue, 10 Nov 2015 09:06:54 +0530 Subject: cluster/ec: fix bug in update_good Backport of http://review.gluster.com/12561 Problem: Bricks that didn't participate in the fops are considered to be good. This is happening two fold. Examples: Case-1: 1) 2+1 volume. 'd1' directory on Brick-0 is bad. 2) readdir takes locks and lock->good_mask is '7' 3) readdir does xattrop and fop->mask is '6'. 4) because fop->expected is '1' lock->good_mask remains '7' Case-2: 1) when all the bricks are up, it does lock + xattrop before op and figures out all the bricks are good. 2) By the time second operation starts brick-0 is down. Now lock->good_mask will always have the '0' bit set as long as the operations are happening on it. because: "lock->good_mask &= ~fop->mask | fop->remaining" fop->mask doesn't have '0' th bit. 3) When it comes time to perform the final xattrop in update_size_version brick-0 comes online because of which it gives the same version to brick-0 as well thinking it has participated in all the transactions till then, even when it didn't participate in the transactions. Fix: Case-1's fix: Update lock->good_mask in ec_prepare_update_cbk with latest good/bad bricks Case-2's fix: Consider non-participating brick as bad. BUG: 1278744 Change-Id: I5c2b07005107f3c067bac69da3b37ff39688bd69 Signed-off-by: Pranith Kumar K Reviewed-on: http://review.gluster.org/12562 Tested-by: NetBSD Build System Tested-by: Gluster Build System Reviewed-by: Xavier Hernandez --- xlators/cluster/ec/src/ec-common.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'xlators/cluster') diff --git a/xlators/cluster/ec/src/ec-common.c b/xlators/cluster/ec/src/ec-common.c index 152611f876f..be2df64d630 100644 --- a/xlators/cluster/ec/src/ec-common.c +++ b/xlators/cluster/ec/src/ec-common.c @@ -154,11 +154,12 @@ void ec_lock_update_good(ec_lock_t *lock, ec_fop_data_t *fop) return; } - /* When updating the good mask of the lock, we only take into - * consideration those bits corresponding to the bricks where - * the fop has been executed. */ - lock->good_mask &= ~fop->mask | fop->remaining; - lock->good_mask |= fop->good; + /* When updating the good mask of the lock, we only take into consideration + * those bits corresponding to the bricks where the fop has been executed. + * Bad bricks are removed from good_mask, but once marked as bad it's never + * set to good until the lock is released and reacquired */ + + lock->good_mask &= fop->good | fop->remaining; } void __ec_fop_set_error(ec_fop_data_t * fop, int32_t error) @@ -967,6 +968,7 @@ out: /* We don't allow the main fop to be executed on bricks that have not * succeeded the initial xattrop. */ parent->mask &= fop->good; + ec_lock_update_good (lock, fop); /*As of now only data healing marks bricks as healing*/ lock->healing |= fop->healing; -- cgit