glusterfs.git/xlators/cluster/afr, branch v3.8.0

cluster/afr: Unwind with xdata in inode-write fops

2016-06-13T10:22:31+00:00

When there is a failure afr was not unwinding xdata to xlators above.
xdata need not be NULL on failures. So it is important to send it
to parent xlators.

 >Change-Id: Ic36aac10a79fa91121961932dd1920cb1c2c3a4c
 >BUG: 1340623
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14567
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Jeff Darcy 

BUG: 1342178
Change-Id: Idd74d2bc898fe5aef537ab48c1754510030c8825
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14618
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

afr: Consider ENOSPC and EDQUOT as symmetric errors

2016-06-13T10:14:49+00:00

Backport of http://review.gluster.org/#/c/14604/

Problem:
Since commit 8eaa3506ead4f11b81b146a9e56575c79f3aad7b, in replica 3, if a
brick is down and a create fails on the other 2 brick with EDQUOT, we consider
it an unsymmetric error and hence do not do post-op. So the dirty xattr
remains set on the parent dir, leading to conservative merges during heal when
all bricks are up. i.e. a file deleted on the source might re-appear after heal.

Fix:
Consider ENOSPC and  EDQUOT as symmetric errors since there is no
possibility of partial inode or entry modification operations possible when
quota is enabled. IOW, if quota reports EDQUOT, the no. of bytes written
(or not written) will be the same on all bricks of the replica.
Likewise, the entry operation (create, mkdir...) will either succeed or
not succeed on all bricks.

Change-Id: Iacb1108e9ef4a918e36242fb4a957455133744e9
BUG: 1344559
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/14687
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Niels de Vos

cluster/afr: Unwind xdata_rsp even in case of failures

2016-06-10T15:36:21+00:00

DHT expects GF_PREOP_CHECK_FAILED to be present in xdata_rsp in case of mkdir
failures because of stale layout. But AFR was unwinding null xdata_rsp in case
of failures. This was leading to mkdir failures just after remove-brick. Unwind
the xdata_rsp in case of failures to make sure the response from brick reaches
dht.

 >BUG: 1340623
 >Change-Id: Idd3f7b95730e8ea987b608e892011ff190e181d1
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14553
 >NetBSD-regression: NetBSD Build System 
 >Reviewed-by: Ravishankar N 
 >Smoke: Gluster Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Anuradha Talur 
 >Reviewed-by: Krutika Dhananjay 

BUG: 1342178
Change-Id: Iaacadcad0f76979fb250bd008b8e43f0e7acf642
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14617
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Niels de Vos

afr: Automagic unsplit-brain by [ctime|mtime|size|majority]

2016-05-27T14:56:37+00:00

Backport of http://review.gluster.org/#/c/14026/

Introduce cluster.favorite-child-policy which when enabled with
[ctime|mtime|size|majority], automatically heals files that are in
split-brian.

The majority policy will not pick a source if there is no majority.
The other three policies pick the first brick with a valid reply and
non-zero ctime/mtime/size as source.

Change-Id: I93623a914dce2839957fce87b514050e9d274d4c
BUG: 1339639
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/14535
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Attempt name-index purge even on full-heal of directory

2016-05-27T14:51:00+00:00

        Backport of: http://review.gluster.org/#/c/14516/

Change-Id: I429ae628a310fd254bac7dde6d1e034d65608047
BUG: 1339436
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/14527
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Do not inode_link in afr

2016-05-25T10:31:27+00:00

Race is explained at
https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0

This patch also handles performing of self-heal with shd-pid.
Also performs the healing with this->itable's inode rather than
main itable.

 >BUG: 1337405
 >Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14422
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Krutika Dhananjay 

BUG: 1337870
Change-Id: Ifb476eeed2ff73a44e481d64074599ab0707c725
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14455
Smoke: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

cluster/afr: Refresh inode for inode-write fops in need

2016-05-24T21:42:30+00:00

Problem:
If a named fresh-lookup is done on an loc and the fop fails on one of the
bricks or not sent on one of the bricks, but by the time response comes to afr,
if the brick is up, 'can_interpret' will be set to false in afr_lookup_done(),
this will lead to inode-ctx for that inode to be not set, this can lead to EIO
in case of a transaction as it depends on 'readable' array to be available by
that point.

Fix:
Refresh inode for inode-write fops for the ctx to be set if it is not already
done at the time of named fresh-lookup or if the file is in split-brain where
we need to perform one more refresh before failing the fop to check if the file
is still in split-brain or not.

 >BUG: 1336612
 >Change-Id: I5c50b62c8de06129b8516039f7c252e5008c47a5
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14368
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >Reviewed-by: Ravishankar N 
 >CentOS-regression: Gluster Build System 

BUG: 1337822
Change-Id: I0f904ebaa78b99cbb11546e08c9fc1562e9a3eef
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14449
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Anuradha Talur 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

cluster/afr: Check for required number of entrylks

2016-05-24T11:19:58+00:00

Backport of http://review.gluster.org/#/c/14358/

Problem:
Parallel rmdir operations on the same directory results in ENOTCONN
messages eventhough there was no network disconnect.

In blocking entry lock during rmdir, AFR takes 2 set of locks on all its
children-One (parentdir,name of dir to be deleted), the other (full lock
on the dir being deleted). We proceed to pre-op stage even if only a
single lock (but not all the needed locks) was obtained, only to fail it with
ENOTCONN because afr_locked_nodes_get() returns zero nodes  in
afr_changelog_pre_op().

Fix:
After we get replies for all blocking lock requests, if we don't have
the minimum number of locks to carry out the FOP, unlock and fail the
FOP. The op_errno will be that of the last failed reply we got, i.e.
whatever is set in afr_lock_cbk().

Change-Id: Ibef25e65b468ebb5ea6ae1f5121a5f1201072293
BUG: 1338051
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/14461
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

cluster/afr: If possible give errno received from lower xlators

2016-05-24T10:40:44+00:00

In case of 3 way replication with quorum enabled with sharding,
if one bricks is brought down and brought back up sometimes
fops fail with EROFS because the mknod of shard file fails with
two good nodes with EEXIST. So even when quorum is not met, it
makes sense to unwind with the errno returned by lower xlators
as much as possible.

 >Change-Id: Iabd91cd7c270f5dfe6cbd18c50e59c299a331552
 >BUG: 1336612
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/14369
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Ravishankar N 

BUG: 1337822
Change-Id: Ic2450d34d3bf1fb6be754ce890aeca960fe7ad1f
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/14448
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Anuradha Talur 
Reviewed-by: Niels de Vos

cluster/afr : Do post-op in case of symmetric errors

2016-05-24T08:37:10+00:00

        Backport of: http://review.gluster.org/#/c/14310/

In afr_changelog_post_op_now(), if there was any error,
meaning op_ret < 0, post-op was not being done even when
the errors were symmetric and there were no "failed
subvols".

Fix:
When the errors are symmetric, perform post-op.

How was the bug found :
In a 1 X 3 volume with shard and write behind on
when writes were done into a file with one brick down,
the trusted.afr.dirty xattr's value for .shard directory
would keep increasing as post op was not done but pre-op was.
This incorrectly showed .shard to be in split-brain.

RCA:
When WB is on, due to multiple writes being sent on
offset lying in the same shard, chances are that
same shard file will be created more than once
with the second one failing with op_ret < 0
and op_errno = EEXIST.

As op_ret was negative, afr wouldn't do post-op,
leading to no decrement of trusted.afr.dirty xattr.
Thus showing .shard directory to be in split-brain.

        >Change-Id: I711bdeaa1397244e6a7790e96f0c84501798fc59
        >BUG: 1335652
        >Signed-off-by: Anuradha Talur 

Change-Id: I711bdeaa1397244e6a7790e96f0c84501798fc59
BUG: 1335829
Signed-off-by: Anuradha Talur 
Reviewed-on: http://review.gluster.org/14331
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Ravishankar N 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos