glusterfs.git/tests/bugs/replicate, branch v3.8.14

afr: don't do a post-op on a brick if op failed

2017-04-29T11:26:24+00:00

Problem:
In afr-v2, self-blaming xattrs are not there by design. But if the FOP
failed on a brick due to an error other than ENOTCONN (or even due to
ENOTCONN, but we regained connection before postop was wound), we wind
the post-op also on the failed brick, leading to setting self-blaming
xattrs on that brick. This can lead to undesired results like healing of
files in split-brain etc.

Fix:
If a fop failed on a brick on which pre-op was successful, do not
perform post-op on it. This also produces the desired effect of not
resetting the dirty xattr on the brick, which is how it should be
because if the fop failed on a brick, there is no reason to clear the
dirty bit which actually serves as an indication of the failure.

> Reviewed-on: https://review.gluster.org/16976
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 

Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4
BUG: 1443319
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17082
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Undo pending xattrs only on the up bricks

2017-04-07T11:56:55+00:00

Problem:
While doing conservative merge, even if a brick is down, it will reset
the pending xattr on that. When that brick comes up, as part of the
heal, it will consider this brick as the source and removes the entries
on the other bricks, which leads to data loss.

Fix:
Undo pending only for the bricks which are up.

> Change-Id: I18436fa0bb1faa5f60531b357dea3f6b20446303
> BUG: 1433571
> Signed-off-by: karthik-us 
> Reviewed-on: https://review.gluster.org/16913
> Reviewed-by: Pranith Kumar Karampuri 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Ravishankar N 
(cherry picked from commit f91596e6566c605e70a31a60523d11f78a097c3c)

Change-Id: Id20c9ce53ee59f005d977494903247e2a8024ed1
BUG: 1436231
Signed-off-by: karthik-us 
Reviewed-on: https://review.gluster.org/16956
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Pranith Kumar Karampuri

afr: all children of AFR must be up to resolve s-brain

2017-02-15T15:08:04+00:00

Problem:
The various split-brain resolution policies (favorite-child-policy based,
CLI based and mount (get/setfattr) based) attempt to resolve split-brain
even when not all bricks of replica are up. This can be a problem when
say in a replica 3, the only good copy is down and the other 2 bricks
are up and blame each other (i.e. split-brain). We end up healing the
file in such a  case and allow I/O on it.

Fix:
A decision on whether the file is in split-brain or not must be taken
only if we are able to examine the afr xattrs of *all* bricks of a given
replica.

> Reviewed-on: https://review.gluster.org/16476
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 

(cherry picked from commit 0e03336a9362e5717e561f76b0c543e5a197b31b)

Change-Id: Icddb1268b380005799990f5379ef957d84639ef9
BUG: 1420984
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/16589
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

afr: allow I/O when favorite-child-policy is enabled

2017-01-08T10:48:49+00:00

Problem:
Currently, I/O on a split-brained file fails even when the
favorite-child-policy is set until the self-heal is complete.

Fix:
If a valid 'source' is found using the set favorite-child-policy, inspect
and reset the afr pending xattrs on the 'sinks' (inside appropriate locks),
refresh the inode and then proceed with the read or write transaction.

The resetting itself happens in the self-heal code and hence can also
happen in the client side background-heal or by the shd's index-heal in
addition to the txn code path explained above. When it happens in via
heal, we also add checks in undo-pending to not reset the sink xattrs
again.

> Reviewed-on: http://review.gluster.org/15673
> Tested-by: Pranith Kumar Karampuri 
> Smoke: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 

Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626
BUG: 1378547
Signed-off-by: Ravishankar N 
Reported-by: Simon Turcotte-Langevin 
Reviewed-on: http://review.gluster.org/16091
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos

cluster/afr: Fix missing name indices due to EEXIST error

2016-12-29T15:09:10+00:00

        Backport of: http://review.gluster.org/16286

PROBLEM:
Consider a volume with  granular-entry-heal and sharding enabled. When
a replica is down and a shard is created as part of a write, the name
index is correctly created under indices/entry-changes/.
Now when a read on the same region triggers another MKNOD, the fop
fails on the online bricks with EEXIST. By virtue of this being a
symmetric error, the failed_subvols[] array is reset to all zeroes.
Because of this, before post-op, the GF_XATTROP_ENTRY_OUT_KEY will be
set, causing the name index, which was created in the previous MKNOD
operation, to be wrongly deleted in THIS MKNOD operation.

FIX:
The ideal fix would have been for a transaction to delete the name
index ONLY if it knows it is the one that created the index in the first
place. This would involve gathering information as to whether THIS xattrop
created the index from individual bricks, aggregating their responses and
based on the various posisble combinations of responses, decide whether to
delete the index or not. This is rather complex. Simpler fix would be
for post-op to examine local->op_ret in the event of no failed_subvols
to figure out whether to delete the name index or not. This can occasionally
lead to creation of stale name indices but they won't be affecting the IO path
or mess with pending changelogs in any way and self-heal in its crawl of
"entry-changes" directory would take care to delete such indices.

Change-Id: Icc642a987d1b6a5097562315aecf1263ed35ceb6
BUG: 1408786
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/16293
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

tests: Fix spurious failure in tests/bugs/replicate/bug-1402730.t

2016-12-23T09:00:04+00:00

        Backport of: http://review.gluster.org/16193

Replace the EXPECT '00000001' with EXPECT_NOT '00000000'. This is
because occasionally a name-heal is performing new-entry marking on
'c' causing the pending entry changelog on it to become '00000002'.

Change-Id: Ib7b0d64c8de2498c2ffb3b8e06228694f2c55755
BUG: 1406740
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/16224
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Fix per-txn optimistic changelog initialisation

2016-12-13T09:03:15+00:00

        Backport of: http://review.gluster.org/16075

Incorrect initialisation of local->optimistic_change_log was leading
to skipped pre-op and post-op even when a brick didn't participate in
the txn because it was down.
The result - missing granular name index resulting in some entries
never getting healed.

FIX:
Initialise local->optimistic_change_log just before pre-op.

Also fixed granular entry heal to create the granular name index in
pre-op as opposed to post-op. This is to prevent loss of granular
information when during an entry txn, the good (src) brick goes
offline before the post-op is done. This would cause self-heal to
do conservative merge (since dirty xattr is the only information
available), which when granular-entry-heal is enabled, expects
granular indices, the lack of which can lead to loss of data in
the worst case.

Change-Id: Ibc0fbfb3fa21c578e28868d9e30b274e33c12064
BUG: 1403646
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/16105
Reviewed-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order

2016-08-22T10:22:36+00:00

        Backport of: http://review.gluster.org/15080

When the bricks are brought offline and then online in cyclic
order while writes are in progress on a file, thanks to inode
refresh in write txns, AFR will mostly fail the write attempt
when the only good copy is offline. However, there is still a
remote possibility that the file will run into split-brain if
the brick that has the lone good copy goes offline *after* the
inode refresh but *before* the write txn completes (I call it
in-flight split-brain in the patch for ease of reference),
requiring intervention from admin to resolve the split-brain
before the IO can resume normally on the file. To get around this,
the patch does the following things:
i) retains the dirty xattrs on the file
ii) avoids marking the last of the good copies as bad (or accused)
    in case it is the one to go down during the course of a write.
iii) fails that particular write with the appropriate errno.

This way, we still have one good copy left despite the split-brain situation
which when it is back online, will be chosen as source to do the heal.

> Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a
> BUG: 1363721
> Signed-off-by: Krutika Dhananjay 
> Reviewed-on: http://review.gluster.org/15080
> Tested-by: Pranith Kumar Karampuri 
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Ravishankar N 
> Reviewed-by: Oleksandr Natalenko 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Pranith Kumar Karampuri 
(cherry picked from commit fcb5b70b1099d0379b40c81f35750df8bb9545a5)

Change-Id: I157f1025aebd6624fa3d412abc69a4ae6f2fe9e0
BUG: 1367272
Signed-off-by: Krutika Dhananjay 
Signed-off-by: Oleksandr Natalenko 
Reviewed-on: http://review.gluster.org/15221
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

glusterd: Convert volume to replica after adding brick self heal is not triggered

2016-08-18T16:55:17+00:00

Problem:  After add brick to a distribute volume to convert to replica is not
          triggering self heal.

Solution: Modify the condition in brick_graph_add_index to set trusted.afr.dirty
          attribute in xlator.

Test    : To verify the patch followd below steps
          1) Create a single node volume
             gluster volume create  
          2) Start volume and create mount point
             mount -t glusterfs :/DIS /mnt
          3) Touch some file and write some data on file
          4) Add another brick along with replica 2
             gluster volume add-brick DIS replica 2 :/dist2/brick2
          5) Before apply the patch file size is 0 bytes in mount point.
Backport of commit 87bb8d0400d4ed18dd3954b1d9e5ca6ee0fb9742
BUG: 1366440
Signed-off-by: Mohit Agrawal 

> Change-Id: Ief0ccbf98ea21b53d0e27edef177db6cabb3397f
> Signed-off-by: Mohit Agrawal 
> Reviewed-on: http://review.gluster.org/15118
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Ravishankar N 
> Reviewed-by: Anuradha Talur 
> Smoke: Gluster Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> (cherry picked from commit 87bb8d0400d4ed18dd3954b1d9e5ca6ee0fb9742)

Change-Id: Icd104cf5a2152a9c606dac209746e2953c4d293e
Reviewed-on: http://review.gluster.org/15151
Smoke: Gluster Build System 
Reviewed-by: Ravishankar N 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Anuradha Talur 
Reviewed-by: Atin Mukherjee 
Tested-by: Atin Mukherjee

afr: some coverity fixes

2016-07-28T13:54:49+00:00

Note: This is a backport of http://review.gluster.org/14895.
It contains:
i) fixes that prevent deadlocks (afr-common.c).
ii) fixes over-writing op-errno=ENOMEM with possible other values
(afr-inode-read.c).
iii) prevents doing further operations with a NULL dictionary if
allocation fails (afr-self-heal-data.c).
iv) prevents falsely marking a sink as healed if metadata heal fails
midway(afr-self-heal-metadata.c).
v) other minor fixes.

Considering the above are not trivial fixes, the patch is a good
candidate for merging in 3.8 branch.

Thanks to Krutika for a cleaner way to track inode refs in
afr_set_split_brain_choice().

Change-Id: I2d968d05b815ad764b7e3f8aa9ad95a792b3c1df
BUG: 1360556
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/15018
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Pranith Kumar Karampuri