glusterfs.git/xlators/cluster/afr/src, branch release-3.11

afr: mark non sources as sinks in metadata heal

2017-07-19T11:24:31+00:00

Problem:
In a 3 way replica, when the source brick does not have pending xattrs
for the sinks, but the 2 sinks blame each other, metadata heal was not
happpening because we were not setting all non-sources as sinks.

Fix: Mark all non-sources as sinks, like it is done in data and entry
heal.

> Reviewed-on: https://review.gluster.org/17717
> Smoke: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 
> CentOS-regression: Gluster Build System 
(cherry picked from commit 77c1ed5fd299914e91ff034d78ef6e3600b9151c)

Change-Id: I534978940f5087302e307fcc810a48ffe898ce08
BUG: 1471611
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17781
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

cluster/afr: Returning single and list of node uuids from AFR

2017-06-22T20:54:58+00:00

Problem:
The change in afr to return list of node uuids was causing problems
with geo-rep.

Fix:
This patch will allow to get the single node uuid as it was doing
before with the key "GF_XATTR_NODE_UUID_KEY", and will also allow
to get the list of node uuids by using a new key
"GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve the problem with
geo-rep and any other feature which were depending on this.

> Change-Id: I09885dac6dfca127be94b708470c8c2941356f9a
> BUG: 1462790
> Signed-off-by: karthik-us 
> Reviewed-on: https://review.gluster.org/17576
> Smoke: Gluster Build System 
> Reviewed-by: Ravishankar N 
> Reviewed-by: Kotresh HR 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Jeff Darcy 

(cherry picked from commit 475ec9928ef96b63a0bfa859a9ae68709275033c)

Change-Id: I6f6e8320a1eb5909ef601e23f8a8d3499807e319
BUG: 1463250
Signed-off-by: karthik-us 
Reviewed-on: https://review.gluster.org/17602
Reviewed-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

cluster/afr: Implement quorum for lk fop

2017-06-20T13:49:23+00:00

Problem:
At the moment when we have replica 3 or arbiter setup, even when
lk succeeds on just one brick we give success to application which
is wrong

Fix:
Consider quorum-number of successes as success when quorum is enabled.

 >BUG: 1461792
 >Change-Id: I5789e6eb5defb68f8a0eb9cd594d316f5cdebaea
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: https://review.gluster.org/17524
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Ravishankar N 

BUG: 1462661
Change-Id: I5789e6eb5defb68f8a0eb9cd594d316f5cdebaea
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/17578
Smoke: Gluster Build System 
Reviewed-by: Ravishankar N 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

afr: add errno to afr_inode_refresh_done()

2017-06-06T12:55:00+00:00

Backport of https://review.gluster.org/17413 and
https://review.gluster.org/17436

Problem:
When parellel `rm -rf`s were being done from cifs clients, opendir might
fail on some replicas with ENOENT. DHT ignores partial opendir failures
in dht_fd_cbk() and winds readdirs on those replicas. Afr inode refresh
(as a part of readdirp read_txn) sees in its fd context that the state
of the fds is *not* AFR_FD_OPENED and bails out to
afr_inode_refresh_done() without doing a refresh. When this happens, the
errno is set as EIO due to lack of readable subvols, logging split-brain
messages in the logs.

Fix:
Introduce an errno argument to afr_inode_refresh_do() to bail out with
the right error value when inode refresh is not performed.

Change-Id: I075707fbb73fd93a923b77b923a96aac79e847f9
BUG: 1457616
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17434
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Return the list of node_uuids for the subvolume

2017-05-19T13:20:57+00:00

Problem:
AFR was returning the node uuid of the first node for every file if
the replica set was healthy, which was resulting in only one node
migrating all the files.

Fix:
With this patch AFR returns the list of node_uuids to the upper layer,
so that they can decide on which node to migrate which files, resulting
in improved performance. Ordering of node uuids will be maintained based
on the ordering of the bricks. If a brick is down, then the node uuid
for that will be set to all zeros.

>Reviewed-on: https://review.gluster.org/17084
> Reviewed-by: Pranith Kumar Karampuri 
> Tested-by: Pranith Kumar Karampuri 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
(cherry picked from commit 0a50167c0a8f950f5a1c76442b6c9abea466200d)

Change-Id: I73ee0f9898ae473584fdf487a2980d7a6db22f31
BUG: 1451573
Signed-off-by: karthik-us 
Reviewed-on: https://review.gluster.org/17336
Tested-by: Ravishankar N 
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

afr: propagate correct errno for fop failures in arbiter

2017-05-16T16:29:05+00:00

Problem:
If quorum is not met in fop cbk, arbiter sends an ENOTCONN error to the
upper xlators. In a VM workload with sharding enabled, this was leading
to the VM pausing when replace-brick was performed as described in the BZ.

Fix:
Move the fop cbk arbitration logic to afr_handle_quorum() because in
normal replica volumes, that is the function that has the quorum and
errno checks in the fop cbk path before doing a post-op.

Thanks to Pranith for suggesting this approach.

> Reviewed-on: https://review.gluster.org/17235
> Smoke: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
(cherry picked from commit 93c850dd2a513fab75408df9634ad3c970a0e859)

Change-Id: Ie6315db30c5e36326b71b90a01da824109e86796
BUG: 1450933
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17294
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

afr: include quorum type and count when dumping afr priv

2017-05-12T13:36:40+00:00

Squash of  https://review.gluster.org/17196 and
           https://review.gluster.org/17215

Dump the client quorum type ('auto', 'fixed' or 'none'). If it is 'fixed',
also dump the quorum-count. This information will be available in the client
statedump and in
//.meta/graphs/active/testvol-replicate-X/private.

Also added a test-case.

Change-Id: I91367c5250b26efb35e5f7d7c397def09cc77cbc
BUG: 1449921
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17243
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

afr: send the correct iatt values in fsync cbk

2017-05-12T13:35:15+00:00

Problem:
afr unwinds the fsync fop with an iatt buffer from one of its children
on whom fsync was successful. But that child might not be a valid read
subvolume for that inode because of pending heals or because it happens
to be the arbiter brick etc. Thus we end up sending the wrong iatt to
mdcache which will in turn serve it to the application on a subsequent
stat call as reported in the BZ.

Fix:
Pick a child on whom the fsync was successful *and* that is readable as
indicated in the inode context.

> Reviewed-on: https://review.gluster.org/17227
> CentOS-regression: Gluster Build System 
> Reviewed-by: Pranith Kumar Karampuri 
> NetBSD-regression: NetBSD Build System 
> Smoke: Gluster Build System 
(cherry picked from commit 1a8fa910ccba7aa941f673302c1ddbd7bd818e39)

Change-Id: Ie8647289219cebe02dde4727e19a729b3353ebcf
BUG: 1449924
RCA'ed-by: Miklós Fokin 
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/17244
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

Halo Replication feature for AFR translator

2017-05-08T05:37:07+00:00

	Backport of https://review.gluster.org/16177
		    https://review.gluster.org/17174

Merged both these patches to make sure IPV6 changes don't make it to 3.11 at all.

Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster.  Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.

In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.

There are a few new volume options due to this feature:
  halo-shd-latency:  The threshold below which self-heal daemons will
  consider children (bricks) connected.

  halo-nfsd-latency: The threshold below which NFS daemons will consider
  children (bricks) connected.

  halo-latency: The threshold below which all other clients will
  consider children (bricks) connected.

  halo-min-replicas: The minimum number of replicas which are to
  be enforced regardless of latency specified in the above 3 options.
  If the number of children falls below this threshold the next
  best (chosen by latency) shall be swapped in.

New FUSE mount options:
  halo-latency & halo-min-replicas: As descripted above.

This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.

Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
  the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
  Writes & deletes will be protected via entry-locks as usual preventing
  concurrent writes into files which are undergoing replication.  Read operations
  on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
  other regions.  The take away from this is care should be taken to ensure
  multiple writers do not write the same files resulting in a gfid split-brain
  which will require resolution via split-brain policies (majority, mtime &
  size).  Recommended method for preventing this is using the nfs-auth feature to
  define which region for each share has RW permissions, tiers not in the origin
  region should have RO perms.

TODO:
- Synchronous clients (including the SHD) should choose clients from their own
  region as preferred sources for reads.  Most of the plumbing is in place for
  this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
  (i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
  to synchronously write to

Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests

Reviewers: jackl, dph, cjh, meyering

Reviewed By: meyering

Subscribers: ethanr

Differential Revision: https://phabricator.fb.com/D1272053

Tasks: 4117827

 >Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
 >BUG: 1428061
 >Signed-off-by: Kevin Vigor 
 >Reviewed-on: http://review.gluster.org/16099
 >Reviewed-on: https://review.gluster.org/16177
 >Tested-by: Pranith Kumar Karampuri 
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Pranith Kumar Karampuri 

BUG: 1448416
Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/17192
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kaushal M

cluster/afr: GFID split brain resolution with favorite-child-policy

2017-04-21T00:38:54+00:00

Problem:
Currently the automatic split brain resolution with favorite child policy
is not resolving the GFID split brains.

Fix:
When there is a GFID split brain and the favorite child policy is set to
size/mtime/ctime/majority, based on the policy decide on the source and
sinks. Delete the entry from the sinks and recreate it from the source.
Mark the appropriate pending attributes and resolve the GFID split brain.
When the heal takes place it will complete the pending heals and reset
the attributes.

Change-Id: Ie30e5373f94ca6f276745d9c3ad662b8acca6946
BUG: 1430719
Signed-off-by: karthik-us 
Reviewed-on: https://review.gluster.org/16878
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Ravishankar N 
CentOS-regression: Gluster Build System