glusterfs.git/xlators/cluster, branch v4.1.3

dht: Delete MDS internal xattr from dict in dht_getxattr_cbk

2018-08-16T14:31:37+00:00

Problem: At the time of fetching xattr to heal xattr by afr
         it is not able to fetch xattr because posix_getxattr
         has a check to ignore if xattr name is MDS

Solution: To ignore same xattr update a check in dht_getxattr_cbk
          instead of having a check in posix_getxattr

Backport of:
 > BUG: 1584098
 > Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc
 > Signed-off-by: Mohit Agrawal 

Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc
fixes: bz#1611116
Signed-off-by: Mohit Agrawal

afr: switch lk_owner only when pre-op succeeds

2018-07-23T23:42:15+00:00

Problem:
In a disk full scenario, we take a failure path in afr_transaction_perform_fop()
and go to unlock phase. But we change the lk-owner before that, causing unlock
to fail. When mount issues another fop that takes locks on that file, it hangs.

Fix:
Change lk-owner only when we are about to perform the fop phase.
Also fix the same issue for arbiters when afr_txn_arbitrate_fop() fails the fop.

Also removed the DISK_SPACE_CHECK_AND_GOTO in posix_xattrop. Otherwise truncate
to zero will fail pre-op phase with ENOSPC when the user is actually trying to
freee up space.

Change-Id: Ic4c8a596b4cdf4a7fc189bf00b561113cf114353
fixes: bz#1603056
Signed-off-by: Ravishankar N 
(cherry picked from commit ec0d7d77de3e4bd485a4fa2e53c9137e25c71ce7)

cluster/afr: Prevent execution of code after call_count decrementing

2018-07-10T08:49:30+00:00

Problem:
When call_count is decremented by one thread, another thread can
go ahead with the operation leading to undefined behavior for the
thread executing statements after decrementing call count.

Fix:
Do the operations necessary before decrementing call count.

fixes bz#1599629
Change-Id: Icc90cd92ac16e5fbdfe534d9f0a61312943393fe
Signed-off-by: Pranith Kumar K 
(cherry picked from commit 03f1f5bdc46076178f1afdf8e2a76c5b973fe11f)

afr: heal gfids when file is not present on all bricks

2018-07-09T15:18:31+00:00

commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression
wherein if a file is present in only 1 brick of replica *and* doesn't
have a gfid associated with it, it doesn't get healed upon the next
lookup from the client. Fix it.

Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599
fixes: bz#1597117
Signed-off-by: Ravishankar N 
(cherry picked from commit eb472d82a083883335bc494b87ea175ac43471ff)

cluster/afr: Make sure lk-owner is assigned at the time of lock

2018-07-04T15:04:30+00:00

Problem:
In the new eager-lock implementation lk-owner is assigned after the
'local' is added to the eager-lock list, so there exists a possibility
of lock being sent even before lk-owner is assigned.

Fix:
Make sure to assign lk-owner before adding local to eager-lock list

fixes bz#1598193
Change-Id: I26d1b7bcf3e8b22531f1dc0b952cae2d92889ef2
Signed-off-by: Pranith Kumar K 
(cherry picked from commit c6f93e422855f656d3a86461a8458f37ad0103eb)

afr: don't update readables if inode refresh failed on all children

2018-07-02T17:24:47+00:00

Problem:
If inode refresh failed on all children of afr due to ENOENT (say file
migrated by dht), it resets the readables to zero. Any inflight txn which
then later comes on the inode fails with EIO because no readable
children present for the inode.

Fix:
Don't update readables when inode refresh fails on *all* children of
afr. In that way any inflight txns will either proceed with its own inode
refresh if needed and fail it with the right errno or use the old value
of readables and continue with the txn.

Also, add quorum checks to the beginning of afr_transaction(). Otherwise, we
seem to be winding the lock and checking for quorum only in pre-op pahse.

Note: This should ideally fix BZ 1329505 since the stop gap fix for
it is has been reverted at https://review.gluster.org/#/c/20028.

Change-Id: Ia638c092d8d12dc27afb3cdad133394845061319
updates: bz#1597116
Signed-off-by: Ravishankar N 
(cherry picked from commit 0f13eed0c1fa74cefed486538b02e0c8a8708456)

cluster/dht: Fix rebalance log msg

2018-05-31T13:13:39+00:00

Corrected the name of the xattr and fixed
the code to log an error only if op_errno
is not ENODATA or ENOATTR.

Change-Id: I42c5b1d838eec586ac7bed2471eb1d27ff09a9ea
fixes: bz#1583769
Signed-off-by: N Balachandran

afr: fix bug-1363721.t failure

2018-05-25T12:57:45+00:00

Problem:
In the .t, when the only good brick was brought down, writes on the fd were
still succeeding on the bad bricks. The inflight split-brain check was
marking the write as failure but since the write succeeded on all the
bad bricks, afr_txn_nothing_failed() was set to true and we were
unwinding writev with success to DHT and then catching the failure in
post-op in the background.

Fix:
Don't wind the FOP phase if the write_subvol (which is populated with readable
subvols obtained in pre-op cbk) does not have at least 1 good brick which was up
when the transaction started.

Note: This fix is not related to brick muliplexing. I ran the .t
10 times with this fix and brick-mux enabled without any failures.

Change-Id: I915c9c366aa32cd342b1565827ca2d83cb02ae85
updates: bz#1581548
Signed-off-by: Ravishankar N 
(cherry picked from commit 985a1d15db910e012ddc1dcdc2e333cc28a9968b)

cluster/ec: Fix pre-op xattrop management

2018-05-25T02:06:11+00:00

Multiple pre-op xattrop can be simultaneously being processed. On the cbk
it was checked if the fop was waiting for some specific data (like size and
version) and, if so, it was assumed that this answer should contain that
data.

This is not true, since a fop can be waiting for some data, but it may come
from the xattrop of another fop.

This patch differentiates between needing some information and providing it.

This is related to parallel writes. Disabling them fixed the problem, but
also prevented concurrent reads. A change has been made so that disabling
parallel writes still allows parallel reads.

Backport of:
> BUG: 1578325

Fixes: bz#1582056
Change-Id: I74772ad6b80b7b37805da93d5ec3ae099e96b041
Signed-off-by: Xavi Hernandez

cluster/dht: Remove EIO from dht_inode_missing

2018-05-22T10:27:57+00:00

Removed EIO from the list of errnos that triggered
a migrate check task.

(cherry picked from commit c925962b91c67c8cd2391df7dd0251e0cbf66648)

Change-Id: I7f89c7a16056421588f1af2377cebe6affddcb47
fixes: bz#1579674
Signed-off-by: N Balachandran