glusterfs.git/xlators/cluster, branch v6.3

cluster/ec: honor contention notifications for partially acquired locks

2019-06-03T04:08:06+00:00

EC was ignoring lock contention notifications received while a lock was
being acquired. When a lock is partially acquired (some bricks have
granted the lock but some others not yet) we can receive notifications
from acquired bricks, which should be honored, since we may not receive
more notifications after that.

Since EC was ignoring them, once the lock was acquired, it was not
released until the eager-lock timeout, causing unnecessary delays on
other clients.

This fix takes into consideration the notifications received before
having completed the full lock acquisition. After that, the lock will
be releaed as soon as possible.

Backport of:
> BUG: bz#1708156
> Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
> Signed-off-by: Xavi Hernandez 

Fixes: bz#1714172
Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
Signed-off-by: Xavi Hernandez

cluster/ec: Reopen shouldn't happen with O_TRUNC

2019-05-15T10:36:50+00:00

Problem:
Doing re-open with O_TRUNC will truncate the fragment even when it is not
needed needing extra heals

Fix:
At the time of re-open don't use O_TRUNC.

fixes bz#1709660
Change-Id: Idc6408968efaad897b95a5a52481c66e843d3fb8
Signed-off-by: Pranith Kumar K

afr: thin-arbiter lock release fixes

2019-05-15T04:16:52+00:00

- pass fop state instead of afr local to
afr_ta_dom_lock_check_and_release()

- avoid afr_lock_release_synctask() being called simultaneosuly from
notify code path and transaction (post-op) code path due to races.

- Check if the post-op on TA is valid based on event_gen checks.

- Invalidate in-memory information when we get TA child down.

Note: Thi patch addresses some pending review comments of commit
053b1309dc8fbc05fcde5223e734da9f694cf5cc
(https://review.gluster.org/#/c/glusterfs/+/20095/)

fixes: bz#1709130
Change-Id: I2ccd7e1b53362f9f3fed8680aecb23b5011eb18c
Signed-off-by: Ravishankar N 
(cherry picked from commit 9ab2747da78061882f6734df4b265bce11adaef1)

cluster/afr : TA: Return actual error code in case of failure

2019-05-13T05:36:51+00:00

In afr_ta_post_op_do, we were sending EIO for every failure.
However, the original error code should be sent.

Change-Id: I9fdc15dac00d758baf8e6f14db244f526481a63a
updates: bz#1709143
Signed-off-by: Ashish Pandey 
(cherry picked from commit 63159cdb5374f458d7d2bffec24d4720ffc96d6c)

cluster/dht: Refactor dht lookup functions

2019-05-09T12:10:25+00:00

Part 2: Modify dht_revalidate_cbk to call
dht_selfheal_directory instead of separate calls
to heal attrs and xattrs.

Change-Id: Id41ac6c4220c2c35484812bbfc6157fc3c86b142
fixes: bz#1707393
Signed-off-by: N Balachandran

cluster/dht: refactor dht lookup functions

2019-05-08T14:00:05+00:00

Part 1:  refactor the dht_lookup_dir_cbk
and dht_selfheal_directory functions.
Added a simple dht selfheal directory test

Change-Id: I1410c26359e3c14b396adbe751937a52bd2fcff9
updates: bz#1707393
Signed-off-by: N Balachandran

cluster/ec: fix fd reopen

2019-05-08T13:54:59+00:00

Currently EC tries to reopen fd's that have been opened while a brick
was down. This is done as part of regular write operations, just after
having acquired the locks, and it's sent as a sub-fop of the main write
fop.

There were two problems:

1. The reopen was attempted on all UP bricks, even if a previous lock
didn't succeed. This is incorrect because most probably the open will
fail.

2. If reopen is sent and fails, the error is propagated to the main
operation, causing it to fail when it shouldn't.

To fix this, we only attempt reopens on bricks where the current fop
owns a lock, and we prevent any error to be propagated to the main
fop.

To implement this behaviour an argument used to indicate the minimum
number of required answers has overloaded to also include some flags. To
make the change consistent, it has been necessary to rename the
argument, which means that a lot of files have been changed. However
there are no functional changes.

This change has also uncovered a problem in discard code, which didn't
correctely process requests of small sizes because no real discard fop
was being processed, only a write of 0's on some region. In this case
some fields of the fop remained uninitialized or with incorrect values.
To fix this, a new function has been created to simulate success on a
fop and it's used in the discard case.

Thanks to Pranith for providing a test script that has also detected an
issue in this patch. This patch includes a small modification of this
script to force data to be written into bricks before stopping them.

Backport of:
> Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec
> BUG: bz#1699866
> Signed-off-by: Xavi Hernandez 

Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec
Fixes: bz#1699917
Signed-off-by: Xavi Hernandez

cluster/afr: Remove local from owners_list on failure of lock-acquisition

2019-04-16T11:29:03+00:00

When eager-lock lock acquisition fails because of say network failures, the
local is not being removed from owners_list, this leads to accumulation of
waiting frames and the application will hang because the waiting frames are
under the assumption that another transaction is in the process of acquiring
lock because owner-list is not empty. Handled this case as well in this patch.
Added asserts to make it easier to find these problems in future.

fixes bz#1699731
Change-Id: I3101393265e9827755725b1f2d94a93d8709e923
Signed-off-by: Pranith Kumar K

ec: fix truncate lock to cover the write in tuncate clean

2019-04-16T10:57:05+00:00

ec_truncate_clean does writing under the lock granted for truncate,
but the lock is calculated by ec_adjust_offset_up, so that,
the write in ec_truncate_clean is out of lock.

Updates: bz#1699499
Change-Id: Idbe1fd48d26afe49c36b77db9f12e0907f5a4134
Signed-off-by: Kinglong Mee 
(cherry picked from commit 0e1223491e964096384edfae5032ed0d50d028ad)

cluster/afr: Thin-arbiter SHD fixes

2019-04-16T10:53:04+00:00

This patch address post-merge review comments for commit
5784a00f997212d34bd52b2303e20c097240d91c

Change-Id: I7ed954664a2ae8e1091d23ee3ceb9c66e83bfeac
fixes: bz#1699319
Signed-off-by: karthik-us