glusterfs.git/xlators/cluster/afr/src/afr-transaction.c, branch v5.8

cluster/afr: Remove local from owners_list on failure of lock-acquisition

2019-05-08T14:07:12+00:00

When eager-lock lock acquisition fails because of say network failures, the
local is not being removed from owners_list, this leads to accumulation of
waiting frames and the application will hang because the waiting frames are
under the assumption that another transaction is in the process of acquiring
lock because owner-list is not empty. Handled this case as well in this patch.
Added asserts to make it easier to find these problems in future.

fixes bz#1699736
Change-Id: I3101393265e9827755725b1f2d94a93d8709e923
Signed-off-by: Pranith Kumar K

cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelog

2019-02-18T14:39:55+00:00

If a fop to create an entry fails on one of the data brick,
we mark the pending changelog on the entry on brick for which
it was successful. This is done as part of post op phase to
make sure that entry gets healed even if it gets renamed to
some other path where its parent was not marked as bad.

As it happens as part of post op, we should consider thin-arbiter
to check if the brick, which was successful, is the good brick or not.
This will avoide split brain and other issues.

>Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
>Signed-off-by: Ashish Pandey 

Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
updates: bz#1672314
Signed-off-by: Ashish Pandey

afr: thin-arbiter 2 domain locking and in-memory state

2018-12-12T14:26:42+00:00

2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.

- All further write txn failures are handled based on this in-memory
value without querying the TA.

- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.

- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.

- Any write txns got after the release request are maintained in a ta_waitq.

- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.

- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.

Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1648205
Signed-off-by: Ravishankar N 
Signed-off-by: Ashish Pandey

cluster/afr: Make data eager-lock decision based on number of locks

2018-10-05T14:37:01+00:00

For both Virt and block workloads the file is opened multiple times
leading to dynamically setting eager-lock to off for the workload.
Instead of depending on the number-of-open-fds, if we change the
logic to depend on number of inodelks, then it will give better
performance than the earlier logic. When there is an eager-lock
and number of inodelks is more than 1 we know that there is a
conflicting lock, so depend on that information to decide whether
to keep the current transaction go through delayed-post-op or not.

Locks xlator doesn't have implementation to query number of locks in
fxattrop in releases older than 3.10 so to keep things backward
compatible in 3.12, data transactions will use new logic where as
fxattrop transactions will use old logic. I am planning to send one
more patch which makes metadata domain locks also depend on
inodelk-count

Profile info for a dd of 500MB to a file with another fd opened
on the file using exec 250>filename

Without this patch:
 0.14      67.41 us      16.72 us    3870.82 us  892 FINODELK
 0.59     279.87 us      95.71 us    2085.89 us  898 FXATTROP
 3.46     366.43 us      81.75 us    6952.79 us 4000 WRITE
95.79  148733.99 us   50568.12 us  919127.86 us  273 FSYNC

With this patch:
 0.00      51.01 us      38.07 us      80.16 us    4 FINODELK
 0.00     235.43 us     235.43 us     235.43 us    1 TRUNCATE
 0.00     125.07 us      56.80 us     193.33 us    2 GETXATTR
 0.00     135.86 us      62.13 us     209.59 us    2  INODELK
 0.00     197.88 us     155.39 us     253.90 us    4 FXATTROP
 0.00     450.59 us     394.28 us     506.89 us    2  XATTROP
 0.00      56.96 us      19.06 us     406.59 us   23    FLUSH
37.81  273648.93 us      48.43 us 6017657.05 us   44   LOOKUP
62.18    4951.86 us      93.80 us 1143154.75 us 3999    WRITE

postgresql benchmark performance changed from ~1130 TPS to ~2300TPS
randio fio job inside Ovirt based VM went from ~600IOPs to ~2000IOPS

fixes bz#1635972
Change-Id: If7f7388d2f08cf7f17ca517a4ea222560661dc36
Signed-off-by: Pranith Kumar K

cluster/afr: Batch writes in same lock even when multiple fds are open

2018-10-05T14:37:01+00:00

Problem:
When eager-lock is disabled because of multiple-fds opened and app
writes come on conflicting regions, the number of locks grows very
fast leading to all the CPU being spent just in locking and unlocking
by traversing huge queues in locks xlator for granting locks.

Fix:
Reduce the number of locks in transit by bundling the writes in the
same lock and disable delayed piggy-pack when we learn that multiple
fds are open on the file. This will reduce the size of queues in the
locks xlator.  This also reduces the number of network calls like
inodelk/fxattrop.

Please note that this problem can still happen if eager-lock is
disabled as the writes will not be bundled in the same lock.

fixes bz#1635975
Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91
Signed-off-by: Pranith Kumar K

Land part 2 of clang-format changes

2018-09-12T12:22:45+00:00

Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu

afr: common thin-arbiter functions

2018-08-23T06:37:27+00:00

...that can be used by client and self-heal daemon, namely:

afr_ta_post_op_lock()
afr_ta_post_op_unlock()

Note: These are not yet consumed. They will be used in the write txn
changes patch which will introduce 2 domain locking.

updates: bz#1579788
Change-Id: I636d50f8fde00736665060e8f9ee4510d5f38795
Signed-off-by: Ravishankar N

afr: switch lk_owner only when pre-op succeeds

2018-07-18T16:49:56+00:00

Problem:
In a disk full scenario, we take a failure path in afr_transaction_perform_fop()
and go to unlock phase. But we change the lk-owner before that, causing unlock
to fail. When mount issues another fop that takes locks on that file, it hangs.

Fix:
Change lk-owner only when we are about to perform the fop phase.
Also fix the same issue for arbiters when afr_txn_arbitrate_fop() fails the fop.

Also removed the DISK_SPACE_CHECK_AND_GOTO in posix_xattrop. Otherwise truncate
to zero will fail pre-op phase with ENOSPC when the user is actually trying to
freee up space.

Change-Id: Ic4c8a596b4cdf4a7fc189bf00b561113cf114353
fixes: bz#1602236
Signed-off-by: Ravishankar N

cluster/afr: Mark dirty for entry transactions for quorum failures

2018-07-16T16:31:34+00:00

Problem:
If an entry creation transaction fails on quprum number of bricks
it might end up setting the pending changelogs on the file itself
on the brick where it got created. But the parent does not have
any entry pending marker set. This will lead to the entry not
getting healed by the self heal daemon automatically.

Fix:
For entry transactions mark dirty on the parent if it fails on
quorum number of bricks, so that the heal can do conservative
merge and entry gets healed by shd.

Change-Id: I56448932dd409b3ddb095e2ae32e037b6157a607
fixes: bz#1586020
Signed-off-by: karthik-us

cluster/afr: Make sure lk-owner is assigned at the time of lock

2018-07-04T14:38:37+00:00

Problem:
In the new eager-lock implementation lk-owner is assigned after the
'local' is added to the eager-lock list, so there exists a possibility
of lock being sent even before lk-owner is assigned.

Fix:
Make sure to assign lk-owner before adding local to eager-lock list

fixes bz#1597805
Change-Id: I26d1b7bcf3e8b22531f1dc0b952cae2d92889ef2
Signed-off-by: Pranith Kumar K