glusterfs.git/tests/basic/afr, branch v5.2

afr: thin-arbiter 2 domain locking and in-memory state

2018-12-12T14:26:42+00:00

2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.

- All further write txn failures are handled based on this in-memory
value without querying the TA.

- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.

- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.

- Any write txns got after the release request are maintained in a ta_waitq.

- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.

- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.

Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1648205
Signed-off-by: Ravishankar N 
Signed-off-by: Ashish Pandey

cluster/afr: Use 2 domain locking in SHD for thin-arbiter

2018-11-29T15:33:37+00:00

With this change when SHD starts the index crawl it requests
all the clients to release the AFR_TA_DOM_NOTIFY lock so that
clients will know the in memory state is no more valid and
any new operations needs to query the thin-arbiter if required.

When SHD completes healing all the files without any failure, it
will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on
TA to see whether there are any new failures happened by that time.
If there are new failures marked on TA, SHD will start the crawl
immediately to heal those failures as well. If there are no new
failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets
the xattrs on TA, so that both the data bricks will be considered
as good there after.

>Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
>fixes: bz#1579788
>Signed-off-by: karthik-us 
(cherry picked from commit 5784a00f997212d34bd52b2303e20c097240d91c)

Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
fixes: bz#1648205

afr: thin-arbiter read txn changes

2018-09-05T08:28:23+00:00

If both data bricks are up, read subvol will be based on read_subvols.

If only one data brick is up:
- First qeury the data-brick that is up. If it blames the other brick,
allow the reads.

- If if doesn't, query the TA to obtain the source of truth.

TODO: See if in-memory state can be maintained for read txns (BZ 1624358).

updates: bz#1579788
Change-Id: I61eec35592af3a1aaf9f90846d9a358b2e4b2fcc
Signed-off-by: Ravishankar N

cluster/afr: Delegate name-heal when possible

2018-09-04T01:52:02+00:00

Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc

fixes bz#1622821
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K

cluster/afr: Delegate metadata heal with pending xattrs to SHD

2018-08-28T14:12:55+00:00

Problem:
When metadata-self-heal is triggered on the mount, it blocks
lookup until metadata-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs metadata heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
Only when the heal is needed but the pending xattrs are not set,
trigger metadata heal that could block lookup. This is the only
case where different clients may give different metadata to the
clients without heals, which should be avoided.

Updates bz#1622821
Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66
Signed-off-by: Pranith Kumar K

performance/readdir-ahead: keep stats of cached dentries in sync with modifications

2018-08-18T07:28:53+00:00

PROBLEM:

Stats of dentries that are readdirp'd ahead can become stale due to
fops like writes, truncate etc that modify the file pointed by
dentries. When a readdir is finally wound at offset corresponding to
these entries, the iatts that are returned to the application come
from readdir-ahead's cache, which are stale by now. This problem gets
further aggravated when caching translators/modules cache and continue
to serve this stale information.

FIX:

* Store the iatt in context of the inode pointed by dentry.
* Whenever the inode pointed by dentry undergoes modification, in cbk
  of modification fop, update the iatt stored in inode-ctx to reflect
  the modification.
* When serving a readdirp response from application, update iatts of
  dentries with the iatts stored in the context of inodes pointed by
  these dentries.
* Some fops don't have valid iatts in their responses. For eg., write
  response whose data is still cached in write-behind will have zeroed
  out stat. In this case keep only ia_type and ia_gfid and reset rest
  of the iatt members to zero.
  - fuse-bridge in this case just sends "entry" information back to
    kernel and attr is not sent.
  - gfapi sets entry->inode to NULL and zeroes out the entire stat
* There is one tiny race between the entry creation and a readdirp on
  its parent dir, which could cause the inode-ctx setting and inode
  ctx reading to happen on two different inode objects. To prevent
  this, when entry->inode doesn't eqaul to linked_inode,
  - fuse-bridge is made to send only "entry" information without
    attributes
  - gfapi sets entry->inode to NULL and zeroes out the entire stat.

Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Krutika Dhananjay 
Signed-off-by: Raghavendra G

tests: Add thin-arbiter.rc for writing tests for thin-arbiter

2018-08-18T04:20:15+00:00

fixes bz#1615789
Change-Id: I1f42e78fec5ddaf2a425dc4b82c9a20472aa146d
Signed-off-by: Pranith Kumar K

tests: Fix for gfid-mismatch-resolution-with-fav-child-policy.t failure

2018-08-14T04:30:25+00:00

This test was retried once on build
https://build.gluster.org/job/regression-on-demand-multiplex/174/
(logs for the first try is not available with this build)
Test case was failing in line #47 where it was was checking for the
heal count to be 0. Line #51 had passed that means file got the gfid
split brain resolved, and both the bricks had same gfids.
At line #54 it again failed which checks for the md5sum on both the
bricks. At this point the md5sum of the brick where the file got
impunged had the md5sum same as the newly created empty file. This
means the data heal has not happened for the file.
At line #64 enabling granular-entry-heal faild, but without the logs
it is not possible to debug this issue.

Change-Id: I56d854dbb9e188cafedfd24a9d463603ae79bd06
fixes: bz#1615331
Signed-off-by: karthik-us

tests: fix replace-brick-self-heal.t failure

2018-08-13T12:10:13+00:00

Please see BZ for details.

Change-Id: Id9273432874bc6a452ac96b2b8c7a61ea6c5b98d
Fixes: bz#1615239
Signed-off-by: Ravishankar N

tests: potential fixes for tests/basic/afr/add-brick-self-heal.t

2018-08-13T04:56:22+00:00

Please see bug description for details.

Change-Id: Ieb6bce6d1d5c4c31f1878dd1a1c3d007d8ff81d5
fixes: bz#1614654
Signed-off-by: Ravishankar N