glusterfs.git/xlators/cluster/afr/src, branch v5.10

afr/lookup: Pass xattr_req in while doing a selfheal in lookup

2019-09-05T12:43:45+00:00

We were not passing xattr_req when doing a name self heal
as well as a meta data heal. Because of this, some xdata
was missing which causes i/o errors

Backport of>https://review.gluster.org/#/c/glusterfs/+/23024/
>Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
>Fixes: bz#1728770
>Signed-off-by: Mohammed Rafi KC 

Change-Id: I740ccdbfcf2667fe4a850c12ecdc4b9eeed08293
Fixes: bz#1749352
Signed-off-by: Mohammed Rafi KC

cluster/afr: Remove local from owners_list on failure of lock-acquisition

2019-05-08T14:07:12+00:00

When eager-lock lock acquisition fails because of say network failures, the
local is not being removed from owners_list, this leads to accumulation of
waiting frames and the application will hang because the waiting frames are
under the assumption that another transaction is in the process of acquiring
lock because owner-list is not empty. Handled this case as well in this patch.
Added asserts to make it easier to find these problems in future.

fixes bz#1699736
Change-Id: I3101393265e9827755725b1f2d94a93d8709e923
Signed-off-by: Pranith Kumar K

cluster/afr: Send truncate on arbiter brick from SHD

2019-03-12T07:21:26+00:00

Problem:
In an arbiter volume configuration SHD will not send any writes onto the arbiter
brick even if there is data pending marker for the arbiter brick. If we have a
arbiter setup on the geo-rep master and there are data pending markers for the files
on arbiter brick, SHD will not mark any data changelog during healing. While syncing
the data from master to slave, if the arbiter-brick is considered as ACTIVE, then
there is a chance that slave will miss out some data. If the arbiter brick is being
newly added or replaced there is a chance of slave missing all the data during sync.

Fix:
If there is data pending marker for the arbiter brick, send truncate on the arbiter
brick during heal, so that it will record truncate as the data transaction in changelog.

Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec
fixes: bz#1687687
Signed-off-by: karthik-us

cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelog

2019-02-18T14:39:55+00:00

If a fop to create an entry fails on one of the data brick,
we mark the pending changelog on the entry on brick for which
it was successful. This is done as part of post op phase to
make sure that entry gets healed even if it gets renamed to
some other path where its parent was not marked as bad.

As it happens as part of post op, we should consider thin-arbiter
to check if the brick, which was successful, is the good brick or not.
This will avoide split brain and other issues.

>Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
>Signed-off-by: Ashish Pandey 

Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
updates: bz#1672314
Signed-off-by: Ashish Pandey

afr: thin-arbiter 2 domain locking and in-memory state

2018-12-12T14:26:42+00:00

2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.

- All further write txn failures are handled based on this in-memory
value without querying the TA.

- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.

- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.

- Any write txns got after the release request are maintained in a ta_waitq.

- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.

- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.

Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1648205
Signed-off-by: Ravishankar N 
Signed-off-by: Ashish Pandey

afr: assign gfid during name heal when no 'source' is present.

2018-12-12T12:57:59+00:00

Problem:
If parent dir is in split-brain or has dirty xattrs set, and the file
has gfid missing on one of the bricks, then name heal won't assign the
gfid.

Fix:
Use the brick we select the gfid from as the 'source'.

Note: Problem was found while trying to debug a split-brain issue on
Cynthia Zhou's setup.

fixes: bz#1655545
Change-Id: Id088d4f0fb017aa35122de426654194e581ed742
Reported-by: Cynthia Zhou 
Signed-off-by: Ravishankar N 
(cherry picked from commit 4d58730c0cd6ab5db39aec8a15276f7bd3371b04)

afr: open_ftruncate_cbk should read fd from local->cont.open struct

2018-11-29T15:34:44+00:00

afr_open stores the fd as part of its local->cont.open struct
but when it calls ftruncate (if open flags contain O_TRUNC), the
corresponding cbk function (afr_ open_ftruncate_cbk) is
incorrectly referencing uninitialized local->fd. This patch fixes
the same.

Change-Id: Icbdedbd1b8cfea11d8f41b6e5c4cb4b44d989aba
updates: bz#1651322
Signed-off-by: Soumya Koduri 
(cherry picked from commit fda594875c4cdb2a22e27aa13f5c66bee032ccb5)

cluster/afr: Use 2 domain locking in SHD for thin-arbiter

2018-11-29T15:33:37+00:00

With this change when SHD starts the index crawl it requests
all the clients to release the AFR_TA_DOM_NOTIFY lock so that
clients will know the in memory state is no more valid and
any new operations needs to query the thin-arbiter if required.

When SHD completes healing all the files without any failure, it
will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on
TA to see whether there are any new failures happened by that time.
If there are new failures marked on TA, SHD will start the crawl
immediately to heal those failures as well. If there are no new
failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets
the xattrs on TA, so that both the data bricks will be considered
as good there after.

>Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
>fixes: bz#1579788
>Signed-off-by: karthik-us 
(cherry picked from commit 5784a00f997212d34bd52b2303e20c097240d91c)

Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
fixes: bz#1648205

all: fix the format string exceptions

2018-11-09T14:03:02+00:00

Currently, there are possibilities in few places, where a user-controlled
(like filename, program parameter etc) string can be passed as 'fmt' for
printf(), which can lead to segfault, if the user's string contains '%s',
'%d' in it.

While fixing it, makes sense to make the explicit check for such issues
across the codebase, by making the format call properly.

Fixes: CVE-2018-14661

Fixes: bz#1647666
Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4
Signed-off-by: Amar Tumballi

cluster/afr : Check for UP bricks before starting heal

2018-11-08T19:05:46+00:00

Problem:
Currently for replica volume, even if only one brick is UP
SHD will keep crawling index entries even if it can not
heal anything.

In thin-arbiter volume which is also a replica 2 volume,
this causes inode lock contention which in turn sends
upcall to all the clients to release notify locks, even
if it can not do anything for healing.

This will slow down the client performance and kills the
purpose of keeping in memory information about bad brick.

Solution: Before starting heal or even crawling, check if
sufficient number of children are UP and available to check
and heal entries.

(cherry picked from commit f73b4476b15f9d6d3dc3c8e20c9742aacd857f9f)

Change-Id: I011c9da3b37cae275f791affd56b8f1c1ac9255d
updates: bz#1644645
Signed-off-by: Ashish Pandey