glusterfs.git/xlators/cluster/ec/src, branch v8dev

cluster/ec: Prevent double pre-op xattrops

2019-06-22T05:06:15+00:00

Problem:
Race:
Thread-1                                    Thread-2
1) Does ec_get_size_version() to perform
pre-op fxattrop as part of write-1
                                           2) Calls ec_set_dirty_flag() in
                                              ec_get_size_version() for write-2.
					      This sets dirty[] to 1
3) Completes executing
ec_prepare_update_cbk leading to
ctx->dirty[] = '1'
					   4) Takes LOCK(inode->lock) to check if there are
					      any flags and sets dirty-flag because
				              lock->waiting_flag is 0 now. This leads to
					      fxattrop to increment on-disk dirty[] to '2'

At the end of the writes the file will be marked for heal even when it doesn't need heal.

Fix:
Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked
as '1' in step 2) above

Updates bz#1593224
Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865
Signed-off-by: Pranith Kumar K

ec-heal: check file's gfid when deleting stale name

2019-06-20T21:24:17+00:00

A name-less lookup does not contain parent's stat,
It is hard to check the lookuped file is at the right path.

This patch changes to a name lookup, and check file's gfid with
expected gfid. If the gfid is different, mark it estale.

fixes: bz#1702131
Change-Id: I2de20b10d680eed1e2fb1d3830b3b3dec4520dbf
Signed-off-by: Kinglong Mee

multiple files: another attempt to remove includes

2019-06-14T16:50:32+00:00

There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )

Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.

Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul

ec/fini: Fix race between xlator cleanup and on going async fop

2019-06-08T12:20:10+00:00

Problem:
While we process a cleanup, there is a chance for a race between
async operations, for example ec_launch_replace_heal. So this can
lead to invalid mem access.

Solution:
Just like we track on going heal fops, we can also track fops like
ec_launch_replace_heal, so that we can decide when to send a
PARENT_DOWN request.

Change-Id: I055391c5c6c34d58aef7336847f3b570cb831298
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC

cluster/ec: honor contention notifications for partially acquired locks

2019-05-25T04:52:34+00:00

EC was ignoring lock contention notifications received while a lock was
being acquired. When a lock is partially acquired (some bricks have
granted the lock but some others not yet) we can receive notifications
from acquired bricks, which should be honored, since we may not receive
more notifications after that.

Since EC was ignoring them, once the lock was acquired, it was not
released until the eager-lock timeout, causing unnecessary delays on
other clients.

This fix takes into consideration the notifications received before
having completed the full lock acquisition. After that, the lock will
be releaed as soon as possible.

Fixes: bz#1708156
Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
Signed-off-by: Xavi Hernandez

ec/fini: Fix race with ec_fini and ec_notify

2019-05-21T12:57:39+00:00

During a graph cleanup, we first sent a PARENT_DOWN and wait for
a child down to ultimately free the xlator and the graph.

In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event.
But a racing event like CHILD_UP or event xl_op may trigger healing threads
after threads cleanup.

So there is a chance that the threads might access a freed private variabe

Change-Id: I252d10181bb67b95900c903d479de707a8489532
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC

ec/shd: Cleanup self heal daemon resources during ec fini

2019-05-13T06:18:14+00:00

We were not properly cleaning self-heal daemon resources
during ec fini. With shd multiplexing, it is absolutely
necessary to cleanup all the resources during ec fini.

Change-Id: Iae4f1bce7d8c2e1da51ac568700a51088f3cc7f2
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC

cluster/ec: fix shd healer wait timeout

2019-05-06T10:56:13+00:00

Right now, the timeout is written by hard code,
fix it by using heal-timeout.

fixes: bz#1703020
Change-Id: I0d154e7807f9dba7efc3896805559bbfaa7af2ad
Signed-off-by: Kinglong Mee

cluster/ec: Reopen shouldn't happen with O_TRUNC

2019-05-05T16:16:47+00:00

Problem:
Doing re-open with O_TRUNC will truncate the fragment even when it is not
needed needing extra heals

Fix:
At the time of re-open don't use O_TRUNC.

fixes bz#1706603
Change-Id: Idc6408968efaad897b95a5a52481c66e843d3fb8
Signed-off-by: Pranith Kumar K

cluster/ec: fix fd reopen

2019-04-23T11:28:58+00:00

Currently EC tries to reopen fd's that have been opened while a brick
was down. This is done as part of regular write operations, just after
having acquired the locks, and it's sent as a sub-fop of the main write
fop.

There were two problems:

1. The reopen was attempted on all UP bricks, even if a previous lock
didn't succeed. This is incorrect because most probably the open will
fail.

2. If reopen is sent and fails, the error is propagated to the main
operation, causing it to fail when it shouldn't.

To fix this, we only attempt reopens on bricks where the current fop
owns a lock, and we prevent any error to be propagated to the main
fop.

To implement this behaviour an argument used to indicate the minimum
number of required answers has overloaded to also include some flags. To
make the change consistent, it has been necessary to rename the
argument, which means that a lot of files have been changed. However
there are no functional changes.

This change has also uncovered a problem in discard code, which didn't
correctely process requests of small sizes because no real discard fop
was being processed, only a write of 0's on some region. In this case
some fields of the fop remained uninitialized or with incorrect values.
To fix this, a new function has been created to simulate success on a
fop and it's used in the discard case.

Thanks to Pranith for providing a test script that has also detected an
issue in this patch. This patch includes a small modification of this
script to force data to be written into bricks before stopping them.

Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec
Fixes: bz#1699866
Signed-off-by: Xavi Hernandez