glusterfs.git/xlators/cluster, branch v7.0

ctime/rebalance: Heal ctime xattr on directory during rebalance

2019-09-16T10:54:21+00:00

After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.

Note that ctime still doesn't support consistent time across
distribute sub-volume.

This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.

Backport of:

 > Patch: https://review.gluster.org/23127/
 > Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
 > BUG: 1734026
 > Signed-off-by: Kotresh HR 
(cherry picked from commit 304640e55c0f3c6d15f4e230dc6376e4f5020fea)

Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
Signed-off-by: Kotresh HR 
fixes: bz#1752429

afr/lookup: Pass xattr_req in while doing a selfheal in lookup

2019-09-11T05:04:47+00:00

We were not passing xattr_req when doing a name self heal
as well as a meta data heal. Because of this, some xdata
was missing which causes i/o errors

Backport of > https://review.gluster.org/#/c/glusterfs/+/23024/


>Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
>Fixes: bz#1728770
>Signed-off-by: Mohammed Rafi KC 

Fixes: bz#1749305
Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
Signed-off-by: Mohammed Rafi KC 
(cherry picked from commit d026f0bcfd301712e4f0671ccf238f43f2e6dd30)

[RFC] change get_real_filename implementation to use ENOATTR instead of ENOENT

2019-09-03T13:21:37+00:00

get_real_filename is implemented as a virtual extended attribute to help
Samba implement the case-insensitive but case preserving SMB protocol
more efficiently. It is implemented as a getxattr call on the parent directory
with the virtual key of "get_real_filename:" by looking for a
spelling with different case for the provided file/dir name ()
and returning this correct spelling as a result if the entry is found.
Originally (05aaec645a6262d431486eb5ac7cd702646cfcfb), the
implementation used the ENOENT errno to return the authoritative answer
that  does not exist in any case folding.

Now this implementation is actually a violation or misuse of the defined
API for the getxattr call which returns ENOENT for the case that the dir
that the call is made against does not exist and ENOATTR (or the synonym
ENODATA) for the case that the xattr does not exist.

This was not a problem until the gluster fuse-bridge was changed
to do map ENOENT to ESTALE in 59629f1da9dca670d5dcc6425f7f89b3e96b46bf,
after which we the getxattr call for get_real_filename returned an
ESTALE instead of ENOENT breaking the expectation in Samba.

It is an independent problem that ESTALE should not leak out to user
space but is intended to trigger retries between fuse and gluster.
But nevertheless, the semantics seem to be incorrect here and should
be changed.

This patch changes the implementation of the get_real_filename virtual
xattr to correctly return ENOATTR instead of ENOENT if the file/directory
being looked up is not found.

The Samba glusterfs_fuse vfs module which takes advantage of the
get_real_filename over a fuse mount will receive a corresponding change
to map ENOATTR to ENOENT. Without this change, it will still work
correctly, but the performance optimization for nonexisting files is
lost. On the other hand side, this change removes the distinction
between the old not-implemented case and the implemented case.
So Samba changed to treat ENOATTR like ENOENT will not work correctly
any more against old servers that don't implement get_real_filename.
I.e. existing files will be reported as non-existing

Change-Id: I971b427ab8410636d5d201157d9af70e0d075b67
fixes: bz#1745914
Signed-off-by: Michael Adam 
(cherry picked from commit dc1b87fcfef08c9497b0c02b2410c9d18bbc2dba)

afr: wake up index healer threads

2019-08-30T05:04:56+00:00

...whenever shd is re-enabled after disabling or there is a change in
`cluster.heal-timeout`, without needing to restart shd or waiting for the
current `cluster.heal-timeout` seconds to expire.

See BZ 1743988 for more details.

Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe
fixes: bz#1747301
Reported-by: Glen Kiessling 
Signed-off-by: Ravishankar N 
(cherry picked from commit 600ba94183333c4af9b4a09616690994fd528478)

cluster/ec: Update lock->good_mask on parent fop failure

2019-08-22T05:56:58+00:00

When discard/truncate performs write fop, it should do so
after updating lock->good_mask to make sure readv happens
on the correct mask

fixes: bz#1739424
Change-Id: Idfef0bbcca8860d53707094722e6ba3f81c583b7
Signed-off-by: Pranith Kumar K

cluster/ec: Fix reopen flags to avoid misbehavior

2019-08-22T05:56:58+00:00

Problem:
when a file needs to be re-opened O_APPEND and O_EXCL
flags are not filtered in EC.

- O_APPEND should be filtered because EC doesn't send O_APPEND below EC for
open to make sure writes happen on the individual fragments instead of at the
end of the file.

- O_EXCL should be filtered because shd could have created the file so even
when file exists open should succeed

- O_CREAT should be filtered because open happens with gfid as parameter. So
open fop will create just the gfid which will lead to problems.

Fix:
Filter out these two flags in reopen.

Change-Id: Ia280470fcb5188a09caa07bf665a2a94bce23bc4
Fixes: bz#1739426
Signed-off-by: Pranith Kumar K

cluster/ec: Always read from good-mask

2019-08-22T05:56:58+00:00

There are cases where fop->mask may have fop->healing added
and readv shouldn't be wound on fop->healing. To avoid this
always wind readv to lock->good_mask

updates: bz#1739424
Change-Id: I2226ef0229daf5ff315d51e868b980ee48060b87
Signed-off-by: Pranith Kumar K

cluster/ec: fix EIO error for concurrent writes on sparse files

2019-08-22T05:56:58+00:00

EC doesn't allow concurrent writes on overlapping areas, they are
serialized. However non-overlapping writes are serviced in parallel.
When a write is not aligned, EC first needs to read the entire chunk
from disk, apply the modified fragment and write it again.

The problem appears on sparse files because a write to an offset
implicitly creates data on offsets below it (so, in some way, they
are overlapping). For example, if a file is empty and we read 10 bytes
from offset 10, read() will return 0 bytes. Now, if we write one byte
at offset 1M and retry the same read, the system call will return 10
bytes (all containing 0's).

So if we have two writes, the first one at offset 10 and the second one
at offset 1M, EC will send both in parallel because they do not overlap.
However, the first one will try to read missing data from the first chunk
(i.e. offsets 0 to 9) to recombine the entire chunk and do the final write.
This read will happen in parallel with the write to 1M. What could happen
is that half of the bricks process the write before the read, and the
half do the read before the write. Some bricks will return 10 bytes of
data while the otherw will return 0 bytes (because the file on the brick
has not been expanded yet).

When EC tries to recombine the answers from the bricks, it can't, because
it needs more than half consistent answers to recover the data. So this
read fails with EIO error. This error is propagated to the parent write,
which is aborted and EIO is returned to the application.

The issue happened because EC assumed that a write to a given offset
implies that offsets below it exist.

This fix prevents the read of the chunk from bricks if the current size
of the file is smaller than the read chunk offset. This size is
correctly tracked, so this fixes the issue.

Also modifying ec-stripe.t file for Test #13 within it.
In this patch, if a file size is less than the offset we are writing, we
fill zeros in head and tail and do not consider it strip cache miss.
That actually make sense as we know what data that part holds and there is
no need of reading it from bricks.

Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6
Fixes: bz#1739427
Signed-off-by: Xavi Hernandez

cluster/ec: inherit healing from lock when it has info

2019-08-22T05:56:23+00:00

If lock has info, fop should inherit healing mask from it.
Otherwise, fop cannot inherit right healing when changed_flags is zero.

Change-Id: Ife80c9169d2c555024347a20300b0583f7e8a87f
updates: bz#1739424
Signed-off-by: Kinglong Mee

afr: restore timestamp of parent dir during entry-heal

2019-08-21T11:45:24+00:00

Fixes: bz#1741041
Change-Id: I29e338bac62104233a6f80212df8d0fb016affda
Signed-off-by: Ravishankar N 
(cherry picked from commit 8e9c53ebf16705b9a1db2fc486dc24a5cb244ddd)