glusterfs.git/tests/basic/ec, branch v9dev

Posix: Optimize posix code to improve file creation

2020-04-06T10:43:26+00:00

Problem: Before executing a fop in POSIX xlator it builds an internal
         path based on GFID.To validate the path it call's (l)stat
         system call and while .glusterfs is heavily loaded kernel takes
         time to lookup inode and due to that performance drops

Solution: In this patch we followed two ways to improve the performance.
          1) Keep open fd specific to first level directory(gfid[0])
             in .glusterfs, it would force to kernel keep the inodes
             from all those files in cache. In case of memory pressure
             kernel won't uncache first level inodes. We need to open
             256 fd's per brick to access the entry faster.
          2) Use at based call's to access relative path to reduce
             path based lookup time.

Note: To verify the patch we have executed kernel untar 100 times on 6
      different clients after enabling metadata group-cache and some
      other option.We were getting more than 20 percent improvement in
      kenel untar after applying the patch.

Credits: Xavi Hernandez 
Change-Id: I1643e6b01ed669b2bb148d02f4e6a8e08da45343
updates: #891
Signed-off-by: Mohit Agrawal

cluster/ec: Add test for reset-brick command

2020-04-06T10:39:36+00:00

Following tests are done -

1 - After finishing reset-brick all the bricks should be up.
2 - Heal should be completed.
3 - Check number of entries present on brick which was reset.

Change-Id: I9314bed180293a99d400d94bb8cc7ece999da29e
Updates: #1144

cluster/afr: Add afr_seek to fops table

2019-10-14T11:26:56+00:00

fixes: bz#1760189
Change-Id: Iffbf8d6f4c50b8e2de8364658697bdbe96549f5d
Signed-off-by: Pranith Kumar K

cluster/ec: Implement read-mask feature

2019-09-27T03:32:27+00:00

fixes: #725
Change-Id: Iaaefe6f49c8193c476b987b92df6bab3e2f62601
Signed-off-by: Pranith Kumar K

read-ahead/io-cache: turn off by default

2019-09-26T09:33:38+00:00

We've found perf xlators io-cache and read-ahead not adding any
performance improvement. At best read-ahead is redundant due to kernel
read-ahead and at worst io-cache is degrading the performance for
workloads that doesn't involve re-read. Given that VFS already have
both these functionalities, this patch makes these two
translators turned off by default for native fuse mounts.

For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
these xlators on by having custom profiles.

Change-Id: Ie7535788909d4c741844473696f001274dc0bb60
Signed-off-by: Raghavendra Gowdappa 
fixes: bz#1676479

cluster/ec: quorum-count implementation

2019-09-08T03:39:52+00:00

fixes: #721
Change-Id: I5333540e3c635ccf441cf1f4696e4c8986e38ea8
Signed-off-by: Pranith Kumar K

cluster/ec: Fail fsync/flush for files on update size/version failure

2019-09-06T07:27:21+00:00

Problem:
If update size/version is not successful on the file, updates on the
same stripe could lead to data corruptions if the earlier un-aligned
write is not successful on all the bricks. Application won't have
any knowledge of this because update size/version happens in the
background.

Fix:
Fail fsync/flush on fds that are opened before update-size-version
went bad.

fixes: bz#1748836
Change-Id: I9d323eddcda703bd27d55f340c4079d76e06e492
Signed-off-by: Pranith Kumar K

cluster/ec: fix EIO error for concurrent writes on sparse files

2019-07-24T10:20:08+00:00

EC doesn't allow concurrent writes on overlapping areas, they are
serialized. However non-overlapping writes are serviced in parallel.
When a write is not aligned, EC first needs to read the entire chunk
from disk, apply the modified fragment and write it again.

The problem appears on sparse files because a write to an offset
implicitly creates data on offsets below it (so, in some way, they
are overlapping). For example, if a file is empty and we read 10 bytes
from offset 10, read() will return 0 bytes. Now, if we write one byte
at offset 1M and retry the same read, the system call will return 10
bytes (all containing 0's).

So if we have two writes, the first one at offset 10 and the second one
at offset 1M, EC will send both in parallel because they do not overlap.
However, the first one will try to read missing data from the first chunk
(i.e. offsets 0 to 9) to recombine the entire chunk and do the final write.
This read will happen in parallel with the write to 1M. What could happen
is that half of the bricks process the write before the read, and the
half do the read before the write. Some bricks will return 10 bytes of
data while the otherw will return 0 bytes (because the file on the brick
has not been expanded yet).

When EC tries to recombine the answers from the bricks, it can't, because
it needs more than half consistent answers to recover the data. So this
read fails with EIO error. This error is propagated to the parent write,
which is aborted and EIO is returned to the application.

The issue happened because EC assumed that a write to a given offset
implies that offsets below it exist.

This fix prevents the read of the chunk from bricks if the current size
of the file is smaller than the read chunk offset. This size is
correctly tracked, so this fixes the issue.

Also modifying ec-stripe.t file for Test #13 within it.
In this patch, if a file size is less than the offset we are writing, we
fill zeros in head and tail and do not consider it strip cache miss.
That actually make sense as we know what data that part holds and there is
no need of reading it from bricks.

Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6
Fixes: bz#1730715
Signed-off-by: Xavi Hernandez

cluster/ec: Prevent double pre-op xattrops

2019-06-22T05:06:15+00:00

Problem:
Race:
Thread-1                                    Thread-2
1) Does ec_get_size_version() to perform
pre-op fxattrop as part of write-1
                                           2) Calls ec_set_dirty_flag() in
                                              ec_get_size_version() for write-2.
					      This sets dirty[] to 1
3) Completes executing
ec_prepare_update_cbk leading to
ctx->dirty[] = '1'
					   4) Takes LOCK(inode->lock) to check if there are
					      any flags and sets dirty-flag because
				              lock->waiting_flag is 0 now. This leads to
					      fxattrop to increment on-disk dirty[] to '2'

At the end of the writes the file will be marked for heal even when it doesn't need heal.

Fix:
Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked
as '1' in step 2) above

Updates bz#1593224
Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865
Signed-off-by: Pranith Kumar K

core: improve timer accuracy

2019-06-17T10:29:50+00:00

Also fixed some issues on test ec-1468261.t.

Change-Id: If156f86af986d9eed13cdd1f15c5a7214cd11706
Updates: bz#1193929
Signed-off-by: Xavier Hernandez