glusterfs.git/xlators/cluster/ec/src, branch v4.0.1

cluster/ec: Change default read policy to gfid-hash

2018-03-19T08:52:04+00:00

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey 

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1557906
Signed-off-by: Ashish Pandey

cluster/ec: avoid delays in self-heal

2018-03-15T07:20:10+00:00

Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Backport of:
> BUG: 1547662

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1555198
Signed-off-by: Xavi Hernandez

cluster/ec: Do lock conflict check correctly for wait-list

2018-02-02T15:04:04+00:00

Problem:
ec_link_has_lock_conflict() is traversing over only owner_list
but the function is also getting called with wait_list.

Fix:
Modify ec_link_has_lock_conflict() to traverse lists correctly.
Updated the callers to reflect the changes.

BUG: 1540882
Change-Id: Ibd7ea10f4498e7c2761f9a6faac6d5cb7d750c91
Signed-off-by: Pranith Kumar K

cluster/ec : EC options for GD2

2018-01-22T08:53:36+00:00

Updates #302

Change-Id: I31b4648f7b1a394fceece5cba8120c579c66edd9
Signed-off-by: Sunil Kumar Acharya

locks: added inodelk/entrylk contention upcall notifications

2018-01-16T10:37:22+00:00

The locks xlator now is able to send a contention notification to
the current owner of the lock.

This is only a notification that can be used to improve performance
of some client side operations that might benefit from extended
duration of lock ownership. Nothing is done if the lock owner decides
to ignore the message and to not release the lock. For forced
release of acquired resources, leases must be used.

Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0
Signed-off-by: Xavier Hernandez

cluster/ec: OpenFD heal implementation for EC

2018-01-05T06:55:44+00:00

Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.

Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.

BUG: 1431955
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya

posix: Introduce flags for validity of iatt members

2017-12-29T17:13:25+00:00

v1 of the patch started off as adding new fields to iatt that can be
filled up using statx but the discussions were more around introducing
masks to check the validity of different fields from a RIO perspective.
To that extent, I have dropped the statx call in this version and
introduced a 64 bit mask for existing fields. The masks I have defined
are similar with the statx() flags' masks.

I have *not* changed iatt_to_stat() to use the macros IATT_TYPE_VALID,
IATT_GFID_VALID etc before blindly copying from struct iatt to struct.

Also fixed warnings in xlators because of atime/mtime/ctime seconds
field change from uint32_t to int64_t.

Change-Id: I4ac614f1e8d5c8246fc99d5bc2d2a23e7941512b
Signed-off-by: Ravishankar N

cluster/ec: Fix possible shift overflow

2017-12-22T16:19:53+00:00

A coverity scan has revelaed a potential shift overflow while scanning
the bitmap of available subvolumes. The actual overflow cannot happen,
but I've changed to test used to control the limit to make it explicit.

Change-Id: Ieb55f010bbca68a1d86a93e47822f7c709a26e83
BUG: 789278
Signed-off-by: Xavier Hernandez

cluster/ec: Change [f]getxattr to parallel-dispatch-one

2017-12-22T09:25:29+00:00

At the moment in EC, [f]getxattr operations wait to acquire a lock
while other operations are in progress even when it is in the same mount with a
lock on the file/directory. This happens because [f]getxattr operations
follow the model where the operation is wound on 'k' of the bricks and are
matched to make sure the data returned is same on all of them. This consistency
check requires that no other operations are on-going while [f]getxattr
operations are wound to the bricks. We can perform [f]getxattr in
another way as well, where we find the good_mask from the lock that is already
granted and wind the operation on any one of the good bricks and unwind the
answer after adjusting size/blocks to the parent xlator. Since we are taking
into account good_mask, the reply we get will either be before or after a
possible on-going operation. Using this method, the operation doesn't need to
depend on completion of on-going operations which could be taking long time (In
case of some slow disks and writes are in progress etc). Thus we reduce the
time to serve [f]getxattr requests.

I changed [f]getxattr to dispatch-one and added extra logic in
ec_link_has_lock_conflict() to not have any conflicts for fops with
EC_MINIMUM_ONE as fop->minimum to achieve the effect described above.
Modified scripts to make sure READ fop is received in EC to trigger heals.

Updates gluster/glusterfs#368
Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296
Signed-off-by: Pranith Kumar K

cluster/ec: Add default value for the redundancy option

2017-12-21T07:04:32+00:00

Updates #302

Change-Id: Ifccc6a64c2b2a3b3a0133e6c8042cdf61a0838ce
Signed-off-by: Kamal Mohanan