glusterfs.git/xlators/cluster/ec/src/ec-inode-write.c, branch v4.1.6

cluster/ec: Fix bugs in stripe-cache feature

2017-12-05T15:46:53+00:00

1 - This patch fixes a bug in ec_update_stripe()
that prevented some stripes to be updated after a write.

2 - This patch also include code modification for the
case in which a file does not exist and we write on
unaligned offset and user size, the last stripe on
which "end" will fall should also be cached.

Change-Id: I069cb4be1c8d59c206e3b35a6991e1fbdbc9b474
BUG: 1520758
Signed-off-by: Ashish Pandey

cluster/ec: EC DISCARD doesn't punch hole properly

2017-11-28T09:34:33+00:00

Problem:
DISCARD operation on EC volume was punching hole of lesser
size than the specified size in some cases.

Solution:
EC was not handling punch hole for tail part in some cases.
Updated the code to handle it appropriately.

BUG: 1516206
Change-Id: If3e69e417c3e5034afee04e78f5f78855e65f932
Signed-off-by: Sunil Kumar Acharya

cluster/ec: Keep last written strip in in-memory cache

2017-11-10T22:15:37+00:00

Problem:
Consider an EC volume with configuration  4 + 2.
The stripe size for this would be 512 * 4 = 2048.
That means, 2048 bytes of user data stored in one
stripe. Let's say 2048 + 512 = 2560 bytes are
already written on this volume. 512 Bytes would
be in second stripe. Now, if there are sequential
writes with offset 2560 and of size 1 Byte, we have
to read the whole stripe, encode it with 1 Byte and
then again have to write it back. Next, write with
offset 2561 and size of 1 Byte will again
READ-MODIFY-WRITE the whole stripe. This is causing
bad performance because of lots of READ request
travelling over the network.

There are some tools and scenario's where such kind
of load is coming and users are not aware of that.
Example: fio and zip

Solution:
One possible solution to deal with this issue is to
keep last stripe in memory. This way, we need not to
read it again and we can save READ fop going over the
network. Considering the above example, we have to
keep last 2048 bytes (maximum) in memory per file.

Change-Id: I3f95e6fc3ff81953646d374c445a40c6886b0b85
BUG: 1471753
Signed-off-by: Ashish Pandey

cluster/ec: Implement DISCARD FOP for EC

2017-10-25T11:52:41+00:00

Updates #254

This code change implements DISCARD FOP support for
EC.

BUG: 1461018
Change-Id: I09a9cb2aa9d91ec27add4f422dc9074af5b8b2db
Signed-off-by: Sunil Kumar Acharya

cluster/ec: Allow parallel writes in EC if possible

2017-10-24T09:30:25+00:00

Problem:
Ec at the moment sends one modification fop after another, so if some of
the disks become slow, for a while then the wait time for the writes that
are waiting in the queue becomes really bad.

Fix:
Allow parallel writes when possible. For this we need to make 3 changes.
1) Each fop now has range parameters they will be updating.
2) Xattrop is changed to handle parallel xattrop requests where some
   would be modifying just dirty xattr.
3) Fops that refer to size now take locks and update the locks.

Fixes #251
Change-Id: Ibc3c15372f91bbd6fb617f0d99399b3149fa64b2
Signed-off-by: Pranith Kumar K

cluster/ec: add functions for stripe alignment

2017-10-13T08:17:27+00:00

This patch removes old functions to align offsets and sizes
to stripe size boundaries and adds new ones to offer more
possibilities.

The new functions are:

 * ec_adjust_offset_down()
     Aligns a given offset to a multiple of the stripe size
     equal or smaller than the initial one. It returns the
     size of the gap between the aligned offset and the given
     one.

 * ec_adjust_offset_up()
     Aligns a given offset to a multiple of the stripe size
     equal or greater than the initial one. It returns the
     size of the skipped region between the given offset and
     the aligned one. If an overflow happens, the returned
     valid has negative sign (but correct value) and the
     offset is set to the maximum value (not aligned).

 * ec_adjust_size_down()
     Aligns the given size to a multiple of the stripe size
     equal or smaller than the initial one. It returns the
     size of the missed region between the aligned size and
     the given one.

 * ec_adjust_size_up()
     Aligns the given size to a multiple of the stripe size
     equal or greater than the initial one. It returns the
     size of the gap between the given size and the aligned
     one. If an overflow happens, the returned value has
     negative sign (but correct value) and the size is set
     to the maximum value (not aligned).

These functions have been defined in ec-helpers.h as static
inline since they are very small and compilers can optimize
them (specially the 'scale' argument).

Change-Id: I4c91009ad02f76c73772034dfde27ee1c78a80d7
Signed-off-by: Xavier Hernandez

ec/cluster: Update failure of fop on a brick properly

2017-07-31T11:55:27+00:00

Problem:
In case of truncate, if writev or open fails on a brick,
in some cases it does not mark the failure onlock->good_mask.
This causes the update of size and version on all the bricks
even if it has failed on one of the brick. That ultimately
causes a data corruption.

Solution:
In callback of such writev and open calls, mark fop->good
for parent too.

Thanks Pranith Kumar K  for finding the
root cause.

Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466
BUG: 1476205
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/17906
Reviewed-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

cluster/ec: Implement FALLOCATE FOP for EC

2017-05-23T07:13:06+00:00

FALLOCATE file operations is not implemented in the
existing EC code. This change set implements it
for EC.

BUG: 1448293
Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/15200
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

cluster/ec: Check xdata to avoid memory leak

2016-12-02T16:24:32+00:00

Problem: ec_writev_start calls ec_make_internal_fop_xdata
to set "yes" in xdata before ec_readv (an internal fop)
is called for head and tail. Second call to this function
is overwriting the previous allocated dict_t to "xdata",
which results in memory leak.

Solution: In ec_make_internal_fop_xdata, check if *xdata
is NULL or not to avoid overwriting *xdata.

Change-Id: I49b83923e11aff9b92d002e86424c0c2e1f5f74f
BUG: 1400818
Signed-off-by: Ashish Pandey 
Reviewed-on: http://review.gluster.org/16007
Reviewed-by: Xavier Hernandez 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: Add support for hardware acceleration

2016-09-08T17:08:25+00:00

This patch implements functionalities for fast encoding/decoding
using hardware support. Currently optimized x86_64, SSE and AVX is
added.

Additionally this patch implements a caching mecanism for inverse
matrices to reduce computation time, as well as a new method for
computing the inverse that takes quadratic time instead of cubic.

Finally some unnecessary memory copies have been eliminated to
further increase performance.

Change-Id: I26c75f26fb4201bd22b51335448ea4357235065a
BUG: 1289922
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/12837
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri