glusterfs.git/xlators/cluster/ec/src/ec-helpers.c, branch v4.0.1

cluster/ec: OpenFD heal implementation for EC

2018-01-05T06:55:44+00:00

Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.

Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.

BUG: 1431955
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya

cluster/ec: Keep last written strip in in-memory cache

2017-11-10T22:15:37+00:00

Problem:
Consider an EC volume with configuration  4 + 2.
The stripe size for this would be 512 * 4 = 2048.
That means, 2048 bytes of user data stored in one
stripe. Let's say 2048 + 512 = 2560 bytes are
already written on this volume. 512 Bytes would
be in second stripe. Now, if there are sequential
writes with offset 2560 and of size 1 Byte, we have
to read the whole stripe, encode it with 1 Byte and
then again have to write it back. Next, write with
offset 2561 and size of 1 Byte will again
READ-MODIFY-WRITE the whole stripe. This is causing
bad performance because of lots of READ request
travelling over the network.

There are some tools and scenario's where such kind
of load is coming and users are not aware of that.
Example: fio and zip

Solution:
One possible solution to deal with this issue is to
keep last stripe in memory. This way, we need not to
read it again and we can save READ fop going over the
network. Considering the above example, we have to
keep last 2048 bytes (maximum) in memory per file.

Change-Id: I3f95e6fc3ff81953646d374c445a40c6886b0b85
BUG: 1471753
Signed-off-by: Ashish Pandey

cluster/ec: add functions for stripe alignment

2017-10-13T08:17:27+00:00

This patch removes old functions to align offsets and sizes
to stripe size boundaries and adds new ones to offer more
possibilities.

The new functions are:

 * ec_adjust_offset_down()
     Aligns a given offset to a multiple of the stripe size
     equal or smaller than the initial one. It returns the
     size of the gap between the aligned offset and the given
     one.

 * ec_adjust_offset_up()
     Aligns a given offset to a multiple of the stripe size
     equal or greater than the initial one. It returns the
     size of the skipped region between the given offset and
     the aligned one. If an overflow happens, the returned
     valid has negative sign (but correct value) and the
     offset is set to the maximum value (not aligned).

 * ec_adjust_size_down()
     Aligns the given size to a multiple of the stripe size
     equal or smaller than the initial one. It returns the
     size of the missed region between the aligned size and
     the given one.

 * ec_adjust_size_up()
     Aligns the given size to a multiple of the stripe size
     equal or greater than the initial one. It returns the
     size of the gap between the given size and the aligned
     one. If an overflow happens, the returned value has
     negative sign (but correct value) and the size is set
     to the maximum value (not aligned).

These functions have been defined in ec-helpers.h as static
inline since they are very small and compilers can optimize
them (specially the 'scale' argument).

Change-Id: I4c91009ad02f76c73772034dfde27ee1c78a80d7
Signed-off-by: Xavier Hernandez

cluster/ec: FORWARD_NULL coverity fix

2017-09-29T15:09:36+00:00

Problem: ctx pointer could be NULL

Solution: Updated the code to verify ctx pointer

BUG: 789278
Change-Id: I25e07a07c6ebe2f630c99ba3aa9a61656fbaa981
Signed-off-by: Akarsha Rai

core: fix spelling errors

2017-06-02T11:50:43+00:00

fixes for various minor spelling errors and typos

Reported-by: Patrick Matthäi 
Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5
BUG: 1457808
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: https://review.gluster.org/17442
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos 
Smoke: Gluster Build System

build: clang has __builtin_popcount() and __builtin_ffs()

2017-04-05T06:12:53+00:00

Note: Even though gcc(1) will automatically treat ffs() and popcount()
as built-in, calling them explicitly as __builtin in the source helps
make it easier to find them. (And if no __builtin_ffs use the one in
libc.)

Change-Id: Ib74d9b221ff03a01df5ad05907024da1a83a7a88
BUG: 1438772
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: https://review.gluster.org/16993
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Xavier Hernandez

cluster/ec: Don't trigger data/metadata heal on Lookups

2017-02-27T03:06:55+00:00

Problem-1
If Lookup which doesn't take any locks observes version mismatch it can't be
trusted. If we launch a heal based on this information it will lead to
self-heals which will affect I/O performance in the cases where Lookup is
wrong. Considering self-heal-daemon and operations on the inode from client
which take locks can still trigger heal we can choose to not attempt a heal on
Lookup.

Problem-2:
Fixed spurious failure of
tests/bitrot/bug-1373520.t
For the issues above, what was happening was that ec_heal_inspect()
is preventing 'name' heal to happen

Problem-3:
tests/basic/ec/ec-background-heals.t
To be honest I don't know what the problem was, while fixing
the 2 problems above, I made some changes to ec_heal_inspect() and
ec_need_heal() after which when I tried to recreate the spurious
failure it just didn't happen even after a long time.

BUG: 1414287
Signed-off-by: Pranith Kumar K 
Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834
Reviewed-on: https://review.gluster.org/16468
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System 
Reviewed-by: Ashish Pandey

cluster/ec: Fix lk-owner set race in ec_unlock

2016-12-13T15:24:51+00:00

Problem:
Rename does two locks. There is a case where when it tries to unlock it sends
xattrop of the directory with new version, callback of these two xattrops can
be picked up by two separate epoll threads. Both of them will try to set the
lk-owner for unlock in parallel on the same frame so one of these unlocks will
fail because the lk-owner doesn't match.

Fix:
Specify the lk-owner which will be set on inodelk frame which will not be over
written by any other thread/operation.

BUG: 1402710
Change-Id: I666ffc931440dc5253d72df666efe0ef1d73f99a
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/16074
Reviewed-by: Xavier Hernandez 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: Add support for hardware acceleration

2016-09-08T17:08:25+00:00

This patch implements functionalities for fast encoding/decoding
using hardware support. Currently optimized x86_64, SSE and AVX is
added.

Additionally this patch implements a caching mecanism for inverse
matrices to reduce computation time, as well as a new method for
computing the inverse that takes quadratic time instead of cubic.

Finally some unnecessary memory copies have been eliminated to
further increase performance.

Change-Id: I26c75f26fb4201bd22b51335448ea4357235065a
BUG: 1289922
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/12837
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order

2016-08-22T09:38:36+00:00

When the bricks are brought offline and then online in cyclic
order while writes are in progress on a file, thanks to inode
refresh in write txns, AFR will mostly fail the write attempt
when the only good copy is offline. However, there is still a
remote possibility that the file will run into split-brain if
the brick that has the lone good copy goes offline *after* the
inode refresh but *before* the write txn completes (I call it
in-flight split-brain in the patch for ease of reference),
requiring intervention from admin to resolve the split-brain
before the IO can resume normally on the file. To get around this,
the patch does the following things:
i) retains the dirty xattrs on the file
ii) avoids marking the last of the good copies as bad (or accused)
    in case it is the one to go down during the course of a write.
iii) fails that particular write with the appropriate errno.

This way, we still have one good copy left despite the split-brain situation
which when it is back online, will be chosen as source to do the heal.

Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a
BUG: 1363721
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15080
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Oleksandr Natalenko 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri