glusterfs.git/xlators/cluster/ec/src, branch v3.11.2

cluster/ec: Non-disruptive upgrade on EC volume fails

2017-07-19T11:26:21+00:00

Problem:
Enabling optimistic changelog on EC volume was not
handling node down scenarios appropriately resulting
in volume data inaccessibility.

Solution:
Update dirty xattr appropriately on good bricks whenever
nodes are down. This would fix the metadata information
as part of heal and thus ensures data accessibility.

>BUG: 1468261
>Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
>Signed-off-by: Sunil Kumar Acharya 
>Reviewed-on: https://review.gluster.org/17703
>Smoke: Gluster Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Pranith Kumar Karampuri 

BUG: 1470938
Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17773
Smoke: Gluster Build System 
Reviewed-by: Ashish Pandey 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System

cluster/ec : Don't try to heal when no sink is UP

2017-07-10T13:58:33+00:00

Problem:
4 + 2 EC volume configuration.
If untar of linux is going on and we kill a brick,
indices will be created for the files/dir which need
to be healed. ec_shd_index_sweep spawns threads to
scan these entries and start heal. If in the middle
of this we kill one more brick, we end up in a
situation where we can not heal an entry as there
are only "ec->fragment" number of bricks are UP.
However, the scan will be continued and it will
trigger the heal for those entries.

Solution:
When a heal is triggered for an entry, check if it
*CAN* be healed or not. If not come out with ENOTCONN.

>Change-Id: I305be7701c289f36bd7bde22491b71074771424f
>BUG: 1464359
>Signed-off-by: Ashish Pandey 
>Reviewed-on: https://review.gluster.org/17692
>Smoke: Gluster Build System 
>CentOS-regression: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Pranith Kumar Karampuri 
>Reviewed-by: Sunil Kumar Acharya 
>Reviewed-by: Xavier Hernandez 
>Signed-off-by: Ashish Pandey 

Change-Id: I305be7701c289f36bd7bde22491b71074771424f
BUG: 1468457
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/17724
Smoke: Gluster Build System 
Reviewed-by: Xavier Hernandez 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: correctly handle end of file for seek

2017-07-06T14:03:32+00:00

When a SEEK_HOLE was issued near to the end of file, sometimes an
offset beyond the end of file was returned. Another problem was that
using some offsets greater than the end of file returned successfully
instead of failing with ENXIO.

 >Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5
 >BUG: 1449348
 >Signed-off-by: Xavier Hernandez 
 >Reviewed-on: https://review.gluster.org/17228
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Amar Tumballi 
 >Reviewed-by: Pranith Kumar Karampuri 
 >(cherry picked from commit eb96dd45f8e583c6bad84bf32ca17e2bb01dd38f)

Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5
BUG: 1468118
Signed-off-by: Xavier Hernandez 
Reviewed-on: https://review.gluster.org/17710
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

ec: Increase notification in all the cases

2017-07-03T12:43:49+00:00

Problem:
"gluster v heal  info" is taking
long time to respond when a brick is down.

RCA:
Heal info command does virtual mount.
EC wait for 10 seconds, before sending UP call to upper xlator,
to get notification (DOWN or UP) from all the bricks.

Currently, we are increasing ec->xl_notify_count based on
the current status of the brick. So, if a DOWN event notification
has come and brick is already down, we are not increasing
ec->xl_notify_count in ec_handle_down.

Solution:
Handle DOWN even as notification irrespective of what
is the current status of brick.

>Change-Id: I0acac0db7ec7622d4c0584692e88ad52f45a910f
>BUG: 1464091
>Signed-off-by: Ashish Pandey 
>Reviewed-on: https://review.gluster.org/17606
>Tested-by: Pranith Kumar Karampuri 
>Reviewed-by: Pranith Kumar Karampuri 
>Smoke: Gluster Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Xavier Hernandez 
>NetBSD-regression: NetBSD Build System 
>Signed-off-by: Ashish Pandey 

Change-Id: I0acac0db7ec7622d4c0584692e88ad52f45a910f
BUG: 1465854
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/17642
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System

cluster/ec: Node uuid xattr support update for EC

2017-06-26T10:32:31+00:00

        Backport of https://review.gluster.org/17594

Problem:
The change in EC to return list of node uuids for
GF_XATTR_NODE_UUID_KEY was causing problems with
geo-rep.

Fix:
This patch will allow to get the single node uuid
as it was doing before with the key
"GF_XATTR_NODE_UUID_KEY", and will also allow to get
the list of node uuids by using a new key
"GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve
the problem with geo-rep and any other features which
were depending on this.

BUG: 1463250
Change-Id: I2d9214a9658d4a41a3d6de08600884d2bda5f3eb
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17615
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: lk shouldn't be a transaction

2017-06-19T15:49:50+00:00

Problem:
When application sends a blocking lock, the lk fop actually waits under
inodelk.  This can lead to a dead-lock.
1) Let's say app-1 takes exculsive-fcntl-lock on the file
2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking
   stage note: app-2 is blocked inside transaction which holds an inode-lock
3) app-1 tries to perform write which needs inode-lock so it gets blocked on
   app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock

Fix:
Correct way to fix this issue and make fcntl locks perform well would be to
introduce
2-phase locking for fcntl lock:
1) Implement a try-lock phase where locks xlator will not merge lk call with
   existing calls until a commit-lock phase.
2) If in try-lock phase we get quorum number of success without any EAGAIN
   error, then send a commit-lock which will merge locks.
3) In case there are any errors, unlock should just delete the lock-object
   which was tried earlier and shouldn't touch the committed locks.

Unfortunately this is a sizeable feature and need to be thought through for any
corner cases.  Until then remove transaction from lk call.

 >BUG: 1455049
 >Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: https://review.gluster.org/17542
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Ashish Pandey 
 >Reviewed-by: Xavier Hernandez 

BUG: 1462121
Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/17556
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

core: fix spelling errors

2017-06-13T14:34:34+00:00

fixes for various minor spelling errors and typos

master BUG: 1457808
master: https://review.gluster.org/17442

Reported-by: Patrick Matthäi 
Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5
BUG: 1459090
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: https://review.gluster.org/17475
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

cluster/ec: Update xattr and heal size properly

2017-06-07T13:29:39+00:00

Problem-1 : Recursive healing of same file is happening
when IO is going on even after data heal completes.

Solution:
RCA: At the end of the write, when ec_update_size_version
gets called, we send it only on good bricks and not
on healing brick. Due to this, xattr on healing brick
will always remain out of sync and when the background
heal check source and sink, it finds this brick to be
healed and start healing from scratch. That involve
ftruncate and writing all of the data again.

To solve this, send xattrop on all the good bricks as
well as healing bricks.

Problem-2: The above fix exposes the data corruption
during heal. If the write on a file is going on and
heal finishes, we find that the file gets corrupted.

RCA:
The real problem happens in ec_rebuild_data(). Here we receive the
'size' argument which contains the real file size at the time of
starting self-heal and it's assigned to heal->total_size.

After that, a sequence of calls to ec_sync_heal_block() are done. Each
call ends up calling ec_manager_heal_block(), which does the actual work
of healing a block.

First a lock on the inode is taken in state EC_STATE_INIT using
ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is
called. This function calls ec_set_inode_size() to store the real size
of the inode (it uses heal->total_size).

The next step is to read the block to be healed. This is done using a
regular ec_readv(). One of the things this call does is to trim the
returned size if the file is smaller than the requested size.

In our case, when we read the last block of a file whose size was = 512
mod 1024 at the time of starting self-heal, ec_readv() will return only
the first 512 bytes, not the whole 1024 bytes.

This isn't a problem since the following ec_writev() sent from the heal
code only attempts to write the amount of data read, so it shouldn't
modify the remaining 512 bytes.

However ec_writev() also checks the file size. If we are writing the
last block of the file (determined by the size stored on the inode that
we have set to heal->total_size), any data beyond the (imposed) end of
file will be cleared with 0's. This causes the 512 bytes after the
heal->total_size to be cleared. Since the file was written after heal
started, the these bytes contained data, so the block written to the
damaged brick will be incorrect.

Solution:
Align heal->total_size to a multiple of the stripe size.

Thanks "Xavier Hernandez" 
to find out the root cause and to fix the issue.

>Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
>BUG: 1428673
>Signed-off-by: Ashish Pandey 
>Reviewed-on: https://review.gluster.org/16985
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Pranith Kumar Karampuri 
>Reviewed-by: Xavier Hernandez 
>Signed-off-by: Ashish Pandey 

Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
BUG: 1459392
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/17482
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: Implement FALLOCATE FOP for EC

2017-05-25T13:12:58+00:00

FALLOCATE file operations is not implemented in the
existing EC code. This change set implements it
for EC.

>BUG: 1448293
>Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
>Signed-off-by: Sunil Kumar Acharya 
>Reviewed-on: https://review.gluster.org/15200
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Pranith Kumar Karampuri 
>CentOS-regression: Gluster Build System 

BUG: 1454686
Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17369
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Ashish Pandey 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: return all node uuids from all subvolumes

2017-05-22T22:41:18+00:00

EC was retuning the UUID of the brick with smaller value. This had
the side effect of not evenly balancing the load between bricks on
rebalance operations.

This patch modifies the common functions that combine multiple subvolume
values into a single result to take into account the subvolume order
and, optionally, other subvolumes that could be damaged.

This makes easier to add future features where brick order is important.
It also makes possible to easily identify the originating brick of each
answer, in case some brick will have an special meaning in the future.

 >Change-Id: Iee0a4da710b41224a6dc8e13fa8dcddb36c73a2f
 >BUG: 1366817
 >Signed-off-by: Xavier Hernandez 
 >Reviewed-on: https://review.gluster.org/17297
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Ashish Pandey 
 >Reviewed-by: Pranith Kumar Karampuri 
 >(cherry picked from commit bcc34ce05c1be76dae42838d55c15d3af5f80e48)

Change-Id: I055713c3c25b7ba99248be880414fb0e8f36a67e
BUG: 1451573
Signed-off-by: Pranith Kumar Karampuri 
Reviewed-on: https://review.gluster.org/17318
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System