glusterfs.git/tests/basic/ec, branch release-3.11

cluster/ec: Test script failing with brick multiplexing enabled

2017-07-20T11:51:55+00:00

Problem:
Killing the bricks(using kill signal) in test scripts will
result in test failures with brick multiplexing enabled.

Solution:
Updated the script to use kill_brick function to bring down
the bricks.

>BUG: 1472094
>Change-Id: Ibbf1fdc1be660ad3cd93e95af2838c0aae0181af
>Signed-off-by: Sunil Kumar Acharya 
>Reviewed-on: https://review.gluster.org/17809
>Smoke: Gluster Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Pranith Kumar Karampuri 

BUG: 1472794
Change-Id: Ibbf1fdc1be660ad3cd93e95af2838c0aae0181af
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17823
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

cluster/ec: Non-disruptive upgrade on EC volume fails

2017-07-19T11:26:21+00:00

Problem:
Enabling optimistic changelog on EC volume was not
handling node down scenarios appropriately resulting
in volume data inaccessibility.

Solution:
Update dirty xattr appropriately on good bricks whenever
nodes are down. This would fix the metadata information
as part of heal and thus ensures data accessibility.

>BUG: 1468261
>Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
>Signed-off-by: Sunil Kumar Acharya 
>Reviewed-on: https://review.gluster.org/17703
>Smoke: Gluster Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Pranith Kumar Karampuri 

BUG: 1470938
Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17773
Smoke: Gluster Build System 
Reviewed-by: Ashish Pandey 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System

cluster/ec: correctly handle end of file for seek

2017-07-06T14:03:32+00:00

When a SEEK_HOLE was issued near to the end of file, sometimes an
offset beyond the end of file was returned. Another problem was that
using some offsets greater than the end of file returned successfully
instead of failing with ENXIO.

 >Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5
 >BUG: 1449348
 >Signed-off-by: Xavier Hernandez 
 >Reviewed-on: https://review.gluster.org/17228
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Amar Tumballi 
 >Reviewed-by: Pranith Kumar Karampuri 
 >(cherry picked from commit eb96dd45f8e583c6bad84bf32ca17e2bb01dd38f)

Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5
BUG: 1468118
Signed-off-by: Xavier Hernandez 
Reviewed-on: https://review.gluster.org/17710
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

cluster/ec: Update xattr and heal size properly

2017-06-07T13:29:39+00:00

Problem-1 : Recursive healing of same file is happening
when IO is going on even after data heal completes.

Solution:
RCA: At the end of the write, when ec_update_size_version
gets called, we send it only on good bricks and not
on healing brick. Due to this, xattr on healing brick
will always remain out of sync and when the background
heal check source and sink, it finds this brick to be
healed and start healing from scratch. That involve
ftruncate and writing all of the data again.

To solve this, send xattrop on all the good bricks as
well as healing bricks.

Problem-2: The above fix exposes the data corruption
during heal. If the write on a file is going on and
heal finishes, we find that the file gets corrupted.

RCA:
The real problem happens in ec_rebuild_data(). Here we receive the
'size' argument which contains the real file size at the time of
starting self-heal and it's assigned to heal->total_size.

After that, a sequence of calls to ec_sync_heal_block() are done. Each
call ends up calling ec_manager_heal_block(), which does the actual work
of healing a block.

First a lock on the inode is taken in state EC_STATE_INIT using
ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is
called. This function calls ec_set_inode_size() to store the real size
of the inode (it uses heal->total_size).

The next step is to read the block to be healed. This is done using a
regular ec_readv(). One of the things this call does is to trim the
returned size if the file is smaller than the requested size.

In our case, when we read the last block of a file whose size was = 512
mod 1024 at the time of starting self-heal, ec_readv() will return only
the first 512 bytes, not the whole 1024 bytes.

This isn't a problem since the following ec_writev() sent from the heal
code only attempts to write the amount of data read, so it shouldn't
modify the remaining 512 bytes.

However ec_writev() also checks the file size. If we are writing the
last block of the file (determined by the size stored on the inode that
we have set to heal->total_size), any data beyond the (imposed) end of
file will be cleared with 0's. This causes the 512 bytes after the
heal->total_size to be cleared. Since the file was written after heal
started, the these bytes contained data, so the block written to the
damaged brick will be incorrect.

Solution:
Align heal->total_size to a multiple of the stripe size.

Thanks "Xavier Hernandez" 
to find out the root cause and to fix the issue.

>Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
>BUG: 1428673
>Signed-off-by: Ashish Pandey 
>Reviewed-on: https://review.gluster.org/16985
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Pranith Kumar Karampuri 
>Reviewed-by: Xavier Hernandez 
>Signed-off-by: Ashish Pandey 

Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
BUG: 1459392
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/17482
Smoke: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: Implement FALLOCATE FOP for EC

2017-05-25T13:12:58+00:00

FALLOCATE file operations is not implemented in the
existing EC code. This change set implements it
for EC.

>BUG: 1448293
>Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
>Signed-off-by: Sunil Kumar Acharya 
>Reviewed-on: https://review.gluster.org/15200
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Reviewed-by: Pranith Kumar Karampuri 
>CentOS-regression: Gluster Build System 

BUG: 1454686
Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17369
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Ashish Pandey 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

Fixes quota aux mount failure

2017-05-17T23:26:20+00:00

The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

>Reviewed-on: https://review.gluster.org/16938
>NetBSD-regression: NetBSD Build System 
>Smoke: Gluster Build System 
>Reviewed-by: Manikandan Selvaganesh 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Raghavendra G 
>Reviewed-by: Atin Mukherjee 
>(cherry picked from commit 2ae4b4058691b324535d802f4e6d24cce89a10e5)

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1449775
Signed-off-by: Sanoj Unnikrishnan 
Reviewed-on: https://review.gluster.org/17240
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

cluster/ec: Introduce optimistic changelog in EC

2017-03-04T12:37:56+00:00

Problem: Fix to https://bugzilla.redhat.com/show_bug.cgi?id=1316873 has made
changes to set dirty flag before every update fop, data or metadata, and unset
it after successful operation. That makes some of the fops very slow such as
entry operations or metadata operations.

Solution: File data operations are the only operation which take some time and
setting dirty flag before a fop and unsetting it after serves the purpose as
probability of failure of a fop is high when the time duration is more. For all
the other operations, set dirty flag at the end of the fop, if any brick is
down and need heal.

Providing following option to choose between high performance or better heal
marking for metadata and entry fops.

Set/Unset dirty flag for every update fop at the start of the fop. If ON, this
option impacts performance of entry operations or metadata operations as it
will set dirty flag at the start and unset it at the end of ALL update fop. If
OFF and all the bricks are good, dirty flag will be set at the start only for
file fops For metadata and entry fops dirty flag will not be set at the start,
if all the bricks are good. This does not impact performance for metadata
operations and entry operation but has a very small window to miss marking
entry as dirty in case it is required to be healed.

Thanks to Xavi and Ashish for the design
Picked the .t file from Ashish' patch https://review.gluster.org/16298

BUG: 1408809
Change-Id: I3ce860063f0e2901e50754dcfc3e4ed22daf819f
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/16821
Smoke: Gluster Build System 
Reviewed-by: Xavier Hernandez 
Tested-by: Xavier Hernandez 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/ec: Don't trigger data/metadata heal on Lookups

2017-02-27T03:06:55+00:00

Problem-1
If Lookup which doesn't take any locks observes version mismatch it can't be
trusted. If we launch a heal based on this information it will lead to
self-heals which will affect I/O performance in the cases where Lookup is
wrong. Considering self-heal-daemon and operations on the inode from client
which take locks can still trigger heal we can choose to not attempt a heal on
Lookup.

Problem-2:
Fixed spurious failure of
tests/bitrot/bug-1373520.t
For the issues above, what was happening was that ec_heal_inspect()
is preventing 'name' heal to happen

Problem-3:
tests/basic/ec/ec-background-heals.t
To be honest I don't know what the problem was, while fixing
the 2 problems above, I made some changes to ec_heal_inspect() and
ec_need_heal() after which when I tried to recreate the spurious
failure it just didn't happen even after a long time.

BUG: 1414287
Signed-off-by: Pranith Kumar K 
Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834
Reviewed-on: https://review.gluster.org/16468
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System 
Reviewed-by: Ashish Pandey

core: run many bricks within one glusterfsd process

2017-01-31T00:13:58+00:00

This patch adds support for multiple brick translator stacks running
in a single brick server process.  This reduces our per-brick memory usage by
approximately 3x, and our appetite for TCP ports even more.  It also creates
potential to avoid process/thread thrashing, and to improve QoS by scheduling
more carefully across the bricks, but realizing that potential will require
further work.

Multiplexing is controlled by the "cluster.brick-multiplex" global option.  By
default it's off, and bricks are started in separate processes as before.  If
multiplexing is enabled, then *compatible* bricks (mostly those with the same
transport options) will be started in the same process.

Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
BUG: 1385758
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/14763
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur

cluster/ec: mark ec-background-heal.t as bad

2017-01-26T12:58:19+00:00

Change-Id: I0c54c62cdeb40b983da2392296762471a5474652
BUG: 1416689
Signed-off-by: Xavier Hernandez 
Reviewed-on: https://review.gluster.org/16470
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Atin Mukherjee 
CentOS-regression: Gluster Build System