glusterfs.git/tests/include.rc, branch v5.0rc1

tests: Preserve tarball of tests when they timeout

2018-08-27T02:42:19+00:00

When tests timeout, the timeout command sends TERM
signal to the command being executed. In the case of run-tests.sh
it invokes prove, which further invokes perl and finally the test
is run using bash. The TERM signal does not seem to be reachnig
the end bash that is actually executing the tests, and hence
when any test is terminated due to a timeout, the cleanup routine
in include.rc does not get a chance to run and preserve the
tarball.

Further, cleanup invokes tarball generation, but is invoked at
the beginning and end of every test, and at times in beteween
as well. This caused way too many tarballs in case we decide to
preserve the same whenever generated by cleanup.

This patch hence moves the tarball generation to run-tests.sh
instead, and further stores them named -iteration-.tar
and also prints tarball name generated and stored per iteration.

This should help relate failed runs to the tarball iteration #
and to look at relevant logs.

Further the patch also provides a -p option to run-tests.sh for
unit testing purposes, where running a test in a loop without the
option will generate as many tarballs, and using the option will
reduce this to preserving the last tarball, saving space in
smaller unit test setups.

Fixes: bz#1614062
Change-Id: I0aee76c89df0691cf4d0c1fcd4c04dffe0d7c896
Signed-off-by: ShyamsundarR

glusterd: address test failures with brick mux enabled

2018-05-31T04:27:26+00:00

This patch addresses following:
1. On volume stop, for the last brick, pmap_registry_remove () is
invoked by glusterd.
2. If a brick process is sigkilled, remove all the associated brick
instances from the portmap.
3. Bump up PROCESS_UP_TIMEOUT to 45.
4. gf_attach to kill a brick takes more time in mux (which is an
issue that needs a fix), but in the interim, give br-state-check.t
more time to complete (there are 2 kill_bricks, each taking 120
seconds, and the test usually passes in 30 odd seconds, hence bumping
this up to 350 seconds)
5. The test bug-1559004-EMLINK-handling.t is taking ~950 seconds at
times on master without mux, in mux cases, when it fails, it is almost
at the last iteration, hence bumping the timeout for this test case
to reduce regression error rates

Updates: bz#1577672
Change-Id: I1922675e112baca4c125c4c094eaa42a11e34e67
Signed-off-by: Atin Mukherjee

cluster/dht: fixes to parallel renames to same destination codepath

2018-05-07T06:04:10+00:00

Test case:
 # while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv
   "test$uuid" "test" -f || break; echo "done:$uuid"; done

 This script was run in parallel from multiple mountpoints

Along the course of getting the above usecase working, many issues
were found:

Issue 1:
=======
consider a case of rename (src, dst). We can encounter a situation
where,
* dst is a file present at the time of lookup
* dst is removed by the time rename fop reaches glusterfs

In this scenario, acquring inodelk on dst fails with ESTALE resulting
in failure of rename. However, as per POSIX irrespective of whether
dst is present or not, rename should be successful. Acquiring entrylk
provides synchronization even in races like this.

Algorithm:
1. Take inodelks on src and dst (if dst is present) on respective
   cached subvols. These inodelks are done to preserve backward
   compatibility with older clients, so that synchronization is
   preserved when a volume is mounted by clients of different
   versions. Once relevant older versions (3.10, 3.12, 3.13) reach
   EOL, this code can be removed.
2. Ignore ENOENT/ESTALE errors of inodelk on dst.
3. protect namespace of src and dst. To protect namespace of a file,
   take inodelk on parent on hashed subvol, then take entrylk on the
   same subvol on parent with basename of file. inodelk on parent is
   done to guard against changes to parent layout so that hashed
   subvol won't change during rename.
4. 
5. unlock all locks

Issue 2:
========
linkfile creation in lookup codepath can race with a rename. Imagine
the following scenario:
* lookup finds a data-file with gfid - gfid-dst - without a
  corresponding linkto file on hashed-subvol. It decides to create
  linkto file with gfid - gfid-dst.
    - Note that some codepaths of dht-rename deletes linkto file of
      dst as first step. So, a lookup racing with an in-progress
      rename can easily run into this situation.
* a rename (src-path:gfid-src, dst-path:gfid-dst) renames data-file
  and hence gfid of data-file changes to gfid-src with path dst-path.
* lookup proceeds and creates linkto file - dst-path - with gfid -
  dst-gfid - on hashed-subvol.
* rename tries to create a linkto file dst-path with src-gfid on
  hashed-subvol, but it fails with EEXIST. But EEXIST is ignored
  during linkto file creation.

Now we've ended with dst-path having different gfids - dst-gfid on
linkto file and src-gfid on data file. Future lookups on dst-path will
always fail with ESTALE, due to differing gfids.

The fix is to synchronize linkfile creation in lookup path with rename
using the same mechanism of protecting namespace explained in solution
of Issue 1. Once locks are acquired, before proceeding with linkfile
creation, we check whether conditions for linkto file creation are
still valid. If not, we skip linkto file creation.

Issue 3:
========
gfid of dst-path can change by the time locks are acquired. This
means, either another rename overwrote dst-path or dst-path was
deleted and recreated by a different client. When this happens,
cached-subvol for dst can change. If rename proceeds with old-gfid and
old-cached subvol, we'll end up in inconsistent state(s) like dst-path
with different gfids on different subvols, more than one data-file
being present etc.

Fix is to do the lookup with a new inode after protecting namespace of
dst. Post lookup, we've to compare gfids and correct local state
appropriately to be in sync with backend.

Issue 4:
========
During revalidate lookup, if following a linkto file doesn't lead to a
valid data-file, local->cached-subvol was not reset to NULL. This
means we would be operating on a stale state which can lead to
inconsistency. As a fix, reset it to NULL before proceeding with
lookup everywhere.

Issue 5:
========
Stale dentries left out in inode table on brick resulted in failures
of link fop even though the file/dentry didn't exist on backend fs. A
patch is submitted to fix this issue. Please check the dependency tree
of current patch on gerrit for details

In short, we fix the problem by not blindly trusting the
inode-table. Instead we validate whether dentry is present by doing
lookup on backend fs.

Change-Id: I832e5c47d232f90c4edb1fafc512bf19bebde165
updates: bz#1543279
BUG: 1543279
Signed-off-by: Raghavendra G

tests: don't kill the process directly with KILL signal

2018-03-08T10:15:01+00:00

Instead send the SIGTERM (default, 15) first, and at the end
send SIGKILL. If SIGKILL is sent directly, we miss many tests
like valgrind, lcov etc., not able to process the information
properly.

BUG: 1549000
Change-Id: I664de12ee7dbf47eb98b8141004cd51f6006b314
Signed-off-by: Amar Tumballi

tests: fix bug-1483058-replace-brick-quorum-validation.t spurious failure

2017-11-12T11:28:13+00:00

Change-Id: I04c35305bfb663eabbf715eee78695adfd4a2d20
BUG: 1511310
Signed-off-by: Atin Mukherjee

cluster/ec: Implement DISCARD FOP for EC

2017-10-25T11:52:41+00:00

Updates #254

This code change implements DISCARD FOP support for
EC.

BUG: 1461018
Change-Id: I09a9cb2aa9d91ec27add4f422dc9074af5b8b2db
Signed-off-by: Sunil Kumar Acharya

glusterd: Gluster should keep PID file in correct location

2017-08-11T07:36:41+00:00

Currently Gluster keeps process pid information of all the daemons
and brick processes in Gluster configuration file directory
(ie., /var/lib/glusterd/*).

These pid files should be seperate from configuration files.
Deletion of the configuration file directory might result into serious problems.
Also, /var/run/gluster is the default placeholder directory for pid files.

So, with this fix Gluster will keep all process pid information of all
processes in /var/run/gluster/* directory.

Change-Id: Idb09e3fccb6a7355fbac1df31082637c8d7ab5b4
BUG: 1258561
Signed-off-by: Gaurav Kumar Garg 
Signed-off-by: Saravanakumar Arumugam 
Reviewed-on: https://review.gluster.org/13580
Tested-by: MOHIT AGRAWAL 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

tests: Minor fix in error condition

2017-08-02T13:58:32+00:00

Change-Id: I2dcc8d88234d2ce92dd8506c61cb84ab253decab
Signed-off-by: Rajesh Joseph 
Reviewed-on: https://review.gluster.org/16191
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Zhou Zhengping

cluster/ec: Non-disruptive upgrade on EC volume fails

2017-07-14T00:26:04+00:00

Problem:
Enabling optimistic changelog on EC volume was not
handling node down scenarios appropriately resulting
in volume data inaccessibility.

Solution:
Update dirty xattr appropriately on good bricks whenever
nodes are down. This would fix the metadata information
as part of heal and thus ensures data accessibility.

BUG: 1468261
Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
Signed-off-by: Sunil Kumar Acharya 
Reviewed-on: https://review.gluster.org/17703
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/ec: Update xattr and heal size properly

2017-06-06T14:41:52+00:00

Problem-1 : Recursive healing of same file is happening
when IO is going on even after data heal completes.

Solution:
RCA: At the end of the write, when ec_update_size_version
gets called, we send it only on good bricks and not
on healing brick. Due to this, xattr on healing brick
will always remain out of sync and when the background
heal check source and sink, it finds this brick to be
healed and start healing from scratch. That involve
ftruncate and writing all of the data again.

To solve this, send xattrop on all the good bricks as
well as healing bricks.

Problem-2: The above fix exposes the data corruption
during heal. If the write on a file is going on and
heal finishes, we find that the file gets corrupted.

RCA:
The real problem happens in ec_rebuild_data(). Here we receive the
'size' argument which contains the real file size at the time of
starting self-heal and it's assigned to heal->total_size.

After that, a sequence of calls to ec_sync_heal_block() are done. Each
call ends up calling ec_manager_heal_block(), which does the actual work
of healing a block.

First a lock on the inode is taken in state EC_STATE_INIT using
ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is
called. This function calls ec_set_inode_size() to store the real size
of the inode (it uses heal->total_size).

The next step is to read the block to be healed. This is done using a
regular ec_readv(). One of the things this call does is to trim the
returned size if the file is smaller than the requested size.

In our case, when we read the last block of a file whose size was = 512
mod 1024 at the time of starting self-heal, ec_readv() will return only
the first 512 bytes, not the whole 1024 bytes.

This isn't a problem since the following ec_writev() sent from the heal
code only attempts to write the amount of data read, so it shouldn't
modify the remaining 512 bytes.

However ec_writev() also checks the file size. If we are writing the
last block of the file (determined by the size stored on the inode that
we have set to heal->total_size), any data beyond the (imposed) end of
file will be cleared with 0's. This causes the 512 bytes after the
heal->total_size to be cleared. Since the file was written after heal
started, the these bytes contained data, so the block written to the
damaged brick will be incorrect.

Solution:
Align heal->total_size to a multiple of the stripe size.

Thanks "Xavier Hernandez" 
to find out the root cause and to fix the issue.

Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
BUG: 1428673
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/16985
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Xavier Hernandez