glusterfs.git, branch v3.12.8

Release notes for 3.12.8

2018-04-12T17:07:38+00:00

Change-Id: If49d876df9b96acbc20c6378cea4ba0fed386b9f
BUG: 1564465
Signed-off-by: Jiffin Tony Thottan

features/index: Choose different base file on EMLINK error

2018-04-12T05:20:36+00:00

Change-Id: I4648816af908539efdc2528608aa2ebf7f0d0e2f
fixes: bz#1565655
Signed-off-by: Pranith Kumar K 
(cherry picked from commit bb12f2109a01856e8184e13cf984210d20155b13)

timer: Fix possible race during cleanup

2018-04-10T11:22:50+00:00

As mentioned in bug1509189, there is a possible race
between gf_timer_cancel(), gf_timer_proc() and
gf_timer_registry_destroy() leading to use_after_free.

Problem:

1) gf_timer_proc() is called, locks reg, and gets an event.
It unlocks reg, and calls the callback.

2) Meanwhile gf_timer_registry_destroy() is called, and removes
reg from ctx, and joins on gf_timer_proc().

3) gf_timer_call_cancel() is called on the event being
processed.  It cannot find reg (since it's been removed from reg),
so it frees event.

4) the callback returns into gf_timer_proc(), and it tries to free
event, but it's already free, so double free.

Solution:
The fix is to bail out in gf_timer_cancel() when registry
is not found. The logic behind this is that, gf_timer_cancel()
is called only on any existing event. That means there was a valid
registry earlier while creating that event. And the only reason
we cannot find that registry now is that it must have got set to
NULL when context cleanup is started.
Since gf_timer_proc() takes care of releasing all the remaining
events active on that registry, it seems safe to bail out
in gf_timer_cancel().

master https://review.gluster.org/18652
master BZ: 1509189

Change-Id: Ia9b088533141c3bb335eff2fe06b52d1575bb34f
BUG: 1565590
Reported-by: Daniel Gryniewicz 
Signed-off-by: Soumya Koduri 
Signed-off-by: Kaleb S. KEITHLEY

cluster/dht: Skipped files are not treated as errors

2018-04-06T12:50:35+00:00

For skipped files, use a return value of 1 to prevent
error messages being logged.

> Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861
> BUG: 1553598
> Signed-off-by: N Balachandran 

Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861
BUG: 1555161
Signed-off-by: N Balachandran

cluster/afr: Prevent ping-event handling on shd

2018-04-06T12:50:03+00:00

On shd, we shouldn't treat any brick down based
on latency, otherwise self-heal will never happen

fixes: 1562723
Change-Id: Ica07fcc4fae91a6bfd9c9a670e2be464704d94b7
BUG: 1562723
Signed-off-by: Pranith Kumar K

cluster/ec: send list-node-uuids request to all subvolumes

2018-04-06T12:49:33+00:00

The xattr trusted.glusterfs.list-node-uuids was only sent to a single
subvolume. This was returning null uuids from the other subvolumes as
if they were down.

This fix forces that xattr to be requested from all subvolumes.

Backport of:
> BUG: 1561406

Change-Id: If62eb39a6857258923ba625e153d4ad79018ea2f
BUG: 1561731
Signed-off-by: Xavi Hernandez

cluster/ec: Change default read policy to gfid-hash

2018-04-06T12:49:09+00:00

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey 

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1558352
Signed-off-by: Ashish Pandey

cluster/ec: avoid delays in self-heal

2018-04-06T12:48:38+00:00

Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Backport of:
> BUG: 1547662

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1555201
Signed-off-by: Xavi Hernandez

extras/hooks: Fix S10selinux-label-brick.sh hook script

2018-04-06T12:47:56+00:00

* script was failng due to syntax error
* shellcheck issues fixed
* improved performance: semanage & restorecon is being run on unique path

Upstream reference:
>Change-Id: I58b357d9fd37586004a2a518f7a5d1c5c9ddd7e3
>BUG: 1533342
>Signed-off-by: Milan Zink 

Change-Id: I58b357d9fd37586004a2a518f7a5d1c5c9ddd7e3
BUG: 1546627
Signed-off-by: Jiffin Tony Thottan

glusterfsd: Memleak in glusterfsd process while brick mux is on

2018-04-06T12:47:34+00:00

Problem: At the time of stopping the volume while brick multiplex is
         enabled memory is not cleanup from all server side xlators.

Solution: To cleanup memory for all server side xlators call fini
          in glusterfs_handle_terminate after send GF_EVENT_CLEANUP
          notification to top xlator.

> BUG: 1544090
> Signed-off-by: Mohit Agrawal 
> (cherry picked from commit 7c3cc485054e4ede1efb358552135b432fb7047a)

>Note: Run all test-cases in separate build (https://review.gluster.org/19574)
>      with same patch after enable brick mux forcefully, all test cases are
>      passed.

BUG: 1549473
Signed-off-by: Mohit Agrawal 
Change-Id: Ia10dc7f2605aa50f2b90b3fe4eb380ba9299e2fc