glusterfs.git, branch v3.13.2

doc: Added release notes for 3.13.2

2018-01-20T00:15:42+00:00

Change-Id: I80f411f3820f82cb27fd5f8cf1cf99d5565d8b9d
BUG: 1530334
Signed-off-by: ShyamsundarR

selinux-xlator : validate dict before calling dict_rename_key()

2018-01-19T14:57:52+00:00

Upstream reference :
>Change-Id: I71da3b64e5e8c82e8842e119b2b05da3e2ace550
>BUG: 1535772
>Signed-off-by: Jiffin Tony Thottan 
>(cherry picked from commit bee06ccd7b80e3f5804f0c7c7c56936fed6d2b4e)

Change-Id: I71da3b64e5e8c82e8842e119b2b05da3e2ace550
BUG: 1536294

cluster/afr: Adding option to take full file lock

2018-01-19T14:24:52+00:00

Problem:
In replica 3 volumes there is a possibilities of ending up in split
brain scenario, when multiple clients writing data on the same file
at non overlapping regions in parallel.

Scenario:
- Initially all the copies are good and all the clients gets the value
  of data readables as all good.
- Client C0 performs write W1 which fails on brick B0 and succeeds on
  other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks.
- C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges
  so afr takes granular locks and all the writes are performed in parallel.
  Since each client had data-readables as good, it does not see
  file going into split-brain in the in_flight_split_brain check, hence
  performs the post-op marking the pending xattrs. Now all the bricks
  are being blamed by each other, ending up in split-brain.

Fix:
Have an option to take either full lock or range lock on files while
doing data transactions, to prevent the possibility of ending up in
split brains. With this change, by default the files will take full
lock while doing IO. If you want to make use of the old range lock
change the value of "cluster.full-lock" to "no".

Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
BUG: 1535438
Signed-off-by: karthik-us

tests: Use /dev/urandom instead of /dev/random for dd

2018-01-19T06:50:16+00:00

If there's not enough entropy in the system then reading /dev/random would take
a significant time since it would take a long time for the /dev/random buffers
to get full as is desired in this dd run.
Milind found that this test file takes almost a 1000 seconds or more to pass
instead of just a minute because of this.

Backport of:
>BUG: 1431955

BUG: 1533023
Signed-off-by: Pranith Kumar K 
Change-Id: I9145b17f77f09d0ab71816ae249c69b8fe14c1a5

cluster/afr: Fixing the flaws in arbiter becoming source patch

2018-01-18T18:20:08+00:00

Problem:
Setting the write_subvol value to read_subvol in case of metadata
transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03)
might lead to the original problem of arbiter becoming source.

Scenario:
1) All bricks are up and good
2) 2 writes w1 and w2 are in progress in parallel
3) ctx->read_subvol is good for all the subvolumes
4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on
   the disk
5) read/lookup comes on the same file and refreshes read_subvols back
   to all good
6) metadata transaction happens which makes ctx->write_subvol to be
   assigned with ctx->read_subvol which is all good
7) w2 succeeds on brick1 and fails on brick0 and this will update the
   brick in reverse order leading to arbiter becoming source

Fix:
Instead of setting the ctx->write_subvol to ctx->read_subvol in the
pre-op statge, if there is a metadata transaction, check in the
function __afr_set_in_flight_sb_status() if it is a data/metadata
transaction. Use the value of ctx->write_subvol if it is a data
transactions and ctx->read_subvol value for other transactions.

With this patch we assign the value of ctx->write_subvol in the
afr_transaction_perform_fop() with the on disk value, instead of
assigning it in the afr_changelog_pre_op() with the in memory value.

Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4
BUG: 1516313
Signed-off-by: karthik-us 
(cherry picked from commit ba149bac92d169ae2256dbc75202dc9e5d06538e)

cluster/ec: OpenFD heal implementation for EC

2018-01-18T18:18:53+00:00

Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.

Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.

>BUG: 1431955
>Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
>Signed-off-by: Sunil Kumar Acharya 

Upstream Patch: https://review.gluster.org/#/c/17077/

BUG: 1533023
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya

posix: delete stale gfid handles in nameless lookup

2018-01-16T20:18:35+00:00

..in order for self-heal of symlinks to work properly (see BZ for
details).

Backport of https://review.gluster.org/#/c/19070/
Signed-off-by: Ravishankar N 

Change-Id: I9a011d00b07a690446f7fd3589e96f840e8b7501
BUG: 1534842

cluster/dht: Add migration checks to dht_(f)xattrop

2018-01-10T08:51:05+00:00

The earlier backport was incorrect. Added the missing
lines of code.

The dht_(f)xattrop implementation did not implement
migration phase1/phase2 checks which could cause issues
with rebalance on sharded volumes.
This does not solve the issue where fops may reach the target
out of order.

> Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c
> BUG: 1471031
> Signed-off-by: N Balachandran 

BUG: 1515434
Change-Id: I183d52530e0220e3007e73672991cb79b44c022a
Signed-off-by: N Balachandran

mount/fuse: use fstat in getattr implementation if any opened fd is available

2018-01-09T14:00:08+00:00

The restriction of using fds opened by the same Pid means fds cannot
be shared across threads of multithreaded application. Note that fops
from kernel have different Pid for different threads. Imagine
following sequence of operations:

* Turn off performance.open-behind
* Thread t1 opens an fd - fd1 - on file "file". Let's assume nodeid of
  "file" is "nodeid-file".
* Thread t2 does RENAME ("newfile", "file"). Let's assume nodeid of
  "newfile" as "nodeid-newfile".
* t2 proceeds to do fstat (fd1)

The above set of operations can sometimes result in ESTALE/ENOENT
errors. RENAME overwrites "file" with "newfile" changing its nodeid
from "nodeid-file" to "nodeid-newfile" and post RENAME, "nodeid-file" is
removed from the backend. If fstat carries nodeid-file as argument,
which can happen if lookup has not refreshed the nodeid of "file" and
since t2 doesn't have an fd opened, fuse_getattr_resume uses STAT
which will fail as "nodeid-file" no longer exists.

Since the above set of operations and sharing of fds across
multiple threads are valid, this is a bug.

The fix is to use any fd opened on the inode. In this specific example
fuse_getattr_resume will find fd1 and winds down the call as fstat
(fd1) which won't fail.

Cross-checked with "Miklos Szeredi"  for
any security issues with this solution and he approves the solution.

Thanks to "Miklos Szeredi"  for all the
pointers and discussions.

>Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c
>BUG: 1510401
>Signed-off-by: Raghavendra G 

(cherry picked from commit 8b57378e5596f287a7b9d106dd6fb56a624b42ee)
Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c
BUG: 1529084
Signed-off-by: Raghavendra G

glusterd: Nullify pmap entry for bricks belonging to same port

2018-01-09T13:59:10+00:00

Commit 30e0b86 tried to address all the stale port issues glusterd had
in case of a brick is abruptly killed. For brick multiplexing case
because of a bug the portmap entry was not getting removed. This patch
addresses the same.

>mainline patch : https://review.gluster.org/#/c/19119/

Change-Id: Ib020b967a9b92f1abae9cab9492f0cacec59aaa1
BUG: 1530449
Signed-off-by: Atin Mukherjee