glusterfs.git/xlators, branch v3.10.9

glusterd: Nullify pmap entry for bricks belonging to same port

2018-01-03T15:19:03+00:00

Commit 30e0b86 tried to address all the stale port issues glusterd had
in case of a brick is abruptly killed. For brick multiplexing case
because of a bug the portmap entry was not getting removed. This patch
addresses the same.

>mainline patch : https://review.gluster.org/#/c/19119/

Change-Id: Ib020b967a9b92f1abae9cab9492f0cacec59aaa1
BUG: 1530450
Signed-off-by: Atin Mukherjee

mount/fuse: use fstat in getattr implementation if any opened fd is available

2018-01-03T14:54:19+00:00

The restriction of using fds opened by the same Pid means fds cannot
be shared across threads of multithreaded application. Note that fops
from kernel have different Pid for different threads. Imagine
following sequence of operations:

* Turn off performance.open-behind
* Thread t1 opens an fd - fd1 - on file "file". Let's assume nodeid of
  "file" is "nodeid-file".
* Thread t2 does RENAME ("newfile", "file"). Let's assume nodeid of
  "newfile" as "nodeid-newfile".
* t2 proceeds to do fstat (fd1)

The above set of operations can sometimes result in ESTALE/ENOENT
errors. RENAME overwrites "file" with "newfile" changing its nodeid
from "nodeid-file" to "nodeid-newfile" and post RENAME, "nodeid-file" is
removed from the backend. If fstat carries nodeid-file as argument,
which can happen if lookup has not refreshed the nodeid of "file" and
since t2 doesn't have an fd opened, fuse_getattr_resume uses STAT
which will fail as "nodeid-file" no longer exists.

Since the above set of operations and sharing of fds across
multiple threads are valid, this is a bug.

The fix is to use any fd opened on the inode. In this specific example
fuse_getattr_resume will find fd1 and winds down the call as fstat
(fd1) which won't fail.

Cross-checked with "Miklos Szeredi"  for
any security issues with this solution and he approves the solution.

Thanks to "Miklos Szeredi"  for all the
pointers and discussions.

>Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c
>BUG: 1510401
>Signed-off-by: Raghavendra G 

(cherry picked from commit 8b57378e5596f287a7b9d106dd6fb56a624b42ee)
Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c
BUG: 1529086
Signed-off-by: Raghavendra G

Revert "mount/fuse: report ESTALE as ENOENT"

2018-01-02T16:21:38+00:00

This reverts commit 26d16b90ec7f8acbe07e56e8fe1baf9c9fa1519e.

Consider rename (index.new, store.idx) and open (store.idx) being
executed in parallel. When we break down operations following sequence
is possible.

* lookup (store.idx) - as part of open(store.idx) returns gfid1 as the
  result.
* rename (index.new, store.idx) changes gfid of store.idx to
  gfid2. Note that gfid2 was the nodeid of index.new. Since rename is
  successful, gfid2 is associated with store.idx.
* open (store.idx) resumes and issues open fop to glusterfs with
  gfid1. open in glusterfs fails as gfid1 doesn't exist and the error
  returned by glusterfs to kernel-fuse is ENOENT.
* kernel passes back the same error to application as a result to
  open.

This error could've been prevented if kernel retries open with
gfid2. Interestingly kernel do retry open when it receives ESTALE
error. Even though failure to find gfid resulted in ESTALE error,
commit 26d16b90ec7f8acb converted that error to ENOENT while sending
an error reply to kernel. This prevented kernel from retrying open
resulting in error.

>Change-Id: I2e752ca60dd8af1b989dd1d29c7b002ee58440b4
>BUG: 1500269
>Signed-off-by: Raghavendra G 

(cherry picked from commit 019a55e708375d2b1e576fcc948a691bcdc5c749)
Change-Id: I2e752ca60dd8af1b989dd1d29c7b002ee58440b4
BUG: 1529089
Signed-off-by: Raghavendra G

mount/fuse: never fail open(dir) with ENOENT

2018-01-02T16:20:39+00:00

open(dir) being an operation on inode should never fail with
ENOENT. If gfid is not present, the appropriate error is ESTALE. This
will enable kernel to retry open after a revalidate lookup.

>Change-Id: I8d07d2ebb5a0da6c3ea478317442cb42f1797a4b
>BUG: 1500269
>Signed-off-by: Raghavendra G 

(cherry picked from commit fb4b914ce84bc83a5f418719c5ba7c25689a9251)
Change-Id: I8d07d2ebb5a0da6c3ea478317442cb42f1797a4b
BUG: 1529089
Signed-off-by: Raghavendra G

performance/write-behind: fix bug while handling short writes

2017-12-26T10:40:19+00:00

The variabled "fulfilled" in wb_fulfill_short_write is not reset to 0
while handling every member of the list.

This has some interesting consequences:

* If we break from the loop while processing last member of the list
  head->winds, req is reset to head as the list is a circular
  one. However, head is already fulfilled and can potentially be
  freed. So, we end up adding a freed request to wb_inode->todo
  list. This is the RCA for the crash tracked by the bug associated
  with this patch (Note that we saw "holder" which is freed in todo
  list).

* If we break from the loop while processing any of the last but one
  member of the list head->winds, req is set to next member in the
  list, skipping the current request, even though it is not entirely
  synced. This can lead to data corruption.

The fix is very simple and we've to change the code to make sure
"fulfilled" reflects whether the current request is fulfilled or not
and it doesn't carry history of previous requests in the list.

>Change-Id: Ia3d6988175a51c9e08efdb521a7b7938b01f93c8
>BUG: 1528558
>Signed-off-by: Raghavendra G 

(cherry picked from commit 0bc22bef7f3c24663aadfb3548b348aa121e3047)
Change-Id: Ia3d6988175a51c9e08efdb521a7b7938b01f93c8
BUG: 1529096
Signed-off-by: Raghavendra G

glusterd: Free up svc->conn on volume delete

2017-12-07T05:01:34+00:00

Daemons like snapd, tierd and gfproxyd are maintained on per volume
basis and on a volume delete we should destroy the rpc connection
established for them.

>mainline patch : https://review.gluster.org/#/c/18957

Change-Id: Id1440e39da07b990fdb9b207df18da04b1ca8014
BUG: 1523050
Signed-off-by: Atin Mukherjee

features/locks: Fix memory leaks

2017-11-27T18:12:28+00:00

Backport of:
> BUG: 1515161

Change-Id: Ic1d2e17a7d14389b6734d1b88bd28c0a2907bbd6
BUG: 1517682
Signed-off-by: Xavier Hernandez

cluster/afr: Honor default timeout of 5min for analyzing split-brain files

2017-11-27T13:50:23+00:00

Problem:
After setting split-brain-choice option to analyze the file to resolve
the split brain using the command
"setfattr -n replica.split-brain-choice -v "choiceX" "
should allow to access the file from mount for default timeout of 5mins.
But the timeout was not honored and was able to access the file even after
the timeout.

Fix:
Call the inode_invalidate() in afr_set_split_brain_choice_cbk() so that
it will triger the cache invalidate after resetting the timer and the
split brain choice. So the next calls to access the file will fail with EIO.

Change-Id: I698cb833676b22ff3e4c6daf8b883a0958f51a64
BUG: 1514388
Signed-off-by: karthik-us 
(cherry picked from commit 933ec57ccda2c1ba5ce6f207313c3b6802e67ca3)

glusterd: use sys_lstat instead of lstat

2017-11-01T10:44:57+00:00

Showed up in 0symbol-check.t while testing something else.  Might as
well fix it now.

> Signed-off-by: Jeff Darcy 
> Reviewed-on: https://review.gluster.org/16820
> Reviewed-by: Prashanth Pai 

(cherry picked from commit 9ed98f23564387c5b436a0c6ec6d4393f970dcb9)
BUG: 1508036
Change-Id: Ic6b8214de6f486187afc4987c5ffbbca02c8997f

glusterd: delete source brick only once in reset-brick commit force

2017-10-31T18:08:14+00:00

While stopping the brick which is to be reset and replaced delete_brick
flag was passed as true which resulted glusterd to free up to source
brick before the actual operation. This results commit force to fail
failing to find the source brickinfo.

> mainline patch : https://review.gluster.org/#/c/18581/

Change-Id: I1aa7508eff7cc9c9b5d6f5163f3bb92736d6df44
BUG: 1507880
Signed-off-by: Atin Mukherjee 
(cherry picked from commit 0fb8acaa6ff80c43e46deac0ce66b29ae0df0ca4)