glusterfs.git/xlators, branch v4.0.1

rpcsvc: scale rpcsvc_request_handler threads

2018-03-20T12:17:40+00:00

Scale rpcsvc_request_handler threads to match the scaling of event
handler threads.

Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c51
for a discussion about why we need multi-threaded rpcsvc request
handlers.

mainline:
> Reviewed-on: https://review.gluster.org/19337
> Reviewed-by: Raghavendra G 
> Signed-off-by: Milind Changire 
(cherry picked from commit 7d641313f46789ec0a7ba0cc04f504724c780855)

Change-Id: Ib6838fb8b928e15602a3d36fd66b7ba08999430b
BUG: 1550946
Signed-off-by: Milind Changire

protocol/client: fix memory corruption

2018-03-20T10:59:57+00:00

There was an issue when some accesses to saved_fds list were
protected by the wrong mutex (lock instead of fd_lock).

Additionally, the retrieval of fdctx from fd's context and any
checks done on it have also been protected by fd_lock to avoid
fdctx to become outdated just after retrieving it.

Backport of:
> BUG: 1553129

Change-Id: If2910508bcb7d1ff23debb30291391f00903a6fe
BUG: 1554235
Signed-off-by: Xavi Hernandez

cluster/ec: Change default read policy to gfid-hash

2018-03-19T08:52:04+00:00

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey 

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1557906
Signed-off-by: Ashish Pandey

glusterd: volume get fixes for client-io-threads & quorum-type

2018-03-16T13:39:18+00:00

1. If a replica volume created on glusterfs-3.8 was upgraded to
glusterfs-3.12, `gluster vol get volname client-io-threads` displayed
'on' even though it wasn't and the xlator wasn't loaded on
the client-graph. This was due to removing certain checks in
glusterd_get_default_val_for_volopt as a part of commit
47604fad4c2a3951077e41e0c007ceb979bb2c24. Fix it.

2. Also, as a part of op-version bump-up, client-io-threads was being
loaded on the clients  during volfile regeneration. Prevent it.

3. AFR assumes quorum-type to be auto in newly created replic 3 (odd
replica in general) volumes but `gluster vol get quorum-type` displays
'none'. Fix it.

Change-Id: I19e586361ed1065c70fb378533d3b4dac1095df9
BUG: 1552404
Signed-off-by: Ravishankar N 
(cherry picked from commit bd2c45fe3180fe36b042d5eabd348b6eaeb8d3e2)

cluster/ec: avoid delays in self-heal

2018-03-15T07:20:10+00:00

Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Backport of:
> BUG: 1547662

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1555198
Signed-off-by: Xavi Hernandez

protocol: Fix 4.0 client, parsing older iatt in dict

2018-03-11T10:22:22+00:00

In a mixed mode cluster involving 4.0 and older 3.x bricks, if
clients are newer, then the iatt encoded in the dictionary can be
of the older iatt format, which a newer client will map incorrectly
to the newer structure.

This causes failures in FOPs that depend on this iatt for some
functionality (seen in mkdir operations failing as EIO, when DHT
hits its internal setxattr call).

The fix provided is to convert the iatt in the dict, based on which
RPC version is used to communicate with the server.

IOW, this is the reverse of change in commit "b966c7790e"

Tested using a mixed mode cluster (i.e bricks in 3.12 and 4.0 versions)
and a mixed set of clients, 3.12 and 4.0 clients.

There is no regression test provided, as this needs a mixed mode cluster
to test and validate.

>Change-Id: I454e54651ca836b9f7c28f45f51d5956106aefa9
>BUG: 1554053
>Signed-off-by: ShyamsundarR 

Change-Id: I454e54651ca836b9f7c28f45f51d5956106aefa9
BUG: 1554077
Signed-off-by: ShyamsundarR 
Signed-off-by: Raghavendra G

protocol: Added iatt conversion to older format

2018-03-11T10:22:22+00:00

Added iatt conversion to an older format, when dealing with
older RPC versions. This enables iatt structure conformance
when dealing with older clients.

This helps fix rolling upgrade from 3.x versions to 4.0 version
of gluster by sending the right iatt in the dictionary when DHT
requests the same.

(cherry picked from commit b966c7790e35de353ae09ee48d4e2f55e0117f7e)

Change-Id: Ieaf925f81f8c7798a8fba1e90a59fa9dec82856c
BUG: 1551112
Signed-off-by: ShyamsundarR

build: address linkage issues

2018-03-06T13:43:53+00:00

We have the following undefined symbol error from protocol/server.so:

  glusterfs_mgmt_pmap_signout
  glusterfs_autoscale_threads

See https://review.gluster.org/19225 (bz#1532238)
and https://review.gluster.org/19657 (bz#1550895)

IMO this is a cleaner solution. I.e. moving the above two functions
to libgfrpc (.../rpc/rpc-lib/...)

I would also, for (foolish) consistency sake, like to see
glusterfs_mgmt_pmap_signin() moved from glusterfsd to libgfrpc as
well.

This works on f28/rawhide, with its new, more restrictive run-time
link semantics. The smoke and regression tests on earlier fedora and
centos will confirm that it works on those platforms too.

master: 1550895
master: https://review.gluster.org/19664

Change-Id: I9cfbd1cc15e7ebd9fc31b56ac791287fa2c584de
BUG: 1551640
Signed-off-by: Kaleb S. KEITHLEY

cluster/afr: Fix dict-leak in pre-op

2018-03-03T04:33:26+00:00

At the time of pre-op, pre_op_xdata is populted with the xattrs we get from the
disk and at the time of post-op it gets over-written without unreffing the
previous value stored leading to a leak.
This is a regression we missed in
https://review.gluster.org/#/q/ba149bac92d169ae2256dbc75202dc9e5d06538e

 >BUG: 1550078
 >Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7
 >Signed-off-by: Pranith Kumar K 
 >(cherry picked from commit e7b79c59590c203c65f7ac8548b30d068c232d33)

BUG: 1550808
Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7

protocol/server: Insert dummy clnt-lk-version to avoid upgrade failure

2018-03-02T18:57:48+00:00

This is required as we check for 'clnt-lk-version' in SETVOLUME callback
with older clients in place against newer servers. Change is similar to
what we have done via https://review.gluster.org/#/c/19560/.

(cherry picked from commit fecb0fc748806d4e6d61bcbef976acf473e55c82)

Change-Id: If333c20cf9503f40687ec926c44c7e50222c05b5
BUG: 1551112
Signed-off-by: Anoop C S