glusterfs.git/rpc/rpc-lib, branch testing-regression-job

rpc/transport: Missing a ref on dict while creating transport object

2019-03-20T13:24:44+00:00

while creating rpc_tranpsort object, we store a dictionary without
taking a ref on dict but it does an unref during the cleaning of the
transport object.

So the rpc layer expect the caller to take a ref on the dictionary
before passing dict to rpc layer. This leads to a lot of confusion
across the code base and leads to ref leaks.

Semantically, this is not correct. It is the rpc layer responsibility
to take a ref when storing it, and free during the cleanup.

I'm listing down the total issues or leaks across the code base because
of this confusion. These issues are currently present in the upstream
master.

1) changelog_rpc_client_init

2) quota_enforcer_init

3) rpcsvc_create_listeners : when there are two transport, like tcp,rdma.

4) quotad_aggregator_init

5) glusterd: init

6) nfs3_init_state

7) server: init

8) client:init

This patch does the cleanup according to the semantics.

Change-Id: I46373af9630373eb375ee6de0e6f2bbe2a677425
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC

core: implement a global thread pool

2019-02-18T02:58:24+00:00

This patch implements a thread pool that is wait-free for adding jobs to
the queue and uses a very small locked region to get jobs. This makes it
possible to decrease contention drastically. It's based on wfcqueue
structure provided by urcu library.

It automatically enables more threads when load demands it, and stops
them when not needed. There's a maximum number of threads that can be
used. This value can be configured.

Depending on the workload, the maximum number of threads plays an
important role. So it needs to be configured for optimal performance.
Currently the thread pool doesn't self adjust the maximum for the
workload, so this configuration needs to be changed manually.

For this reason, the global thread pool has been made optional, so that
volumes can still use the thread pool provided by io-threads.

To enable it for bricks, the following option needs to be set:

   config.global-threading = on

This option has no effect if bricks are already running. A restart is
required to activate it. It's recommended to also enable the following
option when running bricks with the global thread pool:

   performance.iot-pass-through = on

To enable it for a FUSE mount point, the option '--global-threading'
must be added to the mount command. To change it, an umount and remount
is needed. It's recommended to disable the following option when using
global threading on a mount point:

   performance.client-io-threads = off

To enable it for services managed by glusterd, glusterd needs to be
started with option '--global-threading'. In this case all daemons, like
self-heal, will be using the global thread pool.

Currently it can only be enabled for bricks, FUSE mounts and glusterd
services.

The maximum number of threads for clients and bricks can be configured
using the following options:

   config.client-threads
   config.brick-threads

These options can be applied online and its effect is immediate most of
the times. If one of them is set to 0, the maximum number of threads
will be calcutated as #cores * 2.

Some distributions use a very old userspace-rcu library (version 0.7)
for this reason, some header files from version 0.10 have been copied
into contrib/userspace-rcu and are used if the detected version is 0.7
or older.

An additional change has been made to io-threads to prevent that threads
are started when iot-pass-through is set.

Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b
updates: #532
Signed-off-by: Xavi Hernandez

clnt/rpc: ref leak during disconnect.

2019-02-12T07:05:58+00:00

During disconnect cleanup, we are not cancelling reconnect
timer, which causes a ref leak each time when a disconnect
happen.

Change-Id: I9d05d1f368d080e04836bf6a0bb018bf8f7b5b8a
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC

Multiple files: reduce work while under lock.

2019-01-29T09:27:22+00:00

Mostly, unlock before logging.
In some cases, moved different code that was not needed
to be under lock (for example, taking time, or malloc'ing)
to be executed before taking the lock.

Note: logging might be slightly less accurate in order, since it may
not be done now under the lock, so order of logs is racy. I think
it's a reasonable compromise.

Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul 

Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89

core: heketi-cli is throwing error "target is busy"

2019-01-24T06:54:41+00:00

Problem: At the time of deleting block hosting volume
         through heketi-cli , it is throwing an error "target is busy".
         cli is throwing an error because brick is not detached successfully
         and brick is not detached due to race condition to cleanp xprt
         associated with detached brick

Solution: To avoid xprt specifc race condition introduce an atomic flag
          on rpc_transport

Change-Id: Id4ff1fe8375a63be71fb3343f455190a1b8bb6d4
fixes: bz#1668190
Signed-off-by: Mohit Agrawal

rpc: use address-family option from vol file

2019-01-22T13:47:19+00:00

This patch helps enable IPv6 connections in the cluster.
The default address-family is IPv4 without using this option explicitly.

When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol
file, the mount command-line also needs to have
-o xlator-option="transport.address-family=inet6" added to it.

This option also gets added to the brick command-line.
Snapshot and gfapi use-cases should also use this option to pass in the
inet6 address-family.

Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270
fixes: bz#1635863
Signed-off-by: Milind Changire

rpc-clnt: reduce transport connect log for EINPROGRESS

2019-01-07T03:19:34+00:00

quotad and ganesha.nfsd prints many logs as,

[rpc-clnt.c:1739:rpc_clnt_submit ] 0--quota: error returned while attempting to connect to host: (null), port 0

Change-Id: Ic0c815400619e4a87a772a51b19822920228c1ef
Updates: bz#1596787
Signed-off-by: Kinglong Mee

rpcsvc: Don't expect dictionary values to be available

2019-01-07T03:17:49+00:00

When reconfigure happens, string values from one dictionary
are directly set in another dictionary. This can lead to
invalid memory when the first dictionary is freed up.
So do dict_set_dynstr_with_alloc instead of dict_set_str

updates bz#1650403
Change-Id: Id53236467521cfdeb07e7178d87ba6cf88d17003
Signed-off-by: Pranith Kumar K

rpc/rpc-lib: fix coverity issue

2018-12-28T10:40:49+00:00

Defect: Code can never be reached because of the
condition queue_index > 1024 cannot be true.

CID: 1398471 Logically dead code
updates: bz#789278

Change-Id: I367cda7e734f6d774900a58d8664cffcab69126f
Signed-off-by: Sheetal Pamecha

rpc : fix coverity in rpc/rpc-lib/src/rpcsvc.c

2018-12-28T10:31:53+00:00

This patch fixes newly introduced coverity.

CID: 1398472: Dereference before null check.
updates: bz#789278

Change-Id: Ie9b13084097de8f24b138acd7608c3e15b3bba9c
Signed-off-by: Sunny Kumar