glusterfs.git/rpc, branch testing-regression-job

rpc/transport: Missing a ref on dict while creating transport object

2019-03-20T13:24:44+00:00

while creating rpc_tranpsort object, we store a dictionary without
taking a ref on dict but it does an unref during the cleaning of the
transport object.

So the rpc layer expect the caller to take a ref on the dictionary
before passing dict to rpc layer. This leads to a lot of confusion
across the code base and leads to ref leaks.

Semantically, this is not correct. It is the rpc layer responsibility
to take a ref when storing it, and free during the cleanup.

I'm listing down the total issues or leaks across the code base because
of this confusion. These issues are currently present in the upstream
master.

1) changelog_rpc_client_init

2) quota_enforcer_init

3) rpcsvc_create_listeners : when there are two transport, like tcp,rdma.

4) quotad_aggregator_init

5) glusterd: init

6) nfs3_init_state

7) server: init

8) client:init

This patch does the cleanup according to the semantics.

Change-Id: I46373af9630373eb375ee6de0e6f2bbe2a677425
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC

socket/ssl: fix crl handling

2019-03-19T09:38:28+00:00

Problem:
Just setting the path to the CRL directory in socket_init() wasn't working.

Solution:
Need to use special API to retrieve and set X509_VERIFY_PARAM and set
the CRL checking flags explicitly.
Also, setting the CRL checking flags is a big pain, since the connection
is declared as failed if any CRL isn't found in the designated file or
directory. A comment has been added to the code appropriately.

Change-Id: I8a8ed2ddaf4b5eb974387d2f7b1a85c1ca39fe79
fixes: bz#1687326
Signed-off-by: Milind Changire

dict: handle STR_OLD data type in xdr conversions

2019-03-08T09:38:35+00:00

Currently a dict conversion on wire for 3.x protocol happens using
`dict_unserialize()`, which sets the type of data as STR_OLD. But the
new protocol doesn't send it over the wire as its not considered as a
valid format in new processes.

But considering we deal with old and new protocol when we do a rolling
upgrade, it will allow us to get all the information properly with new
protocol.

Credits: Krutika Dhananjay

Fixes: bz#1684385
Change-Id: I165c0021fb195b399790b9cf14a7416ae75ec84f
Signed-off-by: Amar Tumballi

core: implement a global thread pool

2019-02-18T02:58:24+00:00

This patch implements a thread pool that is wait-free for adding jobs to
the queue and uses a very small locked region to get jobs. This makes it
possible to decrease contention drastically. It's based on wfcqueue
structure provided by urcu library.

It automatically enables more threads when load demands it, and stops
them when not needed. There's a maximum number of threads that can be
used. This value can be configured.

Depending on the workload, the maximum number of threads plays an
important role. So it needs to be configured for optimal performance.
Currently the thread pool doesn't self adjust the maximum for the
workload, so this configuration needs to be changed manually.

For this reason, the global thread pool has been made optional, so that
volumes can still use the thread pool provided by io-threads.

To enable it for bricks, the following option needs to be set:

   config.global-threading = on

This option has no effect if bricks are already running. A restart is
required to activate it. It's recommended to also enable the following
option when running bricks with the global thread pool:

   performance.iot-pass-through = on

To enable it for a FUSE mount point, the option '--global-threading'
must be added to the mount command. To change it, an umount and remount
is needed. It's recommended to disable the following option when using
global threading on a mount point:

   performance.client-io-threads = off

To enable it for services managed by glusterd, glusterd needs to be
started with option '--global-threading'. In this case all daemons, like
self-heal, will be using the global thread pool.

Currently it can only be enabled for bricks, FUSE mounts and glusterd
services.

The maximum number of threads for clients and bricks can be configured
using the following options:

   config.client-threads
   config.brick-threads

These options can be applied online and its effect is immediate most of
the times. If one of them is set to 0, the maximum number of threads
will be calcutated as #cores * 2.

Some distributions use a very old userspace-rcu library (version 0.7)
for this reason, some header files from version 0.10 have been copied
into contrib/userspace-rcu and are used if the detected version is 0.7
or older.

An additional change has been made to io-threads to prevent that threads
are started when iot-pass-through is set.

Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b
updates: #532
Signed-off-by: Xavi Hernandez

socket: socket event handlers now return void

2019-02-18T02:45:40+00:00

Problem:
Returning any value from socket event handlers to the event sub-system
doesn't make sense since event sub-system cannot handle socket
sub-system errors.

Solution:
Change return type of all socket event handlers to 'void'

Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
Fixes: bz#1651246
Signed-off-by: Milind Changire

clnt/rpc: ref leak during disconnect.

2019-02-12T07:05:58+00:00

During disconnect cleanup, we are not cancelling reconnect
timer, which causes a ref leak each time when a disconnect
happen.

Change-Id: I9d05d1f368d080e04836bf6a0bb018bf8f7b5b8a
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC

Multiple files: reduce work while under lock.

2019-01-29T09:27:22+00:00

Mostly, unlock before logging.
In some cases, moved different code that was not needed
to be under lock (for example, taking time, or malloc'ing)
to be executed before taking the lock.

Note: logging might be slightly less accurate in order, since it may
not be done now under the lock, so order of logs is racy. I think
it's a reasonable compromise.

Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul 

Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89

core: heketi-cli is throwing error "target is busy"

2019-01-24T06:54:41+00:00

Problem: At the time of deleting block hosting volume
         through heketi-cli , it is throwing an error "target is busy".
         cli is throwing an error because brick is not detached successfully
         and brick is not detached due to race condition to cleanp xprt
         associated with detached brick

Solution: To avoid xprt specifc race condition introduce an atomic flag
          on rpc_transport

Change-Id: Id4ff1fe8375a63be71fb3343f455190a1b8bb6d4
fixes: bz#1668190
Signed-off-by: Mohit Agrawal

core: move logs which are only developer relevant to DEBUG level

2019-01-23T16:05:52+00:00

We had only changed the log level to DEBUG in release branch earlier.
But considering 90%+ of our deployments happen in same env, we can look
at these specific logs on need basis. With this change, the master
branch will be easier to debug with lesser logs.

Change-Id: I4157a7ec7d5ec9c2948b2bbc1e4cb8317f28d6b8
Updates: bz#1666833
Signed-off-by: Amar Tumballi

rpc: use address-family option from vol file

2019-01-22T13:47:19+00:00

This patch helps enable IPv6 connections in the cluster.
The default address-family is IPv4 without using this option explicitly.

When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol
file, the mount command-line also needs to have
-o xlator-option="transport.address-family=inet6" added to it.

This option also gets added to the brick command-line.
Snapshot and gfapi use-cases should also use this option to pass in the
inet6 address-family.

Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270
fixes: bz#1635863
Signed-off-by: Milind Changire