glusterfs.git/rpc/rpc-lib/src/rpcsvc.c, branch v3.13.1

gluster: IPv6 single stack support

2017-10-24T09:28:08+00:00

Summary:
- This diff changes all locations in the code to prefer inet6 family
  instead of inet.  This will allow change GlusterFS to operate
  via IPv6 instead of IPv4 for all internal operations while still
  being able to serve (FUSE or NFS) clients via IPv4.
- The changes apply to NFS as well.
- This diff ports D1892990, D1897341 & D1896522 to the 3.8 branch.

Test Plan: Prove tests!

Reviewers: dph, rwareing

Signed-off-by: Shreyas Siravara 

Change-Id: I34fdaaeb33c194782255625e00616faf75d60c33
BUG: 1406898
Reviewed-on-3.8-fb: http://review.gluster.org/16059
Reviewed-by: Shreyas Siravara 
Tested-by: Shreyas Siravara

rpc: destroy transport after client_t

2017-08-31T03:45:09+00:00

Problem:
1. Ref counting increment on the client_t object is done in
   rpcsvc_request_init() which is incorrect.
2. Ref not taken when delegating to grace_time_handler()

Solution:
1. Only fop requests which require processing down the graph via
   stack 'frames' now ref count the request in get_frame_from_request()
2. Take ref on client_t object in server_rpc_notify() but avoid
   dropping in RPCSVC_EVENT_TRANSPORT_DESRTROY. Drop the ref
   unconditionally when exiting out of grace_time_handler().
   Also, avoid dropping ref on client_t in
   RPCSVC_EVENT_TRANSPORT_DESTROY when ref mangement as been
   delegated to grace_time_handler()

Change-Id: Ic16246bebc7ea4490545b26564658f4b081675e4
BUG: 1481600
Reported-by: Raghavendra G 
Signed-off-by: Milind Changire 
Reviewed-on: https://review.gluster.org/17982
Tested-by: Raghavendra G 
Reviewed-by: Raghavendra G 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System

rpc: prevent logging null client [Invalid argument]

2017-07-25T12:47:02+00:00

The following log entry is observed during volume operations such
as volume create:

[2017-07-20 05:13:43.213797] E [client_t.c:321:gf_client_ref]
(-->/usr/local/lib/libgfrpc.so.0(rpcsvc_request_create+0x1a4) [0x7f987f66cd20]
-->/usr/local/lib/libgfrpc.so.0(rpcsvc_request_init+0xd0) [0x7f987f66ca23]
-->/usr/local/lib/libglusterfs.so.0(gf_client_ref+0x56) [0x7f987f91cbd5] )
0-client_t: null client [Invalid argument]

Change-Id: I49ba753e8d1a828bb275b0ccb1a181706774f387
BUG: 1193929
Signed-off-by: Prashanth Pai 
Reviewed-on: https://review.gluster.org/17848
Reviewed-by: Raghavendra G 
Reviewed-by: Amar Tumballi 
Tested-by: Amar Tumballi 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

libglusterfs: Name threads on creation

2017-07-19T14:16:19+00:00

Set names to threads on creation for easier
debugging.

Output of top -H -p 
Before:
19773 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19774 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19775 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19776 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19777 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19778 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19779 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19780 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19781 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19782 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19783 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19784 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19785 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterfsd
19786 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterfsd
19787 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterfsd
19789 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19790 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
25178 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
 5398 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
 7881 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd

After:
19773 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19774 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glustertimer
19775 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterfsd
19776 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glustermemsweep
19777 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glustersproc0
19778 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glustersproc1
19779 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterepoll0
19780 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusteridxwrker
19781 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusteriotwr0
19782 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterbrssign
19783 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterbrswrker
19784 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterclogecon
19785 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterclogd0
19786 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterclogd1
19787 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.01 glusterclogd2
19789 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterposixjan
19790 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterposixfsy
25178 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterepoll1
 5398 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterepoll2
 7881 root      20   0 1301.3m  12.6m   8.4m S  0.0  0.1   0:00.00 glusterposixhc

Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703
BUG: 1254002
Updates: #271
Signed-off-by: Raghavendra Talur 
Reviewed-on: https://review.gluster.org/11926
Reviewed-by: Prashanth Pai 
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
CentOS-regression: Gluster Build System

program/GF-DUMP: Shield ping processing from traffic to Glusterfs

2017-07-18T10:45:29+00:00

Program

Since poller thread bears the brunt of execution till the request is
handed over to io-threads, poller thread experiencies lock
contention(s) in the control flow till io-threads, which slows it
down. This delay invariably affects reading ping requests from network
and responding to them, resulting in increased ping latencies, which
sometimes results in a ping-timer-expiry on client leading to
disconnect of transport. So, this patch aims to free up poller thread
from executing code of Glusterfs Program. We do this by making

* Glusterfs Program registering itself asking rpcsvc to execute its
  actors in its own threads.
* GF-DUMP Program registering itself asking rpcsvc to _NOT_ execute
  its actors in its own threads. Otherwise program's ownthreads become
  bottleneck in processing ping traffic. This means that poller thread
  reads a ping packet, invokes its actor and hands the response msg to
  transport queue.

Change-Id: I526268c10bdd5ef93f322a4f95385137550a6a49
Signed-off-by: Raghavendra G 
BUG: 1421938
Reviewed-on: https://review.gluster.org/17105
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Jeff Darcy

rpc: Remove accidental IPV6 changes

2017-05-05T11:39:28+00:00

They snuck in with the HALO patch (07cc8679c)

Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6
BUG: 1447953
Signed-off-by: Kaushal M 
Reviewed-on: https://review.gluster.org/17174
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Amar Tumballi

Halo Replication feature for AFR translator

2017-05-02T10:23:53+00:00

Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster.  Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.

In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.

There are a few new volume options due to this feature:
  halo-shd-latency:  The threshold below which self-heal daemons will
  consider children (bricks) connected.

  halo-nfsd-latency: The threshold below which NFS daemons will consider
  children (bricks) connected.

  halo-latency: The threshold below which all other clients will
  consider children (bricks) connected.

  halo-min-replicas: The minimum number of replicas which are to
  be enforced regardless of latency specified in the above 3 options.
  If the number of children falls below this threshold the next
  best (chosen by latency) shall be swapped in.

New FUSE mount options:
  halo-latency & halo-min-replicas: As descripted above.

This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.

Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
  the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
  Writes & deletes will be protected via entry-locks as usual preventing
  concurrent writes into files which are undergoing replication.  Read operations
  on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
  other regions.  The take away from this is care should be taken to ensure
  multiple writers do not write the same files resulting in a gfid split-brain
  which will require resolution via split-brain policies (majority, mtime &
  size).  Recommended method for preventing this is using the nfs-auth feature to
  define which region for each share has RW permissions, tiers not in the origin
  region should have RO perms.

TODO:
- Synchronous clients (including the SHD) should choose clients from their own
  region as preferred sources for reads.  Most of the plumbing is in place for
  this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
  (i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
  to synchronously write to

Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests

Reviewers: jackl, dph, cjh, meyering

Reviewed By: meyering

Subscribers: ethanr

Differential Revision: https://phabricator.fb.com/D1272053

Tasks: 4117827

Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
BUG: 1428061
Signed-off-by: Kevin Vigor 
Reviewed-on: http://review.gluster.org/16099
Reviewed-on: https://review.gluster.org/16177
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

2017-02-16T04:09:36+00:00

Issue:
When fio is run on multiple clients (each client writes to its own files),
and meanwhile the clients does a readdirp, thus the client which did
a readdirp will now recieve the upcalls. In this scenario the client
disconnects with rpc decode failed error.

RCA:
Upcall calls rpcsvc_request_submit to submit the request to socket:
rpcsvc_request_submit currently:
rpcsvc_request_submit () {
   iobuf = iobuf_new
   iov = iobuf->ptr
   fill iobuf to contain xdrised upcall content - proghdr
   rpcsvc_callback_submit (..iov..)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit (... iov...) {
   ...
   iobuf = iobuf_new
   iov1 = iobuf->ptr
   fill iobuf to contain xdrised rpc header - rpchdr
   msg.rpchdr = iov1
   msg.proghdr = iov
   ...
   rpc_transport_submit_request (msg)
   ...
   if (iobuf)
       iobuf_unref (iobuf)
}

rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
can be freed, which is not the case. In especially high workload,
rpc_transport_submit_request() may not be able to write to socket immediately
and hence adds it to its own queue and returns as successful. Thus, we have
use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
and proghdr and thus fails to decode the rpc, resulting in disconnect.

To prevent this, we need to add the rpchdr and proghdr to a iobref and send
it in msg:
   iobref_add (iobref, iobufs)
   msg.iobref = iobref;
The socket layer takes a ref on msg.iobref, if it cannot write to socket and
is adding to the queue. Thus we do not have use after free.

Thank You for discussing, debugging and fixing along:
Prashanth Pai 
Raghavendra G 
Rajesh Joseph 
Kotresh HR 
Mohammed Rafi KC 
Soumya Koduri 

Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
BUG: 1421937
Signed-off-by: Poornima G 
Reviewed-on: https://review.gluster.org/16613
Reviewed-by: Prashanth Pai 
Smoke: Gluster Build System 
Reviewed-by: soumya k 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

socket: socket disconnect should wait for poller thread exit

2016-12-22T04:49:19+00:00

When SSL is enabled or if "transport.socket.own-thread" option is set
then socket_poller is run as different thread. Currently during
disconnect or PARENT_DOWN scenario we don't wait for this thread
to terminate. PARENT_DOWN will disconnect the socket layer and
cleanup resources used by socket_poller.

Therefore before disconnect we should wait for poller thread to exit.

Change-Id: I71f984b47d260ffd979102f180a99a0bed29f0d6
BUG: 1404181
Signed-off-by: Rajesh Joseph 
Reviewed-on: http://review.gluster.org/16141
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kaushal M 
Reviewed-by: Raghavendra Talur 
Reviewed-by: Raghavendra G

compound fops: Fix file corruption issue

2016-10-24T14:11:08+00:00

1. Address of a local variable @args is copied into state->req
in server3_3_compound (). But even after the function has gone out of
scope, in server_compound_resume () this pointer is accessed and
dereferenced. This patch fixes that.

2. Compound fops, by virtue of NOT having a vector sizer (like the one
writev has), ends up having both the header and the data (in case one of
its member fops is WRITEV) in the same hdr_iobuf. This buffer was not
being preserved through the lifetime of the compound fop, causing it to
be overwritten by a parallel write fop, even when the writev associated
with the currently executing compound fop is yet to hit the desk, thereby
corrupting the file's data. This is fixed by associating the hdr_iobuf with
the iobref so its memory remains valid through the lifetime of the fop.

3. Also fixed a use-after-free bug in protocol/client in compound fops cbk,
missed by Linux but caught by NetBSD.

Finally, big thanks to Pranith Kumar K and Raghavendra Gowdappa for their
help in debugging this file corruption issue.

Change-Id: I6d5c04f400ecb687c9403a17a12683a96c2bf122
BUG: 1378778
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15654
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System