glusterfs.git/rpc, branch release-3.11

socket/reconfigure: reconfigure should be done on new dict

2017-07-03T12:49:32+00:00

In socket reconfigure, reconfigurations are doing with old
dict values. It should be with new reconfigured dict values

Backport of>
>Change-Id: Iac5ad4382fe630806af14c99bb7950a288756a87
>BUG: 1456405
>Signed-off-by: Mohammed Rafi KC 
>Reviewed-on: https://review.gluster.org/17412
>Smoke: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Raghavendra G 

Change-Id: Iac5ad4382fe630806af14c99bb7950a288756a87
BUG: 1463517
Signed-off-by: Mohammed Rafi KC 
Reviewed-on: https://review.gluster.org/17588
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Zhou Zhengping 
Reviewed-by: Shyamsundar Ranganathan

event/epoll: Add back socket for polling of events immediately after reading the entire rpc message from the wire

2017-06-05T13:37:33+00:00

Currently socket is added back for future events after higher layers
(rpc, xlators etc) have processed the message. If message processing
involves signficant delay (as in writev replies processed by Erasure
Coding), performance takes hit. Hence this patch modifies
transport/socket to add back the socket for polling of events
immediately after reading the entire rpc message, but before
notification to higher layers.

credits: Thanks to "Kotresh Hiremath Ravishankar"
          for assitance in fixing a regression in
         bitrot caused by this patch.

>Reviewed-on: https://review.gluster.org/15036
>CentOS-regression: Gluster Build System 
>NetBSD-regression: NetBSD Build System 
>Smoke: Gluster Build System 
>Reviewed-by: Amar Tumballi 

Change-Id: I04b6b9d0b51a1cfb86ecac3c3d87a5f388cf5800
BUG: 1456259
Signed-off-by: Raghavendra G 
Reviewed-on: https://review.gluster.org/17391
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

rpc: fix a routine to destory RDMA qp(queue-pair)

2017-05-16T00:30:34+00:00

    This is backport of https://review.gluster.org/#/c/17249/

Problem: If an error has occured with rdma_create_id() in gf_rdma_connect(),
         process will jump to the 'unlock' label and then call gf_rdma_teardown()
         which call __gf_rdma_teardown().
         Presently, __gf_rdma_teardown() checks InifiniBand QP with peer->cm_id->qp!
         Unfortunately, cm_id is not allocated and will be crushed in this situation :)

Solution: If 'this->private->peer->cm_id' member is null, do not check
          'this->private->peer->cm_id->qp'.

> Change-Id: Ie321b8cf175ef4f1bdd9733d73840f03ddff8c3b
> BUG: 1449495
> Signed-off-by: Ji-Hyeon Gim 
> Reviewed-on: https://review.gluster.org/17249
> Reviewed-by: Amar Tumballi 
> Reviewed-by: Prashanth Pai 
> NetBSD-regression: NetBSD Build System 
> Tested-by: Ji-Hyeon Gim
> CentOS-regression: Gluster Build System 
> Smoke: Gluster Build System 
> Reviewed-by: Jeff Darcy 

(cherry picked from commit ccfa06767f1282d9a3783e37555515a63cc62e69)

Change-Id: Ie321b8cf175ef4f1bdd9733d73840f03ddff8c3b
BUG: 1450565
Signed-off-by: Ji-Hyeon Gim 
Reviewed-on: https://review.gluster.org/17282
Smoke: Gluster Build System 
Tested-by: Ji-Hyeon Gim
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Amar Tumballi

rpc: fix transport add/remove race on port probing

2017-05-12T13:41:06+00:00

Problem:
Spurious __gf_free() assertion failures seen all over the place with
header->magic being overwritten when running port probing tests with
'nmap'

Solution:
Fix sequence of:
1. add accept()ed socket connection fd to epoll set
2. add newly created rpc_transport_t object in RPCSVC service list

Correct sequence is #2 followed by #1.

Reason:
Adding new fd returned by accept() to epoll set causes an epoll_wait()
to return immediately with a POLLIN event. This races ahead to a readv()
which returms with errno:104 (Connection reset by peer) during port
probing using 'nmap'. The error is then handled by POLLERR code to
remove the new transport object from RPCSVC service list and later
unref and destroy the rpc transport object.
socket_server_event_handler() then catches up with registering the
unref'd/destroyed rpc transport object. This is later manifest as
assertion failures in __gf_free() with the header->magic field botched
due to invalid address references.
All this does not result in a Segmentation Fault since the address
space continues to be mapped into the process and pages still being
referenced elsewhere.

As a further note:
This race happens only in accept() codepath. Only in this codepath,
the notify will be referring to two transports:
1, listener transport and
2. newly accepted transport
All other notify refer to only one transport i.e., the transport/socket
on which the event is received. Since epoll is ONE_SHOT another event
won't arrive on the same socket till the current event is processed.
However, in the accept() codepath, the current event - ACCEPT - and the
new event - POLLIN/POLLER - arrive on two different sockets:
1. ACCEPT on listener socket and
2. POLLIN/POLLERR on newly registered socket.
Also, note that these two events are handled different thread contexts.

Cleanup:
Critical section in socket_server_event_handler() has been removed.
Instead, an additional ref on new_trans has been used to avoid ref/unref
race when notifying RPCSVC.

mainline:
> BUG: 1438966
> Signed-off-by: Milind Changire 
> Reviewed-on: https://review.gluster.org/17139
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Amar Tumballi 
> Reviewed-by: Oleksandr Natalenko 
> Reviewed-by: Jeff Darcy 
(cherry picked from commit 4f7ef3020edcc75cdeb22d8da8a1484f9db77ac9)

Change-Id: I4417924bc9e6277d24bd1a1c5bcb7445bcb226a3
BUG: 1449191
Signed-off-by: Milind Changire 
Reviewed-on: https://review.gluster.org/17218
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

Halo Replication feature for AFR translator

2017-05-08T05:37:07+00:00

	Backport of https://review.gluster.org/16177
		    https://review.gluster.org/17174

Merged both these patches to make sure IPV6 changes don't make it to 3.11 at all.

Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster.  Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.

In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.

There are a few new volume options due to this feature:
  halo-shd-latency:  The threshold below which self-heal daemons will
  consider children (bricks) connected.

  halo-nfsd-latency: The threshold below which NFS daemons will consider
  children (bricks) connected.

  halo-latency: The threshold below which all other clients will
  consider children (bricks) connected.

  halo-min-replicas: The minimum number of replicas which are to
  be enforced regardless of latency specified in the above 3 options.
  If the number of children falls below this threshold the next
  best (chosen by latency) shall be swapped in.

New FUSE mount options:
  halo-latency & halo-min-replicas: As descripted above.

This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.

Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
  the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
  Writes & deletes will be protected via entry-locks as usual preventing
  concurrent writes into files which are undergoing replication.  Read operations
  on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
  other regions.  The take away from this is care should be taken to ensure
  multiple writers do not write the same files resulting in a gfid split-brain
  which will require resolution via split-brain policies (majority, mtime &
  size).  Recommended method for preventing this is using the nfs-auth feature to
  define which region for each share has RW permissions, tiers not in the origin
  region should have RO perms.

TODO:
- Synchronous clients (including the SHD) should choose clients from their own
  region as preferred sources for reads.  Most of the plumbing is in place for
  this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
  (i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
  to synchronously write to

Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests

Reviewers: jackl, dph, cjh, meyering

Reviewed By: meyering

Subscribers: ethanr

Differential Revision: https://phabricator.fb.com/D1272053

Tasks: 4117827

 >Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
 >BUG: 1428061
 >Signed-off-by: Kevin Vigor 
 >Reviewed-on: http://review.gluster.org/16099
 >Reviewed-on: https://review.gluster.org/16177
 >Tested-by: Pranith Kumar Karampuri 
 >Smoke: Gluster Build System 
 >NetBSD-regression: NetBSD Build System 
 >CentOS-regression: Gluster Build System 
 >Reviewed-by: Pranith Kumar Karampuri 

BUG: 1448416
Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/17192
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kaushal M

glusterd: Fix removing pmap entry on rpc disconnect

2017-04-28T17:15:30+00:00

Problem:
The following line of code intended to remove pmap entry for the
connection during disconnects:

    pmap_registry_remove (this, 0, NULL, GF_PMAP_PORT_NONE, xprt);

However, no pmap entry will have it's type set to GF_PMAP_PORT_NONE
at any point in time. So a call to pmap_registry_search_by_xprt() in
pmap_registry_remove() will always fail to find a match.

Fix:
Optionally ignore pmap entry's type in pmap_registry_search_by_xprt().

BUG: 1193929
Change-Id: I705f101739ab1647ff52a92820d478354407264a
Signed-off-by: Prashanth Pai 
Reviewed-on: https://review.gluster.org/17129
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

glusterd: Add client details to get-state output

2017-04-13T03:43:08+00:00

This commit optionally adds client details corresponding to the
locally running bricks to the get-state output. Since getting
the client details involves sending RPC requests to the respective
local bricks, this is a relatively more costly operation. These
client details would be added to the get-state output only if the
get-state command is invoked with the 'detail' option.

This commit therefore also changes the get-state CLI usage. The
modified usage is as follows:

 # gluster get-state [] [[odir ] \
[file ]] [detail]

Change-Id: I42cd4ef160f9e96d55a08a10d32c8ba44e4cd3d8
BUG: 1431183
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17003
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

rpc: add options to manage socket keepalive lifespan

2017-04-12T09:44:01+00:00

Problem:
Default values for handling socket timeouts for brick responses are
insufficient for aggressive applications such as databases.

Solution:
Add 1:1 gluster options for keepalive, keepalive-idle,
keepalive-interval and keepalive-timeout as per the socket level options
available as per tcp(7) man page.

Default values for options are NOT agressive and continue to be values
which result in default timeout when only the keep alive option is
turned on.

These options are Linux specific and will not be applicable to the
*BSDs.

Change-Id: I2a08ecd949ca8ceb3e090d336ad634341e2dbf14
BUG: 1426059
Signed-off-by: Milind Changire 
Reviewed-on: https://review.gluster.org/16731
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G

build: place generated XDR .h and .c files under $(top_builddir)

2017-04-05T01:27:49+00:00

Change-Id: I0487337223a54a52e73088cb6dd812ce6d47178d
BUG: 1429696
Signed-off-by: Niels de Vos 
Reviewed-on: https://review.gluster.org/16994
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Zhou Zhengping 
Tested-by: Zhou Zhengping 
Reviewed-by: Kaleb KEITHLEY

glusterd : Disallow peer detach if snapshot bricks exist on it

2017-04-01T01:53:10+00:00

Problem :
- Deploy gluster on 2 nodes, one brick each, one volume replicated
- Create a snapshot
- Lose one server
- Add a replacement peer and new brick with a new IP address
- replace-brick the missing brick onto the new server
  (wait for replication to finish)
- peer detach the old server
- after doing above steps, glusterd fails to restart.

Solution:
  With the fix detach peer will populate an error : "N2 is part of
  existing snapshots. Remove those snapshots before proceeding".
  While doing so we force user to stay with that peer or to delete
  all snapshots.

Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c
BUG: 1322145
Signed-off-by: Gaurav Yadav 
Reviewed-on: https://review.gluster.org/16907
Smoke: Gluster Build System 
Reviewed-by: Atin Mukherjee 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System