glusterfs.git/rpc/rpc-lib, branch v3.9.0

Revert "rpc: Fix the race between notification and reconnection"

2016-11-12T05:06:49+00:00

This reverts commit e6c38ae1d3f3c53f8739ab2db7c4ecfdbc58fc44.

Mount is intermittently failing with this patch, which means
this patch introduced another race. So until we fix that
reverting this patch and making the release.

BUG: 1388323
Change-Id: I6e110caf38fcc6a309b2abdc864bc4fbdb3a7588
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15825
Reviewed-by: Raghavendra Talur 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

rpc: Fix the race between notification and reconnection

2016-10-25T11:00:46+00:00

Problem:
There was a hang because unlock on an entry failed with
ENOTCONN.
Client thinks the connection is down where as server thinks
the connection is up.

This is the race we are seeing:
1) Connection from client to the brick disconnects.
2) Saved frames unwind is called which unwinds all
   frames that were wound before disconnect.
3) connection from client to the brick happens and
   setvolume.
4) Disconnect notification for the connection in 1)
   comes now and calls client_rpc_notify() which
   marks the connection to be offline even when the
   connection is up.

This is happening because I/O can retrigger connection
before disconnect notification is sent to the higher
layers in rpc.

Fix:
Notify the higher layers that a disconnect happened and then
go ahead with reconnect logic.

For the logs which point to the information above check:
https://bugzilla.redhat.com/show_bug.cgi?id=1386626#c1

Thanks to Raghavendra G for suggesting the correct fix.

 >BUG: 1386626
 >Change-Id: I3c84ba1f17010bd69049fa88ec5f0ae431f8cda9
 >Signed-off-by: Pranith Kumar K 
 >Reviewed-on: http://review.gluster.org/15681
 >NetBSD-regression: NetBSD Build System 
 >Reviewed-by: Niels de Vos 
 >CentOS-regression: Gluster Build System 
 >Smoke: Gluster Build System 
 >Reviewed-by: Raghavendra G 
 >(cherry picked from commit a6b63e11b7758cf1bfcb67985e25ec02845f0995)

Change-Id: Ifa721193c26b70e26b47b7698c077da0ad5f2e1d
BUG: 1388323
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15717
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

compound fops: Fix file corruption issue

2016-10-25T04:14:54+00:00

        Backport of: http://review.gluster.org/#/c/15654/

1. Address of a local variable @args is copied into state->req
in server3_3_compound (). But even after the function has gone out of
scope, in server_compound_resume () this pointer is accessed and
dereferenced. This patch fixes that.

2. Compound fops, by virtue of NOT having a vector sizer (like the one
writev has), ends up having both the header and the data (in case one of
its member fops is WRITEV) in the same hdr_iobuf. This buffer was not
being preserved through the lifetime of the compound fop, causing it to
be overwritten by a parallel write fop, even when the writev associated
with the currently executing compound fop is yet to hit the desk, thereby
corrupting the file's data. This is fixed by associating the hdr_iobuf with
the iobref so its memory remains valid through the lifetime of the fop.

3. Also fixed a use-after-free bug in protocol/client in compound fops cbk,
missed by Linux but caught by NetBSD.

Finally, big thanks to Pranith Kumar K and Raghavendra Gowdappa for their
help in debugging this file corruption issue.

Change-Id: I58da39ae544ad81192849926399a971c4c01c986
BUG: 1387984
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15709
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

rpc: increase RPC/XID with each callback

2016-09-21T03:11:47+00:00

The RPC/XID for callbacks has been hardcoded to GF_UNIVERSAL_ANSWER. In
Wireshark these RPC-calls are marked as "RPC retransmissions" because of
the repeating RPC/XID. This is most confusing when verifying the
callbacks that the upcall framework sends. There is no way to see the
difference between real retransmissions and new callbacks.

This change was verified by create and removal of files through
different Gluster clients. The RPC/XID is increased on a per connection
(or client) base. The expectations of the RPC protocol are met this way.

> Change-Id: I2116bec0e294df4046d168d8bcbba011284cd0b2
> BUG: 1377097
> Signed-off-by: Niels de Vos 
> Reviewed-on: http://review.gluster.org/15524
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 
(cherry picked from commit e9b39527d5dcfba95c4c52a522c8ce1f4512ac21)

Change-Id: I2116bec0e294df4046d168d8bcbba011284cd0b2
BUG: 1377288
Signed-off-by: Niels de Vos 
Reviewed-on: http://review.gluster.org/15527
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

glusterd : Introduce reset brick

2016-08-30T02:55:53+00:00

The command basically allows replace brick with src and
dst bricks as same.

Usage:
gluster v reset-brick   start
This command kills the brick to be reset. Once this command is run,
admin can do other manual operations that they need to do,
like configuring some options for the brick. Once this is done,
resetting the brick can be continued with the following options.

gluster v reset-brick    commit {force}

Does the job of resetting the brick. 'force' option should be used
when the brick already contains volinfo id.

Problem: On doing a disk-replacement of a brick in a replicate volume
the following 2 scenarios may occur :

a) there is a chance that reads are served from this replaced-disk brick,
which leads to empty reads. b) potential data loss if next writes succeed
only on replaced brick, and heal is done to other bricks from this one.

Solution: After disk-replacement, make sure that reset-brick command is
run for that brick so that pending markers are set for the brick and it
is not chosen as source for reads and heal. But, as of now replace-brick
for the same brick-path is not allowed. In order to fix the above
mentioned problem, same brick-path replace-brick is needed.
With this patch reset-brick commit {force} will be allowed even when
source and destination  are identical as long as
1) destination brick is not alive
2) source and destination brick have the same brick uuid and path.
Also, the destination brick after replace-brick will use the same port
as the source brick.

Change-Id: I440b9e892ffb781ea4b8563688c3f85c7a7c89de
BUG: 1266876
Signed-off-by: Anuradha Talur 
Reviewed-on: http://review.gluster.org/12250
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ashish Pandey 
Reviewed-by: Pranith Kumar Karampuri

rpc: fix unused variable warnings/errors

2016-08-29T12:00:49+00:00

http://review.gluster.org/14085 fixes a/the "leak" - via the
generated rpc/xdr headers - of pragmas that mask these warnings.

However 14085 won't pass the smoke test until all the warnings are
fixed.

Change-Id: I20d91091bee0bf8f198a307ebba4b284bc3817ff
BUG: 1369124
Signed-off-by: Kaleb S. KEITHLEY 
Reviewed-on: http://review.gluster.org/15240
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System

glusterd/cli: cli to get local state representation from glusterd

2016-08-26T15:23:37+00:00

Currently there is no existing CLI that can be used to get the
local state representation of the cluster as maintained in glusterd
in a readable as well as parseable format.

The CLI added has the following usage:

 # gluster get-state [daemon] [odir ] [file ]

This would dump data points that reflect the local state
representation of the cluster as maintained in glusterd (no other
daemons are supported as of now) to a file inside the specified
output directory. The default output directory and filename is
/var/run/gluster and glusterd_state_ respectively. The
option for specifying the daemon name leaves room to add support for
other daemons in the future. Following are the data points captured
as of now to represent the state from the local glusterd pov:

 * Peer:
    - Primary hostname
    - uuid
    - state
    - connection status
    - List of hostnames

 * Volumes:
    - name, id, transport type, status
    - counts: bricks, snap, subvol, stripe, arbiter, disperse,
 redundancy
    - snapd status
    - quorum status
    - tiering related information
    - rebalance status
    - replace bricks status
    - snapshots

 * Bricks:
    - Path, hostname (for all bricks these info will be shown)
    - port, rdma port, status, mount options, filesystem type and
signed in status for bricks running locally.

 * Services:
    - name, online status for initialised services

 * Others:
    - Base port, last allocated port
    - op-version
    - MYUUID

Change-Id: I4a45cc5407ab92d8afdbbd2098ece851f7e3d618
BUG: 1353156
Signed-off-by: Samikshan Bairagya 
Reviewed-on: http://review.gluster.org/14873
Reviewed-by: Avra Sengupta 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Atin Mukherjee

rpc: fixing illegal memory access

2016-08-11T13:19:58+00:00

CID:1357876

Change-Id: I34eee0bf0367f963ddf6e9d6b28b7d3a9af12c90
BUG: 789278
Signed-off-by: Muthu-vigneshwaran 
Reviewed-on: http://review.gluster.org/15094
Tested-by: Muthu Vigneshwaran
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Jeff Darcy

changelog/rpc: Fix rpc_clnt_t mem leaks

2016-07-22T15:12:52+00:00

PROBLEM:
   1. Freeing up rpc_clnt object might lead to crashes. Well,
      it was not a necessity to free rpc-clnt object till now
      because all the existing use cases needs to reconnect
      back on disconnects. Hence timer code was not taking
      ref on rpc-clnt object.

      Glusterd had some use-cases that led to crash due to
      ping-timer and they fixed only those code paths that
      involve ping-timer.

      Now, since changelog has an use-case where rpc-clnt
      need to be freed up, we need to fix timer code to take
      refs

   2. In changelog, because of issue 1, only mydata was being
      freed which is incorrect. And there are races where
      rpc-clnt object would access the freed mydata which
      would lead to crashes.

      Since changelog xlator resides on brick side and is long
      living process, if multiple libgfchangelog consumers
      register to changelog and disconnect/reconnect mulitple
      times, it would result in leak of 'rpc-clnt' object
      for every connect/disconnect.

SOLUTION:
   1. Handle ref/unref of 'rpc_clnt' structure in timer
      functions properly.
   2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT
      after disabling timers and free mydata on RPC_CLNT_DESTROY.

RPC SETUP IN CHANGELOG:
   1. changelog xlator initiates rpc server say 'changelog_rpc_server'
   2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server'
   3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server'
   4. In return changelog_rpc_server initiates a rpc client and connects back
      to 'libgfchangelog_rpc_server'

REF/UNREF HANDLING IN TIMER FUNCTIONS:
Let's say rpc clnt refcount = 1
   1. Take the ref before reigstering callback to timer queue
           >>>>  rpc_clnt_ref (say ref count becomes = 2)
   2. Register a callback to timer say 'callback1'
   3. If register fails:
           >>>> rpc_clnt_unref (ref count = 1)
   4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end
      in 'callback1'. This is corresponding to ref taken in step 1
           >>>> rpc_clnt_unref (ref count = 1)
   5. The cycle from step-1 to step-4 continues....until timer cancel event happens
   6. timer cancel of say 'callback1'
           If timer cancel fails:
                 Do nothing, Step-4 would have unrefd
           If timer cancel succeeds:
                 >>>> rpc_clnt_unref (ref count = 1)

Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09
BUG: 1316178
Signed-off-by: Kotresh HR 
Reviewed-on: http://review.gluster.org/13658
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

rpc: fix several problems in failure handle logic

2016-07-14T21:14:28+00:00

Once dynstr is set into a dict by function dict_set_dynstr, its free
operation will be called by this dict when the dict is destroyed.

Signed-off-by: Zhou Zhengping 

Change-Id: Idd2bd19a041bcb477e1c897428ca1740fb75c5f3
BUG: 1354141
Reviewed-on: http://review.gluster.org/14882
Tested-by: Zhou Zhengping 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur 
Smoke: Gluster Build System