glusterfs.git/rpc/rpc-transport/socket/src/socket.c, branch v3.7.16

socket: log the client identifier in ssl connect

2016-08-04T05:10:26+00:00

Backport of commit d308fb5e152d8c908bf4f5da81f553fbe3d0400a

> Change-Id: I4b463ecafb66de16cbe7ed23fae800bb1204f829
> BUG: 1333912
> Signed-off-by: Raghavendra Bhat 
> Reviewed-on: http://review.gluster.org/14242
> Tested-by: Vijay Bellur 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Jeff Darcy 
> Smoke: Gluster Build System 
> (cherry picked from commit d308fb5e152d8c908bf4f5da81f553fbe3d0400a)

Change-Id: I2a57a206edab3e0c05ce28c299d78264c9a33d8b
BUG: 1351933
Signed-off-by: Mohit Agrawal 
Reviewed-on: http://review.gluster.org/14844
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
Reviewed-by: Atin Mukherjee

rpc/socket: pthread resources are not cleaned up

2016-07-27T10:12:58+00:00

A socket_connect failure creates a new pthread which
is not a detached thread. As no pthread_join is called,
the thread resources are not cleaned up causing a memory leak.

Now, socket_connect creates a detached thread to handle failure.

> Change-Id: Idbf25d312f91464ae20c97d501b628bfdec7cf0c
> BUG: 1343374
> Signed-off-by: N Balachandran 
> Reviewed-on: http://review.gluster.org/14875
> Smoke: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Jeff Darcy 
(cherry picked from commit 9886d568a7a8839bf3acc81cb1111fa372ac5270)

Change-Id: If0a65c50fef2a32148cf3a1d7992e63f044bf0ad
BUG: 1360553
Signed-off-by: N Balachandran 
Reviewed-on: http://review.gluster.org/15019
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Tested-by: Oleksandr Natalenko 
Reviewed-by: Raghavendra G

rpc/socket.c : Modify socket_poller code in case of ENODATA error code.

2016-07-26T17:07:28+00:00

Problem:  Polling failure errors are coming till volume is not come while
          SSL is enabled.

Solution: To avoid the message update one condition in socket_poller code
          It will not exit from thread in case of received ENODATA from
          ssl_do function.

Backport of commit 84e9fc2fb5fabf9d1e553a420854a306cdb8a168

> Change-Id: Ia514e99b279b07b372ee950f4368ac0d9c702d82
> BUG: 1349709
> Signed-off-by: Mohit Agrawal 
> Reviewed-on: http://review.gluster.org/14786
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Jeff Darcy 
> (cherry picked from commit 84e9fc2fb5fabf9d1e553a420854a306cdb8a168)

BUG: 1359651
Signed-off-by: Mohit Agrawal 

Change-Id: I86aa9955eca13d23120ba17b787f619c7de6be0c
Reviewed-on: http://review.gluster.org/14998
Tested-by: MOHIT AGRAWAL 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
Reviewed-by: Jeff Darcy

rpc/socket.c: Modify approach to cleanup threads of socket_poller in socket_spawn.

2016-07-26T11:42:18+00:00

Problem: Current approach to cleanup threads of socket_poller is not appropriate.

Solution: Enable detach flag at the time of thread creation in socket_spawn.

Fix: Write a new wrapper(gf_create_detach_thread) to create detachable thread
         instead of store thread ids in a queue.

Test: Fix is verfied on gluster process, To test the patch followed below
      procedure
      Enable the client.ssl and server.ssl option on the volume
      Start the volume and count anon segment in pmap output for glusterd process
      pmap -x  | grep "\[ anon \]" | wc -l
      Stop the volume and check again count of anon segment it should not increase.

Backport of commit 2ee48474be32f6ead2f3834677fee89d88348382

> Signed-off-by: Mohit Agrawal 
> Change-Id: Ib8f7ec7504ec8f6f74b45ce6719b6fb47f9fdc37
> BUG: 1336508
> Reviewed-on: http://review.gluster.org/14694
> Smoke: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> CentOS-regression: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Jeff Darcy 

BUG: 1354394
Change-Id: I271e83e7a210ecd27a7471c53147ceb837a33cad
Signed-off-by: Mohit Agrawal 
Reviewed-on: http://review.gluster.org/14886
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G

rpc: invalid argument when function setsockopt sets option TCP_USER_TIMEOUT

2016-07-12T12:07:16+00:00

If option "transport.tcp-user-timeout" hasn't been setted, glusterd's
priv->timeout will be -1, which will cause invalid argument when
set TCP_USER_TIMEOUT.

Cherry picked from commit b2c73cbf423de6201f956f522b7429615c88869d:
> Change-Id: Ibc16264ceac0e69ab4a217ffa27c549b9fa21df9
> BUG: 1349657
> Signed-off-by: Zhou Zhengping 
> Reviewed-on: http://review.gluster.org/14785
> CentOS-regression: Gluster Build System 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Reviewed-by: Jeff Darcy 

Change-Id: Ibc16264ceac0e69ab4a217ffa27c549b9fa21df9
BUG: 1354404
Signed-off-by: Niels de Vos 
Reviewed-on: http://review.gluster.org/14889
Smoke: Gluster Build System 
Reviewed-by: Zhou Zhengping 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

socket: Fix incorrect handling of partial reads

2016-05-12T05:17:39+00:00

The usage of function local variables in the protocol state
machine caused an incorrect behaviour when a partial read
from the socket forced the function to return and restart
later when more data was available. At this point the local
variables contained incorrect data.

> Change-Id: I4db1f4ef5c46a3d2d7f7c5328e906188c3af49e6
> BUG: 1334285
> Signed-off-by: Xavier Hernandez 
> Reviewed-on: http://review.gluster.org/14270
> Reviewed-by: Raghavendra G 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> Tested-by: Raghavendra G 
> CentOS-regression: Gluster Build System 

Change-Id: I0465969f27a38912a1b2cd50f5c8ae61bc782e8c
BUG: 1331502
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/14292
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G 
CentOS-regression: Gluster Build System

glusterd: add defence mechanism to avoid brick port clashes

2016-05-05T03:23:19+00:00

Intro:
Currently glusterd maintain the portmap registry which contains ports that
are free to use between 49152 - 65535, this registry is initialized
once, and updated accordingly as an then when glusterd sees they are been
used.

Glusterd first checks for a port within the portmap registry and gets a FREE
port marked in it, then checks if that port is currently free using a connect()
function then passes it to brick process which have to bind on it.

Problem:
We see that there is a time gap between glusterd checking the port with
connect() and brick process actually binding on it. In this time gap it could
be so possible that any process would have occupied this port because of which
brick will fail to bind and exit.

Case 1:
To avoid the gluster client process occupying the port supplied by glusterd :

we have separated the client port map range with brick port map range more @
http://review.gluster.org/#/c/13998/

Case 2: (Handled by this patch)
To avoid the other foreign process occupying the port supplied by glusterd :

To handle above situation this patch implements a mechanism to return EADDRINUSE
error code to glusterd, upon which a new port is allocated and try to restart
the brick process with the newly allocated port.

Note: Incase of glusterd restarts i.e. runner_run_nowait() there is no way to
handle Case 2, becuase runner_run_nowait() will not wait to get the return/exit
code of the executed command (brick process). Hence as of now in such case,
we cannot know with what error the brick has failed to connect.

This patch also fix the runner_end() to perform some cleanup w.r.t
return values.

Backport of:
> Change-Id: Iec52e7f5d87ce938d173f8ef16aa77fd573f2c5e
> BUG: 1322805
> Signed-off-by: Prasanna Kumar Kalever 
> Reviewed-on: http://review.gluster.org/14043
> Tested-by: Prasanna Kumar Kalever 
> Reviewed-by: Atin Mukherjee 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Raghavendra G 
> Signed-off-by: Prasanna Kumar Kalever 

Change-Id: Ief247b4d4538c1ca03e73aa31beb5fa99853afd6
BUG: 1323564
Signed-off-by: Prasanna Kumar Kalever 
Reviewed-on: http://review.gluster.org/14208
Tested-by: Prasanna Kumar Kalever 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

socket: Reap own-threads

2016-05-03T10:10:55+00:00

  Backport of f8948e2 from master

Dead own-threads are reaped periodically (currently every minute). This
helps avoid memory being leaked, and should help prevent memory
starvation issues with GlusterD.

Change-Id: Ifb3442a91891b164655bb2aa72210b13cee31599
BUG: 1268125
Signed-off-by: Kaushal M 
Reviewed-originally-on: http://review.gluster.org/14101
Reviewed-on: http://review.gluster.org/14143
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Jeff Darcy

socket: Don't cleanup encrypted transport in socket_connect()

2016-04-10T04:54:47+00:00

..instead cleanup only in socket_poller()

  Backport of be99ddd from master

With commit d117466 socket_poller() wasn't launched from socket_connect
(for encrypted connections), if connect() failed. This was done to
prevent the socket private data from being double unreffed, from the
cleanups in both socket_poller() and socket_connect(). This allowed
future reconnects to happen successfully.

If a socket reconnects is sort of decided by the rpc notify function
registered. The above change worked with glusterd, as the glusterd rpc
notify function (glusterd_peer_rpc_notify()) continuously allowed
reconnects on failure.

mgmt_rpc_notify(), the rpc notify function in glusterfsd, behaves
differently.

For a DISCONNECT event, if more volfile servers are available or if more
addresses are available in the dns cache, it allows reconnects. If not
it terminates the program.

For a CONNECT event, it attempts to do a volfile fetch rpc request. If
sending this rpc fails, it immediately terminates the program.

One side effect of commit d117466, was that the encrypted socket was
registered with epoll, unintentionally, on a connect failure.  A weird
thing happens because of this. The epoll notifier notifies
mgmt_rpc_notify() of a CONNECT event, instead of a DISCONNECT as
expected. This causes mgmt_rpc_notify() to attempt an unsuccessful
volfile fetch rpc request, and terminate.
(I still don't know why the epoll raises the CONNECT event)

Commit 46bd29e fixed some issues with IPv6 in GlusterFS. This caused
address resolution in GlusterFS to also request of IPv6 addresses
(AF_UNSPEC) instead of just IPv4. On most systems, this causes the IPv6
addresses to be returned first.

GlusterD listens on 0.0.0.0:24007 by default. While this attaches to all
interfaces, it only listens on IPv4 addresses. GlusterFS daemons and
bricks are given 'localhost' as the volfile server. This resolves to
'::1' as the first address.

When using management encryption, the above reasons cause the daemon
processes to fail to fetch volfiles and terminate.

Solution
--------
The solution to this is simple. Instead of cleaning up the encrypted
socket in socket_connect(), launch socket_poller() and let it cleanup
the socket instead. This prevents the unintentional registration with
epoll, and socket_poller() sends the correct events to the rpc notify
functions, which allows proper reconnects to happen.

Change-Id: Idb0c0a828743cccca51cfdd1aa6458cfa0a9d100
BUG: 1325491
Signed-off-by: Kaushal M 
Reviewed-on: http://review.gluster.org/13931
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Tested-by: Gluster Build System 
CentOS-regression: Gluster Build System

socket: Launch socket_poller only if connect succeeded

2016-03-09T09:10:06+00:00

  Backport of 92abe07 from master

For an encrypted connection, sockect_connect() used to launch
socket_poller() in it's own thread (ON by default), even if the connect
failed. This would cause two unrefs to be done on the transport, once in
socket_poller() and once in socket_connect(), causing the transport to
be freed and cleaned up. This would cause further reconnect attempts
from failing as the transport wouldn't be available.

By starting socket_poller() only if connect succeeded, this is avoided.

BUG: 1314641
Change-Id: Ifd1bc4d48a8bdf741e32d02bdbac91530e0e8111
Signed-off-by: Kaushal M 
Originally-reviewed-on: http://review.gluster.org/13554
Reviewed-on: http://review.gluster.org/13604
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Vijay Bellur 
CentOS-regression: Gluster Build System