glusterfs.git/tests/bugs/glusterd, branch release-3.11

libglusterfs : Fix crash in glusterd while peer probing

2017-06-13T14:20:32+00:00

glusterd crashes when port is being set explcitly to a
range which is outside greater than short data type range.
Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
In above case glusterd crashes while parsing the port.

With this fix glusterd will be able to handle port range
between INT_MIN to INT_MAX

> Reviewed-on: https://review.gluster.org/17359
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Samikshan Bairagya 
> Reviewed-by: Atin Mukherjee 
> Reviewed-by: Niels de Vos 
> Reviewed-by: Jeff Darcy 
Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
BUG: 1459759
Signed-off-by: Gaurav Yadav 
Reviewed-on: https://review.gluster.org/17496
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Samikshan Bairagya 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: Don't spawn new glusterfsds on node reboot with brick-mux

2017-05-22T14:17:23+00:00

With brick multiplexing enabled, upon a node reboot new bricks were
not being attached to the first spawned brick process even though
there wasn't any compatibility issues.

The reason for this is that upon glusterd restart after a node
reboot, since brick services aren't running, glusterd starts the
bricks in a "no-wait" mode. So after a brick process is spawned for
the first brick, there isn't enough time for the corresponding pid
file to get populated with a value before the compatibilty check is
made for the next brick.

This commit solves this by iteratively waiting for the pidfile to be
populated in the brick compatibility comparison stage before checking
if the brick process is alive.

> Reviewed-on: https://review.gluster.org/17307
> Reviewed-by: Atin Mukherjee 
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 

(cherry picked from commit 13e7b3b354a252ad4065f7b2f0f805c40a3c5d18)

Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4
BUG: 1453086
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17351
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

Fixes quota aux mount failure

2017-05-17T23:26:20+00:00

The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

>Reviewed-on: https://review.gluster.org/16938
>NetBSD-regression: NetBSD Build System 
>Smoke: Gluster Build System 
>Reviewed-by: Manikandan Selvaganesh 
>CentOS-regression: Gluster Build System 
>Reviewed-by: Raghavendra G 
>Reviewed-by: Atin Mukherjee 
>(cherry picked from commit 2ae4b4058691b324535d802f4e6d24cce89a10e5)

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1449775
Signed-off-by: Sanoj Unnikrishnan 
Reviewed-on: https://review.gluster.org/17240
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

glusterd: Make reset-brick work correctly if brick-mux is on

2017-05-16T00:29:48+00:00

Reset brick currently kills of the corresponding brick process.
However, with brick multiplexing enabled, stopping the brick
process would render all bricks attached to it unavailable. To
handle this correctly, we need to make sure that the brick process
is terminated only if brick-multiplexing is disabled. Otherwise,
we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
brick process to detach the brick that is to be reset.

> Reviewed-on: https://review.gluster.org/17128
> Smoke: Gluster Build System 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 

(cherry picked from commit 74383e3ec6f8244b3de9bf14016452498c1ddcf0)

Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
BUG: 1449933
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17245
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

glusterd: socketfile & pidfile related fixes for brick multiplexing feature

2017-05-10T14:05:52+00:00

Problem: While brick-muliplexing is on after restarting glusterd, CLI is
         not showing pid of all brick processes in all volumes.

Solution: While brick-mux is on all local brick process communicated through one
          UNIX socket but as per current code (glusterd_brick_start) it is trying
          to communicate with separate UNIX socket for each volume which is populated
          based on brick-name and vol-name.Because of multiplexing design only one
          UNIX socket is opened so it is throwing poller error and not able to
          fetch correct status of brick process through cli process.
          To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
          that will call by glusterd_brick_start to validate about the existence of socketpath.
          To avoid the continuous EPOLLERR erros in  logs update socket_connect code.

Test:     To reproduce the issue followed below steps
          1) Create two distributed volumes(dist1 and dist2)
          2) Set cluster.brick-multiplex is on
          3) kill glusterd
          4) run command gluster v status
          After apply the patch it shows correct pid for all volumes

> BUG: 1444596
> Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
> Signed-off-by: Mohit Agrawal 
> Reviewed-on: https://review.gluster.org/17101
> Smoke: Gluster Build System 
> Reviewed-by: Prashanth Pai 
> NetBSD-regression: NetBSD Build System 
> CentOS-regression: Gluster Build System 
> Reviewed-by: Atin Mukherjee 
> (cherry picked from commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6)

Change-Id: Ia95b9d36e50566b293a8d6350f8316dafc27033b
BUG: 1449004
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/17212
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Prashanth Pai 
CentOS-regression: Gluster Build System

glusterd : Disallow peer detach if snapshot bricks exist on it

2017-04-01T01:53:10+00:00

Problem :
- Deploy gluster on 2 nodes, one brick each, one volume replicated
- Create a snapshot
- Lose one server
- Add a replacement peer and new brick with a new IP address
- replace-brick the missing brick onto the new server
  (wait for replication to finish)
- peer detach the old server
- after doing above steps, glusterd fails to restart.

Solution:
  With the fix detach peer will populate an error : "N2 is part of
  existing snapshots. Remove those snapshots before proceeding".
  While doing so we force user to stay with that peer or to delete
  all snapshots.

Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c
BUG: 1322145
Signed-off-by: Gaurav Yadav 
Reviewed-on: https://review.gluster.org/16907
Smoke: Gluster Build System 
Reviewed-by: Atin Mukherjee 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanup

2017-03-20T23:34:16+00:00

Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1433578
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16914
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Milind Changire 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

TESTS/TIER: bug-1303028-Rebalance-glusterd-rpc-connection-issue.t

2017-02-23T11:50:21+00:00

PROBLEM: spurious failure of the test.

CAUSE: the function "rebalance_run_time" calculates the total time
the tier has been running for. this being a test case, the run time
of tier can be 0 and when the function adds up zero it results in
zero. and thus it starts to fail.

FIX: Give it some time for the function to add up the values.

Signed-off-by: hari gowtham 

Change-Id: Ie270f3f3c8942081cca85dc49ef8fec76f3a261a
BUG: 1425743
Reviewed-on: https://review.gluster.org/16711
Smoke: Gluster Build System 
Tested-by: hari gowtham 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: ignore return code of glusterd_restart_bricks

2017-02-09T16:45:59+00:00

When GlusterD is restarted on a multi node cluster, while syncing the
global options from other GlusterD, it checks for quorum and based on
which it decides whether to stop/start a brick. However we handle the
return code of this function in which case if we don't want to start any
bricks the ret will be non zero and we will end up failing the import
which is incorrect.

Fix is just to ignore the ret code of glusterd_restart_bricks ()

Change-Id: I37766b0bba138d2e61d3c6034bd00e93ba43e553
BUG: 1420637
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16574
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Samikshan Bairagya 
Reviewed-by: Jeff Darcy

tests : turn off nfs.disable in bug-1238706-daemons-stop-on-peer-cleanup.t

2017-02-07T13:03:08+00:00

To validate this test and remove it from the list of bad tests, turn off
nfs.disable option so that nfs daemon can come up.

Change-Id: I8146c2d7f72ac53cac7e395dbb9e819d729eb6a9
BUG: 1257792
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16514
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Vijay Bellur