glusterfs.git/tests/bugs/glusterd, branch v4.0dev

glusterfs: Not able to mount running volume after enable brick mux and stopped any volume

2017-05-31T20:43:53+00:00

Problem: After enabled brick mux if any volume has down and then try ot run mount
         with running volume , mount command is hung.

Solution: After enable brick mux server has shared one data structure server_conf
          for all associated subvolumes.After down any subvolume in some
          ungraceful manner (remove brick directory) posix xlator sends
          GF_EVENT_CHILD_DOWN event to parent xlatros and server notify
          updates the child_up to false in server_conf.When client is trying
          to communicate with server through mount it checks conf->child_up
          and it is FALSE so it throws message "translator are not yet ready".
          From this patch updated structure server_conf to save child_up status
          for xlator wise. Another improtant correction from this patch is
          cleanup threads from server side xlators after stop the volume.

BUG: 1453977
Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/17356
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra Talur 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

libglusterfs : Fix crash in glusterd while peer probing

2017-05-26T12:08:36+00:00

glusterd crashes when port is being set explcitly to a
range which is outside greater than short data type range.
Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
In above case glusterd crashes while parsing the port.

With this fix glusterd will be able to handle port range
between INT_MIN to INT_MAX

Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
BUG: 1454418
Signed-off-by: Gaurav Yadav 
Reviewed-on: https://review.gluster.org/17359
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Samikshan Bairagya 
Reviewed-by: Atin Mukherjee 
Reviewed-by: Niels de Vos 
Reviewed-by: Jeff Darcy

glusterd: Don't spawn new glusterfsds on node reboot with brick-mux

2017-05-18T16:45:28+00:00

With brick multiplexing enabled, upon a node reboot new bricks were
not being attached to the first spawned brick process even though
there wasn't any compatibility issues.

The reason for this is that upon glusterd restart after a node
reboot, since brick services aren't running, glusterd starts the
bricks in a "no-wait" mode. So after a brick process is spawned for
the first brick, there isn't enough time for the corresponding pid
file to get populated with a value before the compatibilty check is
made for the next brick.

This commit solves this by iteratively waiting for the pidfile to be
populated in the brick compatibility comparison stage before checking
if the brick process is alive.

Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4
BUG: 1451248
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17307
Reviewed-by: Atin Mukherjee 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

glusterd: remove useless options from glusterd's volume set table

2017-05-17T12:31:11+00:00

	These options will cause brick's log complains:
_log_if_unknown_option] 0-patchy-quota: option 'timeout' is not recognized
_log_if_unknown_option] 0-patchy-server: option 'ping-timeout' is not recognized

Change-Id: Ida2add13f792736a4e52bfaf38d1169309283a3f
BUG: 1449008
Signed-off-by: Zhou Zhengping 
Reviewed-on: https://review.gluster.org/17213
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Prashanth Pai 
Reviewed-by: Jeff Darcy

glusterd: Make reset-brick work correctly if brick-mux is on

2017-05-10T18:58:21+00:00

Reset brick currently kills of the corresponding brick process.
However, with brick multiplexing enabled, stopping the brick
process would render all bricks attached to it unavailable. To
handle this correctly, we need to make sure that the brick process
is terminated only if brick-multiplexing is disabled. Otherwise,
we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
brick process to detach the brick that is to be reset.

Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
BUG: 1446172
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17128
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

glusterd: socketfile & pidfile related fixes for brick multiplexing feature

2017-05-09T01:30:01+00:00

Problem: While brick-muliplexing is on after restarting glusterd, CLI is
         not showing pid of all brick processes in all volumes.

Solution: While brick-mux is on all local brick process communicated through one
          UNIX socket but as per current code (glusterd_brick_start) it is trying
          to communicate with separate UNIX socket for each volume which is populated
          based on brick-name and vol-name.Because of multiplexing design only one
          UNIX socket is opened so it is throwing poller error and not able to
          fetch correct status of brick process through cli process.
          To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
          that will call by glusterd_brick_start to validate about the existence of socketpath.
          To avoid the continuous EPOLLERR erros in  logs update socket_connect code.

Test:     To reproduce the issue followed below steps
          1) Create two distributed volumes(dist1 and dist2)
          2) Set cluster.brick-multiplex is on
          3) kill glusterd
          4) run command gluster v status
          After apply the patch it shows correct pid for all volumes

BUG: 1444596
Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
Signed-off-by: Mohit Agrawal 
Reviewed-on: https://review.gluster.org/17101
Smoke: Gluster Build System 
Reviewed-by: Prashanth Pai 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

Fixes quota aux mount failure

2017-05-08T06:15:55+00:00

The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1433906
Signed-off-by: Sanoj Unnikrishnan 
Reviewed-on: https://review.gluster.org/16938
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Manikandan Selvaganesh 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Atin Mukherjee

glusterd : Disallow peer detach if snapshot bricks exist on it

2017-04-01T01:53:10+00:00

Problem :
- Deploy gluster on 2 nodes, one brick each, one volume replicated
- Create a snapshot
- Lose one server
- Add a replacement peer and new brick with a new IP address
- replace-brick the missing brick onto the new server
  (wait for replication to finish)
- peer detach the old server
- after doing above steps, glusterd fails to restart.

Solution:
  With the fix detach peer will populate an error : "N2 is part of
  existing snapshots. Remove those snapshots before proceeding".
  While doing so we force user to stay with that peer or to delete
  all snapshots.

Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c
BUG: 1322145
Signed-off-by: Gaurav Yadav 
Reviewed-on: https://review.gluster.org/16907
Smoke: Gluster Build System 
Reviewed-by: Atin Mukherjee 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanup

2017-03-20T23:34:16+00:00

Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1433578
Signed-off-by: Atin Mukherjee 
Reviewed-on: https://review.gluster.org/16914
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Milind Changire 
CentOS-regression: Gluster Build System 
Reviewed-by: Jeff Darcy

TESTS/TIER: bug-1303028-Rebalance-glusterd-rpc-connection-issue.t

2017-02-23T11:50:21+00:00

PROBLEM: spurious failure of the test.

CAUSE: the function "rebalance_run_time" calculates the total time
the tier has been running for. this being a test case, the run time
of tier can be 0 and when the function adds up zero it results in
zero. and thus it starts to fail.

FIX: Give it some time for the function to add up the values.

Signed-off-by: hari gowtham 

Change-Id: Ie270f3f3c8942081cca85dc49ef8fec76f3a261a
BUG: 1425743
Reviewed-on: https://review.gluster.org/16711
Smoke: Gluster Build System 
Tested-by: hari gowtham 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan