glusterfs.git/tests/bugs/glusterd, branch release-7

glusterd: Brick process fails to come up with brickmux on

2020-03-17T06:04:40+00:00

Issue:
1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1,
vol2, vol3 with 3 bricks (one from each node)
2- Set cluster.brick-multiplex on
3- Start all 3 volumes
4- Check if all bricks on a node are running on same port
5- Kill N1
6- Set performance.readdir-ahead for volumes vol1, vol2, vol3
7- Bring N1 up and check volume status
8- All bricks processes not running on N1.

Root Cause -
Since, There is a diff in volfile versions in N1 as compared
to N2 and N3 therefore glusterd_import_friend_volume() is called.
glusterd_import_friend_volume() copies the new_volinfo and deletes
old_volinfo and then calls glusterd_start_bricks().
glusterd_start_bricks() looks for the volfiles and sends an rpc
request to glusterfs_handle_attach(). Now, since the volinfo
has been deleted by glusterd_delete_stale_volume()
from priv->volumes list before glusterd_start_bricks() and
glusterd_create_volfiles_and_notify_services() and
glusterd_list_add_order is called after glusterd_start_bricks(),
therefore the attach RPC req gets an empty volfile path
and that causes the brick to crash.

Fix- Call glusterd_list_add_order() and
glusterd_create_volfiles_and_notify_services before
glusterd_start_bricks() cal is made in glusterd_import_friend_volume

> Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
> Bug: bz#1773856
> Signed-off-by: Mohammed Rafi KC 
(cherry picked from commit 45e81aae791da9d013aba2286af44826227c05ec)

Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
fixes: bz#1808964
Signed-off-by: Sanju Rakonde

glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing

2019-08-26T06:35:31+00:00

Problem: sometime ./tests/bugs/glusterd/bug-1595320.t is failing  is
         failing at the time of checking brick_process after sending
         a kill signal to brick process

Solution: Wait sometime after just sending a kill signal to brick
          process to make sure brick process is stopped

> Change-Id: Iee9e91284618abfc62a550d47e4f9117785def58
> Fixes: bz#1743200
> Signed-off-by: Mohit Agrawal 
> (cherry picked from commit 8f1620ad7f5d3d040fee55c5f873349800e2268d)

Change-Id: Iee9e91284618abfc62a550d47e4f9117785def58
Fixes: bz#1745422
Signed-off-by: Mohit Agrawal

glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfile

2019-06-17T10:27:15+00:00

... with out which volume creation fails with "volume create: : failed:
Failed to create volume files"

Fixes: bz#1716812
Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb
Signed-off-by: Atin Mukherjee

glusterfsd/cleanup: Protect graph object under a lock

2019-05-31T11:27:37+00:00

While processing a cleanup_and_exit function, we are
accessing a graph object. But this has not been protected
under a lock. Because a parallel cleanup of a graph is quite
possible which might lead to an invalid memory access

Change-Id: Id05ca70d5b57e172b0401d07b6a1f5386c044e79
fixes: bz#1708926
Signed-off-by: Mohammed Rafi KC

glusterd: bulkvoldict thread is not handling all volumes

2019-05-27T14:58:45+00:00

Problem: In commit ac70f66c5805e10b3a1072bd467918730c0aeeb4 I
         missed one condition to populate volume dictionary in
         multiple threads while brick_multiplex is enabled.Due
         to that glusterd is not sending volume dictionary for
         all volumes to peer.

Solution: Update the condition in code as well as update test case
          also to avoid the issue

Change-Id: I06522dbdfee4f7e995d9cc7b7098fdf35340dc52
fixes: bz#1711250
Signed-off-by: Mohit Agrawal

glusterd: Add gluster volume stop operation to glusterd_validate_quorum()

2019-05-11T04:14:47+00:00

ISSUE: gluster volume stop succeeds even if quorum is not met.

Fix: Add GD_OP_STOP_VOLUME to gluster_validate_quorum in 
glusterd_mgmt_v3_pre_validate ().

Since the volume stop command has been ported from synctask to mgmt_v3,
the quorum check was missed out.

Change-Id: I7a634ad89ec2e286ea262d7952061efad5360042
fixes: bz#1690753
Signed-off-by: Vishal Pandey

shd/glusterd: Serialize shd manager to prevent race condition

2019-05-10T14:19:29+00:00

At the time of a glusterd restart, while doing a handshake
there is a possibility that multiple shd manager might get
executed. Because of this, there is a chance that multiple
shd get spawned during a glusterd restart

Change-Id: Ie20798441e07d7d7a93b7d38dfb924cea178a920
fixes: bz#1707081
Signed-off-by: Mohammed Rafi KC

glusterd: define dumpops in the xlator_api of glusterd

2019-04-27T13:36:11+00:00

Problem: statedump is not capturing information related to glusterd

Solution: statdump is not capturing glusterd info because
trav->dumpops is null in gf_proc_dump_single_xlator_info ()
where trav is glusterd xlator object. trav->dumpops is null
because we missed to define dumpops in xlator_api of glusterd.
defining dumpops in xlator_api of glusterd fixes the issue.

fixes: bz#1703629
Change-Id: If85429ecb1ef580aced8d5b88d09fc15258bfc4c
Signed-off-by: Sanju Rakonde

glusterd: Optimize glusterd handshaking code path

2019-04-15T15:20:50+00:00

Problem: At the time of handshaking glusterd populate volume
         data in a dictionary.While no. of volumes are configured
         more than 1500 glusterd takes more than 10 min to generated
         the data.Due to taking more time rpc request times out and
         rpc start bailing of call frames.

Solution: To optimize the code done below changes
          1) Spawn multiple threads to populate volumes data in bulk
             in separate dictionary and introduce an option
             glusterd.brick-dict-thread-count to configure no. of threads
             to populate volume data.
          2) Populate tier data only while volume type is tier
          3) Compare snap data only while snap_count is non zero

Fixes: bz#1699339
Change-Id: I38dc71970c049217f9d1a06fc0aaf4c26eab18f5
Signed-off-by: Mohit Agrawal

core: Log level changes do not effect on running client process

2019-04-15T04:30:43+00:00

Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level
         per xlator during reconfigure only for a brick process not for
         the client process.

Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure
             about brick multiplex introudce a flag brick_mux at ctx->cmd_args.

Note: There are two other changes done with this patch
      1) Ignore client-log-level option to attach a brick with
         already running brick if brick_mux is enabled
      2) Add a log to print pid of the running process to make easier
         debugging

Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9
Signed-off-by: Mohit Agrawal 
Fixes: bz#1696046