glusterfs.git/xlators/mgmt/glusterd/src/glusterd.h, branch v7.0rc2

glusterd/thin-arbiter: Thin-arbiter integration with GD1

2019-07-04T07:42:11+00:00

gluster volume create  replica 2 thin-arbiter 1 : :
: [force]

The changes have been made in a way that the last brick in the bricks list
will be treated as the thin-arbiter.
GD1 will be manipulated to consider replica count to be as 2 and continue creating the
volume like any other replica 2 volume but since thin-arbiter volumes need ta-brick
client xlator entries for each subvolume in fuse volfile, volfile generation is
modified in a way to inject these entries seperately in the volfile for every subvolume.

Few more additions -
1- Save the volinfo with new fields ta_bricks list and thin_arbiter_count.
2- Introduce a new option client.ta-brick-port to add remote-port to ta-brick xlator entry
   in fuse volfiles. The option can be set using the following CLI syntax -
   gluster volume set  client.ta-brick-port 
3- Volume Info will contain a Thin-Arbiter-path entry to distinguish
   from other replicate volumes.

Change-Id: Ib434e2313b29716f32476c6c211d282c4ef39406
Updates #687
Signed-off-by: Vishal Pandey 
(cherry picked from commit 9b223b15ab69fce4076de036ee162f36a058bcd2)

glusterd-volgen.c: remove BD xlator from the graph

2019-06-18T12:09:09+00:00

The BD xlator was removed some time ago. Remove it from the graph.
We can also remove the caps settings - only the BD xlator
was using it.

Lastly, remove the caps (which only BD was using) and the document
describing the translator.

Change-Id: Id0adcb2952f4832a5dc6301e726874522e07935d
updates: bz#1193929
Signed-off-by: Yaniv Kaul

glusterd: Optimize code to copy dictionary in handshake code path

2019-05-31T14:20:25+00:00

Problem: While high no. of volumes are configured around 2000
         glusterd has bottleneck during handshake at the time
         of copying dictionary

Solution: To avoid the bottleneck serialize a dictionary instead
          of copying key-value pair one by one

Change-Id: I9fb332f432e4f915bc3af8dcab38bed26bda2b9a
fixes: bz#1711297
Signed-off-by: Mohit Agrawal

glusterd/tier: remove tier related code from glusterd

2019-05-27T07:50:24+00:00

The handler functions are pointed to dummy functions.
The switch case handling for tier also have been moved to
point default case to avoid issues, if reintroduced.

The tier changes in DHT still remain as such.

updates: bz#1693692

Change-Id: I80d80c9a3eb862b4440a36b31ae82b2e9d92e4dc
Signed-off-by: Hari Gowtham

shd/glusterd: Serialize shd manager to prevent race condition

2019-05-10T14:19:29+00:00

At the time of a glusterd restart, while doing a handshake
there is a possibility that multiple shd manager might get
executed. Because of this, there is a chance that multiple
shd get spawned during a glusterd restart

Change-Id: Ie20798441e07d7d7a93b7d38dfb924cea178a920
fixes: bz#1707081
Signed-off-by: Mohammed Rafi KC

glusterd: Optimize glusterd handshaking code path

2019-04-15T15:20:50+00:00

Problem: At the time of handshaking glusterd populate volume
         data in a dictionary.While no. of volumes are configured
         more than 1500 glusterd takes more than 10 min to generated
         the data.Due to taking more time rpc request times out and
         rpc start bailing of call frames.

Solution: To optimize the code done below changes
          1) Spawn multiple threads to populate volumes data in bulk
             in separate dictionary and introduce an option
             glusterd.brick-dict-thread-count to configure no. of threads
             to populate volume data.
          2) Populate tier data only while volume type is tier
          3) Compare snap data only while snap_count is non zero

Fixes: bz#1699339
Change-Id: I38dc71970c049217f9d1a06fc0aaf4c26eab18f5
Signed-off-by: Mohit Agrawal

mgmt/shd: Implement multiplexing in self heal daemon

2019-04-01T03:44:23+00:00

Problem:

Shd daemon is per node, which means they create a graph
with all volumes on it. While this is a great for utilizing
resources, it is so good in terms of performance and managebility.

Because self-heal daemons doesn't have capability to automatically
reconfigure their graphs. So each time when any configurations
changes happens to the volumes(replicate/disperse), we need to restart
shd to bring the changes into the graph.

Because of this all on going heal for all other volumes has to be
stopped in the middle, and need to restart all over again.

Solution:

This changes makes shd as a per volume daemon, so that the graph
will be generated for each volumes.

When we want to start/reconfigure shd for a volume, we first search
for an existing shd running on the node, if there is none, we will
start a new process. If already a daemon is running for shd, then
we will simply detach a graph for a volume and reatach the updated
graph for the volume. This won't touch any of the on going operations
for any other volumes on the shd daemon.

Example of an shd graph when it is per volume

                           graph
                     -----------------------
                     |     debug-iostat    |
                     -----------------------
                    /         |             \
                   /          |              \
              ---------    ---------      ----------
              | AFR-1 |    | AFR-2 |      |  AFR-3 |
              --------     ---------      ----------

A running shd daemon with 3 volumes will be like-->

                           graph
                     -----------------------
                     |     debug-iostat    |
                     -----------------------
                    /           |           \
                   /            |            \
              ------------   ------------  ------------
              | volume-1 |   | volume-2 |  | volume-3 |
              ------------   ------------  ------------

Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99
fixes: bz#1659708
Signed-off-by: Mohammed Rafi KC

glusterd: fix potential locking issue on peer probe

2019-03-27T17:17:19+00:00

There are two cases to restart brick, one is when glusterd starts or
quorum is met, another is when new peers are joined and quorum is
changes. In the later case, sync_lock is not taken, and may cause lock
corruption.

Change-Id: I0844e7a631350f5ee00bdacb613602bffffcdf9f
fixes: bz#1692612
Signed-off-by: Zhang Huan

glusterd: migrating rebalance commands to mgmt_v3 framework

2018-12-18T04:01:52+00:00

Current rebalance commands use the op_state machine framework.
Porting it to use the mgmt_v3 framework.

Change-Id: I6faf4a6335c2e2f3d54bbde79908a7749e4613e7
fixes: bz#1655827
Signed-off-by: Sanju Rakonde

glusterd: fix get_mux_limit_per_process to read default value

2018-12-07T07:08:58+00:00

get_mux_limit_per_process () reads the global option dictionary and in
case it doesn't find out a key, assumes that
cluster.max-bricks-per-process option isn't configured however the
default value should be picked up in such case.

Change-Id: I35dd8da084adbf59793d58557e818d8e6c17f9f3
Fixes: bz#1656951
Signed-off-by: Atin Mukherjee