glusterfs.git/xlators/mgmt/glusterd/src/glusterd-volume-ops.c, branch v3.12.6

glusterd: connect to an existing brick process when qourum status is NOT_APPLICABLE_QUORUM

2018-01-12T05:43:49+00:00

First of all, this patch reverts commit 635c1c3 as the same is causing a
regression with bricks not coming up on time when a node is rebooted.
This patch tries to fix the problem in a different way by just trying to
connect to an existing running brick when quorum status is not
applicable.

>mainline patch : https://review.gluster.org/#/c/19134/

Change-Id: I0efb5901832824b1c15dcac529bffac85173e097
BUG: 1511301
Signed-off-by: Atin Mukherjee

glusterd: fix brick restart parallelism

2017-11-06T06:09:21+00:00

glusterd's brick restart logic is not always sequential as there is
atleast three different ways how the bricks are restarted.
1. through friend-sm and glusterd_spawn_daemons ()
2. through friend-sm and handling volume quorum action
3. through friend handshaking when there is a mimatch on quorum on
friend import.

In a brick multiplexing setup, glusterd ended up trying to spawn the
same brick process couple of times as almost in fraction of milliseconds
two threads hit glusterd_brick_start () because of which glusterd didn't
have any choice of rejecting any one of them as for both the case brick
start criteria met.

As a solution, it'd be better to control this madness by two different
flags, one is a boolean called start_triggered which indicates a brick
start has been triggered and it continues to be true till a brick dies
or killed, the second is a mutex lock to ensure for a particular brick
we don't end up getting into glusterd_brick_start () more than once at
same point of time.

Change-Id: I292f1e58d6971e111725e1baea1fe98b890b43e2
BUG: 1508283
Signed-off-by: Atin Mukherjee 
(cherry picked from commit 82be66ef8e9e3127d41a4c843daf74c1d8aec4aa)

glusterd: fix client io-threads option for replicate volumes

2017-10-09T12:21:08+00:00

Backport of https://review.gluster.org/#/c/18430/

Problem:
Commit ff075a3d6f9b142911d25c27fd209838782bfff0 disabled loading
client-io-threads for replicate volumes (it was set to on by default in
commit e068c1997314046658dd502e9118dab32decf879) due to performance
issues but in doing so, inadvertently failed to load the xlator even if
the user explicitly enabled the option using the volume set command.
This was despite returning returning sucess for the volume set.

Fix:
Modify the check in perfxl_option_handler() and add checks in volume
create/add-brick/remove-brick code paths, tying it all to
GD_OP_VERSION_3_12_2.

Change-Id: Ib612973a999a7da818cc926f5c2601b1f0794fcf
BUG: 1499158
Signed-off-by: Ravishankar N

mgmt/glusterd: Provide more information in command message

2017-08-12T13:38:51+00:00

Problem:
When more than one bricks are present on the same node,
while creating a volume, we get a warning message that
the setup is not optimal. We need to add more information
in this error/warning.

Solution:
Add following line in current message.
Bricks should be on different nodes to have best fault
tolerant configuration.

>Change-Id: Ica72bd6e68dff7e41c37617f3b775a981fa40c69
>BUG: 1480099
>Signed-off-by: Ashish Pandey 
>Reviewed-on: https://review.gluster.org/18014
>CentOS-regression: Gluster Build System 
>Smoke: Gluster Build System 
>Reviewed-by: Atin Mukherjee 
>Signed-off-by: Ashish Pandey 

Change-Id: Ica72bd6e68dff7e41c37617f3b775a981fa40c69
BUG: 1480448
Signed-off-by: Ashish Pandey 
Reviewed-on: https://review.gluster.org/18022
Reviewed-by: Atin Mukherjee 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

posix: option to handle the shared bricks for statvfs()

2017-07-31T15:34:58+00:00

Currently 'storage/posix' xlator has an option called option
`export-statfs-size no`, which exports zero as values for few
fields in `struct statvfs`. In a case of backend brick shared
between multiple brick processes, the values of these variables
should be `field_value / number-of-bricks-at-node`. This way,
even the issue of 'min-free-disk' etc at different layers would
also be handled properly when the statfs() sys call is made.

Fixes #241

> Reviewed-on: https://review.gluster.org/17618
> Reviewed-by: Jeff Darcy 
> Reviewed-by: Atin Mukherjee 
> (cherry picked from commit febf5ed4848ad705a34413353559482417c61467)

Change-Id: I2e320e1fdcc819ab9173277ef3498201432c275f
Signed-off-by: Amar Tumballi 
Reviewed-on: https://review.gluster.org/17903
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan

glusterd: Introduce option to limit no. of muxed bricks per process

2017-07-10T04:33:19+00:00

This commit introduces a new global option that can be set to limit
the number of multiplexed bricks in one process.

Usage:
`# gluster volume set all cluster.max-bricks-per-process `

If this option is not set then multiplexing will happen for now
with no limitations set; i.e. a brick process will have as many
bricks multiplexed to it as possible. In other words the current
multiplexing behaviour won't change if this option isn't set to
any value.

This commit also introduces a brick process instance that contains
information about brick processes, like the number of bricks
handled by the process (which is 1 in non-multiplexing cases), list
of bricks, and port number which also serves as an unique identifier
for each brick process instance. The brick process list is
maintained in 'glusterd_conf_t'.

Updates: #151
Change-Id: Ib987d14ab0a4f6034dac01b73a4b2839f7b0b695
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17469
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

glusterd: Make reset-brick work correctly if brick-mux is on

2017-05-10T18:58:21+00:00

Reset brick currently kills of the corresponding brick process.
However, with brick multiplexing enabled, stopping the brick
process would render all bricks attached to it unavailable. To
handle this correctly, we need to make sure that the brick process
is terminated only if brick-multiplexing is disabled. Otherwise,
we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
brick process to detach the brick that is to be reset.

Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
BUG: 1446172
Signed-off-by: Samikshan Bairagya 
Reviewed-on: https://review.gluster.org/17128
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Atin Mukherjee

Fixes quota aux mount failure

2017-05-08T06:15:55+00:00

The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1433906
Signed-off-by: Sanoj Unnikrishnan 
Reviewed-on: https://review.gluster.org/16938
NetBSD-regression: NetBSD Build System 
Smoke: Gluster Build System 
Reviewed-by: Manikandan Selvaganesh 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Atin Mukherjee

rpc: Remove accidental IPV6 changes

2017-05-05T11:39:28+00:00

They snuck in with the HALO patch (07cc8679c)

Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6
BUG: 1447953
Signed-off-by: Kaushal M 
Reviewed-on: https://review.gluster.org/17174
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Amar Tumballi

Halo Replication feature for AFR translator

2017-05-02T10:23:53+00:00

Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster.  Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.

In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.

There are a few new volume options due to this feature:
  halo-shd-latency:  The threshold below which self-heal daemons will
  consider children (bricks) connected.

  halo-nfsd-latency: The threshold below which NFS daemons will consider
  children (bricks) connected.

  halo-latency: The threshold below which all other clients will
  consider children (bricks) connected.

  halo-min-replicas: The minimum number of replicas which are to
  be enforced regardless of latency specified in the above 3 options.
  If the number of children falls below this threshold the next
  best (chosen by latency) shall be swapped in.

New FUSE mount options:
  halo-latency & halo-min-replicas: As descripted above.

This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.

Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
  the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
  Writes & deletes will be protected via entry-locks as usual preventing
  concurrent writes into files which are undergoing replication.  Read operations
  on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
  other regions.  The take away from this is care should be taken to ensure
  multiple writers do not write the same files resulting in a gfid split-brain
  which will require resolution via split-brain policies (majority, mtime &
  size).  Recommended method for preventing this is using the nfs-auth feature to
  define which region for each share has RW permissions, tiers not in the origin
  region should have RO perms.

TODO:
- Synchronous clients (including the SHD) should choose clients from their own
  region as preferred sources for reads.  Most of the plumbing is in place for
  this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
  (i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
  to synchronously write to

Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests

Reviewers: jackl, dph, cjh, meyering

Reviewed By: meyering

Subscribers: ethanr

Differential Revision: https://phabricator.fb.com/D1272053

Tasks: 4117827

Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
BUG: 1428061
Signed-off-by: Kevin Vigor 
Reviewed-on: http://review.gluster.org/16099
Reviewed-on: https://review.gluster.org/16177
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri