glusterfs.git/tests/volume.rc, branch v3.11.0beta1

cluster/ec: Don't trigger data/metadata heal on Lookups

2017-02-27T03:06:55+00:00

Problem-1
If Lookup which doesn't take any locks observes version mismatch it can't be
trusted. If we launch a heal based on this information it will lead to
self-heals which will affect I/O performance in the cases where Lookup is
wrong. Considering self-heal-daemon and operations on the inode from client
which take locks can still trigger heal we can choose to not attempt a heal on
Lookup.

Problem-2:
Fixed spurious failure of
tests/bitrot/bug-1373520.t
For the issues above, what was happening was that ec_heal_inspect()
is preventing 'name' heal to happen

Problem-3:
tests/basic/ec/ec-background-heals.t
To be honest I don't know what the problem was, while fixing
the 2 problems above, I made some changes to ec_heal_inspect() and
ec_need_heal() after which when I tried to recreate the spurious
failure it just didn't happen even after a long time.

BUG: 1414287
Signed-off-by: Pranith Kumar K 
Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834
Reviewed-on: https://review.gluster.org/16468
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System 
Reviewed-by: Ashish Pandey

glusterd: keep snapshot bricks separate from regular ones

2017-02-10T13:16:48+00:00

The problem here is that a volume's transport options can change, but
any snapshots' bricks don't follow along even though they're now
incompatible (with respect to multiplexing).  This was causing the
USS+SSL test to fail.  By keeping the snapshot bricks separate
(though still potentially multiplexed with other snapshot bricks
including those for other volumes) we can ensure that they remain
unaffected by changes to their parent volumes.

Also fixed various issues with how the test waits (or more precisely
didn't) for various events to complete before it continues.

Change-Id: Iab4a8a44fac5760373fac36956a3bcc27cf969da
BUG: 1385758
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16544
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Avra Sengupta 
Tested-by: Avra Sengupta

tests: fix online_brick_count for multiplexing

2017-02-08T03:22:19+00:00

The number of brick processes no longer matches the number of bricks,
therefore counting processes doesn't work.  Counting *pidfiles* does.
Ironically, the fix broke multiplex.t which used this function, so it
now uses a different function with the old process-counting behavior.
Also had to fix online_brick_count and kill_node in cluster.rc to be
consistent with the new reality.

Change-Id: I4e81a6633b93227e10604f53e18a0b802c75cbcc
BUG: 1385758
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/16527
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Vijay Bellur

core: run many bricks within one glusterfsd process

2017-01-31T00:13:58+00:00

This patch adds support for multiple brick translator stacks running
in a single brick server process.  This reduces our per-brick memory usage by
approximately 3x, and our appetite for TCP ports even more.  It also creates
potential to avoid process/thread thrashing, and to improve QoS by scheduling
more carefully across the bricks, but realizing that potential will require
further work.

Multiplexing is controlled by the "cluster.brick-multiplex" global option.  By
default it's off, and bricks are started in separate processes as before.  If
multiplexing is enabled, then *compatible* bricks (mostly those with the same
transport options) will be started in the same process.

Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
BUG: 1385758
Signed-off-by: Jeff Darcy 
Reviewed-on: https://review.gluster.org/14763
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur

features/bit-rot-stub: use the correct spelling of quarantine for bad objects

2017-01-30T20:17:51+00:00

                       container

The directory for containing the list of bad objects was named "quanrantine"
instead of "quarantine"

Change-Id: I8c20381ac637201d9d1a224f5223e8dfbed53f1e
BUG: 1401571
Signed-off-by: Raghavendra Bhat 
Reviewed-on: https://review.gluster.org/16027
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kotresh HR

tier : Tier as a service

2017-01-17T04:49:47+00:00

tierd is implemented by separating from rebalance process.

The commands affected:

1) Attach tier will trigger this process instead of old one
2) tier start and tier start force will also trigger this process.
3) volume status [tier] will show tier daemon as a process instead
of task and normal tier status and tier detach status works.
4) tier stop implemented.
5) detach tier implemented separately along with new detach tier
status
6) volume tier volname status will work using the changes.
7) volume set works

This patch has separated the tier translator from the legacy
DHT rebalance code. It now sends the RPCs from the CLI
to glusterd separate to the DHT rebalance code.
The daemon is now a service, similar to the snapshot daemon,
and can be viewed using the volume status command.

The code for the validation and commit phase are the same
as the earlier tier validation code in DHT rebalance.

The “brickop” phase has been changed so that the status
command can use this framework.

The service management framework is now used.
DHT rebalance does not use this framework.

This service framework takes care of :

*) spawning the daemon, killing it and other such processes.
*) volume set options , which are written on the volfile.
*) restart and reconfigure functions. Restart is to restart
the daemon at two points
        1)after gluster goes down and comes up.
        2) to stop detach tier.
*) reconfigure is used to make immediate volfile changes.
By doing this, we don’t restart the daemon.
it has the code to rewrite the volfile for topological
changes too (which comes into place during add and remove brick).

With this patch the log, pid, and volfile are separated
and put into respective directories.

Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399
BUG: 1313838
Signed-off-by: hari gowtham 
Reviewed-on: http://review.gluster.org/13365
Smoke: Gluster Build System 
Tested-by: hari gowtham 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Dan Lambright 
Reviewed-by: Atin Mukherjee

cluster/ec: Implement heal info with lock

2016-10-11T09:29:27+00:00

Problem: Currently heal info command prints all
the files/directories if the index for the
file/directory is present in .glusterfs/indices folder.
After implementing patch http://review.gluster.org/#/c/13733/
indices of the file which is going through update fop
will also be present in .glusterfs/indices even
if the fop is successful on all the brick. At this time
if heal info command is being used, it will also display this
file which is actually healthy and does not require any heal.

Solution: Take lock on a file corresponding to the indices
and inspect xattrs to decide if the file needs heal or not.

Change-Id: I6361e2813ece369be12d02e74816df4eddb81cfa
BUG: 1366815
Signed-off-by: Ashish Pandey 
Reviewed-on: http://review.gluster.org/15543
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Xavier Hernandez 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System

tests: Fix races in open-behind.t

2016-09-27T11:44:27+00:00

Problems:
1) flush-behind is on by default, so just because write completes doesn't mean
   it will be on the disk, it could still be in write-behind's cache. This
   leads to failure where if you write from one mount and expect it to be there
   on the other mount, sometimes it won't be there.
2) Sometimes the graph switch is not completing by the time we issue read which
   is leading to opens not being sent on brick leading to failures.

Fixes:
1) Disable flush-behind
2) Add new functions to check the new graph is there and connected to bricks
   before 'cat' is executed.

BUG: 1379511
Change-Id: I0faed684e0dc70cfd2258ce6fdaed655ee915ae6
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/15575
Smoke: Gluster Build System 
Reviewed-by: Raghavendra G 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

feature/bitrot: Fix recovery of corrupted hardlink

2016-09-08T17:09:33+00:00

Problem:
When a file with hardlink is corrupted in ec volume,
the recovery steps mentioned was not working.
Only name and metadata was healing but not the data.

Cause:
The bad file marker in the inode context is not removed.
Hence when self heal tries to open the file for data
healing, it fails with EIO.

Background:
The bitrot deletes inode context during forget.

Briefly, the recovery steps involves following steps.
   1. Delete the entry marked with bad file xattr
      from backend. Delete all the hardlinks including
      .glusters hardlink as well.
   2. Access the each hardlink of the file including
      original from the mount.

The step 2 will send lookup to the brick where the files
are deleted from backend and returns with ENOENT. On
ENOENT, server xlator forgets the inode if there are
no dentries associated with it. But in case hardlinks,
the forget won't be called as dentries (other hardlink
files) are associated with the inode. Hence bitrot stube
won't delete it's context failing the data self heal.

Fix:
Bitrot-stub should delete the inode context on getting
ENOENT during lookup.

Change-Id: Ice6adc18625799e7afd842ab33b3517c2be264c1
BUG: 1373520
Signed-off-by: Kotresh HR 
Reviewed-on: http://review.gluster.org/15408
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra Bhat

feature/bitrot: Ondemand scrub option for bitrot

2016-08-25T21:39:38+00:00

The bitrot scrubber takes 'hourly/daily/biweekly/monthly'
as the values for 'scrub-frequency'. There is no way
to schedule the scrubbing when the admin wants it.

Ondemand scrubbing brings in the new option 'ondemand'
with which the admin can start scrubbing ondemand.
It starts the scrubbing immediately.

Ondemand scrubbing is successful only if the scrubber
is in 'Active (Idle)' (waiting for it's next frequency
cycle to start scrubbing). It is not entertained when
the scrubber is in 'Paused' or already running.

Here is the command line syntax.

gluster volume bitrot  scrub ondemand

Change-Id: I84c28904367eed827a7dae8d6a535c14b28e9f4d
BUG: 1366195
Signed-off-by: Kotresh HR 
Reviewed-on: http://review.gluster.org/15111
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Venky Shankar