glusterfs.git/xlators/cluster/afr/src/afr.h, branch v4.0.2

cluster/afr: Fix dict-leak in pre-op

2018-03-03T04:33:26+00:00

At the time of pre-op, pre_op_xdata is populted with the xattrs we get from the
disk and at the time of post-op it gets over-written without unreffing the
previous value stored leading to a leak.
This is a regression we missed in
https://review.gluster.org/#/q/ba149bac92d169ae2256dbc75202dc9e5d06538e

 >BUG: 1550078
 >Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7
 >Signed-off-by: Pranith Kumar K 
 >(cherry picked from commit e7b79c59590c203c65f7ac8548b30d068c232d33)

BUG: 1550808
Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7

posix/afr: handle backward compatibility for rchecksum fop

2018-02-20T03:12:02+00:00

Added a volume option 'fips-mode-rchecksum' tied to op version 4.
If not set, rchecksum fop will use MD5 instead of SHA256.

updates: #230
Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2
Signed-off-by: Ravishankar N 
(cherry picked from commit 6daa6535692b2c68b493636a9bbfdcbc475b3d80)

cluster/afr: Adding option to take full file lock

2018-01-19T00:15:19+00:00

Problem:
In replica 3 volumes there is a possibilities of ending up in split
brain scenario, when multiple clients writing data on the same file
at non overlapping regions in parallel.

Scenario:
- Initially all the copies are good and all the clients gets the value
  of data readables as all good.
- Client C0 performs write W1 which fails on brick B0 and succeeds on
  other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks.
- C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges
  so afr takes granular locks and all the writes are performed in parallel.
  Since each client had data-readables as good, it does not see
  file going into split-brain in the in_flight_split_brain check, hence
  performs the post-op marking the pending xattrs. Now all the bricks
  are being blamed by each other, ending up in split-brain.

Fix:
Have an option to take either full lock or range lock on files while
doing data transactions, to prevent the possibility of ending up in
split brains. With this change, by default the files will take full
lock while doing IO. If you want to make use of the old range lock
change the value of "cluster.full-lock" to "no".

Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
BUG: 1535438
Signed-off-by: karthik-us

cluster/afr: Fixing the flaws in arbiter becoming source patch

2018-01-13T02:55:44+00:00

Problem:
Setting the write_subvol value to read_subvol in case of metadata
transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03)
might lead to the original problem of arbiter becoming source.

Scenario:
1) All bricks are up and good
2) 2 writes w1 and w2 are in progress in parallel
3) ctx->read_subvol is good for all the subvolumes
4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on
   the disk
5) read/lookup comes on the same file and refreshes read_subvols back
   to all good
6) metadata transaction happens which makes ctx->write_subvol to be
   assigned with ctx->read_subvol which is all good
7) w2 succeeds on brick1 and fails on brick0 and this will update the
   brick in reverse order leading to arbiter becoming source

Fix:
Instead of setting the ctx->write_subvol to ctx->read_subvol in the
pre-op statge, if there is a metadata transaction, check in the
function __afr_set_in_flight_sb_status() if it is a data/metadata
transaction. Use the value of ctx->write_subvol if it is a data
transactions and ctx->read_subvol value for other transactions.

With this patch we assign the value of ctx->write_subvol in the
afr_transaction_perform_fop() with the on disk value, instead of
assigning it in the afr_changelog_pre_op() with the in memory value.

Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4
BUG: 1482064
Signed-off-by: karthik-us

rchecksum/fips: Replace MD5 usage to enable fips support

2017-12-21T04:31:31+00:00

rchecksum uses MD5 which is not fips compliant. Hence
using sha256 for the same.

Updates: #230
Change-Id: I7fad016fcc2a9900395d0da919cf5ba996ec5278
Signed-off-by: Kotresh HR

afr: add checks for allowing lookups

2017-11-18T00:38:47+00:00

Problem:
In an arbiter volume, lookup was being served from one of the sink
bricks (source brick was down). shard uses the iatt values from lookup cbk
to calculate the size and block count, which in this case were incorrect
values. shard_local_t->last_block was thus initialised to -1, resulting
in an infinite while loop in shard_common_resolve_shards().

Fix:
Use client quorum logic to allow or fail the lookups from afr if there
are no readable subvolumes. So in replica-3 or arbiter vols, if there is
no good copy or if quorum is not met, fail lookup with ENOTCONN.

With this fix, we are also removing support for quorum-reads xlator
option. So if quorum is not met, neither read nor write txns are allowed
and we fail the fop with ENOTCONN.

Change-Id: Ic65c00c24f77ece007328b421494eee62a505fa0
BUG: 1467250
Signed-off-by: Ravishankar N

cluster/afr: Fix for arbiter becoming source

2017-11-18T00:38:20+00:00

Problem:
When eager-lock is on, and two writes happen in parallel on a FD
we were observing the following behaviour:
- First write fails on one data brick
- Since the post-op is not yet happened, the inode refresh will get
  both the data bricks as readable and set it in the inode context
- In flight split brain check see both the data bricks as readable
  and allows the second write
- Second write fails on the other data brick
- Now the post-op happens and marks both the data bricks as bad and
  arbiter will become source for healing

Fix:
Adding one more variable called write_suvol in inode context and it
will have the in memory representation of the writable subvols. Inode
refresh will not update this value and its lifetime is pre-op through
unlock in the afr transaction. Initially the pre-op will set this
value same as read_subvol in inode context and then in the in flight
split brain check we will use this value instead of read_subvol.
After all the checks we will update the value of this and set the
read_subvol same as this to avoid having incorrect value in that.

Change-Id: I2ef6904524ab91af861d59690974bbc529ab1af3
BUG: 1482064
Signed-off-by: karthik-us

cluster/afr: Fail open on split-brain

2017-10-26T18:23:35+00:00

Problem:
Append on a file with split-brain succeeds. Open is intercepted by open-behind,
when write comes on the file, open-behind does open+write. Open succeeds
because afr doesn't fail it. Then write succeeds because write-behind
intercepts it. Flush is also intercepted by write-behind, so the application
never gets to know that the write failed.

Fix:
Fail open on split-brain, so that when open-behind does open+write open fails
which leads to write failure. Application will know about this failure.

Change-Id: I4bff1c747c97bb2925d6987f4ced5f1ce75dbc15
BUG: 1294051
Signed-off-by: Pranith Kumar K

afr: check validity of afr_reply

2017-08-31T09:40:58+00:00

...in various self-heal code paths.

Originally found by Pranith in __afr_selfheal_name_impunge ()

Also change __afr_selfheal_assign_gfid() to send lookup only on those
bricks that don't have a gfid matching that of the source.

Change-Id: I70a2ccd750a2af92c5fc36e0eefb2b6125404b4a
BUG: 1482923
Signed-off-by: Pranith Kumar K 
Signed-off-by: Ravishankar N 
Reviewed-on: https://review.gluster.org/18065
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System

cluster/afr: Implement quorum for lk fop

2017-06-19T05:17:29+00:00

Problem:
At the moment when we have replica 3 or arbiter setup, even when
lk succeeds on just one brick we give success to application which
is wrong

Fix:
Consider quorum-number of successes as success when quorum is enabled.

BUG: 1461792
Change-Id: I5789e6eb5defb68f8a0eb9cd594d316f5cdebaea
Signed-off-by: Pranith Kumar K 
Reviewed-on: https://review.gluster.org/17524
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Ravishankar N