glusterfs.git/xlators/cluster/dht, branch v3.4.6

Cluster/DHT: Changing rename log severity

2014-10-20T14:56:16+00:00

Changing log level for a rename message from debug
to info to improve debuggability

Change-Id: I53031fcf97fffd62095692477330ecde0cf47dcd
BUG: 1139998
Signed-off-by: Nithya Balachandran 
Reviewed-on: http://review.gluster.org/8582
Reviewed-by: Vijay Bellur 
Tested-by: Gluster Build System 
Reviewed-on: http://review.gluster.org/8685
Reviewed-by: Kaleb KEITHLEY

cluster/dht: Rename should not fail post hardlink creation

2014-10-20T14:56:00+00:00

In the rename path, we wind the creation of newname hardlink and
linkto file in dst hashed a the same time. If the linkto creation
fails, but the link creation succeeds, we enter the failure code
and cleanup the created newname hardlink.

In the interim if another client looks up newname and finds it as
a hardlink from FUSE, it could send an unlink for oldname instead
of a rename. This combined with the above cleanup code could end
up losing all the files copies, and thereby losing data.

This fix separates these steps into 2 parts, creating the linkto
first and then the link file, so that post link file creation no
failures would cleanup the newname file. If linkto fails then link
is not attempted, thereby not polluting the name space with
newname.

Change-Id: I61da8e906060da16a31ea1076eec2f01fd617f44
BUG: 1139998
Signed-off-by: Shyam 
Reviewed-on: http://review.gluster.org/8570
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8683
Reviewed-by: Kaleb KEITHLEY

cluster/dht: Treat linkto file rename failure as non-critial error

2014-10-20T14:55:49+00:00

It is a critical failure iff we fail to rename the cached file
if the rename of the linkto failed, it is not a critical failure,
and we do not want to lose the created hard link for the new
name as that could have been read by other clients.

NOTE: If another client is attempting the same oldname -> newname
rename, and finds both file names as existing, and are hard links
to each other, then FUSE would send in an unlink for oldname. In
this time duration if we treat the linkto as a critical error and
unlink the newname we created, we would have effectively lost the
file to rename operations.

Repercussions of treating this as a non-critical error is that
we could leave behind a stale linkto file and/or not create the new
linkto file, the second case would be rectified by a subsequent
lookup, the first case by a rebalance, like for all stale linkto
files

Change-Id: Ia53ad8b43c3cf8f48ef5b43fd1fec4274e807556
BUG: 1139998
Signed-off-by: Shyam 
Reviewed-on: http://review.gluster.org/8563
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8682
Reviewed-by: Kaleb KEITHLEY

cluster/dht: synchronize rename and file-migration

2014-10-20T14:55:36+00:00

Change-Id: I4f243c946f76d440680b651235f925e3d0ebf0fd
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/8523
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur 
BUG: 1139998
Reviewed-on: http://review.gluster.org/8681
Reviewed-by: Kaleb KEITHLEY

cluster/dht: introduce locking api.

2014-10-20T14:55:11+00:00

Change-Id: I41389ba91951d3e63e617aa32cd0bee848261c72
BUG: 1139998
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/8521
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8679
Reviewed-by: Kaleb KEITHLEY

cluster/dht: Fix dht_access treating directory like files

2014-10-20T14:54:59+00:00

When the cluster topology changes due to add-brick, all sub
volumes of DHT will not contain the directories till a rebalance
is completed. Till the rebalance is run, if a caller bypasses
lookup and calls access due to saved/cached inode information
(like NFS server does) then, dht_access misreads the error
(ESTALE/ENOENT) from the new subvolumes and incorrectly tries
to handle the inode as a file. This results in the directories
in memory state in DHT to be corrupted and not heal even post
a rebalance.

This commit fixes the problem in dht_access thereby preventing
DHT from misrepresenting a directory as a file in the case
presented above.

Change-Id: Idcdaa3837db71c8fe0a40ec0084a6c3dbe27e772
BUG: 1139997
Signed-off-by: Shyam 
Reviewed-on: http://review.gluster.org/8462
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8678
Reviewed-by: Kaleb KEITHLEY

cluster/dht: Prevent dht_access from going into a loop.

2014-10-20T14:54:49+00:00

If access fails with ENOTCONN, do not wind to same subvol.
We wind to first-up-subvol if access fails with ENOTCONN.
In few cases, if dht has only 1 subvolume, and access fails with
ENOTCONN, we go into a infinite loop of winding to same subvol

The fix is to check if we previously wound to same subvol, and
fail if first-up-subvol is same.

Change-Id: Ib5d3ce7d33e8ea09147905a7df1ed280874fa549
BUG: 1139996
Signed-off-by: shishir gowda 
Reviewed-on: http://review.gluster.org/5319
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati 
Reviewed-on: http://review.gluster.org/8677
Reviewed-by: N Balachandran 
Reviewed-by: Kaleb KEITHLEY

dht: fix rename race

2014-10-20T14:54:38+00:00

Additional check to check if we created the linkto
file before deleting it in the rename cleanup function

Change-Id: I919cd7cb24f948ba4917eb9cf50d5169bb730a67
BUG: 1139988
Signed-off-by: Nithya Balachandran 
Reviewed-on: http://review.gluster.org/8338
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8676
Reviewed-by: Kaleb KEITHLEY

cluster/dht: Fix races to avoid deletion of linkto

2014-10-20T14:53:56+00:00

 file

Explanation of Race between rebalance processes:
https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4

scenario-1:
===========

STATE 1:                          BRICK-1
only one brick                   Cached File
in the system

STATE 2:
Add brick-2                       BRICK-1                BRICK-2

STATE 3:                                       Lookup of File on brick-2
                                               by this node's rebalance
                                               will fail because hashed
                                               file is not created yet.
                                               So dht_lookup_everywhere is
                                               about to get called.

STATE 4:                         As part of lookup
                                 link file at brick-2
                                 will be created.

STATE 5:                         getxattr to check that
                                 cached file belongs to
                                 this node is done

STATE 6:

                                            dht_lookup_everywhere_cbk detects
                                            the link created by rebalance-1.
                                            It will unlink it.

STATE 7:                        getxattr at the link
                                file with "pathinfo" key
                                will be called will fail
                                as the link file is deleted
                                by rebalance on node-2

Fix:
So in the STATE 6, we should avoid the deletion of link file. Every time
dht_lookup_everywhere gets called, lookup will be performed on all the nodes.
So to avoid STATE 6, if linkto file is found, it is not deleted until valid
case is found in dht_lookup_everywhere_done.

Case 1: if linkto file points to cached node, and cached file exists,
        uwind with success.

Case 2: if linkto does not point to current cached node, and cached file
        exists:
        a) Unlink stale link file
        b) Create new link file

Case 3: Only linkto file exists:
        Delete linkto file

Case 4: Only cached file
        Create link file (Handled event without patch)

Case 5: Neither cached nor hashed file is present
        Return with ENOENT (handled even without patch)

Reviewed-on: http://review.gluster.org/8231

******************************************************************************

scenario-2:
===========
cluster/dht: Modified logic of linkto file deletion on non-hashed

Currently whenever dht_lookup_everywhere gets called, if in
dht_lookup_everywhere_cbk, a linkto file is found on non-hashed
subvolume, file is unlinked. But there are cases when this file
is under migration. Under such condition, we should avoid deletion
of file.

When  some other rebalance process changes the layout of parent
such that dst_file (w.r.t. migration) falls on non-hashed node,
then may be lookup could have found it as linkto file but just
before unlink, file  is under migration or already migrated
In such cased unlink can be avoided.

Race:
-------
If we have two bricks (brick-1 and brick-2) with initial file "a"
under BaseDir which is hashed as well as cached on (brick-1).

Assume "a"  hashing gives 44.

                              Brick-1              Brick-2

Initial Setup:               BaseDir/a             BaseDir
                             [1-50]                [51-100]

Now add new-brick Brick-3.

1. Rebalance-1 on node Node-1 (Brick-1 node) will reset
the BaseDir Layout.

2. After that it will perform
a)  Create linkto file on  new-hashed (brick-2)
b)  Perform file migration.

1.Rebalance-1 Fixes the base-layout:
                 Brick-1             Brick-2           Brick-3
                 ---------         ----------         ------------
                 BaseDir/a            BaseDir           BaseDir
                  [1-33]              [34-66]           [67-100]

2. Only a) is     BaseDir/a          BaseDir/a(linkto)   BaseDir
   performed                         Create linktofile

Now rebalance 2 on node-2 jumped in and it will perform
step 1 and 2-a.

After (rebal-2, step-1), it changes the layout of the BaseDir.
                    BaseDir/a     BaseDir/a(link)    BaseDir
                    [67-100]           [1-33]        [34-66]

For  (rebale-2, step-2), It will perform lookup at Brick-3 as w.r.t new
layout 44 falls for brick-3. But lookup will fail.
So  dht_lookup_everywhere gets called.

NOTE: On brick-2 by rebalance-1, a linkto file was created.

Currently that linkto files gets deleted by rebalance-2 lookup as it
is considered as stale linkto file.  But  with patch if rebalance is
already in progress or rebalance is over,  linkto file will not be
unlinked. If rebalance is in progress fd will be  open and if rebalance
is over then linkto file wont be set.

Reviewed-on: http://review.gluster.org/8345

*******************************************************************************

scenario-3:
===========

cluster/dht: Added keys in dht_lookup_everywhere_done

Case where both cached  (C1)  and hashed file are found,
but hash does not point to above cached node (C1), then
dont unlink if either fd-is-open on hashed or
linkto-xattr is not found.

Reviewed-on: http://review.gluster.org/8429
BUG: 1139995
Signed-off-by: Venkatesh Somyajulu 
Change-Id: I86d0a21d4c0501c45d837101ced4f96d6fedc5b9
Signed-off-by: Venkatesh Somyajulu 
Tested-by: Gluster Build System 
Reviewed-by: susant palai 
Reviewed-by: Raghavendra G 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8674
Reviewed-by: Kaleb KEITHLEY

DHT/Create : Failing to identify a linkto file in lookup_everywhere_cbk path

2014-10-20T14:53:33+00:00

In case a file is not found in its cached subvol we proceed with
dht_lookup_everywhere. But as we dont add the linkto xattr to the
dictionary, we fail to identify any linkto file encountered.The
implication being we end up thinking the linkto file as a regular file
and proceed with the fop.

Change-Id: Iab02dc60e84bb1aeab49182f680c0631c33947e2
BUG: 1139992
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/8277
Reviewed-by: Vijay Bellur 
Tested-by: Gluster Build System 
Reviewed-on: http://review.gluster.org/8673
Reviewed-by: N Balachandran 
Reviewed-by: Kaleb KEITHLEY