glusterfs.git/xlators/cluster/dht/src/dht-messages.h, branch v3.12dev

feature/dht: Directory synchronization

2017-04-26T09:00:34+00:00

Design doc: https://review.gluster.org/16876

Directory creation is now synchronized with blocking inodelk of the
parent on the hashed subvolume followed by the entrylk on the hashed
subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup
selfheal mkdir.

To maintain internal consistency of directories across all subvols of
dht, we need locks. Specifically we are interested in:

 1. Consistency of layout of a directory. Only one writer should modify
    the layout at a time. A writer (layout setting during directory heal
    as part of lookup) shouldn't modify the layout while there are
    readers (all other fops like create, mkdir etc., which consume
    layout) and readers shouldn't read the layout while a writer is in
    progress. Readers can read the layout simultaneously. Writer takes
    a WRITE inodelk on the directory (whose layout is being modified)
    across ALL subvols. Reader takes a READ inodelk on the directory
    (whose layout is being read) on ANY subvol.

 2. Consistency of directory namespace across subvols. The path and
    associated gfid should be same on all subvols. A gfid should not be
    associated with more than one path on any subvol. All fops that can
    change directory names (mkdir, rmdir, renamedir, directory creation
    phase in lookup-heal) takes an entrylk on hashed subvol of the
    directory.

 NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a
        directory, the transaction itself is a consumer of layout on
        parent directory. So, the transaction is a reader of parent
        layout and does an inodelk on parent directory just like any
        other layout reader. So a mkdir (dir/subdir) would:

     > Acquire a READ inodelk on "dir" on any subvol.
     > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir".
     > creates directory on hashed subvol and possibly on non-hashed subvols.
     > UNLOCK (entrylk)
     > UNLOCK (inodelk)

 NOTE2: mkdir fop while setting the layout of the directory being created
        is considered as a reader, but NOT a writer. The reason is for
        a fop which can consume the layout of a directory to come either
        of the following conditions has to be true:

     > mkdir syscall from application has to complete. In this case no
       need of synchronization.
     > A lookup issued on the directory racing with mkdir has to complete.
       Since layout setting by a lookup is considered as a writer, only
       one of either mkdir or lookup will set the layout.

Code re-organization:
   All the lock related routines are moved to "dht-lock.c" file.
   New wrapper function is introduced to take blocking inodelk
   followed by entrylk 'dht_protect_namespace'

Updates #191
Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31
Signed-off-by: Kotresh HR 
BUG: 1443373
Reviewed-on: https://review.gluster.org/15472
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G 
Smoke: Gluster Build System

dht/cluster: add logs to fix-layout code path

2017-01-20T10:49:57+00:00

Currently there is no helpful log in fix-layout code path. Adding
the logs to be helpful for debugging fix-layout failures.

BUG: 1414782
Change-Id: I61c29ceedcaa2e235fa7be99866709d6ca6de3ae
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/16040
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G

dht/rebalance: reverify lookup failures

2016-12-22T14:28:03+00:00

race: readdirp has read one entry, and doing a lookup on
that entry, but user might have renamed/removed that entry just
after readdirp but before lookup.

Since remove-brick is a costly opertaion,will ingore any
ENOENT/ESTALE failures and move on.

Change-Id: I62c7fa93c0b9b7e764065ad1574b97acf51b5996
BUG: 1408115
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/15846
Reviewed-by: Raghavendra G 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System

quotad: fix potential buffer overflows

2016-08-25T12:18:09+00:00

This converts sprintf to gf_asprintf in following components:                                                                                                          * quotad.c
* dht
* afr
* protocol/client
* rpc/rpc-lib
* rpc/rpc-transport

Change-Id: If8a267bab3d91003bdef3a92664077a0136745ee
BUG: 1332073
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/14102
Tested-by: Manikandan Selvaganesh 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Manikandan Selvaganesh

dht:remember locked subvol and send unlock to the same

2016-05-06T08:54:29+00:00

During locking we send lock request to cached subvol,
and normally we unlock to the cached subvol
But with parallel fresh lookup on a directory, there
is a race window where the cached subvol can change
and the unlock can go into a different subvol from
which we took lock.

This will result in a stale lock held on one of the
subvol.

So we will store the details of subvol which we took the lock
and will unlock from the same subvol

Change-Id: I47df99491671b10624eb37d1d17e40bacf0b15eb
BUG: 1311002
Signed-off-by: Mohammed Rafi KC 
Reviewed-on: http://review.gluster.org/13492
Reviewed-by: N Balachandran 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Raghavendra G 
CentOS-regression: Gluster Build System

dht/rebalance: add lock migration fop to dht_migrate_file

2016-05-02T01:05:14+00:00

Change-Id: Id0e7400c8ae950c90d42a3ddf8b558a14959a1f8
BUG: 1326085
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/14074
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/distribute: detect stale layouts in entry fops

2016-04-22T17:28:54+00:00

dht_mkdir ()
{
      first-hashed-subvol = hashed-subvol for "bname" in in-memory
                            layout of "parent";
      inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any
               subvol, but we choose first-hashed-subvol randomly");
      {
begin:
            hashed-subvol = hashed-subvol for "bname" in in-memory
                            layout of "parent";
            hash-range = extract hashe-range from layout of "parent";

            ret = mkdir (parent/bname, hashed-subvol, hash-range);
            if (ret == "hash-value doesn't fall into layout stored on
                       the brick (this error is returned by posix-mkdir)")
            {
                refresh_parent_layout ();
                goto begin;
            }

      }
      inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN",
               "first-hashed-subvol");

      proceed with other parts of dht_mkdir;
}

posix_mkdir (parent/bname, client-hash-range)
{

       disk-hash-range = getxattr (parent, "dht-layout-key");
       if (disk-hash-range != client-hash-range) {
              fail-with-error ("hash-value doesn't fall into layout
                                stored on the brick");
              return 0;
       }

       continue-with-posix-mkdir;
}

Similar changes need to be done for dentry operations like create,
symlink, link, unlink, rmdir, rename. These will be addressed in
subsequent patches. This patch addresses only mkdir codepath.

This change breaks stripe tests, as on some striped subvols dht layout
xattrs are not set for some reason. This results in failure of
mkdir. Since striped volumes are always created with dht, some tests
associated with stripe also fail. So, I am making following tests
changes (since stripe is out of maintainance):
* modify ./tests/basic/rpc-coverage.t to not to use striped volumes
* mark all (2) tests in tests/bugs/stripe/ as bad tests

Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25
BUG: 1323040
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/13885
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: N Balachandran

Tier: Avoiding stale entries from causing demotion to stop

2016-03-14T06:05:42+00:00

When the parent GFID is a stale entry, the lookup on this parent
fails and this in turn fails the demotion process.

This patch will make the stale entry error to be skipped.

Situation for pargfid to be stale:
Consider a folder from a tar file. Once the tar file is untared
the files in the tar-file will start to demote.
when the demotion is under progress, if we tend to delete
the actual folder, then the files under it which are
undergoing demotion will do a lookup on the parent which was
deleted and become stale entry. This stale entry fails the
Lookup and this will fail the demotion of the other files(not from
tar) that are supposed to be demoted.

Change-Id: I3d47c32c4077526d477a25912b0135bab98b23fc
BUG: 1311178
Signed-off-by: hari gowtham 
Reviewed-on: http://review.gluster.org/13501
Tested-by: hari gowtham 
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Dan Lambright 
NetBSD-regression: NetBSD Build System

cluster/dht : Ftruncate on migrating file fails with EINVAL

2015-12-22T19:13:00+00:00

What:
If dht_open is called on a migrating file after the inode_ctx is set,
subsequent FOPs on that fd do not open the fd on the dst subvol.
This is seen when the open-ftruncate-close sequence is repeatedly
called on a migrating file.
A second call to the sequence described above causes dht_truncate_cbk
to call dht_truncate2 as the dht_inode_ctx was already set by the first
call. As dht_rebalance_in_progress_check is not called, the fd is not
opened on the dst subvol.
On a distributed-replicate volume, this causes AFR to
open the fd using afr_fix_open, but with the wrong flags, causing
posix_ftruncate to fail with EINVAL.
The fix: We require fd specific information to make a decision while
handling migrating files.
Set the fd_ctx to indicate the fd has been opened on the dst subvol
and check if it has been set while processing Phase1/Phase2 checks
in the FOP callback functions.

Change-Id: I43cdcd8017b4a11e18afdd210469de7cd9a5ef14
BUG: 1284823
Signed-off-by: N Balachandran 
Reviewed-on: http://review.gluster.org/12985
Reviewed-by: Raghavendra G 
Tested-by: Gluster Build System 
Reviewed-by: Dan Lambright 
Tested-by: Dan Lambright

cluster/dht/rebalance: rebalance failure handling

2015-10-29T08:30:10+00:00

At current state rebalance aborts basically on any failure
like fix-layout of a directory, readdirp, opendir etc. Unless it is
not a remove-brick process we can ignore these failures.

Major impact:  Any failure in the gf_defrag_process_dir means there
are files left unmigrated in the directory.

Fix-layout(setxattr) failure will impact it's child subtree i.e.
the child subtree will not be rebalanced.

Settle-hash (commit-hash)failure will trigger lookup_everywhere for
immediate children until the next commit-hash.

Note: Remove-brick opertaion is still sensitive to any kind of failure.

Change-Id: I08ab71909bc832f03cc1517172525376f7aed14a
BUG: 1257076
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/12013
Tested-by: NetBSD Build System 
Reviewed-by: Raghavendra G