glusterfs.git/xlators/cluster/dht/src/Makefile.am, branch release-4.0

feature/dht: Directory synchronization

2017-04-26T09:00:34+00:00

Design doc: https://review.gluster.org/16876

Directory creation is now synchronized with blocking inodelk of the
parent on the hashed subvolume followed by the entrylk on the hashed
subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup
selfheal mkdir.

To maintain internal consistency of directories across all subvols of
dht, we need locks. Specifically we are interested in:

 1. Consistency of layout of a directory. Only one writer should modify
    the layout at a time. A writer (layout setting during directory heal
    as part of lookup) shouldn't modify the layout while there are
    readers (all other fops like create, mkdir etc., which consume
    layout) and readers shouldn't read the layout while a writer is in
    progress. Readers can read the layout simultaneously. Writer takes
    a WRITE inodelk on the directory (whose layout is being modified)
    across ALL subvols. Reader takes a READ inodelk on the directory
    (whose layout is being read) on ANY subvol.

 2. Consistency of directory namespace across subvols. The path and
    associated gfid should be same on all subvols. A gfid should not be
    associated with more than one path on any subvol. All fops that can
    change directory names (mkdir, rmdir, renamedir, directory creation
    phase in lookup-heal) takes an entrylk on hashed subvol of the
    directory.

 NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a
        directory, the transaction itself is a consumer of layout on
        parent directory. So, the transaction is a reader of parent
        layout and does an inodelk on parent directory just like any
        other layout reader. So a mkdir (dir/subdir) would:

     > Acquire a READ inodelk on "dir" on any subvol.
     > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir".
     > creates directory on hashed subvol and possibly on non-hashed subvols.
     > UNLOCK (entrylk)
     > UNLOCK (inodelk)

 NOTE2: mkdir fop while setting the layout of the directory being created
        is considered as a reader, but NOT a writer. The reason is for
        a fop which can consume the layout of a directory to come either
        of the following conditions has to be true:

     > mkdir syscall from application has to complete. In this case no
       need of synchronization.
     > A lookup issued on the directory racing with mkdir has to complete.
       Since layout setting by a lookup is considered as a writer, only
       one of either mkdir or lookup will set the layout.

Code re-organization:
   All the lock related routines are moved to "dht-lock.c" file.
   New wrapper function is introduced to take blocking inodelk
   followed by entrylk 'dht_protect_namespace'

Updates #191
Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31
Signed-off-by: Kotresh HR 
BUG: 1443373
Reviewed-on: https://review.gluster.org/15472
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Raghavendra G 
Smoke: Gluster Build System

build: out-of-tree builds generates files in the wrong directory

2016-09-18T16:34:37+00:00

And minor cleanup of a few of the Makefile.am files while we're
at it.

Rewrite the make rules to do what xdrgen does. Now we can get rid
of xdrgen.

Note 1. netbsd6's sed doesn't do -i. Why are we still running
smoke tests on netbsd6 and not netbsd7? We barely support netbsd7
as it is.

Note 2. Why is/was libgfxdr.so (.../rpc/xdr/src/...) linked with
libglusterfs? A cut-and-paste mistake? It has no references to
symbols in libglusterfs.

Note3. "/#ifndef\|#define\|#endif/" (note the '\'s) is a _basic_
regex that matches the same lines as the _extended_ regex
"/#(ifndef|define|endif)/". To match the extended regex sed needs to
be run with -r on Linux; with -E on *BSD. However NetBSD's and
FreeBSD's sed helpfully also provide -r for compatibility. Using a
basic regex avoids having to use a kludge in order to run sed with
the correct option on OS X.

Note 4. Not copying the bit of xdrgen that inserts copyright/license
boilerplate. AFAIK it's silly to pretend that machine generated
files like these can be copyrighted or need license boilerplate.
The XDR source files have their own copyright and license; and
their copyrights are bound to be more up to date than old
boilerplate inserted by a script. From what I've seen of other
Open Source projects -- e.g. gcc and its C parser files generated
by yacc and lex -- IIRC they don't bother to add copyright/license
boilerplate to their generated files.

It appears that it's a long-standing feature of make (SysV, BSD,
gnu) for out-of-tree builds to helpfully pretend that the source
files it can find in the VPATH "exist" as if they are in the $cwd.
rpcgen doesn't work well in this situation and generates files
with "bad" #include directives.

E.g. if you `rpcgen ../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.x`,
you get an #include directive in the generated .c file like this:

  ...
  #include "../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.h"
  ...

which (obviously) results in compile errors on out-of-tree build
because the (generated) header file doesn't exist at that location.
Compared to `rpcgen ./glusterfs3-xdr.x` where you get:

  ...
  #include "glusterfs3-xdr.h"
  ...

Which is what we need. We have to resort to some Stupid Make Tricks
like the addition of various .PHONY targets to work around the VPATH
"help".

Warning: When doing an in-tree build, -I$(top_builddir)/rpc/xdr/...
looks exactly like -I$(top_srcdir)/rpc/xdr/...  Don't be fooled though.
And don't delete the -I$(top_builddir)/rpc/xdr/... bits

Change-Id: Iba6ab96b2d0a17c5a7e9f92233993b318858b62e
BUG: 1330604
Signed-off-by: Kaleb S KEITHLEY 
Reviewed-on: http://review.gluster.org/14085
Tested-by: Niels de Vos 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos

cluster/tier: add tiering events

2016-09-08T12:32:20+00:00

Add events for:
* tier attach and detach
* tier pause and resume
* tier rising and dropping hi and lo watermarks

Update eventskeygen.py with tiering events.
Update cli help with:
* attach: add optional force argument
* detach: make force available as non-optional argument on its own

Change-Id: I43990d3a8742151a4a7889bafa19cb572fe661bd
BUG: 1368336
Signed-off-by: Milind Changire 
Reviewed-on: http://review.gluster.org/15232
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Dan Lambright 
Tested-by: Dan Lambright

cluster/tier: readdirp to cold tier only

2015-11-23T12:05:55+00:00

It is possible a file would get migrated in the middle
of a readdir operation. If there are four subvolumes A,B,C,D,
and if readdir reads them in order and reaches subvol B,
then, if a file is moved from D to A, it will not be included
in the readdir output.

This phenonema has pre-existed in DHT migration but is more
apparent in tiering.

When a file is moved off the hashed subvolume a T file is created.
For tiering, we will make the cold subvolume the hashed subvolume.
This will ensure the creation of a T file. Readdir will not skip T
files in the tier translator.

Making the cold subvolume the hashed subvolume ensures the T
files created on promotions or creates will be less likely to
fill the volume.

Creates still put the data on the hot subvolume.

Change-Id: Ifde557d3d0e94a4570ca9f115adee3db2ee75407
BUG:  1281598
Signed-off-by: Dan Lambright 
Reviewed-on: http://review.gluster.org/12530
Tested-by: Gluster Build System 
Tested-by: NetBSD Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Raghavendra G

build: export minimum symbols from xlators for correct resolution

2015-09-24T14:37:42+00:00

We've been lucky that we haven't had any symbol collisions until now.
Now we have a collision between the snapview-client's svc_lookup() and
libntirpc's svc_lookup() with nfs-ganesha's FSAL_GLUSTER and libgfapi.

As a short term solution all the snapview-client's FOP methods were
changed to static scope. See http://review.gluster.org/11805. This
works in snapview-client because all the FOP methods are defined in
a single source file. This solution doesn't work for other xlators
with FOP methods defined in multiple source files.

To address this we link with libtool's '-export-symbols $symbol-file'
(a wrapper around `ld --version-script ...` --- on linux anyway) and
only export the minimum required symbols from the xlator sharedlib.

N.B. the libtool man page says that the symbol file should be named
foo.sym, thus the rename of *.exports to *.sym. While foo.exports
worked, we will follow the documentation.

Signed-off-by: Kaleb S. KEITHLEY 
BUG: 1248669
Change-Id: I1de68b3e3be58ae690d8bfb2168bfc019983627c
Reviewed-on: http://review.gluster.org/11814
Tested-by: NetBSD Build System 
Tested-by: Gluster Build System 
Reviewed-by: soumya k 
Reviewed-by: Niels de Vos

core: use reference counting for mem_acct structures

2015-05-09T13:28:13+00:00

When freeing memory, our memory-accounting code expects to be able to
dereference from the (previously) allocated block to its owning
translator.  However, as we have already found once in option
validation and twice in logging, that translator might itself have
been freed and the dereference attempt causes on of our daemons to
crash with SIGSEGV.  This patch attempts to fix that as follows:

 * We no longer embed a struct mem_acct directly in a struct xlator,
   but instead allocate it separately.

 * Allocated memory blocks now contain a pointer to the mem_acct
   instead of the xlator.

 * The mem_acct structure contains a reference count, manipulated in
   both the normal and translator allocate/free code using atomic
   increments and decrements.

 * Because it's now a separate structure, we can defer freeing the
   mem_acct until its reference count reaches zero (either way).

 * Some unit tests were disabled, because they embedded their own
   copies of the implementation for what they were supposedly testing.
   Life's too short to spend time fixing tests that seem designed to
   impede progress by requiring a certain implementation as well as
   behavior.

Change-Id: Id929b11387927136f78626901729296b6c0d0fd7
BUG: 1211749
Signed-off-by: Jeff Darcy 
Reviewed-on: http://review.gluster.org/10417
Tested-by: Gluster Build System 
Reviewed-by: Krishnan Parthasarathi 
Reviewed-by: Niels de Vos 
Reviewed-by: Pranith Kumar Karampuri

cluster/dht: fix tier.c problems found prior to feature freeze

2015-04-06T09:35:37+00:00

This patch resolves tiering translator issues taken from the list
in bug 1203776. These issues have been selected to be fixed
first. The rest will be fixed in a subsequent patch (or are not a
problem).

3. Replace hardcoded #defines of promote/demote file names

6. Use loc_wipe() in migrate_using_query_file()

9. Only promote/demote files on the same node on which they reside.

14. Replace calloc with GF_CALLOC in tier.c and ensure freeing done
properly.

15. Handle if parse_query_str fails

22. Only load gfdb library on server side, remove SQL references
    from client.

Change-Id: I6563b11e58ab2e4c6b1ce44db755781ad6d930fb
BUG: 1203776
Signed-off-by: Dan Lambright 
Reviewed-on: http://review.gluster.org/9987
Tested-by: Gluster Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Niels de Vos

Fixing build of xlator/cluster/dht/src/dht-rebalance.c when tiering is disabled

2015-03-25T13:23:10+00:00

1) Removed unnecessary include tier.h in dht-rebalance.c
2) tier xlator will only compile when tiering is enabled in configure.ac

Change-Id: Ia21aa9ff403506dc898a83236e9e2d382a0594da
BUG: 1204604
Signed-off-by: Joseph Fernandes 
Reviewed-on: http://review.gluster.org/9973
Reviewed-by: Niels de Vos 
Reviewed-by: Sachin Pandit 
Tested-by: Gluster Build System

build: pass SQLITE_CFLAGS while building dht files that include tier.h

2015-03-23T09:45:50+00:00

Building on FreeBSD has been broken by http://review.gluster.org/9724
which introduces the cluster/tiering xlator. The CFLAGS passed to the
compiler do not include the path where sqlite3.h can be found.

In fact, an attempt was made to pass the flags on, but a later variable
overwrite these again.

BUG: 1194753
Change-Id: I1c890fa9a0d82492726306fe6b03bd50ca985e31
Signed-off-by: Niels de Vos 
Reviewed-on: http://review.gluster.org/9964
Tested-by: Gluster Build System 
Reviewed-by: Sachin Pandit

cluster/dht: Add tier translator.

2015-03-21T16:50:29+00:00

The tier translator shares most of DHT's code. It differs in how
subvolumes are chosen for I/Os, and how file migration (cache promotion
and demotion) is managed. That different functionality is split to either
DHT or tier logic according to the "tier_methods" structure.

A cache promotion and demotion thread is created in a manner
similar to the rebalance daemon. The thread operates a timing
wheel which periodically checks for promotion and demotion candidates
(files). Candidates are queued and then migrated. Candidates must exist on
the same node as the daemon and meet other critera per caching policies.

This patch has two authors (Dan Lambright and Joseph Fernandes). Dan
did the DHT changes and Joe wrote the cache policies. The fix depends on
DHT readidr changes and the database library which have been submitted
separately.  Header files in libglusterfs/src/gfdb should be reviewed in
patch 9683.

For more background and design see the feature page [1].

[1]
http://www.gluster.org/community/documentation/index.php/Features/data-classification

Change-Id: Icc26c517ccecf5c42aef039f5b9c6f7afe83e46c
BUG: 1194753
Signed-off-by: Dan Lambright 
Reviewed-on: http://review.gluster.org/9724
Reviewed-by: Vijay Bellur 
Tested-by: Vijay Bellur