summaryrefslogtreecommitdiffstats
path: root/doc/hacker-guide
diff options
context:
space:
mode:
Diffstat (limited to 'doc/hacker-guide')
-rw-r--r--doc/hacker-guide/en-US/markdown/adding-fops.md18
-rw-r--r--doc/hacker-guide/en-US/markdown/afr.md191
-rw-r--r--doc/hacker-guide/en-US/markdown/coding-standard.md402
-rw-r--r--doc/hacker-guide/en-US/markdown/inode.md226
-rw-r--r--doc/hacker-guide/en-US/markdown/posix.md59
-rw-r--r--doc/hacker-guide/en-US/markdown/translator-development.md666
-rw-r--r--doc/hacker-guide/en-US/markdown/unittest.md228
-rw-r--r--doc/hacker-guide/en-US/markdown/write-behind.md56
8 files changed, 0 insertions, 1846 deletions
diff --git a/doc/hacker-guide/en-US/markdown/adding-fops.md b/doc/hacker-guide/en-US/markdown/adding-fops.md
deleted file mode 100644
index 3f72ed3e23a..00000000000
--- a/doc/hacker-guide/en-US/markdown/adding-fops.md
+++ /dev/null
@@ -1,18 +0,0 @@
-Adding a new FOP
-================
-
-Steps to be followed when adding a new FOP to GlusterFS:
-
-1. Edit `glusterfs.h` and add a `GF_FOP_*` constant.
-2. Edit `xlator.[ch]` and:
- * add the new prototype for fop and callback.
- * edit `xlator_fops` structure.
-3. Edit `xlator.c` and add to fill_defaults.
-4. Edit `protocol.h` and add struct necessary for the new FOP.
-5. Edit `defaults.[ch]` and provide default implementation.
-6. Edit `call-stub.[ch]` and provide stub implementation.
-7. Edit `common-utils.c` and add to gf_global_variable_init().
-8. Edit client-protocol and add your FOP.
-9. Edit server-protocol and add your FOP.
-10. Implement your FOP in any translator for which the default implementation
- is not sufficient.
diff --git a/doc/hacker-guide/en-US/markdown/afr.md b/doc/hacker-guide/en-US/markdown/afr.md
deleted file mode 100644
index 566573a4e26..00000000000
--- a/doc/hacker-guide/en-US/markdown/afr.md
+++ /dev/null
@@ -1,191 +0,0 @@
-cluster/afr translator
-======================
-
-Locking
--------
-
-Before understanding replicate, one must understand two internal FOPs:
-
-### `GF_FILE_LK`
-
-This is exactly like `fcntl(2)` locking, except the locks are in a
-separate domain from locks held by applications.
-
-### `GF_DIR_LK (loc_t *loc, char *basename)`
-
-This allows one to lock a name under a directory. For example,
-to lock /mnt/glusterfs/foo, one would use the call:
-
-```
-GF_DIR_LK ({loc_t for "/mnt/glusterfs"}, "foo")
-```
-
-If one wishes to lock *all* the names under a particular directory,
-supply the basename argument as `NULL`.
-
-The locks can either be read locks or write locks; consult the
-function prototype for more details.
-
-Both these operations are implemented by the features/locks (earlier
-known as posix-locks) translator.
-
-Basic design
-------------
-
-All FOPs can be classified into four major groups:
-
-### inode-read
-
-Operations that read an inode's data (file contents) or metadata (perms, etc.).
-
-access, getxattr, fstat, readlink, readv, stat.
-
-### inode-write
-
-Operations that modify an inode's data or metadata.
-
-chmod, chown, truncate, writev, utimens.
-
-### dir-read
-
-Operations that read a directory's contents or metadata.
-
-readdir, getdents, checksum.
-
-### dir-write
-
-Operations that modify a directory's contents or metadata.
-
-create, link, mkdir, mknod, rename, rmdir, symlink, unlink.
-
-Some of these make a subgroup in that they modify *two* different entries:
-link, rename, symlink.
-
-### Others
-
-Other operations.
-
-flush, lookup, open, opendir, statfs.
-
-Algorithms
-----------
-
-Each of the four major groups has its own algorithm:
-
-### inode-read, dir-read
-
-1. Send a request to the first child that is up:
- * if it fails:
- * try the next available child
- * if we have exhausted all children:
- * return failure
-
-### inode-write
-
- All operations are done in parallel unless specified otherwise.
-
-1. Send a ``GF_FILE_LK`` request on all children for a write lock on the
- appropriate region
- (for metadata operations: entire file (0, 0) for writev:
- (offset, offset+size of buffer))
- * If a lock request fails on a child:
- * unlock all children
- * try to acquire a blocking lock (`F_SETLKW`) on each child, serially.
- If this fails (due to `ENOTCONN` or `EINVAL`):
- Consider this child as dead for rest of transaction.
-2. Mark all children as "pending" on all (alive) children (see below for
-meaning of "pending").
- * If it fails on any child:
- * mark it as dead (in transaction local state).
-3. Perform operation on all (alive) children.
- * If it fails on any child:
- * mark it as dead (in transaction local state).
-4. Unmark all successful children as not "pending" on all nodes.
-5. Unlock region on all (alive) children.
-
-### dir-write
-
- The algorithm for dir-write is same as above except instead of holding
- `GF_FILE_LK` locks we hold a GF_DIR_LK lock on the name being operated upon.
- In case of link-type calls, we hold locks on both the operand names.
-
-"pending"
----------
-
-The "pending" number is like a journal entry. A pending entry is an
-array of 32-bit integers stored in network byte-order as the extended
-attribute of an inode (which can be a directory as well).
-
-There are three keys corresponding to three types of pending operations:
-
-### `AFR_METADATA_PENDING`
-
-There are some metadata operations pending on this inode (perms, ctime/mtime,
-xattr, etc.).
-
-### `AFR_DATA_PENDING`
-
-There is some data pending on this inode (writev).
-
-### `AFR_ENTRY_PENDING`
-
-There are some directory operations pending on this directory
-(create, unlink, etc.).
-
-Self heal
----------
-
-* On lookup, gather extended attribute data:
- * If entry is a regular file:
- * If an entry is present on one child and not on others:
- * create entry on others.
- * If entries exist but have different metadata (perms, etc.):
- * consider the entry with the highest `AFR_METADATA_PENDING` number as
- definitive and replicate its attributes on children.
- * If entry is a directory:
- * Consider the entry with the highest `AFR_ENTRY_PENDING` number as
- definitive and replicate its contents on all children.
- * If any two entries have non-matching types (i.e., one is file and
- other is directory):
- * Announce to the user via log that a split-brain situation has been
- detected, and do nothing.
-* On open, gather extended attribute data:
- * Consider the file with the highest `AFR_DATA_PENDING` number as
- the definitive one and replicate its contents on all other
- children.
-
-During all self heal operations, appropriate locks must be held on all
-regions/entries being affected.
-
-Inode scaling
--------------
-
-Inode scaling is necessary because if a situation arises where an inode number
-is returned for a directory (by lookup) which was previously the inode number
-of a file (as per FUSE's table), then FUSE gets horribly confused (consult a
-FUSE expert for more details).
-
-To avoid such a situation, we distribute the 64-bit inode space equally
-among all children of replicate.
-
-To illustrate:
-
-If c1, c2, c3 are children of replicate, they each get 1/3 of the available
-inode space:
-
-------------- -- -- -- -- -- -- -- -- -- -- -- ---
-Child: c1 c2 c3 c1 c2 c3 c1 c2 c3 c1 c2 ...
-Inode number: 1 2 3 4 5 6 7 8 9 10 11 ...
-------------- -- -- -- -- -- -- -- -- -- -- -- ---
-
-Thus, if lookup on c1 returns an inode number "2", it is scaled to "4"
-(which is the second inode number in c1's space).
-
-This way we ensure that there is never a collision of inode numbers from
-two different children.
-
-This reduction of inode space doesn't really reduce the usability of
-replicate since even if we assume replicate has 1024 children (which would be a
-highly unusual scenario), each child still has a 54-bit inode space:
-$2^{54} \sim 1.8 \times 10^{16}$, which is much larger than any real
-world requirement.
diff --git a/doc/hacker-guide/en-US/markdown/coding-standard.md b/doc/hacker-guide/en-US/markdown/coding-standard.md
deleted file mode 100644
index 368c5553464..00000000000
--- a/doc/hacker-guide/en-US/markdown/coding-standard.md
+++ /dev/null
@@ -1,402 +0,0 @@
-GlusterFS Coding Standards
-==========================
-
-Structure definitions should have a comment per member
-------------------------------------------------------
-
-Every member in a structure definition must have a comment about its
-purpose. The comment should be descriptive without being overly verbose.
-
-*Bad:*
-
-```
-gf_lock_t lock; /* lock */
-```
-
-*Good:*
-
-```
-DBTYPE access_mode; /* access mode for accessing
- * the databases, can be
- * DB_HASH, DB_BTREE
- * (option access-mode <mode>)
- */
-```
-
-Declare all variables at the beginning of the function
-------------------------------------------------------
-
-All local variables in a function must be declared immediately after the
-opening brace. This makes it easy to keep track of memory that needs to be freed
-during exit. It also helps debugging, since gdb cannot handle variables
-declared inside loops or other such blocks.
-
-Always initialize local variables
----------------------------------
-
-Every local variable should be initialized to a sensible default value
-at the point of its declaration. All pointers should be initialized to NULL,
-and all integers should be zero or (if it makes sense) an error value.
-
-
-*Good:*
-
-```
-int ret = 0;
-char *databuf = NULL;
-int _fd = -1;
-```
-
-Initialization should always be done with a constant value
-----------------------------------------------------------
-
-Never use a non-constant expression as the initialization value for a variable.
-
-
-*Bad:*
-
-```
-pid_t pid = frame->root->pid;
-char *databuf = malloc (1024);
-```
-
-Validate all arguments to a function
-------------------------------------
-
-All pointer arguments to a function must be checked for `NULL`.
-A macro named `VALIDATE` (in `common-utils.h`)
-takes one argument, and if it is `NULL`, writes a log message and
-jumps to a label called `err` after setting op_ret and op_errno
-appropriately. It is recommended to use this template.
-
-
-*Good:*
-
-```
-VALIDATE(frame);
-VALIDATE(this);
-VALIDATE(inode);
-```
-
-Never rely on precedence of operators
--------------------------------------
-
-Never write code that relies on the precedence of operators to execute
-correctly. Such code can be hard to read and someone else might not
-know the precedence of operators as accurately as you do.
-
-*Bad:*
-
-```
-if (op_ret == -1 && errno != ENOENT)
-```
-
-*Good:*
-
-```
-if ((op_ret == -1) && (errno != ENOENT))
-```
-
-Use exactly matching types
---------------------------
-
-Use a variable of the exact type declared in the manual to hold the
-return value of a function. Do not use an ``equivalent'' type.
-
-
-*Bad:*
-
-```
-int len = strlen (path);
-```
-
-*Good:*
-
-```
-size_t len = strlen (path);
-```
-
-Never write code such as `foo->bar->baz`; check every pointer
--------------------------------------------------------------
-
-Do not write code that blindly follows a chain of pointer
-references. Any pointer in the chain may be `NULL` and thus
-cause a crash. Verify that each pointer is non-null before following
-it.
-
-Check return value of all functions and system calls
-----------------------------------------------------
-
-The return value of all system calls and API functions must be checked
-for success or failure.
-
-*Bad:*
-
-```
-close (fd);
-```
-
-*Good:*
-
-```
-op_ret = close (_fd);
-if (op_ret == -1) {
- gf_log (this->name, GF_LOG_ERROR,
- "close on file %s failed (%s)", real_path,
- strerror (errno));
- op_errno = errno;
- goto out;
-}
-```
-
-
-Gracefully handle failure of malloc
------------------------------------
-
-GlusterFS should never crash or exit due to lack of memory. If a
-memory allocation fails, the call should be unwound and an error
-returned to the user.
-
-*Use result args and reserve the return value to indicate success or failure:*
-
-The return value of every functions must indicate success or failure (unless
-it is impossible for the function to fail --- e.g., boolean functions). If
-the function needs to return additional data, it must be returned using a
-result (pointer) argument.
-
-*Bad:*
-
-```
-int32_t dict_get_int32 (dict_t *this, char *key);
-```
-
-*Good:*
-
-```
-int dict_get_int32 (dict_t *this, char *key, int32_t *val);
-```
-
-Always use the `n' versions of string functions
------------------------------------------------
-
-Unless impossible, use the length-limited versions of the string functions.
-
-*Bad:*
-
-```
-strcpy (entry_path, real_path);
-```
-
-*Good:*
-
-```
-strncpy (entry_path, real_path, entry_path_len);
-```
-
-No dead or commented code
--------------------------
-
-There must be no dead code (code to which control can never be passed) or
-commented out code in the codebase.
-
-Only one unwind and return per function
----------------------------------------
-
-There must be only one exit out of a function. `UNWIND` and return
-should happen at only point in the function.
-
-Function length or Keep functions small
----------------------------------------
-
-We live in the UNIX-world where modules do one thing and do it well.
-This rule should apply to our functions also. If a function is very long, try splitting it
-into many little helper functions. The question is, in a coding
-spree, how do we know a function is long and unreadable. One rule of
-thumb given by Linus Torvalds is that, a function should be broken-up
-if you have 4 or more levels of indentation going on for more than 3-4
-lines.
-
-*Example for a helper function:*
-```
-static int
-same_owner (posix_lock_t *l1, posix_lock_t *l2)
-{
- return ((l1->client_pid == l2->client_pid) &&
- (l1->transport == l2->transport));
-}
-```
-
-Defining functions as static
-----------------------------
-
-Define internal functions as static only if you're
-very sure that there will not be a crash(..of any kind..) emanating in
-that function. If there is even a remote possibility, perhaps due to
-pointer derefering, etc, declare the function as non-static. This
-ensures that when a crash does happen, the function name shows up the
-in the back-trace generated by libc. However, doing so has potential
-for polluting the function namespace, so to avoid conflicts with other
-components in other parts, ensure that the function names are
-prepended with a prefix that identify the component to which it
-belongs. For eg. non-static functions in io-threads translator start
-with iot_.
-
-Ensure function calls wrap around after 80-columns
---------------------------------------------------
-
-Place remaining arguments on the next line if needed.
-
-Functions arguments and function definition
--------------------------------------------
-
-Place all the arguments of a function definition on the same line
-until the line goes beyond 80-cols. Arguments that extend beyind
-80-cols should be placed on the next line.
-
-Style issues
-------------
-
-### Brace placement
-
-Use K&R/Linux style of brace placement for blocks.
-
-*Good:*
-
-```
-int some_function (...)
-{
- if (...) {
- /* ... */
- } else if (...) {
- /* ... */
- } else {
- /* ... */
- }
-
- do {
- /* ... */
- } while (cond);
-}
-```
-
-### Indentation
-
-Use *eight* spaces for indenting blocks. Ensure that your
-file contains only spaces and not tab characters. You can do this
-in Emacs by selecting the entire file (`C-x h`) and
-running `M-x untabify`.
-
-To make Emacs indent lines automatically by eight spaces, add this
-line to your `.emacs`:
-
-```
-(add-hook 'c-mode-hook (lambda () (c-set-style "linux")))
-```
-
-### Comments
-
-Write a comment before every function describing its purpose (one-line),
-its arguments, and its return value. Mention whether it is an internal
-function or an exported function.
-
-Write a comment before every structure describing its purpose, and
-write comments about each of its members.
-
-Follow the style shown below for comments, since such comments
-can then be automatically extracted by doxygen to generate
-documentation.
-
-*Good:*
-
-```
-/**
-* hash_name -hash function for filenames
-* @par: parent inode number
-* @name: basename of inode
-* @mod: number of buckets in the hashtable
-*
-* @return: success: bucket number
-* failure: -1
-*
-* Not for external use.
-*/
-```
-
-### Indicating critical sections
-
-To clearly show regions of code which execute with locks held, use
-the following format:
-
-```
-pthread_mutex_lock (&mutex);
-{
- /* code */
-}
-pthread_mutex_unlock (&mutex);
-```
-
-*A skeleton fop function:*
-
-This is the recommended template for any fop. In the beginning come
-the initializations. After that, the `success' control flow should be
-linear. Any error conditions should cause a `goto` to a single
-point, `out`. At that point, the code should detect the error
-that has occurred and do appropriate cleanup.
-
-```
-int32_t
-sample_fop (call_frame_t *frame, xlator_t *this, ...)
-{
- char * var1 = NULL;
- int32_t op_ret = -1;
- int32_t op_errno = 0;
- DIR * dir = NULL;
- struct posix_fd * pfd = NULL;
-
- VALIDATE_OR_GOTO (frame, out);
- VALIDATE_OR_GOTO (this, out);
-
- /* other validations */
-
- dir = opendir (...);
-
- if (dir == NULL) {
- op_errno = errno;
- gf_log (this->name, GF_LOG_ERROR,
- "opendir failed on %s (%s)", loc->path,
- strerror (op_errno));
- goto out;
- }
-
- /* another system call */
- if (...) {
- op_errno = ENOMEM;
- gf_log (this->name, GF_LOG_ERROR,
- "out of memory :(");
- goto out;
- }
-
- /* ... */
-
- out:
- if (op_ret == -1) {
-
- /* check for all the cleanup that needs to be
- done */
-
- if (dir) {
- closedir (dir);
- dir = NULL;
- }
-
- if (pfd) {
- FREE (pfd->path);
- FREE (pfd);
- pfd = NULL;
- }
- }
-
- STACK_UNWIND (frame, op_ret, op_errno, fd);
- return 0;
-}
-```
diff --git a/doc/hacker-guide/en-US/markdown/inode.md b/doc/hacker-guide/en-US/markdown/inode.md
deleted file mode 100644
index a340ab9ca8e..00000000000
--- a/doc/hacker-guide/en-US/markdown/inode.md
+++ /dev/null
@@ -1,226 +0,0 @@
-#Inode and dentry management in GlusterFS:
-
-##Background
-Filesystems internally refer to files and directories via inodes. Inodes
-are unique identifiers of the entities stored in a filesystem. Whenever an
-application has to operate on a file/directory (read/modify), the filesystem
-maps that file/directory to the right inode and start referring to that inode
-whenever an operation has to be performed on the file/directory.
-
-In GlusterFS a new inode gets created whenever a new file/directory is created
-OR when a successful lookup is done on a file/directory for the first time.
-Inodes in GlusterFS are maintained by the inode table which gets initiated when
-the filesystem daemon is started (both for the brick process as well as the
-mount process). Below are some important data structures for inode management.
-
-## Data-structure (inode-table)
-```
-struct _inode_table {
- pthread_mutex_t lock;
- size_t hashsize; /* bucket size of inode hash and dentry hash */
- char *name; /* name of the inode table, just for gf_log() */
- inode_t *root; /* root directory inode, with inode
- number and gfid 1 */
- xlator_t *xl; /* xlator to be called to do purge and
- the xlator which maintains the inode table*/
- uint32_t lru_limit; /* maximum LRU cache size */
- struct list_head *inode_hash; /* buckets for inode hash table */
- struct list_head *name_hash; /* buckets for dentry hash table */
- struct list_head active; /* list of inodes currently active (in an fop) */
- uint32_t active_size; /* count of inodes in active list */
- struct list_head lru; /* list of inodes recently used.
- lru.next most recent */
- uint32_t lru_size; /* count of inodes in lru list */
- struct list_head purge; /* list of inodes to be purged soon */
- uint32_t purge_size; /* count of inodes in purge list */
-
- struct mem_pool *inode_pool; /* memory pool for inodes */
- struct mem_pool *dentry_pool; /* memory pool for dentrys */
- struct mem_pool *fd_mem_pool; /* memory pool for fd_t */
- int ctxcount; /* number of slots in inode->ctx */
-};
-```
-
-#Life-cycle
-```
-
-inode_table_new (size_t lru_limit, xlator_t *xl)
-
-This is a function which allocates a new inode table. Usually the top xlators in
-the graph such as protocol/server (for bricks), fuse and nfs (for fuse and nfs
-mounts) and libgfapi do inode managements. Hence they are the ones which will
-allocate a new inode table by calling the above function.
-
-Each xlator graph in glusterfs maintains an inode table. So in fuse clients,
-whenever there is a graph change due to add brick/remove brick or
-addition/removal of some other xlators, a new graph is created which creates a
-new inode table.
-
-Thus an allocated inode table is destroyed only when the filesystem daemon is
-killed or unmounted.
-
-```
-
-#what it contains.
-```
-
-Inode table in glusterfs mainly contains a hash table for maintaining inodes.
-In general a file/directory is considered to be existing if there is a
-corresponding inode present in the inode table. If a inode for a file/directory
-cannot be found in the inode table, glusterfs tries to resolve it by sending a
-lookup on the entry for which the inode is needed. If lookup is successful, then
-a new inode correponding to the entry is added to the hash table present in the
-inode table. Thus an inode present in the hash-table means, its an existing
-file/directory within the filesystem. The inode table also contains the hash
-size of the hash table (as of now it is hard coded to 14057. The hash value of
-a inode is calculated using its gfid).
-
-Apart from the hash table, inode table also maintains 3 important list of inodes
-1) Active list:
-Active list contains all the active inodes (i.e inodes which are currently part
-of some fop).
-2) Lru list:
-Least recently used inodes list. A limit can be set for the size of the lru
-list. For bricks it is 16384 and for clients it is infinity.
-3) Purge list:
-List of all the inodes which have to be purged (i.e inodes which have to be
-deleted from the inode table due to unlink/rmdir/forget).
-
-And at last it also contains the mem-pool for allocating inodes, dentries so
-that frequent malloc/calloc and free of the data structures can be avoided.
-```
-
-#Data structure (inode)
-```
-struct _inode {
- inode_table_t *table; /* the table this inode belongs to */
- uuid_t gfid; /* unique identifier of the inode */
- gf_lock_t lock;
- uint64_t nlookup;
- uint32_t fd_count; /* Open fd count */
- uint32_t ref; /* reference count on this inode */
- ia_type_t ia_type; /* what kind of file */
- struct list_head fd_list; /* list of open files on this inode */
- struct list_head dentry_list; /* list of directory entries for this inode */
- struct list_head hash; /* hash table pointers */
- struct list_head list; /* active/lru/purge */
-
- struct _inode_ctx *_ctx; /* place holder for keeping the
- information about the inode by different xlators */
-};
-
-As said above, inodes are internal way of identifying the files/directories. A
-inode uniquely represents a file/directory. A new inode is created whenever a
-create/mkdir/symlink/mknod operations are performed. Apart from that a new inode
-is created upon the successful fresh lookup of a file/directory. Say the
-filesystem contained some file "a" within root and the filesystem was
-unmounted. Now when glusterfs is mounted and some operation is perfomed on "/a",
-glusterfs tries to get the inode for the entry "a" with parent inode as
-root. But, since glusterfs just came up, it will not be able to find the inode
-for "a" and will send a lookup on "/a". If the lookup operation succeeds (i.e.
-the root of glusterfs contains an entry called "a"), then a new inode for "/a"
-is created and added to the inode table.
-
-Depending upon the situation, an inode can be in one of the 3 lists maintained
-by the inode table. If some fop is happening on the inode, then the inode will
-be present in the active inodes list maintained by the inode table. Active
-inodes are those inodes whose refcount is greater than zero. Whenever some
-operation comes on a file/directory, and the resolver tries to find the inode
-for it, it increments the refcount of the inode before returning the inode. The
-refcount of an inode can be incremented by calling the below function
-
-inode_ref (inode_t *inode)
-
-Any xlator which wants to operate on a inode as part of some fop (or wants the
-inode in the callback), should hold a ref on the inode.
-Once the fop is completed before sending the reply of the fop to the above
-layers , the inode has to be unrefed. When the refcount of an inode becomes
-zero, it is removed from the active inodes list and put into LRU list maintained
-by the inode table. Thus in short if some fop is happening on a file/directory,
-the corresponding inode will be in the active list or it will be in the LRU
-list.
-```
-
-#Life Cycle
-
-A new inode is created whenever a new file/directory/symlink is created OR a
-successful lookup of an existing entry is done. The xlators which does inode
-management (as of now protocol/server, fuse, nfs, gfapi) will perform inode_link
-operation upon successful lookup or successful creation of a new entry.
-
-inode_link (inode_t *inode, inode_t *parent, const char *name,
- struct iatt *buf);
-
-inode_link actually adds the inode to the inode table (to be precise it adds
-the inode to the hash table maintained by the inode table. The hash value is
-calculated based on the gfid). Copies the gfid to the inode (the gfid is
-present in the iatt structure). Creates a dentry with the new name.
-
-A inode is removed from the inode table and eventually destroyed when unlink
-or rmdir operation is performed on a file/directory, or the the lru limit of
-the inode table has been exceeded.
-
-#Data structure (dentry)
-```
-
-struct _dentry {
- struct list_head inode_list; /* list of dentries of inode */
- struct list_head hash; /* hash table pointers */
- inode_t *inode; /* inode of this directory entry */
- char *name; /* name of the directory entry */
- inode_t *parent; /* directory of the entry */
-};
-
-A dentry is the presence of an entry for a file/directory within its parent
-directory. A dentry usually points to the inode to which it belongs to. In
-glusterfs a dentry contains the following fields.
-1) a hook using which it can add itself to the list of
-the dentries maintained by the inode to which it points to.
-2) A hash table pointer.
-3) Pointer to the inode to which it belongs to.
-4) Name of the dentry
-5) Pointer to the inode of the parent directory in which the dentry is present
-
-A new dentry is created when a new file/directory/symlink is created or a hard
-link to an existing file is created.
-
-__dentry_create (inode_t *inode, inode_t *parent, const char *name);
-
-A dentry holds a refcount on the parent
-directory so that the parent inode is never removed from the active inode's list
-and put to the lru list (If the lru limit of the lru list is exceeded, there is
-a chance of parent inode being destroyed. To avoid it, the dentries hold a
-reference to the parent inode). A dentry is removed whenevern a unlink/rmdir
-is perfomed on a file/directory. Or when the lru limit has been exceeded, the
-oldest inodes are purged out of the inode table, during which all the dentries
-of the inode are removed.
-
-Whenever a unlink/rmdir comes on a file/directory, the corresponding inode
-should be removed from the inode table. So upon unlink/rmdir, the inode will
-be moved to the purge list maintained by the inode table and from there it is
-destroyed. To be more specific, if a inode has to be destroyed, its refcount
-and nlookup count both should become 0. For refcount to become 0, the inode
-should not be part of any fop (there should not be any open fds). Or if the
-inode belongs to a directory, then there should not be any fop happening on the
-directory and it should not contain any dentries within it. For nlookup count to
-become zero, a forget has to be sent on the inode with nlookup count set to 0 as
-an argument. For fuse clients, forget is sent by the kernel itself whenever a
-unlink/rmdir is performed. But for brick processes, upon unlink/rmdir, the
-protocol/server itself has to do inode_forget. Whenever the inode has to be
-deleted due to file removal or lru limit being exceeded the inode is retired
-(i.e. all the dentries of the inode are deleted and the inode is moved to the
-purge list maintained by the inode table), the nlookup count is set to 0 via
-inode_forget api. The inode table, then prunes all the inodes from the purge
-list by destroying the inode contexts maintained by each xlator.
-
-unlinking of the dentry is done via inode_unlink;
-
-void
-inode_unlink (inode_t *inode, inode_t *parent, const char *name);
-
-If the inode has multiple hard links, then the unlink operation performed by
-the application results just in the removal of the dentry with the name provided
-by the application. For the inode to be removed, all the dentries of the inode
-should be unlinked.
-```
-
diff --git a/doc/hacker-guide/en-US/markdown/posix.md b/doc/hacker-guide/en-US/markdown/posix.md
deleted file mode 100644
index 84c813e55a2..00000000000
--- a/doc/hacker-guide/en-US/markdown/posix.md
+++ /dev/null
@@ -1,59 +0,0 @@
-storage/posix translator
-========================
-
-Notes
------
-
-### `SET_FS_ID`
-
-This is so that all filesystem checks are done with the user's
-uid/gid and not GlusterFS's uid/gid.
-
-### `MAKE_REAL_PATH`
-
-This macro concatenates the base directory of the posix volume
-('option directory') with the given path.
-
-### `need_xattr` in lookup
-
-If this flag is passed, lookup returns a xattr dictionary that contains
-the file's create time, the file's contents, and the version number
-of the file.
-
-This is a hack to increase small file performance. If an application
-wants to read a small file, it can finish its job with just a lookup
-call instead of a lookup followed by read.
-
-### `getdents`/`setdents`
-
-These are used by unify to set and get directory entries.
-
-### `ALIGN_BUF`
-
-Macro to align an address to a page boundary (4K).
-
-### `priv->export_statfs`
-
-In some cases, two exported volumes may reside on the same
-partition on the server. Sending statvfs info for both
-the volumes will lead to erroneous df output at the client,
-since free space on the partition will be counted twice.
-
-In such cases, user can disable exporting statvfs info
-on one of the volumes by setting this option.
-
-### `xattrop`
-
-This fop is used by replicate to set version numbers on files.
-
-### `getxattr`/`setxattr` hack to read/write files
-
-A key, `GLUSTERFS_FILE_CONTENT_STRING`, is handled in a special way by
-`getxattr`/`setxattr`. A getxattr with the key will return the entire
-content of the file as the value. A `setxattr` with the key will write
-the value as the entire content of the file.
-
-### `posix_checksum`
-
-This calculates a simple XOR checksum on all entry names in a
-directory that is used by unify to compare directory contents.
diff --git a/doc/hacker-guide/en-US/markdown/translator-development.md b/doc/hacker-guide/en-US/markdown/translator-development.md
deleted file mode 100644
index edadd5150dc..00000000000
--- a/doc/hacker-guide/en-US/markdown/translator-development.md
+++ /dev/null
@@ -1,666 +0,0 @@
-Translator development
-======================
-
-Setting the Stage
------------------
-
-This is the first post in a series that will explain some of the details of
-writing a GlusterFS translator, using some actual code to illustrate.
-
-Before we begin, a word about environments. GlusterFS is over 300K lines of
-code spread across a few hundred files. That's no Linux kernel or anything, but
- you're still going to be navigating through a lot of code in every
-code-editing session, so some kind of cross-referencing is *essential*. I use
-cscope with the vim bindings, and if I couldn't do Crtl+G and such to jump
-between definitions all the time my productivity would be cut in half. You may
-prefer different tools, but as I go through these examples you'll need
-something functionally similar to follow on. OK, on with the show.
-
-The first thing you need to know is that translators are not just bags of
-functions and variables. They need to have a very definite internal structure
-so that the translator-loading code can figure out where all the pieces are.
-The way it does this is to use dlsym to look for specific names within your
-shared-object file, as follow (from `xlator.c`):
-
-```
-if (!(xl->fops = dlsym (handle, "fops"))) {
- gf_log ("xlator", GF_LOG_WARNING, "dlsym(fops) on %s",
- dlerror ());
- goto out;
-}
-
-if (!(xl->cbks = dlsym (handle, "cbks"))) {
- gf_log ("xlator", GF_LOG_WARNING, "dlsym(cbks) on %s",
- dlerror ());
- goto out;
-}
-
-if (!(xl->init = dlsym (handle, "init"))) {
- gf_log ("xlator", GF_LOG_WARNING, "dlsym(init) on %s",
- dlerror ());
- goto out;
-}
-
-if (!(xl->fini = dlsym (handle, "fini"))) {
- gf_log ("xlator", GF_LOG_WARNING, "dlsym(fini) on %s",
- dlerror ());
- goto out;
-}
-```
-
-In this example, `xl` is a pointer to the in-memory object for the translator
-we're loading. As you can see, it's looking up various symbols *by name* in the
- shared object it just loaded, and storing pointers to those symbols. Some of
-them (e.g. init are functions, while others e.g. fops are dispatch tables
-containing pointers to many functions. Together, these make up the translator's
- public interface.
-
-Most of this glue or boilerplate can easily be found at the bottom of one of
-the source files that make up each translator. We're going to use the `rot-13`
-translator just for fun, so in this case you'd look in `rot-13.c` to see this:
-
-```
-struct xlator_fops fops = {
- .readv = rot13_readv,
- .writev = rot13_writev
-};
-
-struct xlator_cbks cbks = {
-};
-
-struct volume_options options[] = {
-{ .key = {"encrypt-write"},
- .type = GF_OPTION_TYPE_BOOL
-},
-{ .key = {"decrypt-read"},
- .type = GF_OPTION_TYPE_BOOL
-},
-{ .key = {NULL} },
-};
-```
-
-The `fops` table, defined in `xlator.h`, is one of the most important pieces.
-This table contains a pointer to each of the filesystem functions that your
-translator might implement -- `open`, `read`, `stat`, `chmod`, and so on. There
-are 82 such functions in all, but don't worry; any that you don't specify here
-will be see as null and filled with defaults from `defaults.c` when your
-translator is loaded. In this particular example, since `rot-13` is an
-exceptionally simple translator, we only fill in two entries for `readv` and
-`writev`.
-
-There are actually two other tables, also required to have predefined names,
-that are also used to find translator functions: `cbks` (which is empty in this
- snippet) and `dumpops` (which is missing entirely). The first of these specify
- entry points for when inodes are forgotten or file descriptors are released.
-In other words, they're destructors for objects in which your translator might
- have an interest. Mostly you can ignore them, because the default behavior
-handles even the simpler cases of translator-specific inode/fd context
-automatically. However, if the context you attach is a complex structure
-requiring complex cleanup, you'll need to supply these functions. As for
-dumpops, that's just used if you want to provide functions to pretty-print
-various structures in logs. I've never used it myself, though I probably
-should. What's noteworthy here is that we don't even define dumpops. That's
-because all of the functions that might use these dispatch functions will check
- for `xl->dumpops` being `NULL` before calling through it. This is in sharp
-contrast to the behavior for `fops` and `cbks1`, which *must* be present. If
-they're not, translator loading will fail because these pointers are not
-checked every time and if they're `NULL` then we'll segfault. That's why we
-provide an empty definition for cbks; it's OK for the individual function
-pointers to be NULL, but not for the whole table to be absent.
-
-The last piece I'll cover today is options. As you can see, this is a table of
-translator-specific option names and some information about their types.
-GlusterFS actually provides a pretty rich set of types (`volume_option_type_t`
-in `options.`h) which includes paths, translator names, percentages, and times
-in addition to the obvious integers and strings. Also, the `volume_option_t`
-structure can include information about alternate names, min/max/default
-values, enumerated string values, and descriptions. We don't see any of these
-here, so let's take a quick look at some more complex examples from afr.c and
-then come back to `rot-13`.
-
-```
-{ .key = {"data-self-heal-algorithm"},
- .type = GF_OPTION_TYPE_STR,
- .default_value = "",
- .description = "Select between \"full\", \"diff\". The "
- "\"full\" algorithm copies the entire file from "
- "source to sink. The \"diff\" algorithm copies to "
- "sink only those blocks whose checksums don't match "
- "with those of source.",
- .value = { "diff", "full", "" }
-},
-{ .key = {"data-self-heal-window-size"},
- .type = GF_OPTION_TYPE_INT,
- .min = 1,
- .max = 1024,
- .default_value = "1",
- .description = "Maximum number blocks per file for which "
- "self-heal process would be applied simultaneously."
-},
-```
-
-When your translator is loaded, all of this information is used to parse the
-options actually provided in the volfile, and then the result is turned into a
-dictionary and stored as `xl->options`. This dictionary is then processed by
-your init function, which you can see being looked up in the first code
-fragment above. We're only going to look at a small part of the `rot-13`'s
-init for now.
-
-```
-priv->decrypt_read = 1;
-priv->encrypt_write = 1;
-
-data = dict_get (this->options, "encrypt-write");
-if (data) {
- if (gf_string2boolean (data->data, &priv->encrypt_write
- == -1) {
- gf_log (this->name, GF_LOG_ERROR,
- "encrypt-write takes only boolean options");
- return -1;
- }
-}
-```
-
-What we can see here is that we're setting some defaults in our priv structure,
-then looking to see if an `encrypt-write` option was actually provided. If so,
-we convert and store it. This is a pretty classic use of dict_get to fetch a
-field from a dictionary, and of using one of many conversion functions in
-`common-utils.c` to convert `data->data` into something we can use.
-
-So far we've covered the basic of how a translator gets loaded, how we find its
-various parts, and how we process its options. In my next Translator 101 post,
-we'll go a little deeper into other things that init and its companion fini
-might do, and how some other fields in our `xlator_t` structure (commonly
-referred to as this) are commonly used.
-
-`init`, `fini`, and private context
------------------------------------
-
-In the previous Translator 101 post, we looked at some of the dispatch tables
-and options processing in a translator. This time we're going to cover the rest
- of the "shell" of a translator -- i.e. the other global parts not specific to
-handling a particular request.
-
-Let's start by looking at the relationship between a translator and its shared
-library. At a first approximation, this is the relationship between an object
-and a class in just about any object-oriented programming language. The class
-defines behaviors, but has to be instantiated as an object to have any kind of
-existence. In our case the object is an `xlator_t`. Several of these might be
-created within the same daemon, sharing all of the same code through init/fini
-and dispatch tables, but sharing *no data*. You could implement shared data (as
- static variables in your shared libraries) but that's strongly discouraged.
-Every function in your shared library will get an `xlator_t` as an argument,
-and should use it. This lack of class-level data is one of the points where
-the analogy to common OOP systems starts to break down. Another place is the
-complete lack of inheritance. Translators inherit behavior (code) from exactly
-one shared library -- looked up and loaded using the `type` field in a volfile
-`volume ... end-volume` block -- and that's it -- not even single inheritance,
-no subclasses or superclasses, no mixins or prototypes, just the relationship
-between an object and its class. With that in mind, let's turn to the init
-function that we just barely touched on last time.
-
-```
-int32_t
-init (xlator_t *this)
-{
- data_t *data = NULL;
- rot_13_private_t *priv = NULL;
-
- if (!this->children || this->children->next) {
- gf_log ("rot13", GF_LOG_ERROR,
- "FATAL: rot13 should have exactly one child");
- return -1;
- }
-
- if (!this->parents) {
- gf_log (this->name, GF_LOG_WARNING,
- "dangling volume. check volfile ");
- }
-
- priv = GF_CALLOC (sizeof (rot_13_private_t), 1, 0);
- if (!priv)
- return -1;
-```
-
-At the very top, we see the function signature -- we get a pointer to the
-`xlator_t` object that we're initializing, and we return an `int32_t` status.
-As with most functions in the translator API, this should be zero to indicate
-success. In this case it's safe to return -1 for failure, but watch out: in
-dispatch-table functions, the return value means the status of the *function
-call* rather than the *request*. A request error should be reflected as a
-callback with a non-zero `op_re`t value, but the dispatch function itself
-should still return zero. In fact, the handling of a non-zero return from a
-dispatch function is not all that robust (we recently had a bug report in
-HekaFS related to this) so it's something you should probably avoid
-altogether. This only underscores the difference between dispatch functions
-and `init`/`fini` functions, where non-zero returns *are* expected and handled
-logically by aborting the translator setup. We can see that down at the
-bottom, where we return -1 to indicate that we couldn't allocate our
-private-data area (more about that later).
-
-The first thing this init function does is check that the translator is being
-set up in the right kind of environment. Translators are called by parents and
-in turn call children. Some translators are "initial" translators that inject
-requests into the system from elsewhere -- e.g. mount/fuse injecting requests
-from the kernel, protocol/server injecting requests from the network. Those
-translators don't need parents, but `rot-13` does and so we check for that.
-Similarly, some translators are "final" translators that (from the perspective
-of the current process) terminate requests instead of passing them on -- e.g.
-`protocol/client` passing them to another node, `storage/posix` passing them to
-a local filesystem. Other translators "multiplex" between multiple children --
- passing each parent request on to one (`cluster/dht`), some
-(`cluster/stripe`), or all (`cluster/afr`) of those children. `rot-13` fits
-into none of those categories either, so it checks that it has *exactly one*
-child. It might be more convenient or robust if translator shared libraries
-had standard variables describing these requirements, to be checked in a
-consistent way by the translator-loading infrastructure itself instead of by
-each separate init function, but this is the way translators work today.
-
-The last thing we see in this fragment is allocating our private data area.
-This can literally be anything we want; the infrastructure just provides the
-priv pointer as a convenience but takes no responsibility for how it's used. In
- this case we're using `GF_CALLOC` to allocate our own `rot_13_private_t`
-structure. This gets us all the benefits of GlusterFS's memory-leak detection
-infrastructure, but the way we're calling it is not quite ideal. For one thing,
- the first two arguments -- from `calloc(3)` -- are kind of reversed. For
-another, notice how the last argument is zero. That can actually be an
-enumerated value, to tell the GlusterFS allocator *what* type we're
-allocating. This can be very useful information for memory profiling and leak
-detection, so it's recommended that you follow the example of any
-x`xx-mem-types.h` file elsewhere in the source tree instead of just passing
-zero here (even though that works).
-
-To finish our tour of standard initialization/termination, let's look at the
-end of `init` and the beginning of `fini`:
-
-```
- this->private = priv;
- gf_log ("rot13", GF_LOG_DEBUG, "rot13 xlator loaded");
- return 0;
-}
-
-void
-fini (xlator_t *this)
-{
- rot_13_private_t *priv = this->private;
-
- if (!priv)
- return;
- this->private = NULL;
- GF_FREE (priv);
-```
-
-At the end of init we're just storing our private-data pointer in the `priv`
-field of our `xlator_t`, then returning zero to indicate that initialization
-succeeded. As is usually the case, our fini is even simpler. All it really has
-to do is `GF_FREE` our private-data pointer, which we do in a slightly
-roundabout way here. Notice how we don't even have a return value here, since
-there's nothing obvious and useful that the infrastructure could do if `fini`
-failed.
-
-That's practically everything we need to know to get our translator through
-loading, initialization, options processing, and termination. If we had defined
- no dispatch functions, we could actually configure a daemon to use our
-translator and it would work as a basic pass-through from its parent to a
-single child. In the next post I'll cover how to build the translator and
-configure a daemon to use it, so that we can actually step through it in a
-debugger and see how it all fits together before we actually start adding
-functionality.
-
-This Time For Real
-------------------
-
-In the first two parts of this series, we learned how to write a basic
-translator skeleton that can get through loading, initialization, and option
-processing. This time we'll cover how to build that translator, configure a
-volume to use it, and run the glusterfs daemon in debug mode.
-
-Unfortunately, there's not much direct support for writing new translators. You
-can check out a GlusterFS tree and splice in your own translator directory, but
- that's a bit painful because you'll have to update multiple makefiles plus a
-bunch of autoconf garbage. As part of the HekaFS project, I basically reverse
-engineered the truly necessary parts of the translator-building process and
-then pestered one of the Fedora glusterfs package maintainers (thanks
-daMaestro!) to add a `glusterfs-devel` package with the required headers. Since
- then the complexity level in the HekaFS tree has crept back up a bit, but I
-still remember the simple method and still consider it the easiest way to get
-started on a new translator. For the sake of those not using Fedora, I'm going
-to describe a method that doesn't depend on that header package. What it does
-depend on is a GlusterFS source tree, much as you might have cloned from GitHub
- or the Gluster review site. This tree doesn't have to be fully built, but you
-do need to run `autogen.sh` and configure in it. Then you can take the
-following simple makefile and put it in a directory with your actual source.
-
-```
-# Change these to match your source code.
-TARGET = rot-13.so
-OBJECTS = rot-13.o
-
-# Change these to match your environment.
-GLFS_SRC = /srv/glusterfs
-GLFS_LIB = /usr/lib64
-HOST_OS = GF_LINUX_HOST_OS
-
-# You shouldn't need to change anything below here.
-
-CFLAGS = -fPIC -Wall -O0 -g \
- -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE \
- -D$(HOST_OS) -I$(GLFS_SRC) -I$(GLFS_SRC)/contrib/uuid \
- -I$(GLFS_SRC)/libglusterfs/src
-LDFLAGS = -shared -nostartfiles -L$(GLFS_LIB)
-LIBS = -lglusterfs -lpthread
-
-$(TARGET): $(OBJECTS)
- $(CC) $(OBJECTS) $(LDFLAGS) -o $(TARGET) $(OBJECTS) $(LIBS)
-```
-
-Yes, it's still Linux-specific. Mea culpa. As you can see, we're sticking with
-the `rot-13` example, so you can just copy the files from
-`xlators/encryption/rot-13/src` in your GlusterFS tree to follow on. Type
-`make` and you should be rewarded with a nice little `.so` file.
-
-```
-xlator_example$ ls -l rot-13.so
--rwxr-xr-x. 1 jeff jeff 40784 Nov 16 16:41 rot-13.so
-```
-
-Notice that we've built with optimization level zero and debugging symbols
-included, which would not typically be the case for a packaged version of
-GlusterFS. Let's put our version of `rot-13.so` into a slightly different file
-on our system, so that it doesn't stomp on the installed version (not that
-you'd ever want to use that anyway).
-
-```
-xlator_example# ls /usr/lib64/glusterfs/3git/xlator/encryption/
-crypt.so crypt.so.0 crypt.so.0.0.0 rot-13.so rot-13.so.0
-rot-13.so.0.0.0
-xlator_example# cp rot-13.so \
- /usr/lib64/glusterfs/3git/xlator/encryption/my-rot-13.so
-```
-
-These paths represent the current Gluster filesystem layout, which is likely to
-be deprecated in favor of the Fedora layout; your paths may vary. At this point
- we're ready to configure a volume using our new translator. To do that, I'm
-going to suggest something that's strongly discouraged except during
-development (the Gluster guys are going to hate me for this): write our own
-volfile. Here's just about the simplest volfile you'll ever see.
-
-```
-volume my-posix
- type storage/posix
- option directory /srv/export
-end-volume
-
-volume my-rot13
- type encryption/my-rot-13
- subvolumes my-posix
-end-volume
-```
-
-All we have here is a basic brick using `/srv/export` for its data, and then
-an instance of our translator layered on top -- no client or server is
-necessary for what we're doing, and the system will automatically push a
-mount/fuse translator on top if there's no server translator. To try this out,
-all we need is the following command (assuming the directories involved already
- exist).
-
-```
-xlator_example$ glusterfs --debug -f my.vol /srv/import
-```
-
-You should be rewarded with a whole lot of log output, including the text of
-the volfile (this is very useful for debugging problems in the field). If you
-go to another window on the same machine, you can see that you have a new
-filesystem mounted.
-
-```
-~$ df /srv/import
-Filesystem 1K-blocks Used Available Use% Mounted on
-/srv/xlator_example/my.vol
- 114506240 2706176 105983488 3% /srv/import
-```
-
-Just for fun, write something into a file in `/srv/import`, then look at the
-corresponding file in `/srv/export` to see it all `rot-13`'ed for you.
-
-```
-~$ echo hello > /srv/import/a_file
-~$ cat /srv/export/a_file
-uryyb
-```
-
-There you have it -- functionality you control, implemented easily, layered on
-top of local storage. Now you could start adding functionality -- real
-encryption, perhaps -- and inevitably having to debug it. You could do that the
- old-school way, with `gf_log` (preferred) or even plain old `printf`, or you
-could run daemons under `gdb` instead. Alternatively, you could wait for the
-next Translator 101 post, where we'll be doing exactly that.
-
-Debugging a Translator
-----------------------
-
-Now that we've learned what a translator looks like and how to build one, it's
-time to run one and actually watch it work. The best way to do this is good
-old-fashioned `gdb`, as follows (using some of the examples from last time).
-
-```
-xlator_example# gdb glusterfs
-GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
-...
-(gdb) r --debug -f my.vol /srv/import
-Starting program: /usr/sbin/glusterfs --debug -f my.vol /srv/import
-...
-[2011-11-23 11:23:16.495516] I [fuse-bridge.c:2971:fuse_init]
- 0-glusterfs-fuse: FUSE inited with protocol versions:
- glusterfs 7.13 kernel 7.13
-```
-
-If you get to this point, your glusterfs client process is already running. You
-can go to another window to see the mountpoint, do file operations, etc.
-
-```
-~# df /srv/import
-Filesystem 1K-blocks Used Available Use% Mounted on
-/root/xlator_example/my.vol
- 114506240 2643968 106045568 3% /srv/import
-~# ls /srv/import
-a_file
-~# cat /srv/import/a_file
-hello
-```
-
-Now let's interrupt the process and see where we are.
-
-```
-^C
-Program received signal SIGINT, Interrupt.
-0x0000003a0060b3dc in pthread_cond_wait@@GLIBC_2.3.2 ()
- from /lib64/libpthread.so.0
-(gdb) info threads
- 5 Thread 0x7fffeffff700 (LWP 27206) 0x0000003a002dd8c7
- in readv ()
- from /lib64/libc.so.6
- 4 Thread 0x7ffff50e3700 (LWP 27205) 0x0000003a0060b75b
- in pthread_cond_timedwait@@GLIBC_2.3.2 ()
- from /lib64/libpthread.so.0
- 3 Thread 0x7ffff5f02700 (LWP 27204) 0x0000003a0060b3dc
- in pthread_cond_wait@@GLIBC_2.3.2 ()
- from /lib64/libpthread.so.0
- 2 Thread 0x7ffff6903700 (LWP 27203) 0x0000003a0060f245
- in sigwait ()
- from /lib64/libpthread.so.0
-* 1 Thread 0x7ffff7957700 (LWP 27196) 0x0000003a0060b3dc
- in pthread_cond_wait@@GLIBC_2.3.2 ()
- from /lib64/libpthread.so.0
-```
-
-Like any non-toy server, this one has multiple threads. What are they all
-doing? Honestly, even I don't know. Thread 1 turns out to be in
-`event_dispatch_epoll`, which means it's the one handling all of our network
-I/O. Note that with socket multi-threading patch this will change, with one
-thread in `socket_poller` per connection. Thread 2 is in `glusterfs_sigwaiter`
-which means signals will be isolated to that thread. Thread 3 is in
-`syncenv_task`, so it's a worker process for synchronous requests such as
-those used by the rebalance and repair code. Thread 4 is in
-`janitor_get_next_fd`, so it's waiting for a chance to close no-longer-needed
-file descriptors on the local filesystem. (I admit I had to look that one up,
-BTW.) Lastly, thread 5 is in `fuse_thread_proc`, so it's the one fetching
-requests from our FUSE interface. You'll often see many more threads than
-this, but it's a pretty good basic set. Now, let's set a breakpoint so we can
-actually watch a request.
-
-```
-(gdb) b rot13_writev
-Breakpoint 1 at 0x7ffff50e4f0b: file rot-13.c, line 119.
-(gdb) c
-Continuing.
-```
-
-At this point we go into our other window and do something that will involve a write.
-
-```
-~# echo goodbye > /srv/import/another_file
-(back to the first window)
-[Switching to Thread 0x7fffeffff700 (LWP 27206)]
-
-Breakpoint 1, rot13_writev (frame=0x7ffff6e4402c, this=0x638440,
- fd=0x7ffff409802c, vector=0x7fffe8000cd8, count=1, offset=0,
- iobref=0x7fffe8001070) at rot-13.c:119
-119 rot_13_private_t *priv = (rot_13_private_t *)this->private;
-```
-
-Remember how we built with debugging symbols enabled and no optimization? That
-will be pretty important for the next few steps. As you can see, we're in
-`rot13_writev`, with several parameters.
-
-* `frame` is our always-present frame pointer for this request. Also,
- `frame->local` will point to any local data we created and attached to the
- request ourselves.
-* `this` is a pointer to our instance of the `rot-13` translator. You can examine
- it if you like to see the name, type, options, parent/children, inode table,
- and other stuff associated with it.
-* `fd` is a pointer to a file-descriptor *object* (`fd_t`, not just a
- file-descriptor index which is what most people use "fd" for). This in turn
- points to an inode object (`inode_t`) and we can associate our own
- `rot-13`-specific data with either of these.
-* `vector` and `count` together describe the data buffers for this write, which
- we'll get to in a moment.
-* `offset` is the offset into the file at which we're writing.
-* `iobref` is a buffer-reference object, which is used to track the life cycle
- of buffers containing read/write data. If you look closely, you'll notice that
- `vector[0].iov_base` points to the same address as `iobref->iobrefs[0].ptr`, which
- should give you some idea of the inter-relationships between vector and iobref.
-
-OK, now what about that `vector`? We can use it to examine the data being
-written, like this.
-
-```
-(gdb) p vector[0]
-$2 = {iov_base = 0x7ffff7936000, iov_len = 8}
-(gdb) x/s 0x7ffff7936000
-0x7ffff7936000: "goodbye\n"
-```
-
-It's not always safe to view this data as a string, because it might just as
-well be binary data, but since we're generating the write this time it's safe
-and convenient. With that knowledge, let's step through things a bit.
-
-```
-(gdb) s
-120 if (priv->encrypt_write)
-(gdb)
-121 rot13_iovec (vector, count);
-(gdb)
-rot13_iovec (vector=0x7fffe8000cd8, count=1) at rot-13.c:57
-57 for (i = 0; i < count; i++) {
-(gdb)
-58 rot13 (vector[i].iov_base, vector[i].iov_len);
-(gdb)
-rot13 (buf=0x7ffff7936000 "goodbye\n", len=8) at rot-13.c:45
-45 for (i = 0; i < len; i++) {
-(gdb)
-46 if (buf[i] >= 'a' && buf[i] <= 'z')
-(gdb)
-47 buf[i] = 'a' + ((buf[i] - 'a' + 13) % 26);
-```
-
-Here we've stepped into `rot13_iovec`, which iterates through our vector
-calling `rot13`, which in turn iterates through the characters in that chunk
-doing the `rot-13` operation if/as appropriate. This is pretty straightforward
-stuff, so let's skip to the next interesting bit.
-
-```
-(gdb) fin
-Run till exit from #0 rot13 (buf=0x7ffff7936000 "goodbye\n",
- len=8) at rot-13.c:47
-rot13_iovec (vector=0x7fffe8000cd8, count=1) at rot-13.c:57
-57 for (i = 0; i < count; i++) {
-(gdb) fin
-Run till exit from #0 rot13_iovec (vector=0x7fffe8000cd8,
- count=1) at rot-13.c:57
-rot13_writev (frame=0x7ffff6e4402c, this=0x638440,
- fd=0x7ffff409802c, vector=0x7fffe8000cd8, count=1,
- offset=0, iobref=0x7fffe8001070) at rot-13.c:123
-123 STACK_WIND (frame,
-(gdb) b 129
-Breakpoint 2 at 0x7ffff50e4f35: file rot-13.c, line 129.
-(gdb) b rot13_writev_cbk
-Breakpoint 3 at 0x7ffff50e4db3: file rot-13.c, line 106.
-(gdb) c
-```
-
-So we've set breakpoints on both the callback and the statement following the
-`STACK_WIND`. Which one will we hit first?
-
-```
-Breakpoint 3, rot13_writev_cbk (frame=0x7ffff6e4402c,
- cookie=0x7ffff6e440d8, this=0x638440, op_ret=8, op_errno=0,
- prebuf=0x7fffefffeca0, postbuf=0x7fffefffec30)
- at rot-13.c:106
-106 STACK_UNWIND_STRICT (writev, frame, op_ret, op_errno,
- prebuf, postbuf);
-(gdb) bt
-#0 rot13_writev_cbk (frame=0x7ffff6e4402c,
- cookie=0x7ffff6e440d8, this=0x638440, op_ret=8, op_errno=0,
- prebuf=0x7fffefffeca0, postbuf=0x7fffefffec30)
- at rot-13.c:106
-#1 0x00007ffff52f1b37 in posix_writev (frame=0x7ffff6e440d8,
- this=<value optimized out>, fd=<value optimized out>,
- vector=<value optimized out>, count=1,
- offset=<value optimized out>, iobref=0x7fffe8001070)
- at posix.c:2217
-#2 0x00007ffff50e513e in rot13_writev (frame=0x7ffff6e4402c,
- this=0x638440, fd=0x7ffff409802c, vector=0x7fffe8000cd8,
- count=1, offset=0, iobref=0x7fffe8001070) at rot-13.c:123
-```
-
-Surprise! We're in `rot13_writev_cbk` now, called (indirectly) while we're
-still in `rot13_writev` before `STACK_WIND` returns (still at rot-13.c:123). If
- you did any request cleanup here, then you need to be careful about what you
-do in the remainder of `rot13_writev` because data may have been freed etc.
-It's tempting to say you should just do the cleanup in `rot13_writev` after
-the `STACK_WIND,` but that's not valid because it's also possible that some
-other translator returned without calling `STACK_UNWIND` -- i.e. before
-`rot13_writev` is called, so then it would be the one getting null-pointer
-errors instead. To put it another way, the callback and the return from
-`STACK_WIND` can occur in either order or even simultaneously on different
-threads. Even if you were to use reference counts, you'd have to make sure to
-use locking or atomic operations to avoid races, and it's not worth it. Unless
-you *really* understand the possible flows of control and know what you're
-doing, it's better to do cleanup in the callback and nothing after
-`STACK_WIND.`
-
-At this point all that's left is a `STACK_UNWIND` and a return. The
-`STACK_UNWIND` invokes our parent's completion callback, and in this case our
-parent is FUSE so at that point the VFS layer is notified of the write being
-complete. Finally, we return through several levels of normal function calls
-until we come back to fuse_thread_proc, which waits for the next request.
-
-So that's it. For extra fun, you might want to repeat this exercise by stepping
-through some other call -- stat or setxattr might be good choices -- but you'll
- have to use a translator that actually implements those calls to see much
-that's interesting. Then you'll pretty much know everything I knew when I
-started writing my first for-real translators, and probably even a bit more. I
-hope you've enjoyed this series, or at least found it useful, and if you have
-any suggestions for other topics I should cover please let me know (via
-comments or email, IRC or Twitter).
diff --git a/doc/hacker-guide/en-US/markdown/unittest.md b/doc/hacker-guide/en-US/markdown/unittest.md
deleted file mode 100644
index 5c6c0a8a039..00000000000
--- a/doc/hacker-guide/en-US/markdown/unittest.md
+++ /dev/null
@@ -1,228 +0,0 @@
-# Unit Tests in GlusterFS
-
-## Overview
-[Art-of-unittesting][definitionofunittest] provides a good definition for unit tests. A good unit test is:
-
-* Able to be fully automated
-* Has full control over all the pieces running (Use mocks or stubs to achieve this isolation when needed)
-* Can be run in any order if part of many other tests
-* Runs in memory (no DB or File access, for example)
-* Consistently returns the same result (You always run the same test, so no random numbers, for example. save those for integration or range tests)
-* Runs fast
-* Tests a single logical concept in the system
-* Readable
-* Maintainable
-* Trustworthy (when you see its result, you don’t need to debug the code just to be sure)
-
-## cmocka
-GlusterFS unit test framework is based on [cmocka][]. cmocka provides
-developers with methods to isolate and test modules written in C language. It
-also provides integration with Jenkins by providing JUnit XML compliant unit
-test results.
-
-cmocka
-
-## Running Unit Tests
-To execute the unit tests, all you need is to type `make check`. Here is a step-by-step example assuming you just cloned a GlusterFS tree:
-
-```
-$ ./autogen.sh
-$ ./configure --enable-debug
-$ make check
-```
-
-Sample output:
-
-```
-PASS: mem_pool_unittest
-============================================================================
-Testsuite summary for glusterfs 3git
-============================================================================
-# TOTAL: 1
-# PASS: 1
-# SKIP: 0
-# XFAIL: 0
-# FAIL: 0
-# XPASS: 0
-# ERROR: 0
-============================================================================
-```
-
-In this example, `mem_pool_unittest` has multiple tests inside, but `make check` assumes that the program itself is the test, and that is why it only shows one test. Here is the output when we run `mem_pool_unittest` directly:
-
-```
-$ ./libglusterfs/src/mem_pool_unittest
-[==========] Running 10 test(s).
-[ RUN ] test_gf_mem_acct_enable_set
-Expected assertion data != ((void *)0) occurred
-[ OK ] test_gf_mem_acct_enable_set
-[ RUN ] test_gf_mem_set_acct_info_asserts
-Expected assertion xl != ((void *)0) occurred
-Expected assertion size > ((4 + sizeof (size_t) + sizeof (xlator_t *) + 4 + 8) + 8) occurred
-Expected assertion type <= xl->mem_acct.num_types occurred
-[ OK ] test_gf_mem_set_acct_info_asserts
-[ RUN ] test_gf_mem_set_acct_info_memory
-[ OK ] test_gf_mem_set_acct_info_memory
-[ RUN ] test_gf_calloc_default_calloc
-[ OK ] test_gf_calloc_default_calloc
-[ RUN ] test_gf_calloc_mem_acct_enabled
-[ OK ] test_gf_calloc_mem_acct_enabled
-[ RUN ] test_gf_malloc_default_malloc
-[ OK ] test_gf_malloc_default_malloc
-[ RUN ] test_gf_malloc_mem_acct_enabled
-[ OK ] test_gf_malloc_mem_acct_enabled
-[ RUN ] test_gf_realloc_default_realloc
-[ OK ] test_gf_realloc_default_realloc
-[ RUN ] test_gf_realloc_mem_acct_enabled
-[ OK ] test_gf_realloc_mem_acct_enabled
-[ RUN ] test_gf_realloc_ptr
-Expected assertion ((void *)0) != ptr occurred
-[ OK ] test_gf_realloc_ptr
-[==========] 10 test(s) run.
-[ PASSED ] 10 test(s).
-[ FAILED ] 0 test(s).
-[ REPORT ] Created libglusterfs_mem_pool_xunit.xml report
-```
-
-
-## Writing Unit Tests
-
-### Enhancing your C functions
-
-#### Programming by Contract
-Add the following to your C file:
-
-```c
-#include <cmocka_pbc.h>
-```
-
-```c
-/*
- * Programming by Contract is a programming methodology
- * which binds the caller and the function called to a
- * contract. The contract is represented using Hoare Triple:
- * {P} C {Q}
- * where {P} is the precondition before executing command C,
- * and {Q} is the postcondition.
- *
- * See also:
- * http://en.wikipedia.org/wiki/Design_by_contract
- * http://en.wikipedia.org/wiki/Hoare_logic
- * http://dlang.org/dbc.html
- */
- #ifndef CMOCKERY_PBC_H_
-#define CMOCKERY_PBC_H_
-
-#if defined(UNIT_TESTING) || defined (DEBUG)
-
-#include <assert.h>
-
-/*
- * Checks caller responsibility against contract
- */
-#define REQUIRE(cond) assert(cond)
-
-/*
- * Checks function reponsability against contract.
- */
-#define ENSURE(cond) assert(cond)
-
-/*
- * While REQUIRE and ENSURE apply to functions, INVARIANT
- * applies to classes/structs. It ensures that intances
- * of the class/struct are consistent. In other words,
- * that the instance has not been corrupted.
- */
-#define INVARIANT(invariant_fnc) do{ (invariant_fnc) } while (0);
-
-#else
-#define REQUIRE(cond) do { } while (0);
-#define ENSURE(cond) do { } while (0);
-#define INVARIANT(invariant_fnc) do{ } while (0);
-
-#endif /* defined(UNIT_TESTING) || defined (DEBUG) */
-#endif /* CMOCKERY_PBC_H_ */
-```
-
-##### Example
-This is an _extremely_ simple example:
-
-```c
-int divide (int n, int d)
-{
- int ans;
-
- REQUIRE(d != 0);
-
- ans = n / d;
-
- // As code is added to this function throughout its lifetime,
- // ENSURE will assert that data will be returned
- // according to the contract. Again this is an
- // extremely simple example. :-D
- ENSURE( ans == (n / d) );
-
- return ans;
-}
-
-```
-
-##### Important Note
-`REQUIRE`, `ENSURE`, and `INVARIANT` are only available when `DEBUG` or `UNIT_TESTING` are set in the CFLAGS. You must pass `--enable-debug` to `./configure` to enable PBC on your non-unittest builds.
-
-#### Overriding functions
-Cmockery2 provides its own memory allocation functions which check for buffer overrun and memory leaks. The following header file must be included **last** to be able to override any of the memory allocation functions:
-
-```c
-#include <cmocka.h>
-```
-
-This file will only take effect with the `UNIT_TESTING` CFLAG is set.
-
-### Creating a unit test
-Once you identify the C file you would like to test, first create a `unittest` directory under the directory where the C file is located. This will isolate the unittests to a different directory.
-
-Next, you need to edit the `Makefile.am` file in the directory where your C file is located. Initialize the
-`Makefile.am` if it does not already have the following sections:
-
-```
-#### UNIT TESTS #####
-CLEANFILES += *.gcda *.gcno *_xunit.xml
-noinst_PROGRAMS =
-TESTS =
-```
-
-Now you can add the following for each of the unit tests that you would like to build:
-
-```
-### UNIT TEST xxx_unittest ###
-xxx_unittest_CPPFLAGS = $(xxx_CPPFLAGS)
-xxx_unittest_SOURCES = xxx.c \
- unittest/xxx_unittest.c
-xxx_unittest_CFLAGS = $(UNITTEST_CFLAGS)
-xxx_unittest_LDFLAGS = $(UNITTEST_LDFLAGS)
-noinst_PROGRAMS += xxx_unittest
-TESTS += xxx_unittest
-```
-
-Where `xxx` is the name of your C file. For example, look at `libglusterfs/src/Makefile.am`.
-
-Copy the simple unit test from the [cmocka API][cmockaapi] to `unittest/xxx_unittest.c`. If you would like to see an example of a unit test, please refer to `libglusterfs/src/unittest/mem_pool_unittest.c`.
-
-#### Mocking
-You may see that the linker will complain about missing functions needed by the C file you would like to test. Identify the required functions, then place their stubs in a file called `unittest/xxx_mock.c`, then include this file in `Makefile.am` in `xxx_unittest_SOURCES`. This will allow you to you Cmockery2's mocking functions.
-
-#### Running the unit test
-You can type `make` in the directory where the C file is located. Once you built it and there are no errors, you can execute the test either by directly executing the program (in our example above it is called `xxx_unittest` ), or by running `make check`.
-
-#### Debugging
-Sometimes you may need to debug your unit test. To do that, you will have to point `gdb` to the binary which is located in the same directory as the source. For example, you can do the following from the root of the source tree to debug `mem_pool_unittest`:
-
-```
-$ gdb libglusterfs/src/mem_pool_unittest
-```
-
-
-[cmocka]: https://cmocka.org
-[definitionofunittest]: http://artofunittesting.com/definition-of-a-unit-test/
-[cmockapi]: https://api.cmocka.org
diff --git a/doc/hacker-guide/en-US/markdown/write-behind.md b/doc/hacker-guide/en-US/markdown/write-behind.md
deleted file mode 100644
index 0d78964fa20..00000000000
--- a/doc/hacker-guide/en-US/markdown/write-behind.md
+++ /dev/null
@@ -1,56 +0,0 @@
-performance/write-behind translator
-===================================
-
-Basic working
---------------
-
-Write behind is basically a translator to lie to the application that the
-write-requests are finished, even before it is actually finished.
-
-On a regular translator tree without write-behind, control flow is like this:
-
-1. application makes a `write()` system call.
-2. VFS ==> FUSE ==> `/dev/fuse`.
-3. fuse-bridge initiates a glusterfs `writev()` call.
-4. `writev()` is `STACK_WIND()`ed up to client-protocol or storage translator.
-5. client-protocol, on receiving reply from server, starts `STACK_UNWIND()` towards the fuse-bridge.
-
-On a translator tree with write-behind, control flow is like this:
-
-1. application makes a `write()` system call.
-2. VFS ==> FUSE ==> `/dev/fuse`.
-3. fuse-bridge initiates a glusterfs `writev()` call.
-4. `writev()` is `STACK_WIND()`ed up to write-behind translator.
-5. write-behind adds the write buffer to its internal queue and does a `STACK_UNWIND()` towards the fuse-bridge.
-
-write call is completed in application's percepective. after
-`STACK_UNWIND()`ing towards the fuse-bridge, write-behind initiates a fresh
-writev() call to its child translator, whose replies will be consumed by
-write-behind itself. Write-behind _doesn't_ cache the write buffer, unless
-`option flush-behind on` is specified in volume specification file.
-
-Windowing
----------
-
-With respect to write-behind, each write-buffer has three flags: `stack_wound`, `write_behind` and `got_reply`.
-
-* `stack_wound`: if set, indicates that write-behind has initiated `STACK_WIND()` towards child translator.
-* `write_behind`: if set, indicates that write-behind has done `STACK_UNWIND()` towards fuse-bridge.
-* `got_reply`: if set, indicates that write-behind has received reply from child translator for a `writev()` `STACK_WIND()`. a request will be destroyed by write-behind only if this flag is set.
-
-Currently pending write requests = aggregate size of requests with write_behind = 1 and got_reply = 0.
-
-window size limits the aggregate size of currently pending write requests. once
-the pending requests' size has reached the window size, write-behind blocks
-writev() calls from fuse-bridge. Blocking is only from application's
-perspective. Write-behind does `STACK_WIND()` to child translator
-straight-away, but hold behind the `STACK_UNWIND()` towards fuse-bridge.
-`STACK_UNWIND()` is done only once write-behind gets enough replies to
-accommodate for currently blocked request.
-
-Flush behind
-------------
-
-If `option flush-behind on` is specified in volume specification file, then
-write-behind sends aggregate write requests to child translator, instead of
-regular per request `STACK_WIND()`s.