From aa2f48dbd8f8ff1d10230fb9656f2ac7d99a48f8 Mon Sep 17 00:00:00 2001
From: Shyam <srangana@redhat.com>
Date: Mon, 27 Feb 2017 13:25:14 -0500
Subject: doc: Moved feature pages that were delivered as a part of 3.10.0

Change-Id: I35a6b599eebbe42b5ef1244d2d72fa103bcf8acb
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: https://review.gluster.org/16775
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
---
 done/GlusterFS 3.10/client-opversion.md    | 111 +++++++++++++++++++
 done/GlusterFS 3.10/max-opversion.md       | 118 ++++++++++++++++++++
 done/GlusterFS 3.10/multiplexing.md        | 141 ++++++++++++++++++++++++
 done/GlusterFS 3.10/readdir-ahead.md       | 167 +++++++++++++++++++++++++++++
 done/GlusterFS 3.10/rebalance-estimates.md | 128 ++++++++++++++++++++++
 done/GlusterFS 3.10/tier_service.md        | 130 ++++++++++++++++++++++
 under_review/client-opversion.md           | 111 -------------------
 under_review/max-opversion.md              | 118 --------------------
 under_review/multiplexing.md               | 141 ------------------------
 under_review/readdir-ahead.md              | 167 -----------------------------
 under_review/rebalance-estimates.md        | 128 ----------------------
 under_review/tier_service.md               | 130 ----------------------
 12 files changed, 795 insertions(+), 795 deletions(-)
 create mode 100644 done/GlusterFS 3.10/client-opversion.md
 create mode 100644 done/GlusterFS 3.10/max-opversion.md
 create mode 100644 done/GlusterFS 3.10/multiplexing.md
 create mode 100644 done/GlusterFS 3.10/readdir-ahead.md
 create mode 100644 done/GlusterFS 3.10/rebalance-estimates.md
 create mode 100644 done/GlusterFS 3.10/tier_service.md
 delete mode 100644 under_review/client-opversion.md
 delete mode 100644 under_review/max-opversion.md
 delete mode 100644 under_review/multiplexing.md
 delete mode 100644 under_review/readdir-ahead.md
 delete mode 100644 under_review/rebalance-estimates.md
 delete mode 100644 under_review/tier_service.md

diff --git a/done/GlusterFS 3.10/client-opversion.md b/done/GlusterFS 3.10/client-opversion.md
new file mode 100644
index 0000000..8c9991e
--- /dev/null
+++ b/done/GlusterFS 3.10/client-opversion.md	
@@ -0,0 +1,111 @@
+Feature
+-------
+
+Summary
+-------
+
+Support to get the op-version information for each client through the volume
+status command.
+
+Owners
+------
+
+Samikshan Bairagya <samikshan@gmail.com>
+
+Current status
+--------------
+
+Currently the only way to get an idea regarding the version of the connected
+clients is to grep for "accepted client from" in /var/log/glusterfs/bricks.
+There is no command that gives that information out to the users.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+https://bugzilla.redhat.com/show_bug.cgi?id=1409078
+
+Detailed Description
+--------------------
+
+The op-version information for each client can be added to the already existing
+volume status command. `volume status <VOLNAME|all> clients` currently gives the
+following information for each client:
+
+* Hostname:port
+* Bytes Read
+* Bytes Written
+
+Benefit to GlusterFS
+--------------------
+
+This would make the user-experience better as it would make it easier for users
+to know the op-version of each client from a single command.
+
+Scope
+-----
+
+#### Nature of proposed change
+
+Adds more information to `volume status <VOLNAME|all> clients` output.
+
+#### Implications on manageability
+
+None.
+
+#### Implications on presentation layer
+
+None.
+
+#### Implications on persistence layer
+
+None.
+
+#### Implications on 'GlusterFS' backend
+
+None.
+
+#### Modification to GlusterFS metadata
+
+None.
+
+#### Implications on 'glusterd'
+
+None.
+
+How To Test
+-----------
+
+This can be tested by having clients with different glusterfs versions connected
+to running volumes, and executing the `volume status <VOLNAME|all> clients`
+command.
+
+User Experience
+---------------
+
+Users can use the `volume status <VOLNAME|all> clients` command to get
+information on the op-versions for each client along with information that were
+already available like (hostname, bytes read and bytes written).
+
+Dependencies
+------------
+
+None
+
+Documentation
+-------------
+
+None.
+
+Status
+------
+
+In development.
+
+Comments and Discussion
+-----------------------
+
+  1. Discussion on gluster-devel ML:
+    - [Thread 1](http://www.gluster.org/pipermail/gluster-users/2016-January/025064.html)
+    - [Thread 2](http://www.gluster.org/pipermail/gluster-devel/2017-January/051820.html)
+  2. [Discussion on Github](https://github.com/gluster/glusterfs/issues/79)
+
diff --git a/done/GlusterFS 3.10/max-opversion.md b/done/GlusterFS 3.10/max-opversion.md
new file mode 100644
index 0000000..16d4ee4
--- /dev/null
+++ b/done/GlusterFS 3.10/max-opversion.md	
@@ -0,0 +1,118 @@
+Feature
+-------
+
+Summary
+-------
+
+Support to retrieve the maximum supported op-version (cluster.op-version) in a
+heterogeneous cluster.
+
+Owners
+------
+
+Samikshan Bairagya <samikshan@gmail.com>
+
+Current status
+--------------
+
+Currently users can retrieve the op-version on which a cluster is operating by
+using the gluster volume get command on the global option cluster.op-version as
+follows:
+
+# gluster volume get <volname> cluster.op-version
+
+There is however no way for an user to find out the maximum op-version to which
+the cluster could be bumped upto.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+https://bugzilla.redhat.com/show_bug.cgi?id=1365822
+
+Detailed Description
+--------------------
+
+A heterogeneous cluster operates on a common op-version that can be supported
+across all the nodes in the trusted storage pool.Upon upgrade of the nodes in
+the cluster, the cluster might support a higher op-version. However, since it
+is currently not possible for the user to get this op-version value, it is
+difficult for them to bump up the op-version of the cluster to the supported
+value.
+
+The maximum supported op-version in a cluster would be the minimum of the
+maximum op-versions in each of the nodes. To retrieve this, the volume get
+functionality could be invoked as follows:
+
+# gluster volume get all cluster.max-op-version
+
+Benefit to GlusterFS
+--------------------
+
+This would make the user-experience better as it would make it easier for users
+to know the maximum op-version on which the cluster can operate.
+
+Scope
+-----
+
+#### Nature of proposed change
+
+This adds a new non-settable global option, cluster.max-op-version.
+
+#### Implications on manageability
+
+None.
+
+#### Implications on presentation layer
+
+None.
+
+#### Implications on persistence layer
+
+None.
+
+#### Implications on 'GlusterFS' backend
+
+None.
+
+#### Modification to GlusterFS metadata
+
+None.
+
+#### Implications on 'glusterd'
+
+None.
+
+How To Test
+-----------
+
+This can be tested on a cluster with at least one node running on version 'n+1'
+and others on version 'n' where n = 3.10. The maximum supported op-version
+(cluster.max-op-version) should be returned by `volume get` as n in this case.
+
+User Experience
+---------------
+
+Upon upgrade of one or more nodes in a cluster, users can get the new maximum
+op-version the cluster can support.
+
+Dependencies
+------------
+
+None
+
+Documentation
+-------------
+
+None.
+
+Status
+------
+
+In development.
+
+Comments and Discussion
+-----------------------
+
+  1. [Discussion on gluster-devel ML](http://www.gluster.org/pipermail/gluster-devel/2016-December/051650.html)
+  2. [Discussion on Github](https://github.com/gluster/glusterfs/issues/56)
+
diff --git a/done/GlusterFS 3.10/multiplexing.md b/done/GlusterFS 3.10/multiplexing.md
new file mode 100644
index 0000000..fd06150
--- /dev/null
+++ b/done/GlusterFS 3.10/multiplexing.md	
@@ -0,0 +1,141 @@
+Feature
+-------
+Brick Multiplexing
+
+Summary
+-------
+
+Use one process (and port) to serve multiple bricks.
+
+Owners
+------
+
+Jeff Darcy (jdarcy@redhat.com)
+
+Current status
+--------------
+
+In development.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+Mostly N/A, except that this will make implementing real QoS easier at some
+point in the future.
+
+Detailed Description
+--------------------
+
+The basic idea is very simple: instead of spawning a new process for every
+brick, we send an RPC to an existing brick process telling it to attach the new
+brick (identified and described by a volfile) beneath its protocol/server
+instance.  Likewise, instead of killing a process to terminate a brick, we tell
+it to detach one of its (possibly several) brick translator stacks.
+
+Bricks can *not* share a process if they use incompatible transports (e.g. TLS
+vs. non-TLS).  Also, a brick process serving several bricks is a larger failure
+domain than we have with a process per brick, so we might voluntarily decide to
+spawn a new process anyway just to keep the failure domains smaller.  Lastly,
+there should always be a fallback to current brick-per-process behavior, by
+simply pretending that all bricks' transports are incompatible with each other.
+
+Benefit to GlusterFS
+--------------------
+
+Multiplexing should significantly reduce resource consumption:
+
+ * Each *process* will consume one TCP port, instead of each *brick* doing so.
+
+ * The cost of global data structures and object pools will be reduced to 1/N
+   of what it is now, where N is the average number of bricks per process.
+
+ * Thread counts will also be reduced to 1/N.  This avoids the exponentially
+   bad thrashing effects as the total number of threads far exceeds the number
+   of cores, made worse by multiple processes trying to auto-scale the nunber
+   of network and disk I/O threads independently.
+
+These resource issues are already limiting the number of bricks and volumes we
+can support.  By reducing all forms of resource consumption at once, we should
+be able to raise these user-visible limits by a corresponding amount.
+
+Scope
+-----
+
+#### Nature of proposed change
+
+The largest changes are at the two places where we do brick and process
+management - GlusterD at one end, generic glusterfsd code at the other.  The
+new messages require changes to rpc and client/server translator code.  The
+server translator needs further changes to look up one among several child
+translators instead of assuming only one.  Auth code must be changed to handle
+separate permissions/credentials on each brick.
+
+Beyond these "obvious" changes, many lesser changes will undoubtedly be needed
+anywhere that we make assumptions about the relationships between bricks and
+processes.  Anything that involves a "helper" daemon - e.g. self-heal, quota -
+is particularly suspect in this regard.
+
+#### Implications on manageability
+
+The fact that bricks can only share a process when they have compatible
+transports might affect decisions about what transport options to use for
+separate volumes.
+
+#### Implications on presentation layer
+
+N/A
+
+#### Implications on persistence layer
+
+N/A
+
+#### Implications on 'GlusterFS' backend
+
+N/A
+
+#### Modification to GlusterFS metadata
+
+N/A
+
+#### Implications on 'glusterd'
+
+GlusterD changes are integral to this feature, and described above.
+
+How To Test
+-----------
+
+For the most part, testing is of the "do no harm" sort; the most thorough test
+of this feature is to run our current regression suite.  Only one additional
+test is needed - create/start a volume with multiple bricks on one node, and
+check that only one glusterfsd process is running.
+
+User Experience
+---------------
+
+Volume status can now include the possibly-surprising result of multiple bricks
+on the same node having the same port number and PID.  Anything that relies on
+these values, such as monitoring or automatic firewall configuration (or our
+regression tests) could get confused and/or end up doing the wrong thing.
+
+Dependencies
+------------
+
+N/A
+
+Documentation
+-------------
+
+TBD (very little)
+
+Status
+------
+
+Very basic functionality - starting/stopping bricks along with volumes,
+mounting, doing I/O - work.  Some features, especially snapshots, probably do
+not work.  Currently running tests to identify the precise extent of needed
+fixes.
+
+Comments and Discussion
+-----------------------
+
+N/A
diff --git a/done/GlusterFS 3.10/readdir-ahead.md b/done/GlusterFS 3.10/readdir-ahead.md
new file mode 100644
index 0000000..71e5b62
--- /dev/null
+++ b/done/GlusterFS 3.10/readdir-ahead.md	
@@ -0,0 +1,167 @@
+Feature
+-------
+Improve directory enumeration performance
+
+Summary
+-------
+Improve directory enumeration performance by implementing parallel readdirp
+at the dht layer.
+
+Owners
+------
+
+Raghavendra G <rgowdapp@redhat.com>
+Poornima G <pgurusid@redhat.com>
+Rajesh Joseph <rjoseph@redhat.com>
+
+Current status
+--------------
+
+In development.
+
+Related Feature Requests and Bugs
+---------------------------------
+https://bugzilla.redhat.com/show_bug.cgi?id=1401812
+
+Detailed Description
+--------------------
+
+Currently readdirp is sequential at the dht layer.
+This makes find and recursive listing of small directories very slow
+(directory whose content can be accomodated in one readdirp call,
+eg: ~600 entries if buf size is 128k).
+
+The number of readdirp fops required to fetch the ls -l -R for nested
+directories is:
+no. of fops = (x + 1) * m * n
+n = number of bricks
+m = number of directories
+x = number of readdirp calls required to fetch the dentries completely
+(this depends on the size of the directory and the readdirp buf size)
+1 = readdirp fop that is sent to just detect the end of directory.
+
+Eg: Let's say, to list 800 directories with files ~300 each and readdirp
+buf size 128K, on distribute 6:
+(1+1) * 800 * 6 = 9600 fops
+
+And all the readdirp fops are sent in sequential manner to all the bricks.
+With parallel readdirp, the number of fops may not decrease drastically
+but since they are issued in parallel, it will increase the throughput.
+
+Why its not a straightforward problem to solve:
+One needs to briefly understand, how the directory offset is handled in dht.
+[1], [2], [3] are some of the links that will hint the same.
+- The d_off is in the order of bricks identfied by dht. Hence, the dentries
+should always be returned in the same order as bricks. i.e. brick2 entries
+shouldn't be returned before brick1 reaches EOD.
+- We cannot store any info of offset read so far etc. in inode_ctx or fd_ctx
+- In case of a very large directories, and readdirp buf too small to hold
+all the dentries in any brick, parallel readdirp is a overhead. Sequential
+readdirp best suits the large directories. This demands dht be aware of or
+speculate the directory size.
+
+There were two solutions that we evaluated:
+1. Change dht_readdirp itself to wind readdirp parallely
+   http://review.gluster.org/15160
+   http://review.gluster.org/15159
+   http://review.gluster.org/15169
+2. Load readd-ahead as a child of dht
+   http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:bug-1401812
+
+For the below mentioned reasons we go with the second approach suggested by
+Ragavendra G:
+-  It requires nil or very less changes in dht
+-  Along with empty/small directories it also benifits large directories
+The only slightly complecated part would be to tune the readdir-ahead
+buffer size for each instance.
+
+The perf gain observed is directly proportional to the:
+- Number of nodes in the cluster/Volume
+- Latency between client and each node in the volume.
+
+Some references:
+[1] http://review.gluster.org/#/c/4711
+[2] https://www.mail-archive.com/gluster-devel@gluster.org/msg02834.html
+[3] http://www.gluster.org/pipermail/gluster-devel/2015-January/043592.html
+
+Benefit to GlusterFS
+--------------------
+
+Improves directory enumeration performance in large clusters.
+
+Scope
+-----
+
+#### Nature of proposed change
+
+- Changes in readdir-ahead, dht xlators.
+- Change glusterd to load readdir-ahead as a child of dht
+  and without breaking upgrade and downgrade scenarios
+
+#### Implications on manageability
+
+N/A
+
+#### Implications on presentation layer
+
+N/A
+
+#### Implications on persistence layer
+
+N/A
+
+#### Implications on 'GlusterFS' backend
+
+N/A
+
+#### Modification to GlusterFS metadata
+
+N/A
+
+#### Implications on 'glusterd'
+
+GlusterD changes are integral to this feature, and described above.
+
+How To Test
+-----------
+
+For the most part, testing is of the "do no harm" sort; the most thorough test
+of this feature is to run our current regression suite.
+Some specific test cases include readdirp on all kind of volumes:
+- distribute
+- replicate
+- shard
+- disperse
+- tier
+Also, readdirp while:
+- rebalance in progress
+- tiering migration in progress
+- self heal in progress
+
+And all the test cases being run while the memory consumption of the process
+is monitored.
+
+User Experience
+---------------
+
+Faster directory enumeration
+
+Dependencies
+------------
+
+N/A
+
+Documentation
+-------------
+
+TBD (very little)
+
+Status
+------
+
+Development in progress
+
+Comments and Discussion
+-----------------------
+
+N/A
diff --git a/done/GlusterFS 3.10/rebalance-estimates.md b/done/GlusterFS 3.10/rebalance-estimates.md
new file mode 100644
index 0000000..2a2c299
--- /dev/null
+++ b/done/GlusterFS 3.10/rebalance-estimates.md	
@@ -0,0 +1,128 @@
+Feature
+-------
+
+Summary
+-------
+
+Provide a user interface to determine when the rebalance process will complete
+
+Owners
+------
+Nithya Balachandran <nbalacha@redhat.com>
+
+
+Current status
+--------------
+Patch being worked on.
+
+
+Related Feature Requests and Bugs
+---------------------------------
+https://bugzilla.redhat.com/show_bug.cgi?id=1396004
+Desc: RFE: An administrator friendly way to determine rebalance completion time
+
+
+Detailed Description
+--------------------
+The rebalance operation starts a rebalance process on each node of the volume.
+Each process scans the files and directories on the local subvols, fixes the layout
+for each directory and migrates files to their new hashed subvolumes based on the
+new layouts.
+
+Currently we do not have any way to determine how long the rebalance process will
+take to complete.
+
+The proposed approach is as follows:
+
+ 1. Determine the total number of files and directories on the local subvol
+ 2. Calculate the rate at which files have been processed since the rebalance started
+ 3. Calculate the time required to process all the files based on the rate calculated
+ 4. Send these values in the rebalance status response
+ 5. Calculate the maximum time required among all the rebalance processes
+ 6. Display the time required along with the rebalance status output
+
+
+The time taken is a factor or the number and size of the files and the number of directories.
+Determining the number of files and directories is difficult as Glusterfs currently
+does not keep track of the number of files on each brick.
+
+The current approach uses the statfs call to determine the number of used inodes
+and uses that number as a rough estimate of how many files/directories ae present
+on the brick. However, this number is not very accurate because the .glusterfs
+directory contributes heavily to this number.
+
+Benefit to GlusterFS
+--------------------
+Improves the usability of rebalance operations.
+Administrators can now determine how long a rebalance operation will take to complete
+allowing better planning.
+
+
+Scope
+-----
+
+#### Nature of proposed change
+
+Modifications required to the rebalance and the cli code.
+
+#### Implications on manageability
+
+gluster volume rebalance <volname> status output will be modified
+
+#### Implications on presentation layer
+
+None
+
+#### Implications on persistence layer
+
+None
+
+#### Implications on 'GlusterFS' backend
+
+None
+
+#### Modification to GlusterFS metadata
+
+None
+
+#### Implications on 'glusterd'
+
+None
+
+How To Test
+-----------
+
+Run a rebalance and compare the estimates with the time actually taken to complete
+the rebalance.
+
+The feature needs to be tested against large workloads to determine the accuracy
+of the calculated times.
+
+User Experience
+---------------
+
+Gluster volume rebalance <volname> status
+will display the expected time left for the rebalance process to complete
+
+
+Dependencies
+------------
+
+None
+
+Documentation
+-------------
+
+Documents to be updated with the changes in the rebalance status output.
+
+
+Status
+------
+In development.
+
+
+
+Comments and Discussion
+-----------------------
+
+*Follow here*
diff --git a/done/GlusterFS 3.10/tier_service.md b/done/GlusterFS 3.10/tier_service.md
new file mode 100644
index 0000000..47640ee
--- /dev/null
+++ b/done/GlusterFS 3.10/tier_service.md	
@@ -0,0 +1,130 @@
+Feature
+-------
+
+Tier as a daemon with the service framework of gluster.
+
+Summary
+-------
+
+Current tier process uses the same dht code. If any change is made to DHT
+it affects tier and vice versa. On an attempt to support add brick on tiered
+volume, we need a rebalance daemon. So the current tier daemon has to be
+separated from DHT. And so the new Daemon has been split from DHT and comes
+under the service framework.
+
+Owners
+------
+
+Dan Lambright <dlambrig@redhat.com>
+
+Hari Gowtham <hgowtham@redhatcom>
+
+Current status
+--------------
+
+In the current code, it doesn't fall under the service framework and this
+makes it hard for gluster to manage the daemon. Moving it into the gluster's
+service framework makes it easier to be managed.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+[BUG] https://bugzilla.redhat.com/show_bug.cgi?id=1313838
+
+Detailed Description
+--------------------
+
+This change is similar to the other daemons that come under service framework.
+The service framework takes care of :
+
+*) Spawning the daemon, killing it and other such processes.
+*) Volume set options.
+*) Restarting the daemon at two points
+        1) when gluster goes down and comes up.
+        2) to stop detach tier.
+*) Reconfigure is used to make volfile changes. The reconfigure checks if the
+daemons needs a restart or not and then does it as per the requirement.
+By doing this, we don’t restart the daemon everytime.
+*) Volume status lists the status of tier daemon as a process instead of
+a task.
+*) remove-brick and detach tier are separated from code level.
+
+With this patch the log, pid, and volfile are separated and put into respective
+directories.
+
+
+Benefit to GlusterFS
+--------------------
+
+Improved Stability, helps the glusterd to manage the daemon during situations
+like update, node down, and restart.
+
+Scope
+-----
+
+#### Nature of proposed change
+
+A new service will be made available. The existing code will be removed in a
+while to make DHT rebalance easy to maintain as the DHT and tier code are
+separated.
+
+#### Implications on manageability
+
+The older gluster commands are designed to be compatible with this change.
+
+#### Implications on presentation layer
+
+None.
+
+#### Implications on persistence layer
+
+None.
+
+#### Implications on 'GlusterFS' backend
+
+Remains the same as for Tier.
+
+#### Modification to GlusterFS metadata
+
+None.
+
+#### Implications on 'glusterd'
+
+The data related to tier is made persistent (will be available after reboot).
+The brick op phase being different for Tier (brick op phase was earlier used
+to communicate with the daemon instead of bricks) has been implemented in
+the commit phase.
+The volfile changes for setting the options are also take care of using the
+service framework.
+
+How To Test
+-----------
+
+The basic tier commands need to be tested as it doesn't change much
+in the user perspective. The same test (like attaching tier, detaching it,
+status) used for testing tier have to be used.
+
+User Experience
+---------------
+
+No changes.
+
+Dependencies
+------------
+
+None.
+
+Documentation
+-------------
+
+https://docs.google.com/document/d/1_iyjiwTLnBJlCiUgjAWnpnPD801h5LNxLhHmN7zmk1o/edit?usp=sharing
+
+Status
+------
+
+Code being reviewed.
+
+Comments and Discussion
+-----------------------
+
+*Follow here*
diff --git a/under_review/client-opversion.md b/under_review/client-opversion.md
deleted file mode 100644
index 8c9991e..0000000
--- a/under_review/client-opversion.md
+++ /dev/null
@@ -1,111 +0,0 @@
-Feature
--------
-
-Summary
--------
-
-Support to get the op-version information for each client through the volume
-status command.
-
-Owners
-------
-
-Samikshan Bairagya <samikshan@gmail.com>
-
-Current status
---------------
-
-Currently the only way to get an idea regarding the version of the connected
-clients is to grep for "accepted client from" in /var/log/glusterfs/bricks.
-There is no command that gives that information out to the users.
-
-Related Feature Requests and Bugs
----------------------------------
-
-https://bugzilla.redhat.com/show_bug.cgi?id=1409078
-
-Detailed Description
---------------------
-
-The op-version information for each client can be added to the already existing
-volume status command. `volume status <VOLNAME|all> clients` currently gives the
-following information for each client:
-
-* Hostname:port
-* Bytes Read
-* Bytes Written
-
-Benefit to GlusterFS
---------------------
-
-This would make the user-experience better as it would make it easier for users
-to know the op-version of each client from a single command.
-
-Scope
------
-
-#### Nature of proposed change
-
-Adds more information to `volume status <VOLNAME|all> clients` output.
-
-#### Implications on manageability
-
-None.
-
-#### Implications on presentation layer
-
-None.
-
-#### Implications on persistence layer
-
-None.
-
-#### Implications on 'GlusterFS' backend
-
-None.
-
-#### Modification to GlusterFS metadata
-
-None.
-
-#### Implications on 'glusterd'
-
-None.
-
-How To Test
------------
-
-This can be tested by having clients with different glusterfs versions connected
-to running volumes, and executing the `volume status <VOLNAME|all> clients`
-command.
-
-User Experience
----------------
-
-Users can use the `volume status <VOLNAME|all> clients` command to get
-information on the op-versions for each client along with information that were
-already available like (hostname, bytes read and bytes written).
-
-Dependencies
-------------
-
-None
-
-Documentation
--------------
-
-None.
-
-Status
-------
-
-In development.
-
-Comments and Discussion
------------------------
-
-  1. Discussion on gluster-devel ML:
-    - [Thread 1](http://www.gluster.org/pipermail/gluster-users/2016-January/025064.html)
-    - [Thread 2](http://www.gluster.org/pipermail/gluster-devel/2017-January/051820.html)
-  2. [Discussion on Github](https://github.com/gluster/glusterfs/issues/79)
-
diff --git a/under_review/max-opversion.md b/under_review/max-opversion.md
deleted file mode 100644
index 16d4ee4..0000000
--- a/under_review/max-opversion.md
+++ /dev/null
@@ -1,118 +0,0 @@
-Feature
--------
-
-Summary
--------
-
-Support to retrieve the maximum supported op-version (cluster.op-version) in a
-heterogeneous cluster.
-
-Owners
-------
-
-Samikshan Bairagya <samikshan@gmail.com>
-
-Current status
---------------
-
-Currently users can retrieve the op-version on which a cluster is operating by
-using the gluster volume get command on the global option cluster.op-version as
-follows:
-
-# gluster volume get <volname> cluster.op-version
-
-There is however no way for an user to find out the maximum op-version to which
-the cluster could be bumped upto.
-
-Related Feature Requests and Bugs
----------------------------------
-
-https://bugzilla.redhat.com/show_bug.cgi?id=1365822
-
-Detailed Description
---------------------
-
-A heterogeneous cluster operates on a common op-version that can be supported
-across all the nodes in the trusted storage pool.Upon upgrade of the nodes in
-the cluster, the cluster might support a higher op-version. However, since it
-is currently not possible for the user to get this op-version value, it is
-difficult for them to bump up the op-version of the cluster to the supported
-value.
-
-The maximum supported op-version in a cluster would be the minimum of the
-maximum op-versions in each of the nodes. To retrieve this, the volume get
-functionality could be invoked as follows:
-
-# gluster volume get all cluster.max-op-version
-
-Benefit to GlusterFS
---------------------
-
-This would make the user-experience better as it would make it easier for users
-to know the maximum op-version on which the cluster can operate.
-
-Scope
------
-
-#### Nature of proposed change
-
-This adds a new non-settable global option, cluster.max-op-version.
-
-#### Implications on manageability
-
-None.
-
-#### Implications on presentation layer
-
-None.
-
-#### Implications on persistence layer
-
-None.
-
-#### Implications on 'GlusterFS' backend
-
-None.
-
-#### Modification to GlusterFS metadata
-
-None.
-
-#### Implications on 'glusterd'
-
-None.
-
-How To Test
------------
-
-This can be tested on a cluster with at least one node running on version 'n+1'
-and others on version 'n' where n = 3.10. The maximum supported op-version
-(cluster.max-op-version) should be returned by `volume get` as n in this case.
-
-User Experience
----------------
-
-Upon upgrade of one or more nodes in a cluster, users can get the new maximum
-op-version the cluster can support.
-
-Dependencies
-------------
-
-None
-
-Documentation
--------------
-
-None.
-
-Status
-------
-
-In development.
-
-Comments and Discussion
------------------------
-
-  1. [Discussion on gluster-devel ML](http://www.gluster.org/pipermail/gluster-devel/2016-December/051650.html)
-  2. [Discussion on Github](https://github.com/gluster/glusterfs/issues/56)
-
diff --git a/under_review/multiplexing.md b/under_review/multiplexing.md
deleted file mode 100644
index fd06150..0000000
--- a/under_review/multiplexing.md
+++ /dev/null
@@ -1,141 +0,0 @@
-Feature
--------
-Brick Multiplexing
-
-Summary
--------
-
-Use one process (and port) to serve multiple bricks.
-
-Owners
-------
-
-Jeff Darcy (jdarcy@redhat.com)
-
-Current status
---------------
-
-In development.
-
-Related Feature Requests and Bugs
----------------------------------
-
-Mostly N/A, except that this will make implementing real QoS easier at some
-point in the future.
-
-Detailed Description
---------------------
-
-The basic idea is very simple: instead of spawning a new process for every
-brick, we send an RPC to an existing brick process telling it to attach the new
-brick (identified and described by a volfile) beneath its protocol/server
-instance.  Likewise, instead of killing a process to terminate a brick, we tell
-it to detach one of its (possibly several) brick translator stacks.
-
-Bricks can *not* share a process if they use incompatible transports (e.g. TLS
-vs. non-TLS).  Also, a brick process serving several bricks is a larger failure
-domain than we have with a process per brick, so we might voluntarily decide to
-spawn a new process anyway just to keep the failure domains smaller.  Lastly,
-there should always be a fallback to current brick-per-process behavior, by
-simply pretending that all bricks' transports are incompatible with each other.
-
-Benefit to GlusterFS
---------------------
-
-Multiplexing should significantly reduce resource consumption:
-
- * Each *process* will consume one TCP port, instead of each *brick* doing so.
-
- * The cost of global data structures and object pools will be reduced to 1/N
-   of what it is now, where N is the average number of bricks per process.
-
- * Thread counts will also be reduced to 1/N.  This avoids the exponentially
-   bad thrashing effects as the total number of threads far exceeds the number
-   of cores, made worse by multiple processes trying to auto-scale the nunber
-   of network and disk I/O threads independently.
-
-These resource issues are already limiting the number of bricks and volumes we
-can support.  By reducing all forms of resource consumption at once, we should
-be able to raise these user-visible limits by a corresponding amount.
-
-Scope
------
-
-#### Nature of proposed change
-
-The largest changes are at the two places where we do brick and process
-management - GlusterD at one end, generic glusterfsd code at the other.  The
-new messages require changes to rpc and client/server translator code.  The
-server translator needs further changes to look up one among several child
-translators instead of assuming only one.  Auth code must be changed to handle
-separate permissions/credentials on each brick.
-
-Beyond these "obvious" changes, many lesser changes will undoubtedly be needed
-anywhere that we make assumptions about the relationships between bricks and
-processes.  Anything that involves a "helper" daemon - e.g. self-heal, quota -
-is particularly suspect in this regard.
-
-#### Implications on manageability
-
-The fact that bricks can only share a process when they have compatible
-transports might affect decisions about what transport options to use for
-separate volumes.
-
-#### Implications on presentation layer
-
-N/A
-
-#### Implications on persistence layer
-
-N/A
-
-#### Implications on 'GlusterFS' backend
-
-N/A
-
-#### Modification to GlusterFS metadata
-
-N/A
-
-#### Implications on 'glusterd'
-
-GlusterD changes are integral to this feature, and described above.
-
-How To Test
------------
-
-For the most part, testing is of the "do no harm" sort; the most thorough test
-of this feature is to run our current regression suite.  Only one additional
-test is needed - create/start a volume with multiple bricks on one node, and
-check that only one glusterfsd process is running.
-
-User Experience
----------------
-
-Volume status can now include the possibly-surprising result of multiple bricks
-on the same node having the same port number and PID.  Anything that relies on
-these values, such as monitoring or automatic firewall configuration (or our
-regression tests) could get confused and/or end up doing the wrong thing.
-
-Dependencies
-------------
-
-N/A
-
-Documentation
--------------
-
-TBD (very little)
-
-Status
-------
-
-Very basic functionality - starting/stopping bricks along with volumes,
-mounting, doing I/O - work.  Some features, especially snapshots, probably do
-not work.  Currently running tests to identify the precise extent of needed
-fixes.
-
-Comments and Discussion
------------------------
-
-N/A
diff --git a/under_review/readdir-ahead.md b/under_review/readdir-ahead.md
deleted file mode 100644
index 71e5b62..0000000
--- a/under_review/readdir-ahead.md
+++ /dev/null
@@ -1,167 +0,0 @@
-Feature
--------
-Improve directory enumeration performance
-
-Summary
--------
-Improve directory enumeration performance by implementing parallel readdirp
-at the dht layer.
-
-Owners
-------
-
-Raghavendra G <rgowdapp@redhat.com>
-Poornima G <pgurusid@redhat.com>
-Rajesh Joseph <rjoseph@redhat.com>
-
-Current status
---------------
-
-In development.
-
-Related Feature Requests and Bugs
----------------------------------
-https://bugzilla.redhat.com/show_bug.cgi?id=1401812
-
-Detailed Description
---------------------
-
-Currently readdirp is sequential at the dht layer.
-This makes find and recursive listing of small directories very slow
-(directory whose content can be accomodated in one readdirp call,
-eg: ~600 entries if buf size is 128k).
-
-The number of readdirp fops required to fetch the ls -l -R for nested
-directories is:
-no. of fops = (x + 1) * m * n
-n = number of bricks
-m = number of directories
-x = number of readdirp calls required to fetch the dentries completely
-(this depends on the size of the directory and the readdirp buf size)
-1 = readdirp fop that is sent to just detect the end of directory.
-
-Eg: Let's say, to list 800 directories with files ~300 each and readdirp
-buf size 128K, on distribute 6:
-(1+1) * 800 * 6 = 9600 fops
-
-And all the readdirp fops are sent in sequential manner to all the bricks.
-With parallel readdirp, the number of fops may not decrease drastically
-but since they are issued in parallel, it will increase the throughput.
-
-Why its not a straightforward problem to solve:
-One needs to briefly understand, how the directory offset is handled in dht.
-[1], [2], [3] are some of the links that will hint the same.
-- The d_off is in the order of bricks identfied by dht. Hence, the dentries
-should always be returned in the same order as bricks. i.e. brick2 entries
-shouldn't be returned before brick1 reaches EOD.
-- We cannot store any info of offset read so far etc. in inode_ctx or fd_ctx
-- In case of a very large directories, and readdirp buf too small to hold
-all the dentries in any brick, parallel readdirp is a overhead. Sequential
-readdirp best suits the large directories. This demands dht be aware of or
-speculate the directory size.
-
-There were two solutions that we evaluated:
-1. Change dht_readdirp itself to wind readdirp parallely
-   http://review.gluster.org/15160
-   http://review.gluster.org/15159
-   http://review.gluster.org/15169
-2. Load readd-ahead as a child of dht
-   http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:bug-1401812
-
-For the below mentioned reasons we go with the second approach suggested by
-Ragavendra G:
--  It requires nil or very less changes in dht
--  Along with empty/small directories it also benifits large directories
-The only slightly complecated part would be to tune the readdir-ahead
-buffer size for each instance.
-
-The perf gain observed is directly proportional to the:
-- Number of nodes in the cluster/Volume
-- Latency between client and each node in the volume.
-
-Some references:
-[1] http://review.gluster.org/#/c/4711
-[2] https://www.mail-archive.com/gluster-devel@gluster.org/msg02834.html
-[3] http://www.gluster.org/pipermail/gluster-devel/2015-January/043592.html
-
-Benefit to GlusterFS
---------------------
-
-Improves directory enumeration performance in large clusters.
-
-Scope
------
-
-#### Nature of proposed change
-
-- Changes in readdir-ahead, dht xlators.
-- Change glusterd to load readdir-ahead as a child of dht
-  and without breaking upgrade and downgrade scenarios
-
-#### Implications on manageability
-
-N/A
-
-#### Implications on presentation layer
-
-N/A
-
-#### Implications on persistence layer
-
-N/A
-
-#### Implications on 'GlusterFS' backend
-
-N/A
-
-#### Modification to GlusterFS metadata
-
-N/A
-
-#### Implications on 'glusterd'
-
-GlusterD changes are integral to this feature, and described above.
-
-How To Test
------------
-
-For the most part, testing is of the "do no harm" sort; the most thorough test
-of this feature is to run our current regression suite.
-Some specific test cases include readdirp on all kind of volumes:
-- distribute
-- replicate
-- shard
-- disperse
-- tier
-Also, readdirp while:
-- rebalance in progress
-- tiering migration in progress
-- self heal in progress
-
-And all the test cases being run while the memory consumption of the process
-is monitored.
-
-User Experience
----------------
-
-Faster directory enumeration
-
-Dependencies
-------------
-
-N/A
-
-Documentation
--------------
-
-TBD (very little)
-
-Status
-------
-
-Development in progress
-
-Comments and Discussion
------------------------
-
-N/A
diff --git a/under_review/rebalance-estimates.md b/under_review/rebalance-estimates.md
deleted file mode 100644
index 2a2c299..0000000
--- a/under_review/rebalance-estimates.md
+++ /dev/null
@@ -1,128 +0,0 @@
-Feature
--------
-
-Summary
--------
-
-Provide a user interface to determine when the rebalance process will complete
-
-Owners
-------
-Nithya Balachandran <nbalacha@redhat.com>
-
-
-Current status
---------------
-Patch being worked on.
-
-
-Related Feature Requests and Bugs
----------------------------------
-https://bugzilla.redhat.com/show_bug.cgi?id=1396004
-Desc: RFE: An administrator friendly way to determine rebalance completion time
-
-
-Detailed Description
---------------------
-The rebalance operation starts a rebalance process on each node of the volume.
-Each process scans the files and directories on the local subvols, fixes the layout
-for each directory and migrates files to their new hashed subvolumes based on the
-new layouts.
-
-Currently we do not have any way to determine how long the rebalance process will
-take to complete.
-
-The proposed approach is as follows:
-
- 1. Determine the total number of files and directories on the local subvol
- 2. Calculate the rate at which files have been processed since the rebalance started
- 3. Calculate the time required to process all the files based on the rate calculated
- 4. Send these values in the rebalance status response
- 5. Calculate the maximum time required among all the rebalance processes
- 6. Display the time required along with the rebalance status output
-
-
-The time taken is a factor or the number and size of the files and the number of directories.
-Determining the number of files and directories is difficult as Glusterfs currently
-does not keep track of the number of files on each brick.
-
-The current approach uses the statfs call to determine the number of used inodes
-and uses that number as a rough estimate of how many files/directories ae present
-on the brick. However, this number is not very accurate because the .glusterfs
-directory contributes heavily to this number.
-
-Benefit to GlusterFS
---------------------
-Improves the usability of rebalance operations.
-Administrators can now determine how long a rebalance operation will take to complete
-allowing better planning.
-
-
-Scope
------
-
-#### Nature of proposed change
-
-Modifications required to the rebalance and the cli code.
-
-#### Implications on manageability
-
-gluster volume rebalance <volname> status output will be modified
-
-#### Implications on presentation layer
-
-None
-
-#### Implications on persistence layer
-
-None
-
-#### Implications on 'GlusterFS' backend
-
-None
-
-#### Modification to GlusterFS metadata
-
-None
-
-#### Implications on 'glusterd'
-
-None
-
-How To Test
------------
-
-Run a rebalance and compare the estimates with the time actually taken to complete
-the rebalance.
-
-The feature needs to be tested against large workloads to determine the accuracy
-of the calculated times.
-
-User Experience
----------------
-
-Gluster volume rebalance <volname> status
-will display the expected time left for the rebalance process to complete
-
-
-Dependencies
-------------
-
-None
-
-Documentation
--------------
-
-Documents to be updated with the changes in the rebalance status output.
-
-
-Status
-------
-In development.
-
-
-
-Comments and Discussion
------------------------
-
-*Follow here*
diff --git a/under_review/tier_service.md b/under_review/tier_service.md
deleted file mode 100644
index 47640ee..0000000
--- a/under_review/tier_service.md
+++ /dev/null
@@ -1,130 +0,0 @@
-Feature
--------
-
-Tier as a daemon with the service framework of gluster.
-
-Summary
--------
-
-Current tier process uses the same dht code. If any change is made to DHT
-it affects tier and vice versa. On an attempt to support add brick on tiered
-volume, we need a rebalance daemon. So the current tier daemon has to be
-separated from DHT. And so the new Daemon has been split from DHT and comes
-under the service framework.
-
-Owners
-------
-
-Dan Lambright <dlambrig@redhat.com>
-
-Hari Gowtham <hgowtham@redhatcom>
-
-Current status
---------------
-
-In the current code, it doesn't fall under the service framework and this
-makes it hard for gluster to manage the daemon. Moving it into the gluster's
-service framework makes it easier to be managed.
-
-Related Feature Requests and Bugs
----------------------------------
-
-[BUG] https://bugzilla.redhat.com/show_bug.cgi?id=1313838
-
-Detailed Description
---------------------
-
-This change is similar to the other daemons that come under service framework.
-The service framework takes care of :
-
-*) Spawning the daemon, killing it and other such processes.
-*) Volume set options.
-*) Restarting the daemon at two points
-        1) when gluster goes down and comes up.
-        2) to stop detach tier.
-*) Reconfigure is used to make volfile changes. The reconfigure checks if the
-daemons needs a restart or not and then does it as per the requirement.
-By doing this, we don’t restart the daemon everytime.
-*) Volume status lists the status of tier daemon as a process instead of
-a task.
-*) remove-brick and detach tier are separated from code level.
-
-With this patch the log, pid, and volfile are separated and put into respective
-directories.
-
-
-Benefit to GlusterFS
---------------------
-
-Improved Stability, helps the glusterd to manage the daemon during situations
-like update, node down, and restart.
-
-Scope
------
-
-#### Nature of proposed change
-
-A new service will be made available. The existing code will be removed in a
-while to make DHT rebalance easy to maintain as the DHT and tier code are
-separated.
-
-#### Implications on manageability
-
-The older gluster commands are designed to be compatible with this change.
-
-#### Implications on presentation layer
-
-None.
-
-#### Implications on persistence layer
-
-None.
-
-#### Implications on 'GlusterFS' backend
-
-Remains the same as for Tier.
-
-#### Modification to GlusterFS metadata
-
-None.
-
-#### Implications on 'glusterd'
-
-The data related to tier is made persistent (will be available after reboot).
-The brick op phase being different for Tier (brick op phase was earlier used
-to communicate with the daemon instead of bricks) has been implemented in
-the commit phase.
-The volfile changes for setting the options are also take care of using the
-service framework.
-
-How To Test
------------
-
-The basic tier commands need to be tested as it doesn't change much
-in the user perspective. The same test (like attaching tier, detaching it,
-status) used for testing tier have to be used.
-
-User Experience
----------------
-
-No changes.
-
-Dependencies
-------------
-
-None.
-
-Documentation
--------------
-
-https://docs.google.com/document/d/1_iyjiwTLnBJlCiUgjAWnpnPD801h5LNxLhHmN7zmk1o/edit?usp=sharing
-
-Status
-------
-
-Code being reviewed.
-
-Comments and Discussion
------------------------
-
-*Follow here*
-- 
cgit