From 1ab87415ec80c541cfba9e86823ebb4d6ffbc5dc Mon Sep 17 00:00:00 2001
From: Jeff Darcy <jdarcy@redhat.com>
Date: Thu, 28 Jul 2016 12:58:22 -0400
Subject: Add brick-multiplexing page.

Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Change-Id: I4ea5a4c0bd78a140bf8e94ef614bb5a4f1917a3f
Reviewed-on: http://review.gluster.org/15038
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Joe Julian <me@joejulian.name>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Shyamsundar Ranganathan <srangana@redhat.com>
---
 under_review/multiplexing.md | 141 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 141 insertions(+)
 create mode 100644 under_review/multiplexing.md

diff --git a/under_review/multiplexing.md b/under_review/multiplexing.md
new file mode 100644
index 0000000..fd06150
--- /dev/null
+++ b/under_review/multiplexing.md
@@ -0,0 +1,141 @@
+Feature
+-------
+Brick Multiplexing
+
+Summary
+-------
+
+Use one process (and port) to serve multiple bricks.
+
+Owners
+------
+
+Jeff Darcy (jdarcy@redhat.com)
+
+Current status
+--------------
+
+In development.
+
+Related Feature Requests and Bugs
+---------------------------------
+
+Mostly N/A, except that this will make implementing real QoS easier at some
+point in the future.
+
+Detailed Description
+--------------------
+
+The basic idea is very simple: instead of spawning a new process for every
+brick, we send an RPC to an existing brick process telling it to attach the new
+brick (identified and described by a volfile) beneath its protocol/server
+instance.  Likewise, instead of killing a process to terminate a brick, we tell
+it to detach one of its (possibly several) brick translator stacks.
+
+Bricks can *not* share a process if they use incompatible transports (e.g. TLS
+vs. non-TLS).  Also, a brick process serving several bricks is a larger failure
+domain than we have with a process per brick, so we might voluntarily decide to
+spawn a new process anyway just to keep the failure domains smaller.  Lastly,
+there should always be a fallback to current brick-per-process behavior, by
+simply pretending that all bricks' transports are incompatible with each other.
+
+Benefit to GlusterFS
+--------------------
+
+Multiplexing should significantly reduce resource consumption:
+
+ * Each *process* will consume one TCP port, instead of each *brick* doing so.
+
+ * The cost of global data structures and object pools will be reduced to 1/N
+   of what it is now, where N is the average number of bricks per process.
+
+ * Thread counts will also be reduced to 1/N.  This avoids the exponentially
+   bad thrashing effects as the total number of threads far exceeds the number
+   of cores, made worse by multiple processes trying to auto-scale the nunber
+   of network and disk I/O threads independently.
+
+These resource issues are already limiting the number of bricks and volumes we
+can support.  By reducing all forms of resource consumption at once, we should
+be able to raise these user-visible limits by a corresponding amount.
+
+Scope
+-----
+
+#### Nature of proposed change
+
+The largest changes are at the two places where we do brick and process
+management - GlusterD at one end, generic glusterfsd code at the other.  The
+new messages require changes to rpc and client/server translator code.  The
+server translator needs further changes to look up one among several child
+translators instead of assuming only one.  Auth code must be changed to handle
+separate permissions/credentials on each brick.
+
+Beyond these "obvious" changes, many lesser changes will undoubtedly be needed
+anywhere that we make assumptions about the relationships between bricks and
+processes.  Anything that involves a "helper" daemon - e.g. self-heal, quota -
+is particularly suspect in this regard.
+
+#### Implications on manageability
+
+The fact that bricks can only share a process when they have compatible
+transports might affect decisions about what transport options to use for
+separate volumes.
+
+#### Implications on presentation layer
+
+N/A
+
+#### Implications on persistence layer
+
+N/A
+
+#### Implications on 'GlusterFS' backend
+
+N/A
+
+#### Modification to GlusterFS metadata
+
+N/A
+
+#### Implications on 'glusterd'
+
+GlusterD changes are integral to this feature, and described above.
+
+How To Test
+-----------
+
+For the most part, testing is of the "do no harm" sort; the most thorough test
+of this feature is to run our current regression suite.  Only one additional
+test is needed - create/start a volume with multiple bricks on one node, and
+check that only one glusterfsd process is running.
+
+User Experience
+---------------
+
+Volume status can now include the possibly-surprising result of multiple bricks
+on the same node having the same port number and PID.  Anything that relies on
+these values, such as monitoring or automatic firewall configuration (or our
+regression tests) could get confused and/or end up doing the wrong thing.
+
+Dependencies
+------------
+
+N/A
+
+Documentation
+-------------
+
+TBD (very little)
+
+Status
+------
+
+Very basic functionality - starting/stopping bricks along with volumes,
+mounting, doing I/O - work.  Some features, especially snapshots, probably do
+not work.  Currently running tests to identify the precise extent of needed
+fixes.
+
+Comments and Discussion
+-----------------------
+
+N/A
-- 
cgit