summaryrefslogtreecommitdiffstats
path: root/doc/hacker-guide/write-behind.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/hacker-guide/write-behind.txt')
-rw-r--r--doc/hacker-guide/write-behind.txt45
1 files changed, 45 insertions, 0 deletions
diff --git a/doc/hacker-guide/write-behind.txt b/doc/hacker-guide/write-behind.txt
new file mode 100644
index 00000000000..498e95480ae
--- /dev/null
+++ b/doc/hacker-guide/write-behind.txt
@@ -0,0 +1,45 @@
+basic working
+--------------
+
+ write behind is basically a translator to lie to the application that the write-requests are finished, even before it is actually finished.
+
+ on a regular translator tree without write-behind, control flow is like this:
+
+ 1. application makes a write() system call.
+ 2. VFS ==> FUSE ==> /dev/fuse.
+ 3. fuse-bridge initiates a glusterfs writev() call.
+ 4. writev() is STACK_WIND()ed upto client-protocol or storage translator.
+ 5. client-protocol, on recieving reply from server, starts STACK_UNWIND() towards the fuse-bridge.
+
+ on a translator tree with write-behind, control flow is like this:
+
+ 1. application makes a write() system call.
+ 2. VFS ==> FUSE ==> /dev/fuse.
+ 3. fuse-bridge initiates a glusterfs writev() call.
+ 4. writev() is STACK_WIND()ed upto write-behind translator.
+ 5. write-behind adds the write buffer to its internal queue and does a STACK_UNWIND() towards the fuse-bridge.
+
+ write call is completed in application's percepective. after STACK_UNWIND()ing towards the fuse-bridge, write-behind initiates a fresh writev() call to its child translator, whose replies will be consumed by write-behind itself. write-behind _doesn't_ cache the write buffer, unless 'option flush-behind on' is specified in volume specification file.
+
+windowing
+---------
+
+ write respect to write-behind, each write-buffer has three flags: 'stack_wound', 'write_behind' and 'got_reply'.
+
+ stack_wound: if set, indicates that write-behind has initiated STACK_WIND() towards child translator.
+
+ write_behind: if set, indicates that write-behind has done STACK_UNWIND() towards fuse-bridge.
+
+ got_reply: if set, indicates that write-behind has recieved reply from child translator for a writev() STACK_WIND(). a request will be destroyed by write-behind only if this flag is set.
+
+ currently pending write requests = aggregate size of requests with write_behind = 1 and got_reply = 0.
+
+ window size limits the aggregate size of currently pending write requests. once the pending requests' size has reached the window size, write-behind blocks writev() calls from fuse-bridge.
+ blocking is only from application's perspective. write-behind does STACK_WIND() to child translator straight-away, but hold behind the STACK_UNWIND() towards fuse-bridge. STACK_UNWIND() is done only once write-behind gets enough replies to accomodate for currently blocked request.
+
+flush behind
+------------
+
+ if 'option flush-behind on' is specified in volume specification file, then write-behind sends aggregate write requests to child translator, instead of regular per request STACK_WIND()s.
+
+