1 files changed, 211 insertions, 0 deletions
diff --git a/doc/developer-guide/fuse-interrupt.md b/doc/developer-guide/fuse-interrupt.md
new file mode 100644
index 00000000000..ec991b81ec5
--- /dev/null
+++ b/doc/developer-guide/fuse-interrupt.md
@@ -0,0 +1,211 @@
+# Fuse interrupt handling
+
+## Conventions followed
+
+- *FUSE* refers to the "wire protocol" between kernel and userspace and
+  related specifications.
+- *fuse* refers to the kernel subsystem and also to the GlusterFs translator.
+
+## FUSE interrupt handling spec
+
+The [Linux kernel FUSE documentation](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/fuse.txt?h=v4.18#n148)
+desrcibes how interrupt handling happens in fuse.
+
+## Interrupt handling in the fuse translator
+
+### Declarations
+
+This document describes the internal API in the fuse translator with which
+interrupt can be handled.
+
+The API being internal (to be used only in fuse-bridge.c; the functions are
+not exported to a header file).
+
+```
+enum fuse_interrupt_state {
+    /* ... */
+    INTERRUPT_SQUELCHED,
+    INTERRUPT_HANDLED,
+    /* ... */
+};
+typedef enum fuse_interrupt_state fuse_interrupt_state_t;
+struct fuse_interrupt_record;
+typedef struct fuse_interrupt_record fuse_interrupt_record_t;
+typedef void (*fuse_interrupt_handler_t)(xlator_t *this,
+                                         fuse_interrupt_record_t *);
+struct fuse_interrupt_record {
+    fuse_in_header_t fuse_in_header;
+    void *data;
+    /*
+       ...
+     */
+};
+
+fuse_interrupt_record_t *
+fuse_interrupt_record_new(fuse_in_header_t *finh,
+                          fuse_interrupt_handler_t handler);
+
+void
+fuse_interrupt_record_insert(xlator_t *this, fuse_interrupt_record_t *fir);
+
+gf_boolean_t
+fuse_interrupt_finish_fop(call_frame_t *frame, xlator_t *this,
+                          gf_boolean_t sync, void **datap);
+
+void
+fuse_interrupt_finish_interrupt(xlator_t *this, fuse_interrupt_record_t *fir,
+                                fuse_interrupt_state_t intstat,
+                                gf_boolean_t sync, void **datap);
+```
+
+The code demonstrates the usage of the API through `fuse_flush()`. (It's a
+dummy implementation only for demonstration purposes.) Flush is chosen
+because a `FLUSH` interrupt is easy to trigger (see
+*tests/features/interrupt.t*). Interrupt handling for flush is switched on
+by `--fuse-flush-handle-interrupt` (a hidden glusterfs command line flag).
+The implementation of flush interrupt is contained in the
+`fuse_flush_interrupt_handler()` function and blocks guarded by the
+
+```
+if (priv->flush_handle_interrupt) { ...
+```
+
+conditional (where `priv` is a `*fuse_private_t`).
+
+### Overview
+
+"Regular" fuse fops and interrupt handlers interact via a list containing
+interrupt records.
+
+If a fop wishes to have its interrupts handled, it needs to set up an
+interrupt record and insert it into the list; also when it's to finish
+(ie. in its "cbk" stage) it needs to delete the record from the list.
+
+If no interrupt happens, basically that's all to it - a list insertion
+and deletion.
+
+However, if an interrupt comes for the fop, the interrupt FUSE request
+will carry the data identifying an ongoing fop (that is, its `unique`),
+and based on that, the interrupt record will be looked up in the list, and
+the specific interrupt handler (a member of the interrupt record) will be
+called.
+
+Usually the fop needs to share some data with the interrupt handler to
+enable it to perform its task (also shared via the interrupt record).
+The interrupt API offers two approaches to manage shared data:
+- _Async or reference-counting strategy_: from the point on when the interrupt
+  record is inserted to the list, it's owned jointly by the regular fop and
+  the prospective interrupt handler. Both of them need to check before they
+  return if the other is still holding a reference; if not, then they are
+  responsible for reclaiming the shared data.
+- _Sync or borrow strategy_: the interrupt handler is considered a borrower
+  of the shared data. The interrupt handler should not reclaim the shared
+  data. The fop will wait for the interrupt handler to finish (ie., the borrow
+  to be returned), then it has to reclaim the shared data.
+
+The user of the interrupt API need to call the following functions to
+instrument this control flow:
+- `fuse_interrupt_record_insert()` in the fop to insert the interrupt record to
+   the list;
+- `fuse_interrupt_finish_fop()`in the fop (cbk) and
+- `fuse_interrupt_finish_interrupt()`in the interrupt handler
+
+to perform needed synchronization at the end their tenure. The data management
+strategies are implemented by the `fuse_interrupt_finish_*()` functions (which
+have an argument to specify which strategy to use); these routines take care
+of freeing the interrupt record itself, while the reclamation of the shared data
+is left to the API user.
+
+### Usage
+
+A given FUSE fop can be enabled to handle interrupts via the following
+steps:
+
+- Define a handler function (of type `fuse_interrupt_handler_t`).
+  It should implement the interrupt handling logic and in the end
+  call (directly or as async callback) `fuse_interrupt_finish_interrupt()`.
+  The `intstat` argument to `fuse_interrupt_finish_interrupt` should be
+  either `INTERRUPT_SQUELCHED` or `INTERRUPT_HANDLED`.
+    - `INTERRUPT_SQUELCHED` means that the interrupt could not be delivered
+      and the fop is going on uninterrupted.
+    - `INTERRUPT_HANDLED` means that the interrupt was actually handled. In
+      this case the fop will be answered from interrupt context with errno
+      `EINTR` (that is, the fop should not send a response to the kernel).
+
+  (the enum `fuse_interrupt_state` includes further members, which are reserved
+  for internal use).
+
+  We return to the `sync` and `datap` arguments later.
+- In the `fuse_<FOP>` function create an interrupt record using
+  `fuse_interrupt_record_new()`, passing the incoming `fuse_in_header` and
+  the above handler function to it.
+    - Arbitrary further data can be referred to via the `data` member of the
+      interrupt record that is to be passed on from fop context to
+      interrupt context.
+- When it's set up, pass the interrupt record to
+  `fuse_interrupt_record_insert()`.
+- In `fuse_<FOP>_cbk` call `fuse_interrupt_finish_fop()`.
+    - `fuse_interrupt_finish_fop()` returns a Boolean according to whether the
+      interrupt was handled. If it was, then the FUSE request is already
+      answered and the stack gets destroyed in `fuse_interrupt_finish_fop` so
+      `fuse_<FOP>_cbk()` can just return (zero). Otherwise follow the standard
+      cbk logic (answer the FUSE request and destroy the stack -- these are
+      typically accomplished by `fuse_err_cbk()`).
+- The last two argument of `fuse_interrupt_finish_fop()` and
+  `fuse_interrupt_finish_interrupt()` are `gf_boolean_t sync` and
+  `void **datap`.
+    - `sync` represents the strategy for freeing the interrupt record. The
+      interrupt handler and the fop handler are in race to get at the interrupt
+      record first (interrupt handler for purposes of doing the interrupt
+      handling, fop handler for purposes of deactivating the interrupt record
+      upon completion of the fop handling).
+        - If `sync` is true, then the fop handler will wait for the interrupt
+          handler to finish and it takes care of freeing.
+        - If `sync` is false, the loser of the above race will perform freeing.
+
+      Freeing is done within the respective interrupt finish routines, except
+      for the `data` field of the interrupt record; with respect to that, see
+      the discussion of the `datap` parameter below. The strategy has to be
+      consensual, that is, `fuse_interrupt_finish_fop()` and
+      `fuse_interrupt_finish_interrupt()` must pass the same value for `sync`.
+      If dismantling the resources associated with the interrupt record is
+      simple, `sync = _gf_false` is the suggested choice; `sync = _gf_true` can
+      be useful in the opposite case, when dismantling those resources would
+      be inconvenient to implement in two places or to enact in non-fop context.
+    - If `datap` is `NULL`, the `data` member of the interrupt record will be
+      freed within the interrupt finish routine.  If it points to a valid
+      `void *` pointer, and if caller is doing the cleanup (see `sync` above),
+      then that pointer will be directed to the `data` member of the interrupt
+      record and it's up to the caller what it's doing with it.
+        - If `sync` is true, interrupt handler can use `datap = NULL`, and
+          fop handler will have `datap` point to a valid pointer.
+        - If `sync` is false, and handlers pass a pointer to a pointer for
+          `datap`, they should check if the pointed pointer is NULL before
+          attempting to deal with the data.
+
+### FUSE answer for the interrupted fop
+
+The kernel acknowledges a successful interruption for a given FUSE request
+if the filesystem daemon answers it with errno EINTR; upon that, the syscall
+which induced the request will be abruptly terminated with an interrupt, rather
+than returning a value.
+
+In glusterfs, this can be arranged in two ways.
+
+- If the interrupt handler wins the race for the interrupt record, ie.
+  `fuse_interrupt_finish_fop()` returns true to `fuse_<FOP>_cbk()`, then, as
+  said above, `fuse_<FOP>_cbk()` does not need to answer the FUSE request.
+  That's because then the interrupt handler will take care about answering
+  it (with errno EINTR).
+- If `fuse_interrupt_finish_fop()` returns false to `fuse_<FOP>_cbk()`, then
+  this return value does not inform the fop handler whether there was an interrupt
+  or not. This return value occurs both when fop handler won the race for the
+  interrupt record against the interrupt handler, and when there was no interrupt
+  at all.
+
+  However, the internal logic of the fop handler might detect from other
+  circumstances that an interrupt was delivered. For example, the fop handler
+  might be sleeping, waiting for some data to arrive, so that a premature
+  wakeup (with no data present) occurs if the interrupt handler intervenes. In
+  such cases it's the responsibility of the fop handler to reply the FUSE
+  request with errro EINTR.