doc/developer-guide/fuse-interrupt.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211

# Fuse interrupt handling

## Conventions followed

- *FUSE* refers to the "wire protocol" between kernel and userspace and
  related specifications.
- *fuse* refers to the kernel subsystem and also to the GlusterFs translator.

## FUSE interrupt handling spec

The [Linux kernel FUSE documentation](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/fuse.txt?h=v4.18#n148)
desrcibes how interrupt handling happens in fuse.

## Interrupt handling in the fuse translator

### Declarations

This document describes the internal API in the fuse translator with which
interrupt can be handled.

The API being internal (to be used only in fuse-bridge.c; the functions are
not exported to a header file).

```
enum fuse_interrupt_state {
    /* ... */
    INTERRUPT_SQUELCHED,
    INTERRUPT_HANDLED,
    /* ... */
};
typedef enum fuse_interrupt_state fuse_interrupt_state_t;
struct fuse_interrupt_record;
typedef struct fuse_interrupt_record fuse_interrupt_record_t;
typedef void (*fuse_interrupt_handler_t)(xlator_t *this,
                                         fuse_interrupt_record_t *);
struct fuse_interrupt_record {
    fuse_in_header_t fuse_in_header;
    void *data;
    /*
       ...
     */
};

fuse_interrupt_record_t *
fuse_interrupt_record_new(fuse_in_header_t *finh,
                          fuse_interrupt_handler_t handler);

void
fuse_interrupt_record_insert(xlator_t *this, fuse_interrupt_record_t *fir);

gf_boolean_t
fuse_interrupt_finish_fop(call_frame_t *frame, xlator_t *this,
                          gf_boolean_t sync, void **datap);

void
fuse_interrupt_finish_interrupt(xlator_t *this, fuse_interrupt_record_t *fir,
                                fuse_interrupt_state_t intstat,
                                gf_boolean_t sync, void **datap);
```

The code demonstrates the usage of the API through `fuse_flush()`. (It's a
dummy implementation only for demonstration purposes.) Flush is chosen
because a `FLUSH` interrupt is easy to trigger (see
*tests/features/interrupt.t*). Interrupt handling for flush is switched on
by `--fuse-flush-handle-interrupt` (a hidden glusterfs command line flag).
The implementation of flush interrupt is contained in the
`fuse_flush_interrupt_handler()` function and blocks guarded by the

```
if (priv->flush_handle_interrupt) { ...
```

conditional (where `priv` is a `*fuse_private_t`).

### Overview

"Regular" fuse fops and interrupt handlers interact via a list containing
interrupt records.

If a fop wishes to have its interrupts handled, it needs to set up an
interrupt record and insert it into the list; also when it's to finish
(ie. in its "cbk" stage) it needs to delete the record from the list.

If no interrupt happens, basically that's all to it - a list insertion
and deletion.

However, if an interrupt comes for the fop, the interrupt FUSE request
will carry the data identifying an ongoing fop (that is, its `unique`),
and based on that, the interrupt record will be looked up in the list, and
the specific interrupt handler (a member of the interrupt record) will be
called.

Usually the fop needs to share some data with the interrupt handler to
enable it to perform its task (also shared via the interrupt record).
The interrupt API offers two approaches to manage shared data:
- _Async or reference-counting strategy_: from the point on when the interrupt
  record is inserted to the list, it's owned jointly by the regular fop and
  the prospective interrupt handler. Both of them need to check before they
  return if the other is still holding a reference; if not, then they are
  responsible for reclaiming the shared data.
- _Sync or borrow strategy_: the interrupt handler is considered a borrower
  of the shared data. The interrupt handler should not reclaim the shared
  data. The fop will wait for the interrupt handler to finish (ie., the borrow
  to be returned), then it has to reclaim the shared data.

The user of the interrupt API need to call the following functions to
instrument this control flow:
- `fuse_interrupt_record_insert()` in the fop to insert the interrupt record to
   the list;
- `fuse_interrupt_finish_fop()`in the fop (cbk) and
- `fuse_interrupt_finish_interrupt()`in the interrupt handler

to perform needed synchronization at the end their tenure. The data management
strategies are implemented by the `fuse_interrupt_finish_*()` functions (which
have an argument to specify which strategy to use); these routines take care
of freeing the interrupt record itself, while the reclamation of the shared data
is left to the API user.

### Usage

A given FUSE fop can be enabled to handle interrupts via the following
steps:

- Define a handler function (of type `fuse_interrupt_handler_t`).
  It should implement the interrupt handling logic and in the end
  call (directly or as async callback) `fuse_interrupt_finish_interrupt()`.
  The `intstat` argument to `fuse_interrupt_finish_interrupt` should be
  either `INTERRUPT_SQUELCHED` or `INTERRUPT_HANDLED`.
    - `INTERRUPT_SQUELCHED` means that the interrupt could not be delivered
      and the fop is going on uninterrupted.
    - `INTERRUPT_HANDLED` means that the interrupt was actually handled. In
      this case the fop will be answered from interrupt context with errno
      `EINTR` (that is, the fop should not send a response to the kernel).

  (the enum `fuse_interrupt_state` includes further members, which are reserved
  for internal use).

  We return to the `sync` and `datap` arguments later.
- In the `fuse_<FOP>` function create an interrupt record using
  `fuse_interrupt_record_new()`, passing the incoming `fuse_in_header` and
  the above handler function to it.
    - Arbitrary further data can be referred to via the `data` member of the
      interrupt record that is to be passed on from fop context to
      interrupt context.
- When it's set up, pass the interrupt record to
  `fuse_interrupt_record_insert()`.
- In `fuse_<FOP>_cbk` call `fuse_interrupt_finish_fop()`.
    - `fuse_interrupt_finish_fop()` returns a Boolean according to whether the
      interrupt was handled. If it was, then the FUSE request is already
      answered and the stack gets destroyed in `fuse_interrupt_finish_fop` so
      `fuse_<FOP>_cbk()` can just return (zero). Otherwise follow the standard
      cbk logic (answer the FUSE request and destroy the stack -- these are
      typically accomplished by `fuse_err_cbk()`).
- The last two argument of `fuse_interrupt_finish_fop()` and
  `fuse_interrupt_finish_interrupt()` are `gf_boolean_t sync` and
  `void **datap`.
    - `sync` represents the strategy for freeing the interrupt record. The
      interrupt handler and the fop handler are in race to get at the interrupt
      record first (interrupt handler for purposes of doing the interrupt
      handling, fop handler for purposes of deactivating the interrupt record
      upon completion of the fop handling).
        - If `sync` is true, then the fop handler will wait for the interrupt
          handler to finish and it takes care of freeing.
        - If `sync` is false, the loser of the above race will perform freeing.

      Freeing is done within the respective interrupt finish routines, except
      for the `data` field of the interrupt record; with respect to that, see
      the discussion of the `datap` parameter below. The strategy has to be
      consensual, that is, `fuse_interrupt_finish_fop()` and
      `fuse_interrupt_finish_interrupt()` must pass the same value for `sync`.
      If dismantling the resources associated with the interrupt record is
      simple, `sync = _gf_false` is the suggested choice; `sync = _gf_true` can
      be useful in the opposite case, when dismantling those resources would
      be inconvenient to implement in two places or to enact in non-fop context.
    - If `datap` is `NULL`, the `data` member of the interrupt record will be
      freed within the interrupt finish routine.  If it points to a valid
      `void *` pointer, and if caller is doing the cleanup (see `sync` above),
      then that pointer will be directed to the `data` member of the interrupt
      record and it's up to the caller what it's doing with it.
        - If `sync` is true, interrupt handler can use `datap = NULL`, and
          fop handler will have `datap` point to a valid pointer.
        - If `sync` is false, and handlers pass a pointer to a pointer for
          `datap`, they should check if the pointed pointer is NULL before
          attempting to deal with the data.

### FUSE answer for the interrupted fop

The kernel acknowledges a successful interruption for a given FUSE request
if the filesystem daemon answers it with errno EINTR; upon that, the syscall
which induced the request will be abruptly terminated with an interrupt, rather
than returning a value.

In glusterfs, this can be arranged in two ways.

- If the interrupt handler wins the race for the interrupt record, ie.
  `fuse_interrupt_finish_fop()` returns true to `fuse_<FOP>_cbk()`, then, as
  said above, `fuse_<FOP>_cbk()` does not need to answer the FUSE request.
  That's because then the interrupt handler will take care about answering
  it (with errno EINTR).
- If `fuse_interrupt_finish_fop()` returns false to `fuse_<FOP>_cbk()`, then
  this return value does not inform the fop handler whether there was an interrupt
  or not. This return value occurs both when fop handler won the race for the
  interrupt record against the interrupt handler, and when there was no interrupt
  at all.

  However, the internal logic of the fop handler might detect from other
  circumstances that an interrupt was delivered. For example, the fop handler
  might be sleeping, waiting for some data to arrive, so that a premature
  wakeup (with no data present) occurs if the interrupt handler intervenes. In
  such cases it's the responsibility of the fop handler to reply the FUSE
  request with errro EINTR.