summaryrefslogtreecommitdiffstats
path: root/accepted/Mandatory Locks.md
blob: c8b09a5bbf32d91cd7617e0893104afbc3abd2da (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
Feature
-------
Provide mandatory lock support for Multiprotocol environment.

Summary
-------
POSIX.1 does not specify any scheme for mandatory locking. Whereas Linux kernel
provide support for mandatory locks based on file mode bits which is explained
at <https://www.kernel.org/doc/Documentation/filesystems/mandatory-locking.txt>.
But the proposed feature does not adhere completely to the semantics described
by linux kernel. Instead we enforce core mandatory lock semantics at its byte
range granularity level as detailed below without the help of file mode bits.

Owners
------
Anoop C S               <anoopcs@redhat.com>

Poornima G              <pgurusid@redhat.com>

Raghavendra Talur       <rtalur@redhat.com>

Rajesh Joseph           <rjoseph@redhat.com>

Current status
--------------
As of now we have POSIX locks translator loaded at the server side to handle
advisory type in-memory locks and we have the support for mandatory locks
according to linux kernel semantics. But as we move forward with respect to
integration of GlusterFS with other protocols like NFS or SMB, apart from
default POSIX style advisory locks(or even mandatory locks by linux kernel)
we will have to add support for applications to make use of mandatory locks
independent of file mode bits defined by linux kernel. This is a mandatory
requirement for GlusterFS to cope with Multiprotocol environment.

Related Feature Requests and Bugs
---------------------------------
* BZ 762184  - Support mandatory locking in glusterfs

    <https://bugzilla.redhat.com/show_bug.cgi?id=762184>

* BZ 1194546 - Write behind returns success for a write irrespective of a
                conflicting lock held by another application.

    <https://bugzilla.redhat.com/show_bug.cgi?id=1194546>

* BZ 1287099 - Race between mandatory lock request and ongoing read/write

    <https://bugzilla.redhat.com/show_bug.cgi?id=1287099>

Detailed Description
--------------------
*   By default, mandatory locking will be disabled for a volume. With all
    the patches listed towards the end, 4 modes{'off', 'file', 'forced' and
    'optimal'} are available for the volume set option 'mandatory-locking'
    which will contain the value 'off' at start.

*   Following the POSIX standard, lock requests from all glusterfs native
    clients will be considered as advisory locks. If the cluster is being
    accessed by only native clients, then multiple access to a single file
    can be controlled by setting the group-id bit in its file mode but
    removing the group-execute bit (as per mandatory locks documentation for
    linux kernel). This behaviour is achieved by setting the option
    for mandatory locking to 'file'(implementation not yet done). As a result
    normal fcntl locks on those files whose required bits are set/unset are
    considered to be mandatory in nature and thereby performs byte-range
    conflict check during every data modifying fops.

*   With other protocol clients accessing the cluster, to provide advanced
    protection to files residing in a volume we can use 'forced' mode to ensure
    volume-wide mandatory lock behaviour which will perform a conflicting
    advisory/mandatory byte-range lock check before every read/write/truncate/
    zerofill fop.

*   In a similar environment with different protocol clients accessing the
    cluster, an 'optimal' mode is more suitable based on the following
    semantics:

    * Each read/write/truncate will check for conflicting mandatory locks
    and correspondingly blocks/allows the fop.

    * Locks from POSIX clients will be always advisory and any other POSIX
    client can do read/write/truncate without conflict/overlap byte
    range check.

    * No clients are allowed to read/write into regions where a conflicting/
    overlapping mandatory byte-range lock is being held by another client.

    * Any gfapi client who require a mandatory lock on a particular
    byte-range will have to use the glfs_common_lock() API to do so.

*   Mandatory locks can be of two types namely shared and exclusive with
    semantics similar to the current fcntl locks. The following table explains
    the extra checks made during various fops acting on overlapping byte range
    for a particular file:


            +----------------------------+-------------+----------------+
            | Incoming FOP/Existing LOCK | SHARED_LOCK | EXCLUSIVE_LOCK |
            +----------------------------+-------------+----------------+
            |  READ                      |  Success    |   Wait/Fail    |
            |  WRITE/FTRUNCATE/ZEROFILL  |  Wait/Fail  |   Wait/Fail    |
            |  SHARED_LOCK               |  Granted    |   Wait/Fail    |
            |  EXCLUSIVE_LOCK            |  Wait/Fail  |   Wait/Fail    |
            |  OPEN (with O_TRUNC)       |  Fail       |   Fail         |
            +----------------------------+-------------+----------------+
        * Wait = make the fop wait till the conflicting lock get released. This is
          the default.
        * Fail = return EAGAIN if fd flags contain O_NONBLOCK except in the case of
          OPEN call with O_TRUNC flag where we return EAGAIN without checking
          O_NONBLOCK.

*   Other than normal client initiated file ops, internal fops associated with
    GlusterFS such as self-heal, rebalance etc will have to bypass byte-range
    conflict check for mandatory locks and proceed as normal under optimal and
    forced mandatory locking mode for a volume (See dependencies for more details
    on problems associated with lock-healing and lock-migration w.r.t self-heal
    and rebalance).

Dependencies
------------
1. During disconnect between client and server, locks get cleaned up on server
   side. When it comes back online we does not heal fcntl locks and therefore
   no way to recover locks for that particular brick. Related problems and
   proposed solutions can be reviewed at the following link:

    <https://docs.google.com/document/d/1py5uDvvbbL3piEnuCa_vo37Kq_vZzHhzKgnc2OiGzg8/edit?pli=1#heading=h.2vntu84m9e2j>

2. Race window where a mandatory lock request from a new client on a particular
   byte range overlapping with a write from older client is ongoing in the
   backend, we end up in granting the lock request which will break the
   assumption given to latter. In this scenario we will have to check for
   conflicting inodelk in that particular range for which the write is being
   done by older client. But this check will not satisfy ongoing read use case
   since reads are not associated with inodelk. Instead we can internally take
   byte range locks for fops like read, write etc to ensure mandatory lock
   semantics in a much better way. Following BZ has been created to track the
   issue in future:

    <https://bugzilla.redhat.com/show_bug.cgi?id=1287099>

3. Considering rebalance of files we have to make sure that locks associated
   with a file are also migrated to its new destination. Failure to migrate
   locks may result in undesired access to files under new destination. Proposed
   design can be tracked at the following link:

   <https://github.com/gluster/glusterfs-specs/blob/master/accepted/Lock-Migration.md>

Relevant patches
----------------
* <http://review.gluster.org/#/c/9768/>
* <http://review.gluster.org/#/c/11177/>

Benefit to GlusterFS
--------------------
This feature will allow other protocol clients like SMB or NFS accessing
gluster volumes via libgfapi to make use of mandatory locking semantics
described above.

Scope
-----

#### Nature of proposed change
For implementing this feature we make use of current locks translator to
accomodate necessary changes without the introduction of a new translator. It is
important to note that mandatory lock requests received when it is disabled for
a volume will be stored as it is inside locks translator. But enforcement of
advisory and mandatory lock requests will be done based on the current
mandatory-locking mode.

#### Implications on manageability
New volume set option 'mandatory-locking' will be available accepting the
following values:
* off
* file
* optimal
* forced

Note:- These volume set options are taken into effect only after a subsequent
start/restart of the volume.

#### Implications on presentation layer
New locking API namely glfs_lock() will be exposed to allow applications to
apply for mandatory locks.

#### Implications on persistence layer
None
#### Implications on 'GlusterFS' backend
None
#### Modification to GlusterFS metadata
None
#### Implications on 'glusterd'
None

How To Test
-----------

User Experience
---------------

Dependencies
------------

Documentation
-------------
TBD

Status
------
In development

Comments and Discussion
-----------------------