summaryrefslogtreecommitdiffstats
path: root/doc/stat-prefetch-design.txt
blob: 06d0ad37e7d340c5666c06bc8df600475324d08b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
what is stat-prefetch?
======================
It is a translator which caches the dentries read in readdir. This dentry
list is stored in the context of fd. Later when lookup happens on 
[parent-inode, basename (path)] combination, this list is searched for the
basename. The dentry thus searched is used to fill up the stat corresponding
to path being looked upon, thereby short-cutting lookup calls. This cache is
preserved till closedir is called on the fd. The purpose of this translator 
is to optimize operations like 'ls -l', where a readdir is followed by 
lookup (stat) calls on each directory entry.

1. stat-prefetch harnesses the efficiency of short lookup calls 
   (saves network roundtrip time for lookup calls from being accounted to 
   the stat call).
2. To maintain the correctness, it does lookup-behind - lookup is winded to 
   underlying translators after it is unwound to upper translators. 
   lookup-behind is necessary as inode gets populated in server inode table
   only in lookup-cbk and also because various translators store their 
   contexts in inode contexts during lookup calls.

fops to be implemented:
=======================
* lookup
  1. check the dentry cache stored in context of fds opened by the same process 
     on parent inode for basename. If found unwind with cached stat, else wind
     the lookup call to underlying translators. 
  2. stat is stored in the context of inode if the path being looked upon 
     happens to be directory. This stat will be used to fill postparent stat
     when lookup happens on any of the directory contents.

* readdir
  1. cache the direntries returned in readdir_cbk in the context of fd.
  2. if the readdir is happening on non-expected offsets (means a seekdir/rewinddir 
     has happened), cache has to be flushed.
  3. delete the entry corresponding to basename of path on which fd is opened 
     from cache stored in parent.

* chmod/fchmod
  delete the entry corresponding to basename from cache stored in context of
  fds opened on parent inode, since these calls change st_mode and st_ctime of 
  stat.
 
* chown/fchown
  delete the entry corresponding to basename from cache stored in context of 
  fds opened on parent inode, since these calls change st_uid/st_gid and 
  st_ctime of stat.

* truncate/ftruncate
  delete the entry corresponding to basename from cache stored in context of 
  fds opened on parent inode, since these calls change st_size/st_mtime of stat.

* utimens
  delete the entry corresponding to basename from cache stored in context of 
  fds opened on parent inode, since this call changes st_atime/st_mtime of stat.

* readlink
  delete the entry corresponding to basename from cache stored in context of fds
  opened on parent inode, since this call changes st_atime of stat.
 
* unlink
  1. delete the entry corresponding to basename from cache stored in context of 
     fds, opened on parent directory containing the file being unlinked.
  2. delete the entry corresponding to basename of parent directory from cache
     of grand-parent.

* rmdir
  1. delete the entry corresponding to basename from cache stored in context of
     fds opened on parent inode.
  2. remove the entire cache from all fds opened on inode corresponding to 
     directory being removed.
  3. delete the entry correspondig to basename of parent from cache stored in
     grand-parent.

* readv
  delete the entry corresponding to basename from cache stored in context of fds
  opened on parent inode, since readv changes st_atime of file. 

* writev
  delete the entry corresponding to basename from cache stored in context of fds
  opened on parent inode, since writev can possibly change st_size and definitely
  changes st_mtime of file.

* fsync
  there is a confusion here as to whether fsync updates mtime/ctimes. Disk based
  filesystems (atleast ext2) just writes the times stored in inode to disk 
  during fsync and not the time at which fsync is being done. But in glusterfs, 
  a translator like write-behind actually sends writes during fsync which will 
  change mtime/ctime. Hence stat-prefetch implements fsync to delete the entry 
  corresponding to basename from cache stored in context of fds opened on parent
  inode.
 
* rename
  1. remove entry corresponding to oldname from cache stored in fd contexts of 
     oldparent.
  2. remove entry corresponding to newname from cache stored in fd contexts of
     newparent. 
  3. remove entry corresponding to oldparent from cache stored in 
     old-grand-parent, since removing oldname changes st_mtime and st_ctime
     of oldparent stat.
  4. remove entry corresponding to newparent from cache stored in 
     new-grand-parent, since adding newname changes st_mtime and st_ctime
     of newparent stat.
  5. if oldname happens to be a directory, remove entire cache from all fds 
     opened on it.

* create/mknod/mkdir/symlink/link
  delete entry corresponding to basename of parent directory in which these 
  operations are happening, from cache stored in context of fds opened on
  grand-parent, since adding a new entry to a directory changes st_mtime
  and st_ctime of parent directory.

* setxattr/removexattr
  delete the entry corresponding to basename from cache stored in context of
  fds opened on parent inode, since setxattr changes st_ctime of file.

* setdents
  1. remove entry corresponding to basename of path on which fd is opened from
     cache stored in context of fds opened on parent.
  2. for each of the entry in the direntry list, delete from cache stored in 
     context of fd, the entry corresponding to basename of path being passed.

* getdents
  1. remove entry corresponding to basename of path on which fd is opened from
     cache stored in parent, since getdents changes st_atime. 
  2. remove entries corresponding to symbolic links from cache, since readlink 
     would've changed st_atime.

* checksum
  delete the entry corresponding to basename from cache stored in context of
  fds opened on parent inode, since st_atime is changed during this call.

* xattrop/fxattrop
  delete the entry corresponding to basename from cache stored in context of fds
  opened on parent inode, since these calls modify st_ctime of file.

callbacks to be implemented:
============================
* releasedir
  free the context stored in fd.

* forget
  dree the stat if the inode corresponds to a directory.

limitations:
============
* since a readdir does not return extended attributes of file, if need_xattr is
  set, short-cutting of lookup does not happen and lookup is passed to 
  underlying translators.

* posix_readdir does not check whether the dentries are spanning across multiple
  mount points. Hence it is not transforming inode numbers in stat buffers if 
  posix is configured to allow export directory spanning on multiple mountpoints.
  This is a bug which needs to be fixed. posix_readdir should treat dentries the 
  same way as if lookup is happening on dentries.