summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/stat-prefetch-design.txt128
1 files changed, 128 insertions, 0 deletions
diff --git a/doc/stat-prefetch-design.txt b/doc/stat-prefetch-design.txt
new file mode 100644
index 00000000000..65f1b922705
--- /dev/null
+++ b/doc/stat-prefetch-design.txt
@@ -0,0 +1,128 @@
+what is stat-prefetch?
+======================
+It is a translator which caches the dentries read in readdir. This dentry
+list is stored in the context of fd. Later when lookup happens on
+[parent-inode, basename (path)] combination, this list is searched for the
+basename. The dentry thus searched is used to fill up the stat corresponding
+to path being looked upon, thereby short-cutting lookup calls. This cache is
+preserved till closedir is called on the fd. The purpose of this translator
+is to optimize operations like 'ls -l', where a readdir is followed by
+lookup (stat) calls on each directory entry.
+
+1. stat-prefetch harnesses the efficiency of short lookup calls
+ (saves network roundtrip time for lookup calls from being accounted to
+ the stat call).
+2. To maintain the correctness, it does lookup-behind - lookup is winded to
+ underlying translators after it is unwound to upper translators.
+ A lookup-behind is necessary as inode gets populated in server inode table
+ only in lookup-cbk. Also various translators store their contexts in inode
+ contexts during lookup calls.
+
+fops to be implemented:
+======================
+* lookup
+ Check the dentry cache stored in context of fds opened by the same process
+ on parent inode for basename. If found unwind with cached stat, else wind
+ the lookup call to underlying translators. We also store the stat path in
+ context of inode if the path being looked upon happens to be directory.
+ This stat will be used to fill postparent stat when lookup happens on any of
+ the directory contents.
+
+* readdir
+ Cache the direntries returned in readdir_cbk in the context of fd. If the
+ readdir is happening on non-expected offsets (means a seekdir/rewinddir
+ has happened), cache has to be flushed.
+
+* chmod/fchmod
+ Delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since these calls change st_mode and ctime of
+ stat.
+
+* chown/fchown
+ Delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since these calls change st_uid/st_gid and
+ st_ctime of stat.
+
+* truncate/ftruncate
+ Delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since these calls change st_size/st_mtime of stat.
+
+* utimens
+ Delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode, since this call changes st_atime/st_mtime of stat.
+
+* readlink
+ Delete the entry corresponding to basename from cache stored in context of fds
+ opened on parent inode, since this call changes st_atime of stat.
+
+* unlink
+ 1. Delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent directory containing file being unlinked.
+ 2. Delete the entry corresponding to basename of parent directory from cache
+ of its parent directory.
+
+* rmdir
+ 1. Delete the entry corresponding to basename from cache stored in context of
+ fds opened on parent inode.
+ 2. Remove the entire cache from all fds opened on inode corresponding to
+ directory being removed.
+
+* readv
+ Delete the entry corresponding to basename from cache stored in context of fds
+ opened on parent inode, since readv changes st_atime of file.
+
+* writev
+ Delete the entry corresponding to basename from cache stored in context of fds
+ opened on parent inode, since writev can possibly change st_size and definitely
+ changes st_mtime of file.
+
+* fsync
+ There is a confusion here as to whether fsync updates mtime/ctimes. Disk based
+ filesystems (atleast ext2) just writes the times stored in inode to disk
+ during fsync and not the time at which fsync is being done. But in glusterfs,
+ a translator like write-behind actually sends writes during fsync which will
+ change mtime/ctime. Hence stat-prefetch implements fsync to delete the entry
+ corresponding to basename from cache stored in context of fds opened on parent
+ inode.
+
+* rename
+ 1. remove entry corresponding to oldname from cache stored in fd contexts of
+ old parent directory.
+ 2. remove entry corresponding to new parent directory from cache stored in
+ fd contexts of its parent directory.
+
+* create/mknod/mkdir/symlink/link
+ Delete entry corresponding to basename of directory in which these operations
+ are happening, from cache stored in context of fds of parent directory. Note
+ that the parent directory containing the cahce is of the directory in which
+ these operations are happening.
+
+* setxattr/removexattr
+ Delete the entry corresponding to basename from cache stored in context of fds
+ opened on parent inode, since setxattr changes st_ctime of file.
+
+* setdents/getdents/checksum/xattrop/fxattrop
+ These calls modify various times of stat structure, hence appropriate entries
+ have to be removed from the cache. I am leaving these calls unimplemented in
+ stat-prefetch for timebeing. Once we have a working translator, these five fops
+ will be implemented.
+
+callbacks to be implemented:
+=======================
+* releasedir
+ Flush the stat-prefetch cache.
+
+* forget
+ Free the stat if the inode corresponds to a directory.
+
+limitations:
+============
+* since a readdir does not return extended attributes of file, if need_xattr is
+ set, short-cutting of lookup does not happen and lookup is passed to
+ underlying translators.
+
+* posix_readdir does not check whether the dentries are spanning across multiple
+ mount points. Hence it is not transforming inode numbers in stat buffers if
+ posix is configured to allow export directory spanning on multiple mountpoints.
+ This is a bug which needs to be fixed. posix_readdir should treat dentries the
+ same way as if lookup is happening on dentries.