diff options
Diffstat (limited to 'doc/legacy/hacker-guide')
| -rw-r--r-- | doc/legacy/hacker-guide/adding-fops.txt | 33 | ||||
| -rw-r--r-- | doc/legacy/hacker-guide/bdb.txt | 70 | ||||
| -rw-r--r-- | doc/legacy/hacker-guide/lock-ahead.txt | 80 | ||||
| -rw-r--r-- | doc/legacy/hacker-guide/posix.txt | 59 | ||||
| -rw-r--r-- | doc/legacy/hacker-guide/write-behind.txt | 45 |
5 files changed, 0 insertions, 287 deletions
diff --git a/doc/legacy/hacker-guide/adding-fops.txt b/doc/legacy/hacker-guide/adding-fops.txt deleted file mode 100644 index e70dbbdc8..000000000 --- a/doc/legacy/hacker-guide/adding-fops.txt +++ /dev/null @@ -1,33 +0,0 @@ - HOW TO ADD A NEW FOP TO GlusterFS - ================================= - -Steps to be followed when adding a new FOP to GlusterFS: - -1. Edit glusterfs.h and add a GF_FOP_* constant. - -2. Edit xlator.[ch] and: - 2a. add the new prototype for fop and callback. - 2b. edit xlator_fops structure. - -3. Edit xlator.c and add to fill_defaults. - -4. Edit protocol.h and add struct necessary for the new FOP. - -5. Edit defaults.[ch] and provide default implementation. - -6. Edit call-stub.[ch] and provide stub implementation. - -7. Edit common-utils.c and add to gf_global_variable_init(). - -8. Edit client-protocol and add your FOP. - -9. Edit server-protocol and add your FOP. - -10. Implement your FOP in any translator for which the default implementation - is not sufficient. - -========================================== -Last updated: Mon Oct 27 21:35:49 IST 2008 - -Author: Vikas Gorur <vikas@gluster.com> -========================================== diff --git a/doc/legacy/hacker-guide/bdb.txt b/doc/legacy/hacker-guide/bdb.txt deleted file mode 100644 index 1a80be813..000000000 --- a/doc/legacy/hacker-guide/bdb.txt +++ /dev/null @@ -1,70 +0,0 @@ - -* How does file translates to key/value pair? ---------------------------------------------- - - in bdb a file is identified by key (obtained by taking basename() of the path of -the file) and file contents are stored as value corresponding to the key in database -file (defaults to glusterfs_storage.db under dirname() directory). - -* symlinks, directories ------------------------ - - symlinks and directories are stored as is. - -* db (database) files ---------------------- - - every directory, including root directory, contains a database file called -glusterfs_storage.db. all the regular files contained in the directory are stored -as key/value pair inside the glusterfs_storage.db. - -* internal data cache ---------------------- - - db does not provide a way to find out the size of the value corresponding to a key. -so, bdb makes DB->get() call for key and takes the length of the value returned. -since DB->get() also returns file contents for key, bdb maintains an internal cache and -stores the file contents in the cache. - every directory maintains a seperate cache. - -* inode number transformation ------------------------------ - - bdb allocates a inode number to each file and directory on its own. bdb maintains a -global counter and increments it after allocating inode number for each file -(regular, symlink or directory). NOTE: bdb does not guarantee persistent inode numbers. - -* checkpoint thread -------------------- - - bdb creates a checkpoint thread at the time of init(). checkpoint thread does a -periodic checkpoint on the DB_ENV. checkpoint is the mechanism, provided by db, to -forcefully commit the logged transactions to the storage. - -NOTES ABOUT FOPS: ------------------ - -lookup() - - 1> do lstat() on the path, if lstat fails, we assume that the file being looked up - is either a regular file or doesn't exist. - 2> lookup in the DB of parent directory for key corresponding to path. if key exists, - return key, with. - NOTE: 'struct stat' stat()ed from DB file is used as a container for 'struct stat' - of the regular file. st_ino, st_size, st_blocks are updated with file's values. - -readv() - - 1> do a lookup in bctx cache. if successful, return the requested data from cache. - 2> if cache missed, do a DB->get() the entire file content and insert to cache. - -writev(): - 1> flush any cached content of this file. - 2> do a DB->put(), with DB_DBT_PARTIAL flag. - NOTE: DB_DBT_PARTIAL is used to do partial update of a value in DB. - -readdir(): - 1> regular readdir() in a loop, and vomit all DB_ENV log files and DB files that - we encounter. - 2> if the readdir() buffer still has space, open a DB cursor and do a sequential - DBC->get() to fill the reaadir buffer. - - diff --git a/doc/legacy/hacker-guide/lock-ahead.txt b/doc/legacy/hacker-guide/lock-ahead.txt deleted file mode 100644 index 70aa452d3..000000000 --- a/doc/legacy/hacker-guide/lock-ahead.txt +++ /dev/null @@ -1,80 +0,0 @@ - Lock-ahead translator - --------------------- - -The objective of the lock-ahead translator is to speculatively -hold locks (inodelk and entrylk) on the universal set (0 - infinity -in case of inodelk and all basenames in case of entrylk) even -when a lock is requested only on a subset, in anticipation that -further locks will be requested within the same universal set. - -So, for example, when cluster/replicate locks a region before -writing to it, lock-ahead would instead lock the entire file. -On further writes, lock-ahead can immediately return success for -the lock requests, since the entire file has been previously locked. - -To avoid starvation of other clients/mountpoints, we employ a -notify mechanism, described below. - -typedef struct { - struct list_head subset_locks; -} la_universal_lock_t; - -Universal lock structure is stored in the inode context. - -typedef struct { - enum {LOCK_AHEAD_ENTRYLK, LOCK_AHEAD_FENTRYLK, - LOCK_AHEAD_INODELK, LOCK_AHEAD_FINODELK}; - - union { - fd_t *fd; - loc_t loc; - }; - - off_t l_start; - off_t l_len; - - const char *basename; - - struct list_head universal_lock; -} la_subset_lock_t; - - -fops implemented: - -* inodelk/finodelk/entrylk/fentrylk: - -lock: - if universal lock held: - add subset to it (save loc_t or fd) and return success - else: - send lock-notify fop - hold universal lock and return - (set inode context, add subset to it, save loc_t or fd) - - if this fails: - forward the lock request - -unlock: - if subset exists in universal lock: - delete subset lock from list - else: - forward it - -* release: - hold subset locks (each subset lock using the saved loc_t or fd) - and release universal lock - -* lock-notify (on unwind) (new fop) - hold subset locks and release universal lock - - -lock-notify in locks translator: - -if a subset lock in entrylk/inodelk cannot be satisfied -because of a universal lock held by someone else: - unwind the lock-notify fop - -============================================== -$ Last updated: Tue Feb 17 11:31:18 IST 2009 $ -$ Author: Vikas Gorur <vikas@gluster.com> $ -============================================== diff --git a/doc/legacy/hacker-guide/posix.txt b/doc/legacy/hacker-guide/posix.txt deleted file mode 100644 index 7958af2ea..000000000 --- a/doc/legacy/hacker-guide/posix.txt +++ /dev/null @@ -1,59 +0,0 @@ ---------------- -* storage/posix ---------------- - -- SET_FS_ID - - This is so that all filesystem checks are done with the user's - uid/gid and not GlusterFS's uid/gid. - -- MAKE_REAL_PATH - - This macro concatenates the base directory of the posix volume - ('option directory') with the given path. - -- need_xattr in lookup - - If this flag is passed, lookup returns a xattr dictionary that contains - the file's create time, the file's contents, and the version number - of the file. - - This is a hack to increase small file performance. If an application - wants to read a small file, it can finish its job with just a lookup - call instead of a lookup followed by read. - -- getdents/setdents - - These are used by unify to set and get directory entries. - -- ALIGN_BUF - - Macro to align an address to a page boundary (4K). - -- priv->export_statfs - - In some cases, two exported volumes may reside on the same - partition on the server. Sending statvfs info for both - the volumes will lead to erroneous df output at the client, - since free space on the partition will be counted twice. - - In such cases, user can disable exporting statvfs info - on one of the volumes by setting this option. - -- xattrop - - This fop is used by replicate to set version numbers on files. - -- getxattr/setxattr hack to read/write files - - A key, GLUSTERFS_FILE_CONTENT_STRING, is handled in a special way by - getxattr/setxattr. A getxattr with the key will return the entire - content of the file as the value. A setxattr with the key will write - the value as the entire content of the file. - -- posix_checksum - - This calculates a simple XOR checksum on all entry names in a - directory that is used by unify to compare directory contents. - - diff --git a/doc/legacy/hacker-guide/write-behind.txt b/doc/legacy/hacker-guide/write-behind.txt deleted file mode 100644 index 50b7d2a1d..000000000 --- a/doc/legacy/hacker-guide/write-behind.txt +++ /dev/null @@ -1,45 +0,0 @@ -basic working --------------- - - write behind is basically a translator to lie to the application that the write-requests are finished, even before it is actually finished. - - on a regular translator tree without write-behind, control flow is like this: - - 1. application makes a write() system call. - 2. VFS ==> FUSE ==> /dev/fuse. - 3. fuse-bridge initiates a glusterfs writev() call. - 4. writev() is STACK_WIND()ed upto client-protocol or storage translator. - 5. client-protocol, on receiving reply from server, starts STACK_UNWIND() towards the fuse-bridge. - - on a translator tree with write-behind, control flow is like this: - - 1. application makes a write() system call. - 2. VFS ==> FUSE ==> /dev/fuse. - 3. fuse-bridge initiates a glusterfs writev() call. - 4. writev() is STACK_WIND()ed upto write-behind translator. - 5. write-behind adds the write buffer to its internal queue and does a STACK_UNWIND() towards the fuse-bridge. - - write call is completed in application's percepective. after STACK_UNWIND()ing towards the fuse-bridge, write-behind initiates a fresh writev() call to its child translator, whose replies will be consumed by write-behind itself. write-behind _doesn't_ cache the write buffer, unless 'option flush-behind on' is specified in volume specification file. - -windowing ---------- - - write respect to write-behind, each write-buffer has three flags: 'stack_wound', 'write_behind' and 'got_reply'. - - stack_wound: if set, indicates that write-behind has initiated STACK_WIND() towards child translator. - - write_behind: if set, indicates that write-behind has done STACK_UNWIND() towards fuse-bridge. - - got_reply: if set, indicates that write-behind has received reply from child translator for a writev() STACK_WIND(). a request will be destroyed by write-behind only if this flag is set. - - currently pending write requests = aggregate size of requests with write_behind = 1 and got_reply = 0. - - window size limits the aggregate size of currently pending write requests. once the pending requests' size has reached the window size, write-behind blocks writev() calls from fuse-bridge. - blocking is only from application's perspective. write-behind does STACK_WIND() to child translator straight-away, but hold behind the STACK_UNWIND() towards fuse-bridge. STACK_UNWIND() is done only once write-behind gets enough replies to accomodate for currently blocked request. - -flush behind ------------- - - if 'option flush-behind on' is specified in volume specification file, then write-behind sends aggregate write requests to child translator, instead of regular per request STACK_WIND()s. - - |
