diff options
Diffstat (limited to 'doc/data-structures')
-rw-r--r-- | doc/data-structures/iobuf.md | 259 | ||||
-rw-r--r-- | doc/data-structures/mem-pool.md | 124 |
2 files changed, 0 insertions, 383 deletions
diff --git a/doc/data-structures/iobuf.md b/doc/data-structures/iobuf.md deleted file mode 100644 index 5f521f1485f..00000000000 --- a/doc/data-structures/iobuf.md +++ /dev/null @@ -1,259 +0,0 @@ -#Iobuf-pool -##Datastructures -###iobuf -Short for IO Buffer. It is one allocatable unit for the consumers of the IOBUF -API, each unit hosts @page_size(defined in arena structure) bytes of memory. As -initial step of processing a fop, the IO buffer passed onto GlusterFS by the -other applications (FUSE VFS/ Applications using gfapi) is copied into GlusterFS -space i.e. iobufs. Hence Iobufs are mostly allocated/deallocated in Fuse, gfapi, -protocol xlators, and also in performance xlators to cache the IO buffers etc. -``` -struct iobuf { - union { - struct list_head list; - struct { - struct iobuf *next; - struct iobuf *prev; - }; - }; - struct iobuf_arena *iobuf_arena; - - gf_lock_t lock; /* for ->ptr and ->ref */ - int ref; /* 0 == passive, >0 == active */ - - void *ptr; /* usable memory region by the consumer */ - - void *free_ptr; /* in case of stdalloc, this is the - one to be freed not the *ptr */ -}; -``` - -###iobref -There may be need of multiple iobufs for a single fop, like in vectored read/write. -Hence multiple iobufs(default 16) are encapsulated under one iobref. -``` -struct iobref { - gf_lock_t lock; - int ref; - struct iobuf **iobrefs; /* list of iobufs */ - int alloced; /* 16 by default, grows as required */ - int used; /* number of iobufs added to this iobref */ -}; -``` -###iobuf_arenas -One region of memory MMAPed from the operating system. Each region MMAPs -@arena_size bytes of memory, and hosts @arena_size / @page_size IOBUFs. -The same sized iobufs are grouped into one arena, for sanity of access. - -``` -struct iobuf_arena { - union { - struct list_head list; - struct { - struct iobuf_arena *next; - struct iobuf_arena *prev; - }; - }; - - size_t page_size; /* size of all iobufs in this arena */ - size_t arena_size; /* this is equal to - (iobuf_pool->arena_size / page_size) - * page_size */ - size_t page_count; - - struct iobuf_pool *iobuf_pool; - - void *mem_base; - struct iobuf *iobufs; /* allocated iobufs list */ - - int active_cnt; - struct iobuf active; /* head node iobuf - (unused by itself) */ - int passive_cnt; - struct iobuf passive; /* head node iobuf - (unused by itself) */ - uint64_t alloc_cnt; /* total allocs in this pool */ - int max_active; /* max active buffers at a given time */ -}; - -``` -###iobuf_pool -Pool of Iobufs. As there may be many Io buffers required by the filesystem, -a pool of iobufs are preallocated and kept, if these preallocated ones are -exhausted only then the standard malloc/free is called, thus improving the -performance. Iobuf pool is generally one per process, allocated during -glusterfs_ctx_t init (glusterfs_ctx_defaults_init), currently the preallocated -iobuf pool memory is freed on process exit. Iobuf pool is globally accessible -across GlusterFs, hence iobufs allocated by any xlator can be accessed by any -other xlators(unless iobuf is not passed). -``` -struct iobuf_pool { - pthread_mutex_t mutex; - size_t arena_size; /* size of memory region in - arena */ - size_t default_page_size; /* default size of iobuf */ - - int arena_cnt; - struct list_head arenas[GF_VARIABLE_IOBUF_COUNT]; - /* array of arenas. Each element of the array is a list of arenas - holding iobufs of particular page_size */ - - struct list_head filled[GF_VARIABLE_IOBUF_COUNT]; - /* array of arenas without free iobufs */ - - struct list_head purge[GF_VARIABLE_IOBUF_COUNT]; - /* array of of arenas which can be purged */ - - uint64_t request_misses; /* mostly the requests for higher - value of iobufs */ -}; -``` -~~~ -The default size of the iobuf_pool(as of yet): -1024 iobufs of 128Bytes = 128KB -512 iobufs of 512Bytes = 256KB -512 iobufs of 2KB = 1MB -128 iobufs of 8KB = 1MB -64 iobufs of 32KB = 2MB -32 iobufs of 128KB = 4MB -8 iobufs of 256KB = 2MB -2 iobufs of 1MB = 2MB -Total ~13MB -~~~ -As seen in the datastructure iobuf_pool has 3 arena lists. - -- arenas: -The arenas allocated during iobuf_pool create, are part of this list. This list -also contains arenas that are partially filled i.e. contain few active and few -passive iobufs (passive_cnt !=0, active_cnt!=0 except for initially allocated -arenas). There will be by default 8 arenas of the sizes mentioned above. -- filled: -If all the iobufs in the arena are filled(passive_cnt = 0), the arena is moved -to the filled list. If any of the iobufs from the filled arena is iobuf_put, -then the arena moves back to the 'arenas' list. -- purge: -If there are no active iobufs in the arena(active_cnt = 0), the arena is moved -to purge list. iobuf_put() triggers destruction of the arenas in this list. The -arenas in the purge list are destroyed only if there is atleast one arena in -'arenas' list, that way there won't be spurious mmap/unmap of buffers. -(e.g: If there is an arena (page_size=128KB, count=32) in purge list, this arena -is destroyed(munmap) only if there is an arena in 'arenas' list with page_size=128KB). - -##APIs -###iobuf_get - -``` -struct iobuf *iobuf_get (struct iobuf_pool *iobuf_pool); -``` -Creates a new iobuf of the default page size(128KB hard coded as of yet). -Also takes a reference(increments ref count), hence no need of doing it -explicitly after getting iobuf. - -###iobuf_get2 - -``` -struct iobuf * iobuf_get2 (struct iobuf_pool *iobuf_pool, size_t page_size); -``` -Creates a new iobuf of a specified page size, if page_size=0 default page size -is considered. -``` -if (requested iobuf size > Max iobuf size in the pool(1MB as of yet)) - { - Perform standard allocation(CALLOC) of the requested size and - add it to the list iobuf_pool->arenas[IOBUF_ARENA_MAX_INDEX]. - } - else - { - -Round the page size to match the stndard sizes in iobuf pool. - (eg: if 3KB is requested, it is rounded to 8KB). - -Select the arena list corresponding to the rounded size - (eg: select 8KB arena) - If the selected arena has passive count > 0, then return the - iobuf from this arena, set the counters(passive/active/etc.) - appropriately. - else the arena is full, allocate new arena with rounded size - and standard page numbers and add to the arena list - (eg: 128 iobufs of 8KB is allocated). - } -``` -Also takes a reference(increments ref count), hence no need of doing it -explicitly after getting iobuf. - -###iobuf_ref - -``` -struct iobuf *iobuf_ref (struct iobuf *iobuf); -``` - Take a reference on the iobuf. If using an iobuf allocated by some other -xlator/function/, its a good practice to take a reference so that iobuf is not -deleted by the allocator. - -###iobuf_unref -``` -void iobuf_unref (struct iobuf *iobuf); -``` -Unreference the iobuf, if the ref count is zero iobuf is considered free. - -``` - -Delete the iobuf, if allocated from standard alloc and return. - -set the active/passive count appropriately. - -if passive count > 0 then add the arena to 'arena' list. - -if active count = 0 then add the arena to 'purge' list. -``` -Every iobuf_ref should have a corresponding iobuf_unref, and also every -iobuf_get/2 should have a correspondning iobuf_unref. - -###iobref_new -``` -struct iobref *iobref_new (); -``` -Creates a new iobref structure and returns its pointer. - -###iobref_ref -``` -struct iobref *iobref_ref (struct iobref *iobref); -``` -Take a reference on the iobref. - -###iobref_unref -``` -void iobref_unref (struct iobref *iobref); -``` -Decrements the reference count of the iobref. If the ref count is 0, then unref -all the iobufs(iobuf_unref) in the iobref, and destroy the iobref. - -###iobref_add -``` -int iobref_add (struct iobref *iobref, struct iobuf *iobuf); -``` -Adds the given iobuf into the iobref, it takes a ref on the iobuf before adding -it, hence explicit iobuf_ref is not required if adding to the iobref. - -###iobref_merge -``` -int iobref_merge (struct iobref *to, struct iobref *from); -``` -Adds all the iobufs in the 'from' iobref to the 'to' iobref. Merge will not -cause the delete of the 'from' iobref, therefore it will result in another ref -on all the iobufs added to the 'to' iobref. Hence iobref_unref should be -performed both on 'from' and 'to' iobrefs (performing iobref_unref only on 'to' -will not free the iobufs and may result in leak). - -###iobref_clear -``` -void iobref_clear (struct iobref *iobref); -``` -Unreference all the iobufs in the iobref, and also unref the iobref. - -##Iobuf Leaks -If all iobuf_refs/iobuf_new do not have correspondning iobuf_unref, then the -iobufs are not freed and recurring execution of such code path may lead to huge -memory leaks. The easiest way to identify if a memory leak is caused by iobufs -is to take a statedump. If the statedump shows a lot of filled arenas then it is -a sure sign of leak. Refer doc/debugging/statedump.md for more details. - -If iobufs are leaking, the next step is to find where the iobuf_unref went -missing. There is no standard/easy way of debugging this, code reading and logs -are the only ways. If there is a liberty to reproduce the memory leak at will, -then logs(gf_callinginfo) in iobuf_ref/unref might help. -TODO: A easier way to debug iobuf leaks. diff --git a/doc/data-structures/mem-pool.md b/doc/data-structures/mem-pool.md deleted file mode 100644 index c71aa2a8ddd..00000000000 --- a/doc/data-structures/mem-pool.md +++ /dev/null @@ -1,124 +0,0 @@ -#Mem-pool -##Background -There was a time when every fop in glusterfs used to incur cost of allocations/de-allocations for every stack wind/unwind between xlators because stack/frame/*_localt_t in every wind/unwind was allocated and de-allocated. Because of all these system calls in the fop path there was lot of latency and the worst part is that most of the times the number of frames/stacks active at any time wouldn't cross a threshold. So it was decided that this threshold number of frames/stacks would be allocated in the beginning of the process only once. Get one of them from the pool of stacks/frames whenever `STACK_WIND` is performed and put it back into the pool in `STACK_UNWIND`/`STACK_DESTROY` without incurring any extra system calls. The data structures are allocated only when threshold number of such items are in active use i.e. pool is in complete use.% increase in the performance once this was added to all the common data structures (inode/fd/dict etc) in xlators throughout the stack was tremendous. - -## Data structure -``` -struct mem_pool { - struct list_head list; /*Each member in the mempool is element padded with a doubly-linked-list + ptr of mempool + is-in --use info. This list is used to add the element to the list of free members in the mem-pool*/ - int hot_count;/*number of mempool elements that are in active use*/ - int cold_count;/*number of mempool elements that are not in use. If a new allocation is required it -will be served from here until all the elements in the pool are in use i.e. cold-count becomes 0.*/ - gf_lock_t lock;/*synchronization mechanism*/ - unsigned long padded_sizeof_type;/*Each mempool element is padded with a doubly-linked-list + ptr of mempool + is-in --use info to operate the pool of elements, this size is the element-size after padding*/ - void *pool;/*Starting address of pool*/ - void *pool_end;/*Ending address of pool*/ -/* If an element address is in the range between pool, pool_end addresses then it is alloced from the pool otherwise it is 'calloced' this is very useful for functions like 'mem_put'*/ - int real_sizeof_type;/* size of just the element without any padding*/ - uint64_t alloc_count; /*Number of times this type of data is allocated through out the life of this process. This may include calloced elements as well*/ - uint64_t pool_misses; /*Number of times the element had to be allocated from heap because all elements from the pool are in active use.*/ - int max_alloc; /*Maximum number of elements from the pool in active use at any point in the life of the process. This does *not* include calloced elements*/ - int curr_stdalloc;/*Number of elements that are allocated from heap at the moment because the pool is in completed use. It should be '0' when pool is not in complete use*/ - int max_stdalloc;/*Maximum number of allocations from heap after the pool is completely used that are in active use at any point in the life of the process.*/ - char *name; /*Contains xlator-name:data-type as a string - struct list_head global_list;/*This is used to insert it into the global_list of mempools maintained in 'glusterfs-ctx' -}; -``` - -##Life-cycle -``` -mem_pool_new (data_type, unsigned long count) - -This is a macro which expands to mem_pool_new_fn (sizeof (data_type), count, string-rep-of-data_type) - -struct mem_pool * -mem_pool_new_fn (unsigned long sizeof_type, unsigned long count, char *name) - -Padded-element: - ---------------------------------------- -|list-ptr|mem-pool-address|in-use|Element| - ---------------------------------------- - ``` - -This function allocates the `mem-pool` structure and sets up the pool for use. -`name` parameter above is the `string` containing type of the datatype. This `name` is appended to `xlator-name + ':'` so that it can be easily identified in things like statedump. `count` is the number of elements that need to be allocated. `sizeof_type` is the size of each element. Ideally `('sizeof_type'*'count')` should be the size of the total pool. But to manage the pool using `mem_get`/`mem_put` (will be explained after this section) each element needs to be padded in the front with a `('list', 'mem-pool-address', 'in_use')`. So the actual size of the pool it allocates will be `('padded_sizeof_type'*'count')`. Why these extra elements are needed will be evident after understanding how `mem_get` and `mem_put` are implemented. In this function it just initializes all the `list` structures in front of each element and adds them to the `mem_pool->list` which represent the list of `cold` elements which can be allocated whenever `mem_get` is called on this mem_pool. It remembers mem_pool's start and end addresses in `mem_pool->pool`, `mem_pool->pool_end` respectively. Initializes `mem_pool->cold_count` to `count` and `mem_pool->hot_count` to `0`. This mem-pool will be added to the list of `global_list` maintained in `glusterfs-ctx` - - -``` -void* mem_get (struct mem_pool *mem_pool) - -Initial-list before mem-get ----------------- -| Pool | -| ----------- | ---------------------------------------- ---------------------------------------- -| | pool-list | |<---> |list-ptr|mem-pool-address|in-use|Element|<--->|list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- ---------------------------------------- ----------------- - -list after mem-get from the pool ----------------- -| Pool | -| ----------- | ---------------------------------------- -| | pool-list | |<--->|list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- ----------------- - -List when the pool is full: - ---------------- -| Pool | extra element that is allocated -| ----------- | ---------------------------------------- -| | pool-list | | |list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- - ---------------- -``` - -This function is similar to `malloc()` but it gives memory of type `element` of this pool. When this function is called it increments `mem_pool->alloc_count`, checks if there are any free elements in the pool that can be returned by inspecting `mem_pool->cold_count`. If `mem_pool->cold_count` is non-zero then it means there are elements in the pool which are not in active use. It deletes one element from the list of free elements and decrements `mem_pool->cold_count` and increments `mem_pool->hot_count` to indicate there is one more element in active use. Updates `mem_pool->max_alloc` accordingly. Sets `element->in_use` in the padded memory to `1`. Sets `element->mem_pool` address to this mem_pool also in the padded memory(It is useful for mem_put). Returns the address of the memory after the padded boundary to the caller of this function. In the cases where all the elements in the pool are in active use it `callocs` the element with padded size and sets mem_pool address in the padded memory. To indicate the pool-miss and give useful accounting information of the pool-usage it increments `mem_pool->pool_misses`, `mem_pool->curr_stdalloc`. Updates `mem_pool->max_stdalloc` accordingly. - -``` -void* mem_get0 (struct mem_pool *mem_pool) -``` -Just like `calloc` is to `malloc`, `mem_get0` is to `mem_get`. It memsets the memory to all '0' before returning the element. - - -``` -void mem_put (void *ptr) - -list before mem-put from the pool - ---------------- -| Pool | -| ----------- | ---------------------------------------- -| | pool-list | |<--->|list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- - ---------------- - -list after mem-put to the pool - ---------------- -| Pool | -| ----------- | ---------------------------------------- ---------------------------------------- -| | pool-list | |<---> |list-ptr|mem-pool-address|in-use|Element|<--->|list-ptr|mem-pool-address|in-use|Element| -| ----------- | ---------------------------------------- ---------------------------------------- - ---------------- - -If mem_put is putting an element not from pool then it is just freed so -no change to the pool - ---------------- -| Pool | -| ----------- | -| | pool-list | | -| ----------- | - ---------------- -``` - -This function is similar to `free()`. Remember that ptr passed to this function is the address of the element, so this function gets the ptr to its head of the padding in front of it. If this memory falls in bettween `mem_pool->pool`, `mem_pool->pool_end` then the memory is part of the 'pool' memory that is allocated so it does some sanity checks to see if the memory is indeed head of the element by checking if `in_use` is set to `1`. It resets `in_use` to `0`. It gets the mem_pool address stored in the padded region and adds this element to the list of free elements. Decreases `mem_pool->hot_count` increases `mem_pool->cold_count`. In the case where padded-element address does not fall in the range of `mem_pool->pool`, `mem_pool->pool_end` it just frees the element and decreases `mem_pool->curr_stdalloc`. - -``` -void -mem_pool_destroy (struct mem_pool *pool) -``` -Deletes this pool from the `global_list` maintained by `glusterfs-ctx` and frees all the memory allocated in `mem_pool_new`. - - -###How to pick pool-size -This varies from work-load to work-load. Create the mem-pool with some random size and run the work-load. Take the statedump after the work-load is complete. In the statedump if `max_alloc` is always less than `cold_count` may be reduce the size of the pool closer to `max_alloc`. On the otherhand if there are lots of `pool-misses` then increase the `pool_size` by `max_stdalloc` to achieve better 'hit-rate' of the pool. |