summaryrefslogtreecommitdiffstats
path: root/xlators/performance
Commit message (Collapse)AuthorAgeFilesLines
* nl-cache: Fix a possible crash and stale cachePoornima G2017-06-133-48/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue1: Consider the followinf sequence of operations: ... nlc_ctx = nlc_ctx_get (inode i1) ....... -> nlc_clear_cache (i1) gets called as a part of nlc_invalidate or any other callers ... GF_FREE (ii nlc_ctx) LOCK (nlc_ctx->lock); -> This will result in crash as the ctx got freed in nlc_clear_cache. Issue2: lookup on dir1/file1 result in ENOENT add cache to dir1 at time T1 .... CHILD_DOWN at T2 lookup on dir1/file2 result in ENOENT add cache to dir1, but the cache time is still T1 lookup on dir1/file2 - should have been served from cache but the cache time is T1 < T2, hence cache is considered as invalid. So, after CHILD_DOWN the right thing would be to clear the cache and restart caching on that inode. Solution: Do not free nlc_ctx in nlc_clear_cache, but only in inode_forget() The fix for both issue1 and 2 is interleaved hence sending it as single patch. Change-Id: I83d8ed36c049a93567c6d7e63d045dc14ccbb397 BUG: 1458539 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17453 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* readdir-ahead: Fix duplicate listing and cache size calculationPoornima G2017-06-121-19/+16
| | | | | | | | | | | | | | | | | | | | | | | Issue: If a opendir is followed by a closedir without readdir, though the prefetched entries were freed, the freed size was not accounted in priv->rda_cache_size. Thus the cache limit will exceed if there are multiple opendir followed by closedir. Fix: Fix the pric->rda_cache_size calculation. Also have removed the inode_ctx_size. Each perf xlator has its own cache limit that it works with. Also the inode_ctx size can change, if a forget/ invalidate or any other factor triggers the inode_ctx size. Change-Id: I9707ec558076ce046e58a55989ec9513c70ea029 BUG: 1431908 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17504 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Fix the dump of stat inode in .meta and statedumpPoornima G2017-06-121-8/+8
| | | | | | | | | | Change-Id: If61ed5e4462e98d18a1370734a0bcee1ed94d82d Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17491 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: fix spelling errorsKaleb S. KEITHLEY2017-06-021-1/+1
| | | | | | | | | | | | | | fixes for various minor spelling errors and typos Reported-by: Patrick Matthäi <pmatthaei@debian.org> Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5 BUG: 1457808 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17442 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterfs: Not able to mount running volume after enable brick mux and ↵Mohit Agrawal2017-05-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | stopped any volume Problem: After enabled brick mux if any volume has down and then try ot run mount with running volume , mount command is hung. Solution: After enable brick mux server has shared one data structure server_conf for all associated subvolumes.After down any subvolume in some ungraceful manner (remove brick directory) posix xlator sends GF_EVENT_CHILD_DOWN event to parent xlatros and server notify updates the child_up to false in server_conf.When client is trying to communicate with server through mount it checks conf->child_up and it is FALSE so it throws message "translator are not yet ready". From this patch updated structure server_conf to save child_up status for xlator wise. Another improtant correction from this patch is cleanup threads from server side xlators after stop the volume. BUG: 1453977 Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17356 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* perf/ioc: Fix race causing crash when accessing freed pageN Balachandran2017-05-301-38/+40
| | | | | | | | | | | | | | | | | | | | | | ioc_inode_wakeup does not lock the ioc_inode for the duration of the operation, leaving a window where ioc_prune could find a NULL waitq and hence free the page which ioc_inode_wakeup later tries to access. Thanks to Mohit for the analysis. credit: moagrawa@redhat.com Change-Id: I54b064857e2694826d0c03b23f8014e3984a3330 BUG: 1456385 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17410 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* nl-cache: Remove null check validation for frame->local in lookup cbkRavishankar N2017-05-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | For nameless lookups, nl-cache does not init frame local, so the cbk throws up messages like these flooding the logs, especially whenenver gfid lookup on '/' is done (i.e. loc.path="/" and loc.gfid=1). [2017-05-30 04:35:31.628443] E [nl-cache.c:201:nlc_lookup_cbk] (-->/usr/lib64/glusterfs/3.8.4/xlator/performance/io-cache.so(+0x3d81) [0x7f0883005d81] -->/usr/lib64/glusterfs/3.8.4/xlator/performance/quick-read.so(+0x3127) [0x7f0882dfb127] -->/usr/lib64/glusterfs/3.8.4/xlator/performance/nl-cache.so(+0x4cd3) [0x7f08829e0cd3] ) 0-distrep-nl-cache: invalid argument: local [Invalid argument] Fixed it. Change-Id: I21cb44a9d2a324617e43f46fed83c9a0942d3a0b BUG: 1456653 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17417 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* nl-cache: In case of nameless operations do not cachePoornima G2017-05-221-4/+7
| | | | | | | | | | | | | | | | | | Issue: In nameless lookup/other fops, parent inode will be NULL, when we try to add the cache to the NULL inode, it causes a crash. Hence handle the scenario of nameless fops, and do not cache/serve the nameless fops. Change-Id: I3b90f882ac89e6aaf3419db89e6f890797f37700 BUG: 1451588 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17316 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rda, glusterd: Change the max of rda-cache-limit to INFINITYPoornima G2017-05-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The max value of rda-cache-limit is 1GB before this patch. When parallel-readdir is enabled, there will be many instances of readdir-ahead, hence the rda-cache-limit depends on the number of instances. Eg: On a volume with distribute count 4, rda-cache-limit when parallel-readdir is enabled, will be 4GB instead of 1GB. Consider a followinf sequence of operations: - Enable parallel readdir - Set rda-cache-limit to lets say 3GB - Disable parallel-readdir, this results in one instance of readdir-ahead and the rda-cache-limit will be back to 1GB, but the current value is 3GB and hence the mount will stop working as 3GB > max 1GB. Solution: To fix this, we can limit the cache to 1GB even when parallel-readdir is enabled. But there is no necessity to limit the cache to 1GB, it can be increased if the system has enough resources. Hence getting rid of the rda-cache-limit max value is more apt. If we just change the rda-cache-limit max to INFINITY, we will render older(<3.11) clients broken, when the rda-cache-limit is set to > 1GB (as the older clients still expect a value < 1GB). To safely change the max value of rda-cache-limit to INFINITY, add a check in glusted to verify all the clients are > 3.11 if the value exceeds 1GB. Change-Id: Id0cdda3b053287b659c7bf511b13db2e45b92032 BUG: 1446516 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17338 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* nl-cache: Remove the max limit for nl-cache-limit and nl-cache-timeoutPoornima G2017-05-151-2/+0
| | | | | | | | | | | | | | The max limit is better unset when arbitrary. Otherwise in the future if max has to be changed, it can break backward compatility. Change-Id: I4337a3789a2d0d5cc8e2bf687a22536c97608461 BUG: 1442569 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17261 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* libglusterfs: fix race condition in client_ctx_setZhou Zhengping2017-05-121-2/+4
| | | | | | | | | | | | | | | | | | | follow procedures: 1.thread1 client_ctx_get return NULL 2.thread 2 client_ctx_set ctx1 ok 3.thread1 client_ctx_set ctx2 ok thread1 use ctx1, thread2 use ctx2 and ctx1 will leak Change-Id: I990b02905edd1b3179323ada56888f852d20f538 BUG: 1449232 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17219 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* performance/read-ahead: prevent stale data being returned to application.Raghavendra G2017-05-091-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assume that fd is shared by two application threads/processes. T0 read is triggered from app-thread t1 and read call passes through write-behind. T1 app-thread t2 issues a write. The page on which read from t1 is waiting is marked stale T2 write-behind caches write and indicates to application as write complete. T3 app-thread t2 issues read to same region. Since, there is already a page for that region (created as part of read at T0), this read request waits on that page to be filled (though it is stale, which is a bug). T4 read (triggered at T0) completes from brick (with write still pending). Now both read requests from t1 and t2 are served this data (though data is stale from app-thread t2's perspective - which is a bug) T5 write is flushed to brick by write-behind. Fix is to not to serve data from a stale page, but instead initiate a fresh read to back-end. Change-Id: Id6af733464fa41bb4e81fd29c7451c73d06453fb BUG: 1414242 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/7447 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Csaba Henk <csaba@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: stop special casing "cache-size" in size_t validationCsaba Henk2017-05-082-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original situation was as follows: The function that validates xlator options indicating a size, xlator_option_validate_sizet(), handles the case when the name of the option is "cache-size" in a special way. - Xlator options (things of type volume_option_t) has a min and max attribute of type double. - An xlator option is endowed with a gluster specific type (not C type). An instance of an xlator option goes through a validation process by a type specific validator function (which are collected in option.c). - Validators of numeric types - size being one of them - make use the min and max attributes to perform a range check, except in one case: if an option is defined with min = max = 0, then this option will be exempt of range checking. (Note: the volume_option_t definition features the following comments along the min, max fields: double min; /* 0 means no range */ double max; /* 0 means no range */ which is slightly misleading as it lets one to conclude that zeroing min or max buys exemption from low or high boundary check, which is not true -- only *both* being zero buys exemption.) - Besides this, the validator for options of size type, xlator_option_validate_sizet() special cases options named "cache-size" so that only min is enforced. (The only consequence of a value exceeding max is that glusterd logs a warning about it, but the cli user who makes such a setting gets no feedback on it.) - This was introduced because a hard coded limit is not useful for io-cache and quick-read. They rather use a runtime calculated upper limit. (See changes I7dd4d8c53051b89a293696abf1ee8dc237e39a20 I9c744b5ace10604d5a814e6218ca0d83c796db80 about the last two points.) - As an unintended consequence, the upper limit check of cache-size of write-behind, for which a conventional hard coded limit is specified, is defeated. What we do about it: - Remove the special casing clause for cache-size in xlator_option_validate_sizet. Thus the general range check policy (as described above) will apply to cache-size too. - To implement a lower bound only check by the validator for cache-size of io-cache and quick-read, change the max attribute of these options to INFINITY. The only behavioral difference is the omission of the warnings about cache-size of io-cache and quick-read exceeding the former max values. (They were rather heuristic anyway.) BUG: 1445609 Change-Id: I0bd8bd391fa7d926f76e214a2178833fe4673b4a Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17125 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* nl-cache: free nlc_conf_t in fini()Niels de Vos2017-05-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The (xlator_t*)->private structure in negative-lookup-cache is allocated in the init() function of the xlator, but never free'd. Valgrind detected this as: 656 bytes in 1 blocks are definitely lost in loss record X of Y at 0x..+ calloc (/builddir/build/BUILD/valgrind-3.11.0/coregrind/m_replacemalloc/vg_replace_malloc.c:711) by 0x.. __gf_calloc (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/mem-pool.c:117) by 0x.. init (/usr/src/debug/glusterfs-3.11dev/xlators/performance/nl-cache/src/nl-cache.c:669) by 0x.. __xlator_init (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/xlator.c:472) by 0x.. xlator_init (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/xlator.c:498) by 0x.. glusterfs_graph_init (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/graph.c:321) by 0x.. glusterfs_graph_activate (/usr/src/debug/glusterfs-3.11dev/libglusterfs/src/graph.c:693) by 0x.. glfs_process_volfp (/usr/src/debug/glusterfs-3.11dev/api/src/glfs-mgmt.c:79) by 0x.. glfs_volumes_init (/usr/src/debug/glusterfs-3.11dev/api/src/glfs.c:160) by 0x.. glfs_init_common (/usr/src/debug/glusterfs-3.11dev/api/src/glfs.c:868) by 0x.. glfs_init@@GFAPI_3.4.0 (/usr/src/debug/glusterfs-3.11dev/api/src/glfs.c:913) by 0x.. main (/root/gluster-debug/gfapi-load-volfile/gfapi-load-volfile.c:54) When the xlators is unloaded, it should free the resources it allocated. This can easily be done in the fini() function. Change-Id: I079e78cc207145bc542e2282fc4cf2bb4dadc28a BUG: 1442569 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/17143 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* core: make the per glusterfs_ctx_t timer-wheel refcountedNiels de Vos2017-05-011-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlators can use a 'global' timer-wheel for scheduling events. This timer-wheel is managed per glusterfs_ctx_t, but does not need to be allocated for every graph. When an xlator wants to use the timer-wheel, it will be instanciated on demand, and provided to xlators that request it later on. By adding a reference counter to the glusterfs_ctx_t for the timer-wheel, the threads and structures can be cleaned up when the last xlator does not have a need for it anymore. In general, the xlators request the timer-wheel in init(), and they should return it in fini(). Because the timer-wheel is managed per glusterfs_ctx_t, the functions can be added to ctx.c and do not need to live in their very minimal tw.[ch] files. Change-Id: I19d225b39aaa272d9005ba7adc3104c3764f1572 BUG: 1442788 Reported-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/17068 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* nl-cache: Fix the issue in refd_inode counting and prune the cachePoornima G2017-04-263-6/+13
| | | | | | | | | | | | | Change-Id: I5b9beb8502667bc3876385900bc01b6491348716 BUG: 1442569 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17110 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Implement negative lookup cachePoornima G2017-04-208-1/+2169
| | | | | | | | | | | | | | | | | | | | | Before creating any file negative lookups(1 in Fuse, 4 in SMB etc.) are sent to verify if the file already exists. By serving these lookups from the cache when possible, increases the create performance by multiple folds in SMB access and some percentage in Fuse/NFS access. Feature page: https://review.gluster.org/#/c/16436 Updates #82 Change-Id: Ib1c0e7ac7a386f943d84f6398c27f9a03665b2a4 BUG: 1442569 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16952 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* dht: The xattrs sent in readdirp should be sent in opendir aswellPoornima G2017-04-062-33/+16
| | | | | | | | | | | | | | As readdir-ahead can be loaded as a child of dht, dht has to specify the xattrs it is intrested in, as part of opendir call itself. Change-Id: I012ef96cc143b0cef942df78aa7150d85ec38606 BUG: 1431908 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16902 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs: provide standardized atomic operationsNiels de Vos2017-04-051-48/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The current macros ATOMIC_INCREMENT() and ATOMIC_DECREMENT() expect a lock as first argument. There are at least two issues with this approach: 1. this lock is unused on architectures that have atomic operations 2. some structures use a single lock for multiple variables By defining a gf_atomic_t type, the unused lock can be removed, saving a few bytes on modern architectures. Because the gf_atomic_t type locates the lock for the variable (in case of older architectures), each variable is protected the same on all architectures. This makes the behaviour across all architectures more equal (per variable locking, by a gf_lock_t or compiler optimization). BUG: 1437037 Change-Id: Ic164892b06ea676e6a9566f8a98b7faf0efe76d6 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16963 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* reddir-ahead: Fix EOD propagation problemPoornima G2017-04-051-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In readdirp fop, op_errno is overloaded to indicate the EOD detection. If op_errno contains ENOENT, then it indicates that there are no further entries pending read in the directory. Currently NFS uses the ENOENT to identify the EOD. Issue: NFS clients issues a 4K buffer for readdirp, readdir-ahead converts it to 128K buffer as its reading ahead. If there are 100 entries in the bricks, 128K can get all 100 and store in readdir-ahead, but only 23 entries that can be fit in 4K will be sent to NFS. Since the whole 100 entries were read from brick, the op_errno is set to ENOENT, and the op_errno is propagated as is when sent to NFS. Hence NFS client in reading 23 entries thinks it reached EOD. Solution: Do not propogate ENOENT errno, unless all the entries are read from the readdir ahead buffer. Change-Id: I4f173a77b21ab9e98ae35e291a45b8fc0cde65bd BUG: 1436086 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16953 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* build: miscellaneous spelling fixesPatrick Matthäi2017-04-021-1/+1
| | | | | | | | | | | | | | | | Debian builds detected spelling issues with GlusterFS 3.10.1. Instead of carrying the patch in the Debian sources, let's include the fixes here too. Change-Id: I38db6adf142f7ec247bffd47aa1e6ff1a0c49e00 Reported-by: Patrick Matthäi <pmatthaei@debian.org> BUG: 1437853 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16973 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/write-behind: Honor the client pid setKotresh HR2017-03-101-0/+3
| | | | | | | | | | | | | | | | write-behind xlator does not honor the client pid being set. It doesn't pass down the client pid saved in 'frame->root->pid'. This patch fixes the same. Change-Id: I838dcf43f56d6d0aa1d2c88811a2b271d9e88d05 BUG: 1430608 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16854 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* events: use attribute(format(/printf)) to catch fmt string errorsKaleb S. KEITHLEY2017-02-261-1/+1
| | | | | | | | | | | | | and statedump too. Also "const char *" (versus just "char *") for the fmt param. Change-Id: Ic63734a673208a2cd49aebccce7659816e6179e3 BUG: 1399196 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/15881 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd, readdir-ahead: Fix backward incompatibilityPoornima G2017-02-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: Any opion is spcified in two places: In the options[] of xlator itself and glusterd-volume-set.c. The default value of this option can be specified in both the places. If its specified only in xlator then the volfile generated will not have the option and default value, it will be assigned during graph initialization. With patch [1] the option rda-request-size was changed from INT to SIZET type, and default was changed from 131072 to 128KB, but was specified only in the readdir-ahead.c. Thus with this patch alone the volfile entry for readdir-ahead looks like: volume patchy-readdir-ahead type performance/readdir-ahead subvolumes patchy-read-ahead end-volume With patch [2], the default of option rda-request-size was specified in glusterd-volume-set.c as well(as it was necessary fr parallel readdir). With this patch the readdir entry in the volfile will look like: volume patchy-readdir-ahead type performance/readdir-ahead option rda-cache-limit 10MB option rda-request-size 128KB option parallel-readdir off subvolumes patchy-read-ahead end-volume Now consider the server has both these patches and client doesn't. Server will generate a volfile with entry: The old clients which thought the option rda-request-size is of type INT will now recieve the value 128KB which it willn't understand, and hence fail the mount. The issue is seen only with the combination of [1] and [2]. Solution: Instead of specifying 128KB as default in glusterd we specify 131072 so that the old clients will interpret as INT and new ones as 128KB Credits: Raghavendra G Change-Id: I0c269a5890957fd8a38e9a05bdec088645a6688a BUG: 1423410 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16657 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* readdir-ahead: Remove unnecessary loggingPoornima G2017-02-171-6/+0
| | | | | | | | | | | | | | dict_get_int can return < 0 when key is not found is a valid case. Hence no need to log. Change-Id: If0795b0f178adbb94b10efc563506993f7411962 BUG: 1423369 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16654 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* performance/decompounder: Have terminal value in options[]Pranith Kumar K2017-02-161-0/+1
| | | | | | | | | | | | | | Absence of terminal values is leading to buffer-over-flow errors in address sanitizer. BUG: 1422152 Change-Id: I769c0e4b5bbb3ef2849b8d1097b9def522ae08d9 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16615 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* md-cache: initialize mdc_counter.lockVitaly Lipatov2017-02-031-0/+1
| | | | | | | | | | | | | | | | | | add missed LOCK_INIT to fix INCREMENT_ATOMIC on conf->mdc_counter.lock when pthread_spin_* using Change-Id: I680bd6f41e3b8a1852ed969bf6794cbf4c1ccdd4 BUG: 1417913 Signed-off-by: Vitaly Lipatov <lav@etersoft.ru> Reviewed-on: https://review.gluster.org/16515 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/write-behind: access stub only if available duringRaghavendra G2017-02-021-12/+12
| | | | | | | | | | | | | statedump Change-Id: Ia5dd718458a5e32138012f81f014d13fc6b28be2 BUG: 1415115 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/16440 Reviewed-by: N Balachandran <nbalacha@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs+transport+io-threads: fix 256KB stack abuseJeff Darcy2017-02-011-1/+1
| | | | | | | | | | | | | | | | | Some functions were allocating 64K booleans, which are (crazily) mapped to 4-byte ints, for a total of 256KB per call. Changed to use bitfields instead, so usage is now only 8KB per call. This was the impediment to changing the io-threads stack size, so that has been adjusted too. Change-Id: I8781c4f2c8f2b830f4535e366995fac8dd0a8653 BUG: 1418095 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/15745 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* performance/write-behind: do __wb_request_unref within locksRaghavendra G2017-01-261-3/+2
| | | | | | | | | | | | | | | | | Since __wb_request_unref can remove the request from various lists, calling it without holding wb_inode->lock results in corruptions when other threads simultaneously try to access the lists this request is part of. Thanks to "Nithya Balachandran" <nbalacha@redhat.com> for pointing out the bug. Change-Id: I78fb6433c2e212500d07780f7b45c5a0e2bf9209 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/16464 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Readdir-ahead : Honor readdir-optimise option of dhtPoornima G2017-01-243-1/+20
| | | | | | | | | | | Change-Id: I9c5e65b32e316e6a2fc7e1f5c79fce79386b78e2 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16071 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Cache security.ima xattrsPoornima G2017-01-201-2/+79
| | | | | | | | | | | | | | | | From kernel version 3.X or greater, creating of a file results in removexattr call on security.ima xattr. But this xattr is not set on the file unless IMA feature is active. With this patch, removxattr call returns ENODATA if it is not found in the cache. Change-Id: I8136096598a983aebc09901945eba1db1b2f93c9 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16296 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* gluster: Typos in logs and commentsN Balachandran2017-01-191-4/+4
| | | | | | | | | | | | | | | Replaced 'recieve' with 'receive'. Change-Id: I4c1c9147db5437feb81e4c83ed074440aaa28e07 BUG: 1414645 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/16429 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> Tested-by: Manikandan Selvaganesh <manikandancs333@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* readdir-ahead : Perform STACK_UNWIND outside of mutex locksPoornima G2017-01-091-48/+67
| | | | | | | | | | | | | | | | | | Currently STACK_UNWIND is performnd within ctx->lock. If readdir-ahead is loaded as a child of dht, then there can be scenarios where the function calling STACK_UNWIND becomes re-entrant. Its a good practice to not call STACK_WIND/UNWIND within local mutex's Change-Id: If4e869849d99ce233014a8aad7c4d5eef8dc2e98 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16068 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Cache updated as a part of invalidate should not update timePoornima G2017-01-081-14/+25
| | | | | | | | | | | | | | | | | Currently when a invalidate happens we update the cache along with the cache time. The problem with this is, upcall doesn't update the last access time of a client when an invalidation is sent, thus resulting in a timewindow where the md-cache has cached, but the upcall is unaware and hence upcall will not further invalidate the cache(unless a fop is sent from the same client, and upcall updates its database to reflect the same) Change-Id: Ibceb8d2fc360582752846bbf7fd59697d5424754 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16295 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/write-behind: Add debug messagesRaghavendra G2017-01-081-17/+168
| | | | | | | | | | Change-Id: I2ea1350fcbe4b6c06dcb8093b28316f734cd3b48 BUG: 1379655 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/16285 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* ec: Invalidations in disperse volume should not update the statPoornima G2017-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | Issue: In disperse volume, the file is present across bricks, hence the stat from one brick doesn't carry the valid size of the file. Therefore the upcall from one brick updating the md-cache results in wrong size being updated. Fix: If the notification is cache invalidation then, indicate md-cache that the attributes is invalid. BUG: 1410375 Change-Id: Id89d2283478e70b62b435a8891fffc86d2be8cb2 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16329 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* readdir-ahead: Enhance EOD detection logicPoornima G2017-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Issue: Currently end of directory is identified on obtaining a readdirp_cbk with op_ret = 0 (i.e. 0 entries fetched in readdirp). Thus an extra readdirp is required for every directory just to identify EOD. Consider a case of listing large number of small directories. The readdirp fops required are doubled in that case. Solution: On reaching the EOD, posix sets the op_errno to ENOENT, hence along with looking for 'op_ret == 0' we also look for 'operrno == ENOENT' ehile checking for EOD condition Change-Id: I7a5b52e7b98f5dc236c387635fcc651dac0171b3 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16042 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/readdir-ahead: limit cache sizeRaghavendra G2016-12-222-35/+85
| | | | | | | | | | | | | | | | This patch introduces a new option called "rda-cache-limit", which is the maximum value the entire readdir-ahead cache can grow into. Since, readdir-ahead holds a reference to inode through dentries, this patch also accounts memory stored by various xlators in inode contexts. Change-Id: I84cc0ca812f35e0f9041f8cc71effae53a9e7f99 BUG: 1356960 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/16137 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* performance/write-behind: Add more fops for checking dependency withRaghavendra G2016-12-121-0/+317
| | | | | | | | | | | | | | | | | | cached writes Fops like readdirp, link, fallocate, discard, zerofill return iatt of files in their responses. This iatt can be cached by md-cache. Hence it is important that write-behind maintains relative ordering of these fops with cached writes. Failure to do so, can result in md-cache storing stale iatts and returning the same to applications. Change-Id: Icfe12ad807e42fe9e52a9f63e47ce63f511c6946 BUG: 1390050 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/15757 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* dht/md-cache: Filter invalidate if the file is made a linkto filePoornima G2016-12-021-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | Upcall as a part of setattr, sends an invalidation and the invalidation carries the resulting stat value. When a file is converted to linkto files, even then an invalidation is set and as a result the mountpoint shows the sticky bit in the stat of the file. eg: ---------T. 945 root root 0 Nov 8 10:14 hardlink.999 Fix: When dht recieves a notification of sticky bit change, it updates the flag, to indicate md-cache to send the subsequent lookup. Change-Id: Ic2fd7a5b196db0754f9b97072e644e6bf69da606 BUG: 1392713 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15789 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* io-cache: Fix a read hangPoornima G2016-11-231-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: ===== In certain cases, there was no unwind of read from read-ahead xlator, thus resulting in hang. RCA: ==== In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead of STACK_WIND(). One such case is when inode_ctx for that file is not present (can happen if readdirp was called, and populates md-cache and serves all the lookups from cache). Consider the following graph: ... io-cache (parent) | readdir-ahead | read-ahead ... Below is the code snippet of ioc_readv calling STACK_WIND_TAIL: ioc_readv() { ... if (!inode_ctx) STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this), FIRST_CHILD (frame->this)->fops->readv, fd, size, offset, flags, xdata); /* Ideally, this stack_wind should wind to readdir-ahead:readv() but it winds to read-ahead:readv(). See below for explaination. */ ... } STACK_WIND_TAIL (frame, obj, fn, ...) { frame->this = obj; /* for the above mentioned graph, frame->this will be readdir-ahead * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which * is as expected */ ... THIS = obj; /* THIS will be read-ahead instead of readdir-ahead!, as obj expands * to "FIRST_CHILD (frame->this)" and frame->this was pointing * to readdir-ahead in the previous statement. */ ... fn (frame, obj, params); /* fn will call read-ahead:readv() instead of readdir-ahead:readv()! * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and * frame->this was pointing ro readdir-ahead in the first statement */ ... } Thus, the readdir-ahead's readv() implementation will be skipped, and ra_readv() will be called with frame->this = "readdir-ahead" and this = "read-ahead". This can lead to corruption / hang / other problems. But in this perticular case, when 'frame->this' and 'this' passed to ra_readv() doesn't match, it causes ra_readv() to call ra_readv() again!. Thus the logic of read-ahead readv() falls apart and leads to hang. Solution: ========= Ideally, STACK_WIND_TAIL() should be modified as: STACK_WIND_TAIL (frame, obj, fn, ...) { next_xl = obj /* resolve obj as the variables passed in obj macro can be overwritten in the further instrucions */ next_xl_fn = fn /* resolve fn and store in a tmp variable, before modifying any variables */ frame->this = next_xl; ... THIS = next_xl; ... next_xl_fn (frame, next_xl, params); ... } But for this solution, knowing the type of variable 'next_xl_fn' is a challenge and is not easy. Hence just modifying all the existing callers to pass "FIRST_CHILD (this)" as obj, instead of "FIRST_CHILD (frame->this)". Change-Id: I179ffe3d1f154bc5a1935fd2ee44e912eb0fbb61 BUG: 1388292 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15901 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* performance/io-threads: Exit threads in fini() as wellPranith Kumar K2016-11-212-11/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: io-threads starts the thread in 'init()' but doesn't clean them up on 'fini()'. It relies on PARENT_DOWN to exit threads but there can be cases where event before PARENT_UP the graph init code can think of issuing fini(). This code path is hit when glfs_init() is called on a volume that is in 'stopped' state. It leads to a crash in ganesha process, because the io-thread tries to access freed memory. Fix: Ideal fix would be to wait for all fops in io-thread list to be completed on PARENT_DOWN, and have fini() do cleanup of threads. Because there is no proper documentation about how PARENT_DOWN/fini are supposed to be used, we are getting different kinds of sequences in different higher level protocols. So for now cleaning up in both PARENT_DOWN and fini(). Fuse doesn't call fini() gfapi is not calling PARENT_DOWN in some cases, so for now I don't see another way out. BUG: 1396793 Change-Id: I9c9154e7d57198dbaff0f30d3ffc25f6d8088aec Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15888 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UPPoornima G2016-11-211-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently these are few events related to child_up/down: GF_EVENT_CHILD_UP : Issued when any of the protocol client connects. GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec GF_EVENT_CHILD_DOWN : Issued when any of the protocol client disconnects. These events get modified at the dht/afr/ec layers. Here is a brief on the same. DHT: - All the subvolumes reported once, and atleast one child came up, then GF_EVENT_CHILD_UP is issued - connect GF_EVENT_CHILD_UP is issued - disconnect GF_EVENT_CHILD_MODIFIED is issued - All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued AFR: - First subvolume came up, then GF_EVENT_CHILD_UP is issued - Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED - Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued - Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP, GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes go up or down. Now with md-cache changes, there is a necessity to differentiate between child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and getting rid of GF_EVENT_CHILD_MODIFIED. [1] http://review.gluster.org/12573 Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58 BUG: 1396038 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15764 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* performance/open-behind: Avoid deadlock in statedumpPranith Kumar K2016-11-091-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: open-behind is taking fd->lock then inode->lock where as statedump is taking inode->lock then fd->lock, so it is leading to deadlock In open-behind, following code exists: void ob_fd_free (ob_fd_t *ob_fd) { loc_wipe (&ob_fd->loc); <<--- this takes (inode->lock) ....... } int ob_wake_cbk (call_frame_t *frame, void *cookie, xlator_t *this, int op_ret, int op_errno, fd_t *fd_ret, dict_t *xdata) { ....... LOCK (&fd->lock); <<---- fd->lock { ....... __fd_ctx_del (fd, this, NULL); ob_fd_free (ob_fd); <<<--------------- } UNLOCK (&fd->lock); ....... } ================================================================= In statedump this code exists: inode_dump (inode_t *inode, char *prefix) { ....... ret = TRY_LOCK(&inode->lock); <<---- inode->lock ....... fd_ctx_dump (fd, prefix); <<<----- ....... } fd_ctx_dump (fd_t *fd, char *prefix) { ....... LOCK (&fd->lock); <<<------------------ this takes fd-lock { ....... } Fix: Make sure open-behind doesn't call ob_fd_free() inside fd->lock BUG: 1393259 Change-Id: I4abdcfc5216270fa1e2b43f7b73445f49e6d6e6e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15808 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/write-behind: fix flush stuck by former failed writesRyan Ding2016-11-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the issue is happened in this case: assume a file is opened with fd1 and fd2. 1. some WRITE opto fd1 got error, they were add back to 'todo' queue because of those error. 2. fd2 closed, a FLUSH op is send to write-behind. 3. FLUSH can not be unwind because it's not a legal waiter for those failed write(as func __wb_request_waiting_on() say). and those failed WRITE also can not be ended if fd1 is not closed. fd2 stuck in close syscall. to resolve this issue, we can change the way we determine 2 requests is 'conflict': flush/fsync is not conflict with those write that is not belonged to them. so __wb_pick_winds() can wind the FLUSH op. below is some information when the stuck issue happen: glusterdump logs: [xlator.performance.write-behind.wb_inode] path=/ltp-F9eG0ZSOME/rw-buffered-16436 inode=0x7fdbe8039b9c window_conf=1048576 window_current=249856 transit-size=0 dontsync=0 [.WRITE] request-ptr=0x7fdbe8020200 refcount=1 wound=no generation-number=4 req->op_ret=-1 req->op_errno=116 sync-attempts=3 sync-in-progress=no size=131072 offset=1220608 lied=-1 append=0 fulfilled=0 go=0 [.WRITE] request-ptr=0x7fdbe8068c30 refcount=1 wound=no generation-number=5 req->op_ret=-1 req->op_errno=116 sync-attempts=2 sync-in-progress=no size=118784 offset=1351680 lied=-1 append=0 fulfilled=0 go=0 [.FLUSH] request-ptr=0x7fdbe8021cd0 refcount=1 wound=no generation-number=6 req->op_ret=0 req->op_errno=0 sync-attempts=0 gdb detail about above 3 requests: (gdb) print *((wb_request_t *)0x7fdbe8021cd0) $2 = {all = {next = 0x7fdbe803a608, prev = 0x7fdbe8068c30}, todo = {next = 0x7fdbe803a618, prev = 0x7fdbe8068c40}, lie = {next = 0x7fdbe8021cf0, prev = 0x7fdbe8021cf0}, winds = {next = 0x7fdbe8021d00, prev = 0x7fdbe8021d00}, unwinds = {next = 0x7fdbe8021d10, prev = 0x7fdbe8021d10}, wip = { next = 0x7fdbe8021d20, prev = 0x7fdbe8021d20}, stub = 0x7fdbe80224dc, write_size = 0, orig_size = 0, total_size = 0, op_ret = 0, op_errno = 0, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_FLUSH, lk_owner = {len = 8, data = "W\322T\f\271\367y$", '\000' <repeats 1015 times>}, iobref = 0x0, gen = 6, fd = 0x7fdbe800f0dc, wind_count = 0, ordering = {size = 0, off = 0, append = 0, tempted = 0, lied = 0, fulfilled = 0, go = 0}} (gdb) print *((wb_request_t *)0x7fdbe8020200) $3 = {all = {next = 0x7fdbe8068c30, prev = 0x7fdbe803a608}, todo = {next = 0x7fdbe8068c40, prev = 0x7fdbe803a618}, lie = {next = 0x7fdbe8068c50, prev = 0x7fdbe803a628}, winds = {next = 0x7fdbe8020230, prev = 0x7fdbe8020230}, unwinds = {next = 0x7fdbe8020240, prev = 0x7fdbe8020240}, wip = { next = 0x7fdbe8020250, prev = 0x7fdbe8020250}, stub = 0x7fdbe8062c3c, write_size = 131072, orig_size = 4096, total_size = 0, op_ret = -1, op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, iobref = 0x7fdbe80311a0, gen = 4, fd = 0x7fdbe805c89c, wind_count = 3, ordering = {size = 131072, off = 1220608, append = 0, tempted = -1, lied = -1, fulfilled = 0, go = 0}} (gdb) print *((wb_request_t *)0x7fdbe8068c30) $4 = {all = {next = 0x7fdbe8021cd0, prev = 0x7fdbe8020200}, todo = {next = 0x7fdbe8021ce0, prev = 0x7fdbe8020210}, lie = {next = 0x7fdbe803a628, prev = 0x7fdbe8020220}, winds = {next = 0x7fdbe8068c60, prev = 0x7fdbe8068c60}, unwinds = {next = 0x7fdbe8068c70, prev = 0x7fdbe8068c70}, wip = { next = 0x7fdbe8068c80, prev = 0x7fdbe8068c80}, stub = 0x7fdbe806746c, write_size = 118784, orig_size = 4096, total_size = 0, op_ret = -1, op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, iobref = 0x7fdbe8052b10, gen = 5, fd = 0x7fdbe805c89c, wind_count = 2, ordering = {size = 118784, off = 1351680, append = 0, tempted = -1, lied = -1, fulfilled = 0, go = 0}} you can see they are all on 'todo' queue, and FLUSH op fd is not the same WRITE op fd. Change-Id: Id687f9cd3b9f281e1a97c83f1ce981ede272b8ab BUG: 1372211 Signed-off-by: Ryan Ding <ryan.ding@open-fs.com> Reviewed-on: http://review.gluster.org/15380 Tested-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Invalidate cache entry for open() with O_TRUNCSoumya Koduri2016-10-261-0/+48
| | | | | | | | | | | | | | | | | When a file is opened with O_TRUNC flag set, its size gets set to '0'. This case needs to be handled in md-cache to avoid sending incorrect cached stat. Change-Id: I95d1f8a6634734898883ede010c3e7b0b7eb97d9 BUG: 1382266 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/15618 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: jiffin tony Thottan <jthottan@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* performance/io-threads: Exit all threads on PARENT_DOWNPranith Kumar K2016-10-232-17/+40
| | | | | | | | | | | | | | | | | | | | Problem: When glfs_fini() is called on a volume where client.io-threads is enabled, fini() will free up iothread xl's private structure but there would be some threads that are sleeping which would wakeup after the timedwait completes leading to accessing already free'd memory. Fix: As part of parent-down, exit all sleeping threads. BUG: 1381830 Change-Id: I0bb8d90241112c355fb22ee3fbfd7307f475b339 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15620 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/write-behind: remove the request from liability queue inRaghavendra G2016-10-161-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wb_fulfill_request Before this patch, a request is removed from liability queue only when ref count of request hits 0. Though, wb_fulfill_request does an unref, it need not be the last unref and hence the request may survive in liability queue till the last unref. Let, T1: the time at which wb_fulfill_request is invoked T2: the time at which last unref is done on request Let's consider a case of T2 > T1. In the time window between T1 and T2, any other request (waiter) conflicting with request in liability queue (blocker - basically a write which has been lied) is blocked from winding. If T2 happens to be when wb_do_unwinds is invoked, no further processing of request list happens and "waiter" would get blocked forever. An example imaginary sequence of events is given below: 1. A write request w1 is picked up for unwinding in __wb_pick_unwinds (but unwind is not done _yet_ and hence reference remains). However, w1 is moved to liability queue. Let's call this invocation of wb_process_queue by wb_writev as PQ1. 2. A flush (f1) request hits write behind. Since the liability queue of inode is not empty, f1 is not picked for unwinding. Let's call the invocation of wb_process_queue by wb_flush as PQ2. 3. PQ2 continues and picks w1 for fulfilling and invokes wb_fulfill. As part of successful wb_fulfill_cbk, wb_fulfill_request (w1) is invoked. But, w1 is not freed (and hence not removed from liability queue) as w1 is not unwound _yet_ and a ref remains (PQ1 has not invoked wb_do_unwinds _yet_). 4. wb_fulfill_cbk (triggered by PQ2) invokes a wb_process_queue (let's say PQ3). f1 is not resumed in PQ3 as w1 is still in liability queue. At this time, PQ2 and PQ3 are complete. 5. PQ1 continues, unwinds w1 and does last unref on w1 and w1 is freed (and removed from liability queue). Since PQ1 didn't invoke wb_fulfill on any other write requests, there won't be any future codepaths that would invoke wb_process_queue and f1 is stuck forever. With this fix, w1 is removed from liability queue in step 3 above and PQ3 resumes f1 in step 4 (as there are no requests conflicting with f1 in liability queue during execution of PQ3). Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 1379655 Change-Id: Idacda1fcd520ac27f30224f8dfe8360dba6ac6cb Reviewed-on: http://review.gluster.org/15579 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* performance/open-behind: Pass O_DIRECT flags for anon fd reads when requiredKrutika Dhananjay2016-09-222-39/+28
| | | | | | | | | | | | | | | | Writes are already passing the correct flags at the time of open(). Also, make io-cache honor direct-io for anon-fds with O_DIRECT flag during reads. Change-Id: I215cb09ef1b607b9f95cabf0ef3065c00edd9e78 BUG: 1377556 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15537 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>