summaryrefslogtreecommitdiffstats
path: root/xlators/performance
Commit message (Collapse)AuthorAgeFilesLines
* Multiple xlator .h files: remove unused private gf_* memory types.Yaniv Kaul2018-11-302-6/+1
| | | | | | | | | | | | | It seems there were quite a few unused enums (that in turn cause unndeeded memory allocation) in some xlators. I've removed them, hopefully not causing any damage. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I8252bd763dc1506e2d922496d896cd2fc0886ea7
* libglusterfs: rename macros roof and floor to not conflict with math.hRaghavendra Gowdappa2018-11-284-10/+10
| | | | | | Change-Id: I666eeb63ebd000711b3f793b948d4e0c04b1a242 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Updates: bz#1644629
* coverity: Fix coverity issuesMohammed Rafi KC2018-11-261-1/+3
| | | | | | | | | | | | | | | | This patch fixes coverity CID : 1356537 https://scan6.coverity.com/reports.htm#v42907/p10714/fileInstanceId=87389108&defectInstanceId=26791927&mergedDefectId=1356537 CID : 1395666 https://scan6.coverity.com/reports.htm#v42907/p10714/fileInstanceId=87389187&defectInstanceId=26791932&mergedDefectId=1395666 CID : 1351707 https://scan6.coverity.com/reports.htm#v42907/p10714/fileInstanceId=87389027&defectInstanceId=26791973&mergedDefectId=1351707 CID : 1396910 https://scan6.coverity.com/reports.htm#v42907/p10714/fileInstanceId=87389027&defectInstanceId=26791973&mergedDefectId=13596910 Change-Id: I8094981a741f4d61b083c05a98df23dcf5b022a2 updates: bz#789278 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* md-cache: request ACLs related xattrs when mode changeKinglong Mee2018-11-121-0/+62
| | | | | | | | If glusterfs client changes mode, ACLs related xattrs may changed too. Change-Id: Ifa5bff1f77ab7b176e54da4607ea9c1e66fc5588 Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* md-cache: new option "cache-glusterfs-acl" for virtual glusterfs ACLsKinglong Mee2018-11-061-3/+21
| | | | | | Change-Id: I020ab08dba48f13cf7b8908e96280f1e92e9b9db Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* md-cache: request cached xattrs at getxattr/fgetxattrKinglong Mee2018-11-061-4/+32
| | | | | | Change-Id: I8e3ad961164815683776850e3a5fd4f510003690 Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* md-cache: extends xa_time when previous attributes equals cached attributesKinglong Mee2018-11-061-7/+15
| | | | | | | | | | | | Some operations like read/write only update iatt and ia_time, without any operion request xattrs from glusterfsd, the xa_time is timeout before ia_time always. This patch updates xa_time when update ia_ttime. Change-Id: I77e3984f38c1c4dbebfde9729b8117fbacde9674 Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* md-cache: remove xattr setting after new file createdKinglong Mee2018-11-061-3/+0
| | | | | | | | | Fops of creating file does not request cached xattrs, the xattr in reply is not cached xattrs. Change-Id: Iab2db686e92466e72cfee8ac494e851d797c10b3 Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* all: fix the format string exceptionsAmar Tumballi2018-11-059-40/+38
| | | | | | | | | | | | | | | | Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. While fixing it, makes sense to make the explicit check for such issues across the codebase, by making the format call properly. Fixes: CVE-2018-14661 Fixes: bz#1644763 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* md-cache: removexattr must be send to glusterfsdKinglong Mee2018-11-021-6/+14
| | | | | | Change-Id: I53a583ec14bce65e8914bc496123dee3abe61f6c Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* md-cache: request cached xattrs at stat/fstatKinglong Mee2018-10-231-0/+20
| | | | | | | | | Ganesha always operate file by filehandle, and translates to glusterfs's stat/fstat many time. Change-Id: Idd0dc33c31131331ac948754c8b7f898777c31d3 Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* md-cache: fix dict leak of cached xattrKinglong Mee2018-10-221-0/+13
| | | | | | Change-Id: I52f8e13e68528ba9679537ffdddf58ec08f9fd0c Updates: bz#1634220 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* xlators: some high severity coverity fixesBhumika Goyal2018-10-221-0/+2
| | | | | | | | Fixes: 124759 1288787 Change-Id: Ib8999242fc3ea5f4ea80246659899d2d4f06c506 updates: bz#789278 Signed-off-by: Bhumika Goyal <bgoyal@redhat.com>
* performance/write-behind: Fix NULL dereference issueVarsha Rao2018-10-221-2/+2
| | | | | | | | | This patches fixes the following coverity issues: CID: 1396101, 1396102 - Dereference null return value. Change-Id: I7ec783a61c06a1378863e974ff6e0baae418aec2 updates: bz#789278 Signed-off-by: Varsha Rao <varao@redhat.com>
* Multiple xlators: perform gettimeofday() not under lockYaniv Kaul2018-10-163-4/+20
| | | | | | | | | | | | | While it may slightly reduce accuracy, I think it's better to acquire the time outside a lock and then memcpy the value while under lock. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ic8fb94a56c4fa2d3b13e59767e13646fb39342ba
* core: glusterfsd keeping fd open in index xlatorMohit Agrawal2018-10-122-12/+15
| | | | | | | | | | | | | | | | | | | | | | Problem: At the time of processing GF_EVENT_PARENT_DOWN at brick xlator, it forwards the event to next xlator only while xlator ensures no stub is in progress. At io-thread xlator it decreases stub_cnt before the process a stub and notify EVENT to next xlator Solution: Introduce a new counter to save stub_cnt and decrease the counter after process the stub completely at io-thread xlator. To avoid brick crash at the time of call xlator_mem_cleanup move only brick xlator if detach brick name has found in the graph Note: Thanks to pranith for sharing a simple reproducer to reproduce the same fixes bz#1637934 Change-Id: I1a694a001f7a5417e8771e3adf92c518969b6baa Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* performance/nl-cache: clang fixIraj Jamali2018-10-121-1/+3
| | | | | | | | | | | Argument with 'nonnull' attribute passed null. Adding a check to avoid the issue. Updates: bz#1622665 Change-Id: I1d3a166e154a51da59bebb93a49f5174e593c98e Signed-off-by: Iraj Jamali <ijamali@redhat.com>
* performance/write-behind: NULL pointer passed to a nonnull parameterShwetha Acharya2018-10-111-0/+6
| | | | | | | | | | | | Problem: wb_directory_inode->lock can be null. Solution: added a condition, if(!wb_directory_inode->lock.spinlock) to address the issue (checked one of the attributes of union lock to ensure that union is not null). Updates: bz#1622665 Change-Id: I0749ee16aa2c23f51d4b4c7b0979d494bcd4d90e Signed-off-by: Shwetha Acharya <sacharya@redhat.com>
* all: fix warnings on non 64-bits architecturesXavi Hernandez2018-10-103-9/+9
| | | | | | | | | | When compiling in other architectures there appear many warnings. Some of them are actual problems that prevent gluster to work correctly on those architectures. Change-Id: Icdc7107a2bc2da662903c51910beddb84bdf03c0 fixes: bz#1632717 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* core: glusterfsd keeping fd open in index xlatorMohit Agrawal2018-10-082-9/+44
| | | | | | | | | | | | | | Problem: Current resource cleanup sequence is not perfect while brick mux is enabled Solution: 1) Destroying xprt after cleanup all fd associated with a client 2) Before call fini for brick xlators ensure no stub should be running on a brick Change-Id: I86195785e428f57d3ef0da3e4061021fafacd435 fixes: bz#1631357 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* mdcache: Fix asan reported potential heap buffer overflowShyamsundarR2018-09-281-0/+1
| | | | | | | | | | | | | | The char pointer mdc_xattr_str in function mdc_xattr_list_populate is malloc'd and doing a strcat into a malloc'd region can overflow content allocated based on prior contents of the memory region. Added a NULL terimation to the malloc'd region to prevent the overflow, and treat it as an empty string. Change-Id: If0decab669551581230a8ede4c44c319ff04bac9 Updates: bz#1633930 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* glusterd: Update op-version from 4.2 to 5.0ShyamsundarR2018-09-131-1/+1
| | | | | | | | | | | | Post changing the max op-version to 4.2, after release 4.1 branching, the decision was to go with increasing release numbers. Thus this needs to change to 5.0. This commit addresses the above change. Fixes: bz#1628664 Change-Id: Ifcc0c6da90fdd51e4eceea40749511110a432cce Signed-off-by: ShyamsundarR <srangana@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-1215-15373/+14685
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* Land clang-format changesGluster Ant2018-09-1228-721/+637
| | | | Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
* performance/write-behind: remove the request from wip queue in ↵Raghavendra G2018-09-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wb_fulfill_request The bug is very similar to bz 1379655 and the fix too very similar to commit a8b2a981881221925bb5edfe7bb65b25ad855c04. Before this patch, a request is removed from wip queue only when ref count of request hits 0. Though, wb_fulfill_request does an unref, it need not be the last unref and hence the request may survive in wip queue till the last unref. Let, T1: the time at which wb_fulfill_request is invoked T2: the time at which last unref is done on request Let's consider a case of T2 > T1. In the time window between T1 and T2, any other request (waiter) conflicting with request in liability queue (blocker - basically a write which has been lied) is blocked from winding. If T2 happens to be when wb_do_unwinds is invoked, no further processing of request list happens and "waiter" would get blocked forever. An example imaginary sequence of events is given below: 1. A write request w1 is picked up for winding in __wb_pick_winds and w1 is moved to wip queue. Let's call this invocation of wb_process_queue by wb_writev as PQ1. Note w1 is not unwound. 2. A dependent write (w2) hits write-behind and is unwound followed by a flush (f1) request. Since the liability queue of inode is not empty, w2 and f1 are not picked for unwinding. Let's call the invocation of wb_process_queue by wb_flush as PQ2. Note that invocation of wb_process_queue by w2 doesn't wind w2 instead unwinds it after which we hit PQ2 3. PQ2 continues and picks w1 for fulfilling and invokes wb_fulfill. As part of successful wb_fulfill_cbk, wb_fulfill_request (w1) is invoked. But, w1 is not freed (and hence not removed from wip queue) as w1 is not unwound _yet_ and a ref remains (PQ1 has not invoked wb_do_unwinds _yet_). 4. wb_fulfill_cbk (triggered by PQ2) invokes a wb_process_queue (let's say PQ3). w2 is not picked up for winding in PQ3 as w1 is still in wip queue. At this time, PQ2 and PQ3 are complete. 5. PQ1 continues, unwinds w1 and does last unref on w1 and w1 is freed (and removed from wip queue). Since PQ1 didn't invoke wb_fulfill on any other write requests, there won't be any future codepaths that would invoke wb_process_queue and w2 is stuck forever. This will prevent f2 too and hence close syscall is hung With this fix, w1 is removed from liability queue in step 3 above and PQ3 winds w2 in step 4 (as there are no requests conflicting with w2 in liability queue during execution of PQ3). Once w2 is complete, f1 is resumed. Change-Id: Ia972fad0858dc4abccdc1227cb4d880f85b3b89b Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1626787
* xlators: add classification flag to someAmar Tumballi2018-09-043-0/+3
| | | | | | | | | Add classification to those translators which has `xlator_api_t` already defined and used. Updates: #430 Change-Id: I9d2772cb2c4ed4ab06aaa546500cf3b7d00bddac Signed-off-by: Amar Tumballi <amarts@redhat.com>
* IO cache : fix coverity issue in page.cSunny Kumar2018-09-041-3/+3
| | | | | | | | This patch fixes CID 1382439 and 1382412. Change-Id: I8696623c168ba76ae2ecac7c9582b4e50437bc53 updates: bz#789278 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* Multiple files: calloc -> mallocYaniv Kaul2018-09-043-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlators/storage/posix/src/posix-inode-fd-ops.c: xlators/storage/posix/src/posix-helpers.c: xlators/storage/bd/src/bd.c: xlators/protocol/client/src/client-lk.c: xlators/performance/quick-read/src/quick-read.c: xlators/performance/io-cache/src/page.c xlators/nfs/server/src/nfs3-helpers.c xlators/nfs/server/src/nfs-fops.c xlators/nfs/server/src/mount3udp_svc.c xlators/nfs/server/src/mount3.c xlators/mount/fuse/src/fuse-helpers.c xlators/mount/fuse/src/fuse-bridge.c xlators/mgmt/glusterd/src/glusterd-utils.c xlators/mgmt/glusterd/src/glusterd-syncop.h xlators/mgmt/glusterd/src/glusterd-snapshot.c xlators/mgmt/glusterd/src/glusterd-rpc-ops.c xlators/mgmt/glusterd/src/glusterd-replace-brick.c xlators/mgmt/glusterd/src/glusterd-op-sm.c xlators/mgmt/glusterd/src/glusterd-mgmt.c xlators/meta/src/subvolumes-dir.c xlators/meta/src/graph-dir.c xlators/features/trash/src/trash.c xlators/features/shard/src/shard.h xlators/features/shard/src/shard.c xlators/features/marker/src/marker-quota.c xlators/features/locks/src/common.c xlators/features/leases/src/leases-internal.c xlators/features/gfid-access/src/gfid-access.c xlators/features/cloudsync/src/cloudsync-plugins/src/cloudsyncs3/src/libcloudsyncs3.c xlators/features/bit-rot/src/bitd/bit-rot.c xlators/features/bit-rot/src/bitd/bit-rot-scrub.c bxlators/encryption/crypt/src/metadata.c xlators/encryption/crypt/src/crypt.c xlators/performance/md-cache/src/md-cache.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible It doesn't make sense to calloc (allocate and clear) memory when the code right away fills that memory with data. It may be optimized by the compiler, or have a microscopic performance improvement. In some cases, also changed allocation size to be sizeof some struct or type instead of a pointer - easier to read. In some cases, removed redundant strlen() calls by saving the result into a variable. 1. Only done for the straightforward cases. There's room for improvement. 2. Please review carefully, especially for string allocation, with the terminating NULL string. Only compile-tested! .. and allocate memory as much as needed. xlators/nfs/server/src/mount3.c : Don't blindly allocate PATH_MAX, but strlen() the string and allocate appropriately. Also, align error messges. updates: bz#1193929 Original-Author: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ibda6f33dd180b7f7694f20a12af1e9576fe197f5
* multiple xlators: move from strlen() to sizeof()Yaniv Kaul2018-08-312-3/+3
| | | | | | | | | | | | | | | xlators/performance/nl-cache/src/nl-cache.c xlators/performance/md-cache/src/md-cache.c xlators/protocol/server/src/authenticate.c xlators/storage/bd/src/bd-helper.c For const strings, just do compile time size calc instead of runtime. Compile-tested only! Change-Id: I9b98940a38d85321a69436a1871930da367b918a updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* IO cache : fix coverity issues in io-cache.cSunny Kumar2018-08-301-3/+8
| | | | | | | | This patch fixes CID 1382361, 1124714 and 1382432. Change-Id: I0407f35ee44ec6e4522de46092658223d0c8ee6a updates: bz#789278 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* performance/write-behind: fix fulfill and readdirp raceRaghavendra G2018-08-231-33/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current invalidation of stats in wb_readdirp_cbk is prone to races. As the deleted comment explains, <snip> We cannot guarantee integrity of entry->d_stat as there are cached writes. The stat is most likely stale as it doesn't account the cached writes. However, checking for non-empty liability list here is not a fool-proof solution as there can be races like, 1. readdirp is successful on posix 2. sync of cached write is successful on posix 3. write-behind received sync response and removed the request from liability queue 4. readdirp response is processed at write-behind. In the above scenario, stat for the file is sent back in readdirp response but it is stale. </snip> The fix is to mark readdirp sessions (tracked in this patch by non-zero value of "readdirps" on parent inode) and if fulfill completes when one or more readdirp sessions are in progress, mark the inode so that wb_readdirp_cbk doesn't send iatts for that in inode in readdirp response. Note that wb_readdirp_cbk already checks for presence of a non-empty liability queue and invalidates iatt. Since the only way a liability queue can shrink is by fulfilling requests in liability queue, wb_fulfill_cbk indicates wb_readdirp_cbk that a potential race could've happened b/w readdirp and fulfill. Change-Id: I12d167bf450648baa64be1cbe1ca0fddf5379521 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> updates: bz#1512691
* Revert "performance/write-behind: better invalidation in readdirp"Raghavendra G2018-08-211-28/+23
| | | | | | | | | | | This reverts commit 4d3c62e71f3250f10aa0344085a5ec2d45458d5c. Traversing all children of a directory in wb_readdirp caused significant performance regression. Hence reverting this patch Change-Id: I6c3b6cee2dd2aca41d49fe55ecdc6262e7cc5f34 updates: bz#1512691 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* write-behind: coverity fixesBhumika Goyal2018-08-201-3/+7
| | | | | | | | Fixes CID: 1124360 1291740 1370918 Change-Id: I008c7ade8f9809d040f42f6d3e9af70fff2f3dc6 updates: bz#789278 Signed-off-by: Bhumika Goyal <bgoyal@redhat.com>
* performance/readdir-ahead: keep stats of cached dentries in sync with ↵Krutika Dhananjay2018-08-183-20/+606
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modifications PROBLEM: Stats of dentries that are readdirp'd ahead can become stale due to fops like writes, truncate etc that modify the file pointed by dentries. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: * Store the iatt in context of the inode pointed by dentry. * Whenever the inode pointed by dentry undergoes modification, in cbk of modification fop, update the iatt stored in inode-ctx to reflect the modification. * When serving a readdirp response from application, update iatts of dentries with the iatts stored in the context of inodes pointed by these dentries. * Some fops don't have valid iatts in their responses. For eg., write response whose data is still cached in write-behind will have zeroed out stat. In this case keep only ia_type and ia_gfid and reset rest of the iatt members to zero. - fuse-bridge in this case just sends "entry" information back to kernel and attr is not sent. - gfapi sets entry->inode to NULL and zeroes out the entire stat * There is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, when entry->inode doesn't eqaul to linked_inode, - fuse-bridge is made to send only "entry" information without attributes - gfapi sets entry->inode to NULL and zeroes out the entire stat. Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* performance/md-cache: Use bitwise AND instead of logical ANDVijay Bellur2018-08-161-1/+1
| | | | | | | | Addresses CID: 1394640 Change-Id: I1139222301569d17760df74624acd301594063b9 updates: bz#789278 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* performance/quick-read: handle rollover of generation counterRaghavendra G2018-08-132-36/+108
| | | | | | Change-Id: I37a6e0efda430b70d03dd431c35bef23b3d16361 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* performance/quick-read: don't update with stale data after invalidationRaghavendra G2018-08-042-44/+233
| | | | | | | | | | | | Once invalidated, make sure that only ops incident after invalidation update the cache. This makes sure that ops before invalidation don't repopulate cache with stale data. This patch also uses an internal counter instead of frame->root->unique for keeping track of generations. Change-Id: I6b38b141985283bd54b287775f3ec67b88bf6cb8 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* Revert "performance/readdir-ahead: Invalidate cached dentries if they're ↵Raghavendra G2018-08-033-597/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modified while in cache" This reverts commit 7131de81f72dda0ef685ed60d0887c6e14289b8c. With the latest master, I created a single brick volume and some files inside it. [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying again"; ls -l /mnt/fuse1 umount: /mnt/fuse1: not mounted total 0 ----------. 0 root root 0 Jan 1 1970 file-1 ----------. 0 root root 0 Jan 1 1970 file-2 ----------. 0 root root 0 Jan 1 1970 file-3 ----------. 0 root root 0 Jan 1 1970 file-4 ----------. 0 root root 0 Jan 1 1970 file-5 d---------. 0 root root 0 Jan 1 1970 subdir Trying again total 3 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-1 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-2 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-3 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-4 -rw-r--r--. 1 root root 33 Aug 3 14:06 file-5 d---------. 0 root root 0 Jan 1 1970 subdir [root@rhgs313-6 ~]# Conversation can be followed on gluster-devel on thread with subj: tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected pointed this patch as culprit. Change-Id: I1eb46f6c196f44fde8ce991840a0e724e6f50862 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1390050
* performance/ob: stringent synchronization between rename/unlink and openRaghavendra G2018-08-032-67/+330
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue 1: ======== open all pending fds before resuming rename and unlink currently ob uses fd_lookup to find out the opened-behind. But, fd_lookup gives the recent fd opened on the inode, but the oldest fd(s) (there can be multiple fds opened-behind when the very first opens on an inode are issued in parallel) are the candidates for fds with pending opens on backend. So, this patch explictily tracks the opened-behind fds on an inode and opens them before resuming rename or unlink. similar code changes are also done for setattr and setxattr to make sure pending opens are complete before permission change. This patch also adds a check for an open-in-progress to ob_get_wind_fd. If there is already an open-in-progress, ob_get_wind_fd won't return an anonymous fd as a result. This is done to make sure that rename/unlink/setattr/setxattr don't race with an operation like readv/fstat on an anonymous fd already in progress. Issue 2: ======== once renamed/unlinked, don't open-behind any future opens on the same inode. Issue 3: ======== Don't use anonymous fds by default. Note that rename/unlink can race with a read/fd on anonymous fds and these operations can fail with ESTALE. So, for better consistency in default mode, don't use anonymous fds. If performance is needed with tradeoff of consistency, one can switch on the option "use-anonymous-fd" Change-Id: Iaf130db71ce61ac37269f422e348a45f6ae6e82c Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* performance/open-behind: don't use anonymous fds for reads by defaultRaghavendra G2018-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | anonymous fds interfere with working of read-ahead as read-ahead won't be able to store its cache in fd. Also, as seen in bz 1455872, anonymous fds also affect performance of large file sequential reads as the cost of opening fd for each read on brick stack is significant. So, have a proper fd which enables read-ahead to store its cache and brick stack to reuse the fd during reads. With this change test tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t fails consistently. The failure can also be seen with open-behind off. bz 1611532 has been filed to track the issue with test. Thanks to Rafi <rkavunga@redhat.com> for assistance provided in debugging test failure. Change-Id: Ifa52d8ff017f115e83247f3396b9d27f0295ce3f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1455872
* performance/md-cache: update cache only from fops issued after previous ↵Raghavendra G2018-08-021-98/+238
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | invalidation Invalidations are triggered mainly by two codepaths - upcall and write-behind unwinding a cached write with zeroed out stat. For the case of upcall, following race can happen: * stat s1 is fetched from brick * invalidation is detected on brick * invalidation is propagated to md-cache and cache is invalidated * s1 updates md-cache with a stale state For the case of write-behind, imagine following sequence of operations, * A stat s1 was issued from application thread t1 when size of file was s1 * stat s1 completes on brick stack, but yet to reach md-cache * A write w1 from application thread t2 extends file to size s2 is cached in write-behind and response is unwound with zeroed out stat * md-cache while handling write-cbk, invalidates cache * md-cache receives response for s1, updates cache with stale stat with size of s1 overwriting invalidation state Fix is to remember when s1 was incident on md-cache and update cache with results of s1 only if the it was incident after invalidation of cache. This patch identified some bugs in regression tests which is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1608158. As a stop gap measure I am marking following tests as bad basic/afr/split-brain-resolution.t bugs/bug-1368312.t bugs/replicate/bug-1238398-split-brain-resolution.t bugs/replicate/bug-1417522-block-split-brain-resolution.t bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t Change-Id: Ia4bb9dd36494944e2d91e9e71a79b5a3974a8c77 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* performance/write-behind: synchronize rename with cached writes on srcRaghavendra G2018-08-021-0/+40
| | | | | | | | | | | rename response contains a postbuf stat of src-inode. Since md-cache caches stat in rename codepath too, we've to make sure stat accounts any cached writes in write-behind. So, we make sure rename is resumed only after any cached writes are committed to backend. Change-Id: Ic9f2adf8edd0b58ebaf661f3a8d0ca086bc63111 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* performance/write-behind: better invalidation in readdirpRaghavendra G2018-07-281-23/+28
| | | | | | | | | | | | | | | | | | | | | | | | Current invalidation of stats in wb_readdirp_cbk is prone to races. As the deleted comment explains, <snip> We cannot guarantee integrity of entry->d_stat as there are cached writes. The stat is most likely stale as it doesn't account the cached writes. However, checking for non-empty liability list here is not a fool-proof solution as there can be races like, 1. readdirp is successful on posix 2. sync of cached write is successful on posix 3. write-behind received sync response and removed the request from liability queue 4. readdirp response is processed at write-behind. In the above scenario, stat for the file is sent back in readdirp response but it is stale. </snip> Change-Id: I6ce170985cc6ce3df2382ec038dd5415beefded5 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1512691
* performance/readdir-ahead: Invalidate cached dentries if they're modified ↵Krutika Dhananjay2018-07-283-19/+597
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while in cache PROBLEM: Entries that are readdirp'd ahead can undergo modification in terms of writes, truncates which could modify their iatts. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: Whenever a dentry undergoes modification, in the cbk of the modification fop, a "dirty" flag (default 0) is set in its inode ctx. When it's time for readdir-ahead to serve these entries, it will read the inode ctx and check if the entry is "dirty", and if it is, set the entry's attrs to all zeroes, as an indicator to fuse, md-cache etc not to cache these attributes. Also there is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, fuse-bridge is made to drop entries for which dentry->inode is not the same as linked inode, in readdirp cbk. Change-Id: If7396507632b5268442ca580473d5155fee9cbef BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* core (named threads): flood of -Wformat-truncation warnings with gcc-7.1Kaleb S. KEITHLEY2018-07-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Starting in Fedora 26 which has gcc-7.1.x, -Wformat-trunction is enabled with -Wformat, resulting in a flood of new warnings. This many warnings is a concern because it makes it hard(er) to see other warnings that should be addressed. An example is at https://kojipkgs.fedoraproject.org//packages/glusterfs/3.12.0/1.fc28/data/logs/x86_64/build.log For more info see https://review.gluster.org/#/c/18267/ I can't find much (or good) documentation on the heuristics the compiler uses for this warning. In the case of printing integer types it appears it looks at the available space in the destination and the range of values for the variable and/or its type. To address the specific question about why 0x3ff versus 0xfff to mask the value, either would suffice to hint to the compiler that the printed value will fit in three characters. But the loop is from 0...1023 (or 0...0x3ff if you prefer) so I chose that as a more "accurate" mask to use as it exactly matches the range of values of the loop. Fixes: bz#1492847 Change-Id: I6e309ba42159841131d8241bfc0566ef09e00aa9
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-224-10/+10
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* performance/read-ahead: stricter adherence to force-atime-updateRaghavendra G2018-07-191-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | Throwaway read-ahead cache in fstat only if force-atime-update is set. Note that fstat flushes read-ahead cache only for atime consistency. However if atime consistency is needed user is required to set force-atime-update which updates atime on backend fs even though application reads are served from read-ahead cache. So, if user has not set force-atime-update, atime won't be accurate and there is no point in flushing read-ahead cache in fstats. mounts requiring atime consistency have to mandatorily set force-atime-update. Also note that normally kernel interspers reads with fstat. So, read-ahead is not effective as fstats flush read-ahead-cache. Instead it regresses performance due to wasted network reads. It is recommended to turn off read-ahead if applications require atime consistency. This patch is aimed at applications which don't require atime consistency. Without atime consistency required, read-ahead cache is effective and increases performance of sequential reads. Change-Id: I122bbc410cee96661823f9c4b934383495c18446 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1601166
* md-cache: Do not invalidate cache post set/remove xattrPoornima G2018-07-111-4/+52
| | | | | | | | | | | | | | | Since setxattr and removexattr fops cbk do not carry poststat, the stat cache was being invalidated in setxatr/remoxattr cbk. Hence the further lookup wouldn't be served from cache. To prevent this invalidation, md-cache is modified to get the poststat in set/removexattr_cbk in dict. Co-authored with Xavi Hernandez. Change-Id: I6b946be2d20b807e2578825743c25ba5927a60b4 fixes: bz#1586018 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Poornima G <pgurusid@redhat.com>
* performance/md-cache: Fix issue on lock being used before init.Zhang Huan2018-06-271-1/+2
| | | | | | | | | lock is used in mdc_xattr_list_populate(), but got init after call. Fix this issue by moving initing lock ahead. Change-Id: I94b08303a8ba74b1e9388f700587a00b7ae3fd78 fixes: bz#1595174 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* performance/quick-read: provide an invalidation based on ctimeRaghavendra G2018-06-182-1/+49
| | | | | | | | | | | | | | | | | | | | Quick-read by default uses mtime to identify changes to file data. However there are applications like rsync which explicitly set mtime making it unreliable for the purpose of identifying change in file content. Since ctime also changes when content of a file changes and it cannot be set explicitly, it becomes suitable for identifying staleness of cached data. This option makes quick-read to prefer ctime over mtime to validate its cache. However, using ctime can result in false positives as ctime changes with just attribute changes like permission without changes to file data. So, use this option only when mtime is not reliable. credits to Kotresh Hiremath Ravishankar <khiremat@redhat.com> for suggestion on using ctime instead of mtime. Change-Id: Ib3ae39a3252b2876c8ffe81f471d02a87190e9b9 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1591621