summaryrefslogtreecommitdiffstats
path: root/xlators/mount/fuse
Commit message (Collapse)AuthorAgeFilesLines
* fuse : fix high sev coverity issueSunny Kumar2019-03-211-1/+7
| | | | | | | | | | | | This patch fixed coverity issue in fuse-bridge.c. CID : 1398630 : Resource leak CID : 1399757 : Uninitialized pointer read updates: bz#789278 Change-Id: I69f8591400ee56a5d215eeac443a8e3d7777db27 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* mount/fuse: Fix spelling mistakePranith Kumar K2019-03-151-1/+2
| | | | | | updates bz#1193929 Change-Id: I55ffa8f086ad9570f2526d91c196d7de9ffe6add Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* fuse : fix memory leakSunny Kumar2019-02-251-0/+4
| | | | | | | | | | | | | | | | | | | | This patch fixes memory leak reported by ASan. Tracebacks: ERROR: LeakSanitizer: detected memory leaks Direct leak of 712 byte(s) in 1 object(s) allocated from: #0 0x7f35139dc848 in __interceptor_malloc (/lib64/libasan.so.5+0xef848) #1 0x7f35136efb29 in __gf_malloc ../libglusterfs/src/mem-pool.c:136 #2 0x7f3510591ce9 in fuse_thread_proc ../xlators/mount/fuse/src/fuse-bridge.c:5929 #3 0x7f351336d58d in start_thread (/lib64/libpthread.so.0+0x858d) SUMMARY: AddressSanitizer: 712 byte(s) leaked in 1 allocation(s). updates: bz#1633930 Change-Id: Ie5b4da6b338d8e5fc770c5b2da1238e3462468ac Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* core: implement a global thread poolXavi Hernandez2019-02-182-21/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements a thread pool that is wait-free for adding jobs to the queue and uses a very small locked region to get jobs. This makes it possible to decrease contention drastically. It's based on wfcqueue structure provided by urcu library. It automatically enables more threads when load demands it, and stops them when not needed. There's a maximum number of threads that can be used. This value can be configured. Depending on the workload, the maximum number of threads plays an important role. So it needs to be configured for optimal performance. Currently the thread pool doesn't self adjust the maximum for the workload, so this configuration needs to be changed manually. For this reason, the global thread pool has been made optional, so that volumes can still use the thread pool provided by io-threads. To enable it for bricks, the following option needs to be set: config.global-threading = on This option has no effect if bricks are already running. A restart is required to activate it. It's recommended to also enable the following option when running bricks with the global thread pool: performance.iot-pass-through = on To enable it for a FUSE mount point, the option '--global-threading' must be added to the mount command. To change it, an umount and remount is needed. It's recommended to disable the following option when using global threading on a mount point: performance.client-io-threads = off To enable it for services managed by glusterd, glusterd needs to be started with option '--global-threading'. In this case all daemons, like self-heal, will be using the global thread pool. Currently it can only be enabled for bricks, FUSE mounts and glusterd services. The maximum number of threads for clients and bricks can be configured using the following options: config.client-threads config.brick-threads These options can be applied online and its effect is immediate most of the times. If one of them is set to 0, the maximum number of threads will be calcutated as #cores * 2. Some distributions use a very old userspace-rcu library (version 0.7) for this reason, some header files from version 0.10 have been copied into contrib/userspace-rcu and are used if the detected version is 0.7 or older. An additional change has been made to io-threads to prevent that threads are started when iot-pass-through is set. Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b updates: #532 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* mount/fuse: fix bug related to --auto-invalidation in mount scriptRaghavendra Gowdappa2019-02-091-1/+1
| | | | | | | | | | | | When "auto-invalidation" option was not specified for mount script, glusterfs cmdline ended with "--auto-invalidation=" option. This patch fixes that bug in mount script. Thanks to Amar for reporting it. Change-Id: Ie5cd4c6ffb3ac644d9d2b032035f914a935d05a8 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1664934
* fuse: correctly handle setxattr valuesXavi Hernandez2019-02-071-4/+16
| | | | | | | | | | The setxattr function receives a pointer to raw data, which may not be null-terminated. When this data needs to be interpreted as a string, an explicit null termination needs to be added before using the value. Change-Id: Id110f9b215b22786da5782adec9449ce38d0d563 updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* mount/fuse: expose auto-invalidation as a mount optionRaghavendra Gowdappa2019-02-023-10/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Auto invalidation is necessary when same (meta)data is shared/access across multiple mounts. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, fuse-auto-invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. From glusterfs --help, <snip> --auto-invalidation[=BOOL] controls whether fuse-kernel can auto-invalidate attribute, dentry and page-cache. Disable this only if same files/directories are not accessed across two different mounts concurrently [default: "on"] </snip> Details on how disabling auto-invalidation helped to reduce pgbench init times can be found at [1]. Time taken for pgbench init of scale 8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s) with auto-invalidations turned off along with other optimizations. Just disabling auto-invalidation contributed 56% improvement by reducing the total time taken by 33260s. [1] https://www.spinics.net/lists/gluster-devel/msg25907.html Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1664934
* fix 32-bit-build-smoke warningsIraj Jamali2019-01-111-1/+1
| | | | | | | fixes: bz#1622665 Change-Id: I777d67b1b62c284c62a02277238ad7538eef001e Signed-off-by: Iraj Jamali <ijamali@redhat.com>
* Revert "iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool"Amar Tumballi2019-01-081-1/+2
| | | | | | | | | This reverts commit b87c397091bac6a4a6dec4e45a7671fad4a11770. There seems to be some performance regression with the patch and hence recommended to have it reverted. Updates: #325 Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09
* multiple-files: clang-scan fixesAmar Tumballi2018-12-311-3/+12
| | | | | | updates: bz#1622665 Change-Id: I9f3a75ed9be3d90f37843a140563c356830ef945 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* iobuf: Get rid of pre allocated iobuf_pool and use per thread mem poolPoornima G2018-12-181-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation of iobuf_pool has two problems: - prealloc of 12.5MB memory, this limits the scale factor of the gluster processes due to RAM requirements - lock contention, as the current implementation has one global iobuf_pool lock. Credits for debugging and addressing the same goes to Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410 Hence changing the iobuf implementation to use per thread mem pool. This may theoritically appear to cause perf dip as there is no preallocation. But per thread mem pool will not have significant perf impact as the last allocated memory is kept alive for subsequent allocs, for some time. The worst case would be if iobufs requested are of random sizes each time. The best case is, if we get iobuf request of the same size. From the perf tests, this patch did not seem to cause any perf decrease. Note that, with this patch, the rdma performance is going to degrade drastically. In one of the previous patchsets we had fixes to not degrade rdma perf, but rdma is not supported and also not tested [1]. Hence the decision was to not have code in rdma that is not tested and not supported. [1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html Updates: #325 Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27 Signed-off-by: Poornima G <pgurusid@redhat.com>
* fuse: SETLKW interruptCsaba Henk2018-12-141-0/+130
| | | | | | | | | Use the (f)getxattr based clearlocks interface to interrupt a pending lock request. updates: #465 Change-Id: I4e91a4d8791fc688fed400a02de4c53487e61be2 Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse: add --lru-limit optionAmar Tumballi2018-12-143-51/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode LRU mechanism is moot in fuse xlator (ie. there is no limit for the LRU list), as fuse inodes are referenced from kernel context, and thus they can only be dropped on request of the kernel. This might results in a high number of passive inodes which are useless for the glusterfs client, causing a significant memory overhead. This change tries to remedy this by extending the LRU semantics and allowing to set a finite limit on the fuse inode LRU. A brief history of problem: When gluster's inode table was designed, fuse didn't have any 'invalidate' method, which means, userspace application could never ask kernel to send a 'forget()' fop, instead had to wait for kernel to send it based on kernel's parameters. Inode table remembers the number of times kernel has cached the inode based on the 'nlookup' parameter. And 'nlookup' field is not used by no other entry points (like server-protocol, gfapi etc). Hence the inode_table of fuse module always has to have lru-limit as '0', which means no limit. GlusterFS always had to keep all inodes in memory as kernel would have had a reference to it. Again, the reason for this is, kernel's glusterfs inode reference was pointer of 'inode_t' structure in glusterfs. As it is a pointer, we could never free it (to prevent segfault, or memory corruption). Solution: In the inode table, handle the prune case of inodes with 'nlookup' differently, and call a 'invalidator' method, which in this case is fuse_invalidate(), and it sends the request to kernel for getting the forget request. When the kernel sends the forget, it means, it has dropped all the reference to the inode, and it will send the forget with the 'nlookup' parameter too. We just need to make sure to reduce the 'nlookup' value we have when we get forget. That automatically cause the relevant prune to happen. Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B fixes: bz#1560969 Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* copy_file_range support in GlusterFSRaghavendra Bhat2018-12-122-0/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * libglusterfs changes to add new fop * Fuse changes: - Changes in fuse bridge xlator to receive and send responses * posix changes to perform the op on the backend filesystem * protocol and rpc changes for sending and receiving the fop * gfapi changes for performing the fop * tools: glfs-copy-file-range tool for testing copy_file_range fop - Although, copy_file_range support has been added to the upstream fuse kernel module, no release has been made yet of a kernel which contains the support. It is expected to come in the upcoming release of linux-4.20 So, as of now, executing copy_file_range fop on a fused based filesystem results in fuse kernel module sending read on the source fd and write on the destination fd. Therefore a small gfapi based tool has been written to be able test the copy_file_range fop. This tool is similar (in functionality) to the example program given in copy_file_range man page. So, running regular copy_file_range on a fuse mount point and running gfapi based glfs-copy-file-range tool gives some idea about how fast, the copy_file_range (or reflink) can be. On the local machine this was the result obtained. mount -t glusterfs workstation:new /mnt/glusterfs [root@workstation ~]# cd /mnt/glusterfs/ [root@workstation glusterfs]# ls file [root@workstation glusterfs]# cd [root@workstation ~]# time /tmp/a.out /mnt/glusterfs/file /mnt/glusterfs/new real 0m6.495s user 0m0.000s sys 0m1.439s [root@workstation ~]# time glfs-copy-file-range $(hostname) new /tmp/glfs.log /file /rrr OPEN_SRC: opening /file is success OPEN_DST: opening /rrr is success FSTAT_SRC: fstat on /rrr is success copy_file_range successful real 0m0.309s user 0m0.039s sys 0m0.017s This tool needs following arguments 1) hostname 2) volume name 3) log file path 4) source file path (relative to the gluster volume root) 5) destination file path (relative to the gluster volume root) "glfs-copy-file-range <hostname> <volume> <log file path> <source> <destination>" - Added a testcase as well to run glfs-copy-file-range tool * io-stats changes to capture the fop for profiling * NOTE: - Added conditional check to see whether the copy_file_range syscall is available or not. If not, then return ENOSYS. - Added conditional check for kernel minor version in fuse_kernel.h and fuse-bridge while referring to copy_file_range. And the kernel minor version is kept as it is. i.e. 24. Increment it in future when there is a kernel release which contains the support for copy_file_range fop in fuse kernel module. * The document which contains a writeup on this enhancement can be found at https://docs.google.com/document/d/1BSILbXr_knynNwxSyyu503JoTz5QFM_4suNIh2WwrSc/edit Change-Id: I280069c814dd21ce6ec3be00a884fc24ab692367 updates: #536 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* all: add xlator_api to many translatorsAmar Tumballi2018-12-061-0/+14
| | | | | | Fixes: #164 Change-Id: I93ad6f0232a1dc534df099059f69951e1339086f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-053-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* core: fix strncpy warningsKaleb S. KEITHLE2018-11-152-2/+2
| | | | | | | | | | | | | | | | | | | Since gcc-8.2.x (fedora-28 or so) gcc has been emitting warnings about buggy use of strncpy. e.g. warning: ‘strncpy’ output truncated before terminating nul copying as many bytes from a string as its length and warning: ‘strncpy’ specified bound depends on the length of the source argument Since we're copying string fragments and explicitly null terminating use memcpy to silence the warning Change-Id: I413d84b5f4157f15c99e9af3e154ce594d5bcdc1 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* fuse: diagnostic FLUSH interruptCsaba Henk2018-11-063-2/+69
| | | | | | | | | | | | | | | | | | | We add dummy interrupt handling for the FLUSH fuse message. It can be enabled by the "--fuse-flush-handle-interrupt" hidden command line option, or "-ofuse-flush-handle-interrupt=yes" mount option. It serves no other than diagnostic & demonstational purposes -- to exercise the interrupt handling framework a bit and to give an usage example. Documentation is also provided that showcases interrupt handling via FLUSH. Change-Id: I522f1e798501d06b74ac3592a5f73c1ab0590c60 updates: #465 Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse: interrupt handling frameworkCsaba Henk2018-11-063-1/+512
| | | | | | | | | | | | | | | | | | | | - add sub-framework to send timed responses to kernel - add interrupt handler queue - implement INTERRUPT fuse_interrupt looks up handlers for interrupted messages in the queue. If found, it invokes the handler function. Else responds with EAGAIN with a delay. See spec at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/fuse.txt?h=v4.17#n148 and explanation in comments. Change-Id: I1a79d3679b31f36e14b4ac8f60b7f2c1ea2badfb updates: #465 Signed-off-by: Csaba Henk <csaba@redhat.com>
* all: fix the format string exceptionsAmar Tumballi2018-11-051-3/+3
| | | | | | | | | | | | | | | | Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. While fixing it, makes sense to make the explicit check for such issues across the codebase, by making the format call properly. Fixes: CVE-2018-14661 Fixes: bz#1644763 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* mount.glusterfs: A more explicit check to avoid identical mountsHan Han2018-10-241-1/+1
| | | | | | | | | | | Change check condition from "[[:space:]+]${mount_point}[[:space:]+]fuse" to "[[:space:]+]${mount_point}[[:space:]+]fuse.glusterfs". Fix false postive check result for mount points of other FUSEes, such as "fuse.sshfs". Change-Id: I13898b50a651a8f5ecc3a94d01b3b5de37ec4cbc fixes: bz#1640026 Signed-off-by: Han Han <hhan@redhat.com>
* mount/fuse: return ESTALE instead of ENOENT on all inode based operationsRaghavendra Gowdappa2018-10-201-2/+115
| | | | | | | | | | | | | | | | | | | | This patch is continuation of commit fb4b914ce84bc83a5f418719c5ba7c25689a9251. This patch extends that logic to all inode based operations and not just open(dir). <snip> mount/fuse: never fail open(dir) with ENOENT open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. </snip> Change-Id: I6313f520827e9af725485631cb6a9d9718243bc4 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1627620
* all: fix warnings on non 64-bits architecturesXavi Hernandez2018-10-101-1/+1
| | | | | | | | | | When compiling in other architectures there appear many warnings. Some of them are actual problems that prevent gluster to work correctly on those architectures. Change-Id: Icdc7107a2bc2da662903c51910beddb84bdf03c0 fixes: bz#1632717 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* fuse: prevent error message "can't shift that many"Niels de Vos2018-10-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | On systems where /bin/sh is not Bash, running plain mount.glusterfs gives the unhelpful error "can't shift that many". The argument parsing can be a little improved. Adding a check for the number of arguments, minimal two (Gluster ip:/volume, and mountpoint), but possibly more (-o, -v etc.). With the additional check, running 'mount.glusterfs -h' now shows the following messags: Usage: /sbin/mount.glusterfs <server>:<volume/subdir> <mountpoint> -o<options> Options: man 8 mount.glusterfs To display the version number of the mount helper: /sbin/mount.glusterfs -V Change-Id: I50e3ade0c6217fab4155f35ad8cb35d99d52e133 Fixes: bz#1564890 Reported-by: Alexander Zimmermann <alexander.zimmermann96@gmail.com> Signed-off-by: Niels de Vos <ndevos@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-123-5816/+5624
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* Land clang-format changesGluster Ant2018-09-122-361/+373
| | | | Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
* mount/fuse: convert ENOENT to ESTALE in open(dir)_resumeRaghavendra G2018-09-111-0/+9
| | | | | | | | | | | | | | | | | | | | | | This patch is continuation of commit fb4b914ce84bc83a5f418719c5ba7c25689a9251. <snip> mount/fuse: never fail open(dir) with ENOENT open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. </snip> Earlier commit failed to fix codepath where error response is sent back on gfid resolution failures in fuse_open(dir)_resume. Current patch completes that work Change-Id: Ia07e3cece404811703c8cfbac9b402ca5fe98c1e Signed-off-by: Raghavendra G <rgowdapp@redhat.com> updates: bz#1627620
* Multiple files: calloc -> mallocYaniv Kaul2018-09-042-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlators/storage/posix/src/posix-inode-fd-ops.c: xlators/storage/posix/src/posix-helpers.c: xlators/storage/bd/src/bd.c: xlators/protocol/client/src/client-lk.c: xlators/performance/quick-read/src/quick-read.c: xlators/performance/io-cache/src/page.c xlators/nfs/server/src/nfs3-helpers.c xlators/nfs/server/src/nfs-fops.c xlators/nfs/server/src/mount3udp_svc.c xlators/nfs/server/src/mount3.c xlators/mount/fuse/src/fuse-helpers.c xlators/mount/fuse/src/fuse-bridge.c xlators/mgmt/glusterd/src/glusterd-utils.c xlators/mgmt/glusterd/src/glusterd-syncop.h xlators/mgmt/glusterd/src/glusterd-snapshot.c xlators/mgmt/glusterd/src/glusterd-rpc-ops.c xlators/mgmt/glusterd/src/glusterd-replace-brick.c xlators/mgmt/glusterd/src/glusterd-op-sm.c xlators/mgmt/glusterd/src/glusterd-mgmt.c xlators/meta/src/subvolumes-dir.c xlators/meta/src/graph-dir.c xlators/features/trash/src/trash.c xlators/features/shard/src/shard.h xlators/features/shard/src/shard.c xlators/features/marker/src/marker-quota.c xlators/features/locks/src/common.c xlators/features/leases/src/leases-internal.c xlators/features/gfid-access/src/gfid-access.c xlators/features/cloudsync/src/cloudsync-plugins/src/cloudsyncs3/src/libcloudsyncs3.c xlators/features/bit-rot/src/bitd/bit-rot.c xlators/features/bit-rot/src/bitd/bit-rot-scrub.c bxlators/encryption/crypt/src/metadata.c xlators/encryption/crypt/src/crypt.c xlators/performance/md-cache/src/md-cache.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible It doesn't make sense to calloc (allocate and clear) memory when the code right away fills that memory with data. It may be optimized by the compiler, or have a microscopic performance improvement. In some cases, also changed allocation size to be sizeof some struct or type instead of a pointer - easier to read. In some cases, removed redundant strlen() calls by saving the result into a variable. 1. Only done for the straightforward cases. There's room for improvement. 2. Please review carefully, especially for string allocation, with the terminating NULL string. Only compile-tested! .. and allocate memory as much as needed. xlators/nfs/server/src/mount3.c : Don't blindly allocate PATH_MAX, but strlen() the string and allocate appropriately. Also, align error messges. updates: bz#1193929 Original-Author: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ibda6f33dd180b7f7694f20a12af1e9576fe197f5
* performance/readdir-ahead: keep stats of cached dentries in sync with ↵Krutika Dhananjay2018-08-181-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modifications PROBLEM: Stats of dentries that are readdirp'd ahead can become stale due to fops like writes, truncate etc that modify the file pointed by dentries. When a readdir is finally wound at offset corresponding to these entries, the iatts that are returned to the application come from readdir-ahead's cache, which are stale by now. This problem gets further aggravated when caching translators/modules cache and continue to serve this stale information. FIX: * Store the iatt in context of the inode pointed by dentry. * Whenever the inode pointed by dentry undergoes modification, in cbk of modification fop, update the iatt stored in inode-ctx to reflect the modification. * When serving a readdirp response from application, update iatts of dentries with the iatts stored in the context of inodes pointed by these dentries. * Some fops don't have valid iatts in their responses. For eg., write response whose data is still cached in write-behind will have zeroed out stat. In this case keep only ia_type and ia_gfid and reset rest of the iatt members to zero. - fuse-bridge in this case just sends "entry" information back to kernel and attr is not sent. - gfapi sets entry->inode to NULL and zeroes out the entire stat * There is one tiny race between the entry creation and a readdirp on its parent dir, which could cause the inode-ctx setting and inode ctx reading to happen on two different inode objects. To prevent this, when entry->inode doesn't eqaul to linked_inode, - fuse-bridge is made to send only "entry" information without attributes - gfapi sets entry->inode to NULL and zeroes out the entire stat. Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df BUG: 1390050 Updates: bz#1390050 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-221-1/+1
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* add check if no matching password record was found with getpwuid_r(uid)Vitaly Lipatov2018-07-131-0/+5
| | | | | | Change-Id: Iae712828ee656008faf5fe2bc4e6f96fa12ea4cb fixes: bz#1600687 Signed-off-by: Vitaly Lipatov <lav@etersoft.ru>
* fuse: avoid using the which commandJohn Mulligan2018-06-211-1/+1
| | | | | | | | | | | In mount.glusterfs avoid using the which tool as it may not exist on minimal system installs. Use the "command -v" builtin as it is expected to be more portable. Remove a extra semicolon while we're at it. Change-Id: Ib682ed4955d5bad1beb94b65d10f4c44e9490767 fixes: bz#1593351 Signed-off-by: John Mulligan <jmulligan@redhat.com>
* fuse: make sure the send lookup on root instead of getattr()Amar Tumballi2018-05-071-0/+20
| | | | | | | | | | | | | | | | | This change was done in https://review.gluster.org/16945. While the changes added there were required, it was not necessary to remove the getattr part. As fuse's lookup on root(/) comes as getattr only, this change is very much required. The previous fix for this bug was to add the check for revalidation in lookup when it was sent on root. But I had removed the part where getattr is coming on root. The removing was not requried to fix the issue then. Added back this part of the code, to make sure we have proper validation of root inode in many places like acl, etc. updates: bz#1437780 Change-Id: I859c4ee1a3f407465cbf19f8934530848424ff50 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* mount,fuse: make fuse dumping available as mount optionCsaba Henk2018-05-041-0/+7
| | | | | | Updates: bz#1193929 Change-Id: I4dd4d0e607f89650ebb74b893b911b554472826d Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse: add support for kernel writeback cacheCsaba Henk2018-05-043-4/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added kernel-writeback-cache command line and xlator option for requesting utilisation of the writeback cache of the kernel in FUSE_INIT (see [1]). - Added attr-times-granularity command line and xlator option via which granularity of the {a,m,c}time in stat (attr) data that we support can be indicated to kernel. This is a means to avoid divergence of the attr times between kernel and userspace that could occur with writeback-cache, while still maintaining maximum time precision the FUSE server is capable of (see [2]). - Handling FATTR_CTIME flag in FUSE_SETATTR that indicates presence of ctime in setattr payload. Currently we cannot associate arbitrary ctimes to files on backend, so we just touch them to update their ctimes to current time. Having ctimes in setattr payload is also a side effect of writeback cache (see [3] and [4]). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d99ff8, "fuse: Turn writeback cache on" [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e27c9d3, "fuse: fuse: add time_gran to INIT_OUT" [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e18bda, "fuse: add .write_inode" [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9e13f, "fuse: allow ctime flushing to userspace" Updates: #435 Change-Id: Id174c8e0c815c4456c35f8c53e41a6a507d91855 Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse: do fd_resolve in fuse_getattr if fd is receivedSusant Palai2018-04-181-5/+8
| | | | | | | | | | | | | | | | | | | | problem: With the current code, post graph switch the old fd is received for fuse_getattr and since it is associated with old inode, it does not have the inode ctx across xlators in new graph. Hence, dht errored out saying "no layout" for fstat call. Hence the EINVAL. Solution: if fd is passed, init and resolve fd to carry on getattr test case: - Created a single brick distributed volume - Started untar - Added a new-brick Without this fix, untar used to abort with ERROR. Change-Id: I5805c463fb9a04ba5c24829b768127097ff8b9f9 fixes: bz#1566207 Signed-off-by: Susant Palai <spalai@redhat.com>
* fuse: retire statvfs tweakCsaba Henk2018-04-161-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | fuse xlator used to override the filesystem block size of the storage backend to indicate its preferences. Now we retire this tweak and pass on what we get from the backend. This fixes the anomaly reported in the referred BUG. For more background, see the following email, which was sent out to gluster-devel and gluster-users mailing lists to gauge if anyone sees any use of this tweak: http://lists.gluster.org/pipermail/gluster-devel/2018-March/054660.html http://lists.gluster.org/pipermail/gluster-users/2018-March/033775.html Noone vetoed the removal of it but it got endorsement: http://lists.gluster.org/pipermail/gluster-devel/2018-March/054686.html BUG: 1523219 Change-Id: I3b7111d3037a1b91a288c1589f407b2c48d81bfa Signed-off-by: Csaba Henk <csaba@redhat.com>
* mount/fuse: Set default fuse reader thread count to 1Krutika Dhananjay2018-04-021-1/+1
| | | | | | | Updates #412 Change-Id: Ida53d8b630feabb856a3551fa888f92382ade768 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* mount/fuse: Add support for multi-threaded fuse readersKrutika Dhananjay2018-04-025-83/+168
| | | | | | | | | | | | | | Usage: Use 'reader-thread-count=<NUM>' as command line option to set the thread count at the time of mounting the volume. Next task is to make these threads auto-scale based on the load, instead of having the user remount the volume everytime to change the thread count. Updates #412 Change-Id: I94aa1505e5ae6a133683d473e0e4e0edd139b76b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse: enable proper "fgetattr"-like semanticsCsaba Henk2018-03-061-1/+13
| | | | | | | | | | | | | | | | | | | | | GETATTR FUSE message can carry a file handle reference in which case it serves as a hint for the FUSE server that the stat data is preferably acquired in context of the given filehandle (which we call '"fgetattr"-like semantics'). So far FUSE ignored the GETTATTR provided filehandle and grabbed a file handle heuristically. This caused confusion in the caching layers, which has been tracked down as one of the reasons of referred BUG. As of the BUG, this is just a partial fix. BUG: 1512691 Change-Id: I67eebbf5407ca725ed111fbda4181ead10d03f6d Signed-off-by: Csaba Henk <csaba@redhat.com>
* gfapi: return pre/post attributes from glfs_fsync/fdatasyncKinglong Mee2018-02-121-1/+2
| | | | | | Updates: #389 Change-Id: I4153df72d5eeecefa7579170899db4c340128bea Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* fuse: write out reverse notification to fuse dumpCsaba Henk2018-01-171-30/+59
| | | | | | BUG: 1534602 Change-Id: Ide42cf9cffe462d0cc46272b327c2a05999f09ba Signed-off-by: Csaba Henk <csaba@redhat.com>
* dict: add more types for valuesAmar Tumballi2018-01-052-3/+3
| | | | | | | | | | Added 2 more types which are present in gluster codebase, mainly IATT and UUID. Updates #203 Change-Id: Ib6d6d6aefb88c3494fbf93dcbe08d9979484968f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: fix the call_stack_set_group() functionCsaba Henk2017-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - call_stack_set_group() will take the ownership of passed buffer from caller; - to indicate the change, its signature is changed from including the buffer directly to take a pointer to it; - either the content of the buffer is copied to the groups_small embedded buffer of the call stack, or the buffer is set as groups_large member of the call stack; - the groups member of the call stack is set to, respectively, groups_small or groups_large, according to the memory management conventions of the call stack; - the buffer address is overwritten with junk to effectively prevent the caller from using it further on. Also move call_stack_set_group to stack.c from stack.h to prevent "defined but not used [-Wunused-function]" warnings (not using it anymore in call_stack_alloc_group() implementation, which saved us from this so far). protocol/server: refactor gid_resolve() In gid_resolve there are two cases: either the gid_cache_lookup() call returns a value or not. The result is caputured in the agl variable, and throughout the function, each particular stage of the implementation comes with an agl and a no-agl variant. In most cases this is explicitly indicated via an if (agl) { ... } else { ... } but some of this branching are expressed via goto constructs (obfuscating the fact we stated above, that is, each particular stage having an agl/no-agl variant). In the current refactor, we bring the agl conditional to the top, and present the agl/non-agl implementations sequentially. Also we take the opportunity to clean up and fix the agl case: - remove the spurious gl.gl_list = agl->gl_list; setting, as gl is not used in the agl caae - populate the group list of call stack from agl, fixing thus referred BUG. Also fixes BUG: 1513920 Change-Id: I61f4574ba21969f7661b9ff0c9dce202b874025d BUG: 1513928 Signed-off-by: Csaba Henk <csaba@redhat.com>
* mount/fuse: use fstat in getattr implementation if any opened fd is availableRaghavendra G2017-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The restriction of using fds opened by the same Pid means fds cannot be shared across threads of multithreaded application. Note that fops from kernel have different Pid for different threads. Imagine following sequence of operations: * Turn off performance.open-behind * Thread t1 opens an fd - fd1 - on file "file". Let's assume nodeid of "file" is "nodeid-file". * Thread t2 does RENAME ("newfile", "file"). Let's assume nodeid of "newfile" as "nodeid-newfile". * t2 proceeds to do fstat (fd1) The above set of operations can sometimes result in ESTALE/ENOENT errors. RENAME overwrites "file" with "newfile" changing its nodeid from "nodeid-file" to "nodeid-newfile" and post RENAME, "nodeid-file" is removed from the backend. If fstat carries nodeid-file as argument, which can happen if lookup has not refreshed the nodeid of "file" and since t2 doesn't have an fd opened, fuse_getattr_resume uses STAT which will fail as "nodeid-file" no longer exists. Since the above set of operations and sharing of fds across multiple threads are valid, this is a bug. The fix is to use any fd opened on the inode. In this specific example fuse_getattr_resume will find fd1 and winds down the call as fstat (fd1) which won't fail. Cross-checked with "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for any security issues with this solution and he approves the solution. Thanks to "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for all the pointers and discussions. Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c BUG: 1510401 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* core: make gf_boolean_t a C99 bool instead of an enumJeff Darcy2017-11-031-2/+5
| | | | | | | | | | | | This reduces the space used from four bytes to one, and allows new code to use familiar C99 types/values interoperably with our old cruft. It does *not* change current declarations or code; that will be left for a separate - much larger - patch. Updates: #80 Change-Id: I5baedd17d3fb05b38f0d8b8bb9dd62824475842e Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* gfproxyd: Let glusterd manage gfproxy daemonPoornima G2017-10-183-0/+18
| | | | | | | Updates: #242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <pgurusid@redhat.com>
* mount/fuse: never fail open(dir) with ENOENTRaghavendra G2017-10-171-0/+7
| | | | | | | | | | open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. Change-Id: I8d07d2ebb5a0da6c3ea478317442cb42f1797a4b BUG: 1500269 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* Revert "mount/fuse: report ESTALE as ENOENT"Raghavendra G2017-10-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 26d16b90ec7f8acbe07e56e8fe1baf9c9fa1519e. Consider rename (index.new, store.idx) and open (store.idx) being executed in parallel. When we break down operations following sequence is possible. * lookup (store.idx) - as part of open(store.idx) returns gfid1 as the result. * rename (index.new, store.idx) changes gfid of store.idx to gfid2. Note that gfid2 was the nodeid of index.new. Since rename is successful, gfid2 is associated with store.idx. * open (store.idx) resumes and issues open fop to glusterfs with gfid1. open in glusterfs fails as gfid1 doesn't exist and the error returned by glusterfs to kernel-fuse is ENOENT. * kernel passes back the same error to application as a result to open. This error could've been prevented if kernel retries open with gfid2. Interestingly kernel do retry open when it receives ESTALE error. Even though failure to find gfid resulted in ESTALE error, commit 26d16b90ec7f8acb converted that error to ENOENT while sending an error reply to kernel. This prevented kernel from retrying open resulting in error. Change-Id: I2e752ca60dd8af1b989dd1d29c7b002ee58440b4 BUG: 1500269 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse : Fix parsing of vol_id for snapshot volumeMohammed Rafi KC2017-10-151-2/+4
| | | | | | | | | | | | | | | | For supporting sub-dir mount, we changed the volid. Which means anything after a '/' in volume_id will be considered as sub-dir path. But snapshot volume has vol_id stracture of /snaps/<volname>/<snapname> which has to be considered as during the parsing. Note 1: sub-dir mount is not supported on snapshot volume Note 2: With sub-dir mount changes brick based mount for quota cannot be executed via mount command. It has to be a direct call via glusterfs Change-Id: I0d824de0236b803db8a918f683dabb0cb523cb04 BUG: 1501235 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>