summaryrefslogtreecommitdiffstats
path: root/xlators/mount/fuse/src/fuse-bridge.h
Commit message (Collapse)AuthorAgeFilesLines
* fuse: fix waiting for interrupt handlerCsaba Henk2020-07-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | With 'sync' strategy, a fop's cbk waits for the interrupt handler to finish by making a call to fuse_interrupt_finish_fop() with sync = true. The wait is implemented by monitoring an interrupt_state struct member via a condition variable. However, due to broken code logic, the pthread_cond_wait() call is never reached. This change introduces a new member to the fuse_interrupt_state_t enum (the type of aforementioned struct member), FUSE_INTERRUPT_WAITING_HANDLER, which is then used for indicating the state of waiting for the interrupt handler. Change-Id: I72ab06c37f45ff8f212a6a632bac1f647af05cbd Updates: #1374 Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse - remove unnecessary code blockBarak Sason Rofman2020-06-021-24/+0
| | | | | | | | | Per a "TODO" comment in the code, a code block can be removed as it's no longer required Change-Id: I60e064ece985ff2ea2a686bbd2f0e6cc850899e9 updates: #1000 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* fuse: occasional logging for fuse device 'weird' write errorsCsaba Henk2020-05-211-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change is a followup to I510158843e4b1d482bdc496c2e97b1860dc1ba93. In referred change we pushed log messages about 'weird' write errors to fuse device out of sight, by reporting them at Debug loglevel instead of Error (where 'weird' means errno is not POSIX compliant but having meaningful semantics for FUSE protocol). This solved the issue of spurious error reporting. And so far so good: these messages don't indicate an error condition by themselves. However, when they come in high repetitions, that indicates a suboptimal condition which should be reported.[1] Therefore now we shall emit a Warning if a certain errno occurs a certain number of times[2] as the outcome of a write to the fuse device. ___ [1] typically ENOENTs and ENOTDIRs accumulate when glusterfs' inode invalidation lags behind the kernel's internal inode garbage collection (in this case above errnos mean that the inode which we requested to be invalidated is not found in kernel). This can be mitigated with the invalidate-limit command line / mount option, cf. bz#1732717. [2] 256, as of the current implementation. Change-Id: I8cc7fe104da43a88875f93b0db49d5677cc16045 Updates: #1000 Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse: degrade logging of write failure to fuse deviceCsaba Henk2020-01-091-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: FUSE uses failures of communicating with /dev/fuse with various errnos to indicate in-kernel conditions to userspace. Some of these shouldn't be handled as an application error. Also the standard POSIX errno description should not be shown as they are misleading in this context. Solution: When writing to the fuse device, the caller of the respective convenience routine can mask those errnos which don't qualify to be an error for the application in that context, so then those shall be reported at DEBUG level. The possible non-standard errnos are reported with their POSIX name instead of their description to avoid confusion. (Eg. for ENOENT we don't log "no such file or directory", we log indeed literal "ENOENT".) Change-Id: I510158843e4b1d482bdc496c2e97b1860dc1ba93 updates: bz#1193929 Signed-off-by: Csaba Henk <csaba@redhat.com>
* glusterfs/fuse: Reduce the default lru-limit valueN Balachandran2019-09-241-1/+1
| | | | | | | | | | The current lru-limit value still uses memory for upto 128K inodes. Reduce the default value of lru-limit to 64K. Change-Id: Ica2dd4f8f5fde45cb5180d8f02c3d86114ac52b3 Fixes: bz#1753880 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* fuse: Set limit on invalidate queue sizeN Balachandran2019-08-141-2/+2
| | | | | | | | | | | | | If the glusterfs fuse client process is unable to process the invalidate requests quickly enough, the number of such requests quickly grows large enough to use a significant amount of memory. We are now introducing another option to set an upper limit on these to prevent runaway memory usage. Change-Id: Iddfff1ee2de1466223e6717f7abd4b28ed947788 Fixes: bz#1732717 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* fuse: rate limit reading from fuse device upon receiving EPERMCsaba Henk2019-08-081-0/+2
| | | | | | Fixes: bz#1644322 Change-Id: I53e8fa362cd8c7d04fb1c4abb606a9abb642c592 Signed-off-by: Csaba Henk <csaba@redhat.com>
* mount/fuse: expose auto-invalidation as a mount optionRaghavendra Gowdappa2019-02-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Auto invalidation is necessary when same (meta)data is shared/access across multiple mounts. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, fuse-auto-invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. From glusterfs --help, <snip> --auto-invalidation[=BOOL] controls whether fuse-kernel can auto-invalidate attribute, dentry and page-cache. Disable this only if same files/directories are not accessed across two different mounts concurrently [default: "on"] </snip> Details on how disabling auto-invalidation helped to reduce pgbench init times can be found at [1]. Time taken for pgbench init of scale 8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s) with auto-invalidations turned off along with other optimizations. Just disabling auto-invalidation contributed 56% improvement by reducing the total time taken by 33260s. [1] https://www.spinics.net/lists/gluster-devel/msg25907.html Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1664934
* fuse: add --lru-limit optionAmar Tumballi2018-12-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode LRU mechanism is moot in fuse xlator (ie. there is no limit for the LRU list), as fuse inodes are referenced from kernel context, and thus they can only be dropped on request of the kernel. This might results in a high number of passive inodes which are useless for the glusterfs client, causing a significant memory overhead. This change tries to remedy this by extending the LRU semantics and allowing to set a finite limit on the fuse inode LRU. A brief history of problem: When gluster's inode table was designed, fuse didn't have any 'invalidate' method, which means, userspace application could never ask kernel to send a 'forget()' fop, instead had to wait for kernel to send it based on kernel's parameters. Inode table remembers the number of times kernel has cached the inode based on the 'nlookup' parameter. And 'nlookup' field is not used by no other entry points (like server-protocol, gfapi etc). Hence the inode_table of fuse module always has to have lru-limit as '0', which means no limit. GlusterFS always had to keep all inodes in memory as kernel would have had a reference to it. Again, the reason for this is, kernel's glusterfs inode reference was pointer of 'inode_t' structure in glusterfs. As it is a pointer, we could never free it (to prevent segfault, or memory corruption). Solution: In the inode table, handle the prune case of inodes with 'nlookup' differently, and call a 'invalidator' method, which in this case is fuse_invalidate(), and it sends the request to kernel for getting the forget request. When the kernel sends the forget, it means, it has dropped all the reference to the inode, and it will send the forget with the 'nlookup' parameter too. We just need to make sure to reduce the 'nlookup' value we have when we get forget. That automatically cause the relevant prune to happen. Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B fixes: bz#1560969 Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* copy_file_range support in GlusterFSRaghavendra Bhat2018-12-121-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * libglusterfs changes to add new fop * Fuse changes: - Changes in fuse bridge xlator to receive and send responses * posix changes to perform the op on the backend filesystem * protocol and rpc changes for sending and receiving the fop * gfapi changes for performing the fop * tools: glfs-copy-file-range tool for testing copy_file_range fop - Although, copy_file_range support has been added to the upstream fuse kernel module, no release has been made yet of a kernel which contains the support. It is expected to come in the upcoming release of linux-4.20 So, as of now, executing copy_file_range fop on a fused based filesystem results in fuse kernel module sending read on the source fd and write on the destination fd. Therefore a small gfapi based tool has been written to be able test the copy_file_range fop. This tool is similar (in functionality) to the example program given in copy_file_range man page. So, running regular copy_file_range on a fuse mount point and running gfapi based glfs-copy-file-range tool gives some idea about how fast, the copy_file_range (or reflink) can be. On the local machine this was the result obtained. mount -t glusterfs workstation:new /mnt/glusterfs [root@workstation ~]# cd /mnt/glusterfs/ [root@workstation glusterfs]# ls file [root@workstation glusterfs]# cd [root@workstation ~]# time /tmp/a.out /mnt/glusterfs/file /mnt/glusterfs/new real 0m6.495s user 0m0.000s sys 0m1.439s [root@workstation ~]# time glfs-copy-file-range $(hostname) new /tmp/glfs.log /file /rrr OPEN_SRC: opening /file is success OPEN_DST: opening /rrr is success FSTAT_SRC: fstat on /rrr is success copy_file_range successful real 0m0.309s user 0m0.039s sys 0m0.017s This tool needs following arguments 1) hostname 2) volume name 3) log file path 4) source file path (relative to the gluster volume root) 5) destination file path (relative to the gluster volume root) "glfs-copy-file-range <hostname> <volume> <log file path> <source> <destination>" - Added a testcase as well to run glfs-copy-file-range tool * io-stats changes to capture the fop for profiling * NOTE: - Added conditional check to see whether the copy_file_range syscall is available or not. If not, then return ENOSYS. - Added conditional check for kernel minor version in fuse_kernel.h and fuse-bridge while referring to copy_file_range. And the kernel minor version is kept as it is. i.e. 24. Increment it in future when there is a kernel release which contains the support for copy_file_range fop in fuse kernel module. * The document which contains a writeup on this enhancement can be found at https://docs.google.com/document/d/1BSILbXr_knynNwxSyyu503JoTz5QFM_4suNIh2WwrSc/edit Change-Id: I280069c814dd21ce6ec3be00a884fc24ab692367 updates: #536 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-051-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* fuse: diagnostic FLUSH interruptCsaba Henk2018-11-061-1/+3
| | | | | | | | | | | | | | | | | | | We add dummy interrupt handling for the FLUSH fuse message. It can be enabled by the "--fuse-flush-handle-interrupt" hidden command line option, or "-ofuse-flush-handle-interrupt=yes" mount option. It serves no other than diagnostic & demonstational purposes -- to exercise the interrupt handling framework a bit and to give an usage example. Documentation is also provided that showcases interrupt handling via FLUSH. Change-Id: I522f1e798501d06b74ac3592a5f73c1ab0590c60 updates: #465 Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse: interrupt handling frameworkCsaba Henk2018-11-061-0/+39
| | | | | | | | | | | | | | | | | | | | - add sub-framework to send timed responses to kernel - add interrupt handler queue - implement INTERRUPT fuse_interrupt looks up handlers for interrupted messages in the queue. If found, it invokes the handler function. Else responds with EAGAIN with a delay. See spec at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/fuse.txt?h=v4.17#n148 and explanation in comments. Change-Id: I1a79d3679b31f36e14b4ac8f60b7f2c1ea2badfb updates: #465 Signed-off-by: Csaba Henk <csaba@redhat.com>
* Land clang-format changesGluster Ant2018-09-121-349/+362
| | | | Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
* fuse: add support for kernel writeback cacheCsaba Henk2018-05-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added kernel-writeback-cache command line and xlator option for requesting utilisation of the writeback cache of the kernel in FUSE_INIT (see [1]). - Added attr-times-granularity command line and xlator option via which granularity of the {a,m,c}time in stat (attr) data that we support can be indicated to kernel. This is a means to avoid divergence of the attr times between kernel and userspace that could occur with writeback-cache, while still maintaining maximum time precision the FUSE server is capable of (see [2]). - Handling FATTR_CTIME flag in FUSE_SETATTR that indicates presence of ctime in setattr payload. Currently we cannot associate arbitrary ctimes to files on backend, so we just touch them to update their ctimes to current time. Having ctimes in setattr payload is also a side effect of writeback cache (see [3] and [4]). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d99ff8, "fuse: Turn writeback cache on" [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e27c9d3, "fuse: fuse: add time_gran to INIT_OUT" [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e18bda, "fuse: add .write_inode" [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9e13f, "fuse: allow ctime flushing to userspace" Updates: #435 Change-Id: Id174c8e0c815c4456c35f8c53e41a6a507d91855 Signed-off-by: Csaba Henk <csaba@redhat.com>
* mount/fuse: Add support for multi-threaded fuse readersKrutika Dhananjay2018-04-021-2/+7
| | | | | | | | | | | | | | Usage: Use 'reader-thread-count=<NUM>' as command line option to set the thread count at the time of mounting the volume. Next task is to make these threads auto-scale based on the load, instead of having the user remount the volume everytime to change the thread count. Updates #412 Change-Id: I94aa1505e5ae6a133683d473e0e4e0edd139b76b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* gfproxyd: Let glusterd manage gfproxy daemonPoornima G2017-10-181-0/+3
| | | | | | | Updates: #242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <pgurusid@redhat.com>
* mount/fuse: Make event-history feature configurableKrutika Dhananjay2017-09-241-5/+10
| | | | | | | | | | | | | | | | | | | | | | ... and disable it by default. This is because having it disabled seems to improve performance. This could be due to the lock contention by the different epoll threads on the circular buff lock in the fop cbks just before writing their response to /dev/fuse. Just to provide some data - wrt ovirt-gluster hyperconverged environment, I saw an increase in IOPs by 12K with event-history disabled for randrom read workload. Usage: mount -t glusterfs -o event-history=on $HOSTNAME:$VOLNAME $MOUNTPOINT OR glusterfs --event-history=on --volfile-server=$HOSTNAME --volfile-id=$VOLNAME $MOUNTPOINT Change-Id: Ia533788d309c78688a315dc8cd04d30fad9e9485 BUG: 1467614 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse: implement "-oauto_unmount"Csaba Henk2017-05-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libfuse has an auto_unmount option which, if enabled, ensures that the file system is unmounted at FUSE server termination by running a separate monitor process that performs the unmount when that occurs. (This feature would probably better be called "robust auto-unmount", as FUSE servers usually do try to unmount their file systems upon termination, it's just this mechanism is not crash resilient.) This change implements that option and behavior for glusterfs. Note that "auto unmount" (robust or not) is a leaky abstraction, as the kernel cannot guarantee that at the path where the FUSE fs is mounted is actually the toplevel mount at the time of the umount(2) call, for multiple reasons, among others, see: fuse-devel: "fuse: feasible to distinguish between umount and abort?" http://fuse.996288.n3.nabble.com/fuse-feasible-to-distinguish-between-umount-and-abort-tt14358.html https://github.com/libfuse/libfuse/issues/122 Updates #153 Change-Id: Ia4432580c9fd2c156d9c73c3a44f4bfd42437599 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17230 Tested-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* dict: Don't expose get_new_dict/dict_destroyPranith Kumar K2016-07-251-2/+0
| | | | | | | | | | | | | | | get_new_dict/dict_destroy is causing confusion where, dict_new/dict_destroy or get_new_dict/dict_unref are used instead of dict_new/dict_unref. Change-Id: I4cc69f5b6711d720823395e20fd624a0c6c1168c BUG: 1296043 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13183 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse: Address the review comments in the backportPoornima G2016-03-091-0/+1
| | | | | | | | | | | | | | | | | Backport @ http://review.gluster.org/#/c/13626/3 Fix a typo error, consolidate the selinux and capability check in getxattr and setxattr. Change-Id: I4303de3d4dd00853169b07577311e03cbb912ed7 BUG: 1316327 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13653 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Vijay Bellur <vbellur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: Add a new mount option capabilityPoornima G2016-03-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Originally all security.* xattrs were forbidden if selinux is disabled, which was causing Samba's acl_xattr module to not work, as it would store the NTACL in security.NTACL. To fix this http://review.gluster.org/#/c/12826/ was sent, which forbid only security.selinux. This opened up a getxattr call on security.capability before every write fop and others. Capabilities can be used without selinux, hence if selinux is disabled, security.capability cannot be forbidden. Hence adding a new mount option called capability. Only when "--capability" or "--selinux" mount option is used, security.capability is sent to the brick, else it is forbidden. Change-Id: I77f60e0fb541deaa416159e45c78dd2ae653105e BUG: 1309462 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13540 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: add support for SEEK_HOLE and SEEK_DATA through lseek()Niels de Vos2016-02-101-1/+3
| | | | | | | | | | | | | | | | | | | | The Linux FUSE kernel module has gained support for passing SEEK_HOLE and SEEK_DATA on through lseek(). This can greatly improve performance when working with sparse files. Linux FUSE introduced support for lseek() with version 4.5. The commit in mainline Linux is 0b5da8db145bfd44266ac964a2636a0cf8d7c286. URL: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/14752 Change-Id: I12496d788e59461a3023ddd30e0ea3179007f77e BUG: 1220173 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11474 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* Revert "fuse: resolve complete path after a graph switch"Mohammed Rafi KC2015-11-081-7/+0
| | | | | | | | | | | | | | This reverts commit d0edb6d555d687f76837515207b9408be0bdd55e. The same functionality will be provided in a different patch Change-Id: I3139478b218fa32e803bb088df585fbbdf94af34 BUG: 1272949 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12375 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse: use a queue instead of pipe to communicate with threadRaghavendra G2015-10-261-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | doing inode/entry invalidations. Writing to pipe can block if pipe is full. This can lead to deadlocks in some situations. Consider following situation: 1. Kernel sends a write on an inode. Client is waiting for a response to write from brick. 2. A lookup happens on behalf of different application/thread on the same inode. In response, mdc tries to invalidate the inode. 3. fuse_invalidate_inode is called. It writes a invalidation request to pipe. Another thread which reads from this pipe writes the request to /dev/fuse. The invalidate code in fuse-kernel-module, tries to acquire lock on all pages for the inode and is blocked as a write is in progress on same inode (step 1) 4. Now, poller thread is blocked in invalidate notification and cannot receive any messages from same socket (on which lookup response came). But client is expecting a response for write from same socket (again step1) and we've a deadlock. The deadlock can be solved in two ways: 1. Use a queue (and a conditional variable for notifications) to pass invalidation requests from poller to invalidate thread. This is a variant of using non-blocking pipe, but doesn't have any limit on the amount of data (worst case we run out of memory and error out). 2. Allow events from sockets, immediately after we read one rpc-msg. Currently we disallow events till that rpc-msg is read from socket, processed and handled by higher layers. That way we won't run into these kind of issues. Also, it'll increase parallelism in way of reading from sockets. This patch implements solution 1 above. Change-Id: I8e8199fd7f4da9eab46a719d9292f35c039967e1 BUG: 1273387 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/12402
* fuse: resolve complete path after a graph switchMohammed Rafi KC2015-10-081-0/+7
| | | | | | | | | | | | | | | | | | If a graph switch has happended as part of a attach-tier, then there is a chance to hash fops to newly added brick before fix-layout. This causes on going i/o to fail. This patch will resolve a path, for graph switch by sending recursive lookup to the parent directories. Those lookups will help to heal the directory. Change-Id: Ia2bb4b43a21e5cc6875ba1205628744c3f0ce4e5 BUG: 1263549 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12184 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* fuse: add "resolve-gids" mount option to overcome 32-groups limitNiels de Vos2015-08-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a --resolve-gids commandline option to the glusterfs binary. This option gets set when executing "mount -t glusterfs -o resolve-gids ...". This option is most useful in combination with the "acl" mount option. POSIX ACL permission checking is done on the FUSE-client side to improve performance (in addition to the checking on the bricks). The fuse-bridge reads /proc/$PID/status by default, and this file contains maximum 32 groups. Any local (client-side) permission checking that requires more than the first 32 groups will fail. By enabling the "resolve-gids" option, the fuse-bridge will call getgrouplist() to retrieve all the groups from the user accessing the mountpoint. This is comparable to how "nfs.server-aux-gids" works. Note that when a user belongs to more than ~93 groups, the volume option server.manage-gids needs to be enabled too. Without this option, the RPC-layer will need to reduce the number of groups to make them fit in the RPC-header. Change-Id: I7ede90d0e41bcf55755cced5747fa0fb1699edb2 BUG: 1246275 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11732 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-291-5/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* fuse: squash 64-bit inodes in readdirp when enable-ino32 is setNiels de Vos2015-05-281-0/+3
| | | | | | | | | | | | | | | The structures returned by readdirp contain the inode 2x. Only one of them was squashed into 32-bits when enable-ino32 is enabled. Change-Id: I33a6d28fb118bb23971f918ffeb983d7f033106e BUG: 1223889 Signed-off-by: Niels de Vos <ndevos@redhat.com> Tested-by: Cyril Peponnet <cyril@peponnet.fr> [on release-3.5] Reviewed-on: http://review.gluster.org/10881 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: Fix cores in notify function when this is executed in parallelShyam2015-01-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | The fuse notify function gets called by the epoll or the poll thread and till the point there is a single epoll thread, 2 notify instances would not race with each other. With the upcoming multi thread epoll changes, it is possible that 2 epoll threads invoke the notify function. As a result races in this function are fixed with this commit. The races seen are detailed in the bug, and the fix here is to enforce a (slightly) longer critical section when updating the fuse private structure and reserving state updates post error handling. Change-Id: I6974bc043cb59eb6dc39c5777123364dcefca358 BUG: 1180231 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/9421 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse: Handle fd resolution failuresPranith Kumar K2014-08-251-156/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Even when the fd resolution failed, the fop is continuing on the new graph which may not have valid inode. This lead to NULL layout subvols in dht which lead to crash in fsync after graph migration. Fix: - Remove resolution error handling in FUSE_FOP as it was only added to handle fd migration failures. - check in fuse_resolve_done for fd resolution failures and fail the fop right away. - loc resolution failures are already handled in the corresponding fops. - Return errno from state->resolve.op_errno in resume functions. - Send error to fuse on frame allocation failures. - Removed unused variable state->resolved - Removed unused macro FUSE_FOP_COOKIE Change-Id: I479d6e1ff2ca626ad8c8fcb6f293022149474992 BUG: 1126048 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8402 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* porting: Port for FreeBSD rebased from Mike Ma's effortsHarshavardhana2014-07-021-3/+1
| | | | | | | | | | | | | | | | | | | - Provides a working Gluster Management Daemon, CLI - Provides a working GlusterFS server, GlusterNFS server - Provides a working GlusterFS client - execinfo port from FreeBSD is moved into ./contrib/libexecinfo for ease of portability on NetBSD. (FreeBSD 10 and OSX provide execinfo natively) - More portability cleanups for Darwin, FreeBSD and NetBSD - Provides a new rc script for FreeBSD Change-Id: I8dff336f97479ca5a7f9b8c6b730051c0f8ac46f BUG: 1111774 Original-Author: Mike Ma <mikemandarine@gmail.com> Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/8141 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* protocol/server: do not do root-squashing for trusted clientsRaghavendra Bhat2014-02-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * As of now clients mounting within the storage pool using that machine's ip/hostname are trusted clients (i.e clients local to the glusterd). * Be careful when the request itself comes in as nfsnobody (ex: posix tests). So move the squashing part to protocol/server when it creates a new frame for the request, instead of auth part of rpc layer. * For nfs servers do root-squashing without checking if it is trusted client, as all the nfs servers would be running within the storage pool, hence will be trusted clients for the bricks. * Provide one more option for mounting which actually says root-squash should/should not happen. This value is given priority only for the trusted clients. For non trusted clients, the volume option takes the priority. But for trusted clients if root-squash should not happen, then they have to be mounted with root-squash=no option. (This is done because by default blocking root-squashing for the trusted clients will cause problems for smb and UFO clients for which the requests have to be squashed if the option is enabled). * For geo-replication and defrag clients do not do root-squashing. * Introduce a new option in open-behind for doing read after successful open. Change-Id: I8a8359840313dffc34824f3ea80a9c48375067f0 BUG: 954057 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4863 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: Check the return status from state->resolve_nowv3.5.0qa1Vijaykumar M2013-11-141-7/+14
| | | | | | | | | Change-Id: I85fc6dd393449d365bb908b38c2827b58cb08171 BUG: 1030208 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6262 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* fuse-bridge: enable --fopen-keep-cache based on FUSE_AUTO_INVAL_DATA.Anand Avati2013-09-171-1/+1
| | | | | | | | | | | | | If kernel supports FUSE_AUTO_INVAL_DATA then it is safe(r) to turn on --fopen-keep-cache mode by default. Users report significant improvement in perf by enabling the mode. Change-Id: Icf9df4b7b43950d7e25302d9c2a1a7d14571a9a9 BUG: 990744 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5770 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* Revert "fuse: auxiliary gfid mount support"Amar Tumballi2013-08-211-25/+1
| | | | | | | | | | | | | | | | | This reverts commit 4c0f4c8a89039b1fa1c9c015fb6f273268164c20. Conflicts: xlators/mount/fuse/src/fuse-bridge.c For build issues added CREATE_MODE_KEY definition in: libglusterfs/src/glusterfs.h Change-Id: I8093c2a0b5349b01e1ee6206025edbdbee43055e BUG: 952029 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5495 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* fuse: auxiliary gfid mount supportRaghavendra G2013-07-191-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * files can be accessed directly through their gfid and not just through their paths. For eg., if the gfid of a file is f3142503-c75e-45b1-b92a-463cf4c01f99, that file can be accessed using <gluster-mount>/.gfid/f3142503-c75e-45b1-b92a-463cf4c01f99 .gfid is a virtual directory used to seperate out the namespace for accessing files through gfid. This way, we do not conflict with filenames which can be qualified as uuids. * A new file/directory/symlink can be created with a pre-specified gfid. A setxattr done on parent directory with fuse_auxgfid_newfile_args_t initialized with appropriate fields as value to key "glusterfs.gfid.newfile" results in the entry <parent>/bname whose gfid is set to args.gfid. The contents of the structure should be in network byte order. struct auxfuse_symlink_in { char linkpath[]; /* linkpath is a null terminated string */ } __attribute__ ((__packed__)); struct auxfuse_mknod_in { unsigned int mode; unsigned int rdev; unsigned int umask; } __attribute__ ((__packed__)); struct auxfuse_mkdir_in { unsigned int mode; unsigned int umask; } __attribute__ ((__packed__)); typedef struct { unsigned int uid; unsigned int gid; char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid string * in canonical form. */ unsigned int st_mode; char bname[]; /* bname is a null terminated string */ union { struct auxfuse_mkdir_in mkdir; struct auxfuse_mknod_in mknod; struct auxfuse_symlink_in symlink; } __attribute__ ((__packed__)) args; } __attribute__ ((__packed__)) fuse_auxgfid_newfile_args_t; An initial consumer of this feature would be geo-replication to create files on slave mount with same gfids as that on master. It will also help gsyncd to access files directly through their gfids. gsyncd in its newer version will be consuming a changelog (of master) containing operations on gfids and sync corresponding files to slave. * Also, bring in support to heal gfids with a specific value. fuse-bridge sends across a gfid during a lookup, which storage translators assign to an inode (file/directory etc) if there is no gfid associated it. This patch brings in support to specify that gfid value from an application, instead of relying on random gfid generated by fuse-bridge. gfids can be healed through setxattr interface. setxattr should be done on parent directory. The key used is "glusterfs.gfid.heal" and the value should be the following structure whose contents should be in network byte order. typedef struct { char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid * string in canonical form */ char bname[]; /* a null terminated basename */ } __attribute__((__packed__)) fuse_auxgfid_heal_args_t; This feature can be used for upgrading older geo-rep setups where gfids of files are different on master and slave to newer setups where they should be same. One can delete gfids on slave using setxattr -x and .glusterfs and issue stat on all the files with gfids from master. Thanks to "Amar Tumballi" <amarts@redhat.com> and "Csaba Henk" <csaba@redhat.com> for their inputs. Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ie8ddc0fb3732732315c7ec49eab850c16d905e4e BUG: 952029 Reviewed-on: http://review.gluster.com/#/c/4702 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/4702 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mount/fuse: Provide option to use/not use kernel-readdirpPranith Kumar K2013-07-121-0/+3
| | | | | | | | | | | | | | By default fuse kernel readdirp usage in fuse xlator is off. When mount option use-readdirp=yes is provided it starts using fuse-kernel's readdirp. Change-Id: Id37edc53b1adc1638186d956c2f74c1e4e48aa59 BUG: 983477 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5322 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse-bridge: use READDIRPLUS support when availableAnand Avati2013-02-071-1/+1
| | | | | | | | | | | | | This patch makes use of READDIRPLUS call when support is available in the kernel. Change-Id: Iac78881179567856b55af1f46594a2b2859309f0 BUG: 908128 Signed-off-by: Anand V. Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/3905 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* xlator/fuse: integrate fuse with event-historyRaghavendra Bhat2012-11-121-0/+89
| | | | | | | | | | | | | | | | | use event-history framework for saving and dumping (on necessity) important xlator specific information. Tests: Included the regression testcase. Change-Id: I6c0532e9ffe0b624286cdc4d2637b1bd2c0579e0 BUG: 858215 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: root <root@thinkpad.(none)> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/3925 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: create a new fd during fd-migration.Raghavendra G2012-10-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | Migration of fd to new graph involves creation of a new fd to be used only for calls sent in that graph. Earlier approach of using same fd across all graphs, with the associated inode always guaranteed to be the one valid in currently active graph, had issues because of the broken immutability of the association of fd with an inode (for the life of fd). With this patch, there will be a basefd, which the kernel will be aware of. This basefd, contains a mapping of an fd which is valid in currently active graph. Signed-off-by: Raghavendra G <raghavendra@gluster.com> Change-Id: I2b459f05bc2690a66498be107fad6444e3158138 BUG: 802414 Reviewed-on: http://review.gluster.org/3566 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mount/fuse: readdir() should return 32-bit inodes when 'enable-ino32' is usedNiels de Vos2012-09-181-0/+2
| | | | | | | | | | | | | | | | | | | | The glusterfs mount option 'enable-ino32' does not change the behaviour of readdir(). fuse_readdir_cbk() uses entry->d_ino directly, and this was missed in commit c13823bd16b26bc471d3efb15f63b76fbfdf0309. By adding the function gf_fuse_fill_dirent(), the fuse_dirent structure is filled in a similar way as the fuse_attr structure. This helper uses the same function to squash the 64-bit inode in a 32-bit attribute. Change-Id: Ia20e7144613124a58691e7935cb793b6256aef79 BUG: 850352 URL: http://lists.nongnu.org/archive/html/gluster-devel/2012-09/msg00051.html Tested-by: Steve Bakke <sbakke@netzyn.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/3955 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* All: License message changeVarun Shastry2012-09-131-7/+6
| | | | | | | | | | | | License message changed for server-side, dual license GPLV2 and LGPLv3+. Change-Id: Ia9e53061b9d2df3b3ef3bc9778dceff77db46a09 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3940 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* fuse-bridge: Pass unknown option down to fuseLubomir Rintel2012-09-121-0/+1
| | | | | | | | | | | | | | | | | | In Linux, certain "filesystem-specific" options (passed in string form in last argument to mount(2)), such as "rootcontext" or "context" are in fact common to all filesystems, including fuse. We should pass them down to the actual mount(2) call untouched. This is achieved by adding "fuse-mountopts" option to mount/fuse translator and adjusting the mount helper to propagate it with unrecognized options as they are encountered. BUG: 852754 Change-Id: I309203090c02025334561be235864d8d04e4159b Signed-off-by: Lubomir Rintel <lubo.rintel@gooddata.com> Reviewed-on: http://review.gluster.org/3871 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mount/fuse: add mount-option "enable-ino32" for the native clientNiels de Vos2012-09-061-1/+3
| | | | | | | | | | | | | | By default the GlusterFS-native client uses 64-bit inodes. Some 32-bit applications can not handle these correctly. Introduce a client-side mount option "enable-ino32" which causes the FUSE-client to squash the 64-bit inodes into a 32-bit value. Change-Id: I3296d16528bfb50457b9675f6b8701234ed82ff0 BUG: 850352 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/3885 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* All: License message changeVarun Shastry2012-08-281-16/+7
| | | | | | | | | | | | | | | | | | The license message is changed to Copyright (c) 2008-2012 Red Hat, Inc. <http://www.redhat.com> This file is part of GlusterFS. This file is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. Change-Id: I07d2b63ed5fbbbd1884f1e74f2dd56013d15b0f4 BUG: 852318 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3858 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: make background queue length configurableAmar Tumballi2012-08-221-0/+4
| | | | | | | | | | | | | | | | * also make 'congestion_threshold' an option * make 'congestion_threshold' as 75% of background queue length if not explicitely specified * in glusterfsd.c, moved all the fuse option dictionary setting code to separate function Change-Id: Ie1680eefaed9377720770a09222282321bd4132e Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 845214 Reviewed-on: http://review.gluster.org/3830 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mount/fuse: ignore any erros that might've happened while resolving entry in ↵Raghavendra G2012-08-031-22/+93
| | | | | | | | | | | | | | | | | | | resolver. One error we hit was absence of gfid on backend. While the lookup code-path generates a new uuid and sets it on file, resolver code doesn't do that. Since, functionally (atleast after resolving parent inode, we would be resolving the path in new-graph) both resolver and lookup does same work, it would be no harm in ignoring errors during resolving the entry. This would help us to continue with the _extra_ work (like healing gfid as of now) in fuse_lookup_resume. Change-Id: If46d5e07c32e67b5744287a6ef55d0b0fe347689 BUG: 821138 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.com/3344 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* fuse-bridge: expose negative entry caching of FUSEAnand Avati2012-07-191-0/+1
| | | | | | | | | | | | | | | | Fuse kernel module supports caching negative entries, enabled by specifying a timeout while returning ENOENT to lookup. This patch enables the functionality to be enabled with the command line. Also fixed a typo bug in mount.glusterfs.in. Change-Id: I47eab2834cca9a05887266358afbf504bbb4c489 BUG: 841417 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3696 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* FUSE: ignore setxattr for some keys from gsyncd aux mountVenky Shankar2012-07-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Context ------- gsyncd/geo-rep plans to rely on Rsync to sync extended attributes. When this is in place, all xattrs *visible* on the mount point would be candidate for syncing. This set could include gluster internal xattrs too (as xome xlators do not filter out in their cbks). Syncing these xattrs to the slave could result in unexpected functioning of the slave mount. Soln. ----- For gsyncd auxillary mounts (identified by client_pid -1), we only allow xtime related xattrs to go through and silently ignore (w/o propagating error back to the client) the rest of them. This provides a future proof solution as we need not worry about what xattrs show up on the mounts. Also, 'user' namespace xattrs are always passed through even if it's from a gsyncd aux mount. Signed-off-by: Venky Shankar <vshankar@redhat.com> Change-Id: I6fac5e03d2b25fa4cdece4b2897fb202617b3c23 BUG: 841062 Reviewed-on: http://review.gluster.com/3687 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>