summaryrefslogtreecommitdiffstats
path: root/xlators/mount/fuse/src/fuse-bridge.c
Commit message (Collapse)AuthorAgeFilesLines
* fuse: write out reverse notification to fuse dumpCsaba Henk2018-01-171-30/+59
| | | | | | BUG: 1534602 Change-Id: Ide42cf9cffe462d0cc46272b327c2a05999f09ba Signed-off-by: Csaba Henk <csaba@redhat.com>
* dict: add more types for valuesAmar Tumballi2018-01-051-1/+1
| | | | | | | | | | Added 2 more types which are present in gluster codebase, mainly IATT and UUID. Updates #203 Change-Id: Ib6d6d6aefb88c3494fbf93dcbe08d9979484968f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* mount/fuse: use fstat in getattr implementation if any opened fd is availableRaghavendra G2017-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The restriction of using fds opened by the same Pid means fds cannot be shared across threads of multithreaded application. Note that fops from kernel have different Pid for different threads. Imagine following sequence of operations: * Turn off performance.open-behind * Thread t1 opens an fd - fd1 - on file "file". Let's assume nodeid of "file" is "nodeid-file". * Thread t2 does RENAME ("newfile", "file"). Let's assume nodeid of "newfile" as "nodeid-newfile". * t2 proceeds to do fstat (fd1) The above set of operations can sometimes result in ESTALE/ENOENT errors. RENAME overwrites "file" with "newfile" changing its nodeid from "nodeid-file" to "nodeid-newfile" and post RENAME, "nodeid-file" is removed from the backend. If fstat carries nodeid-file as argument, which can happen if lookup has not refreshed the nodeid of "file" and since t2 doesn't have an fd opened, fuse_getattr_resume uses STAT which will fail as "nodeid-file" no longer exists. Since the above set of operations and sharing of fds across multiple threads are valid, this is a bug. The fix is to use any fd opened on the inode. In this specific example fuse_getattr_resume will find fd1 and winds down the call as fstat (fd1) which won't fail. Cross-checked with "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for any security issues with this solution and he approves the solution. Thanks to "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for all the pointers and discussions. Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c BUG: 1510401 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* core: make gf_boolean_t a C99 bool instead of an enumJeff Darcy2017-11-031-2/+5
| | | | | | | | | | | | This reduces the space used from four bytes to one, and allows new code to use familiar C99 types/values interoperably with our old cruft. It does *not* change current declarations or code; that will be left for a separate - much larger - patch. Updates: #80 Change-Id: I5baedd17d3fb05b38f0d8b8bb9dd62824475842e Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* gfproxyd: Let glusterd manage gfproxy daemonPoornima G2017-10-181-0/+8
| | | | | | | Updates: #242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <pgurusid@redhat.com>
* mount/fuse: never fail open(dir) with ENOENTRaghavendra G2017-10-171-0/+7
| | | | | | | | | | open(dir) being an operation on inode should never fail with ENOENT. If gfid is not present, the appropriate error is ESTALE. This will enable kernel to retry open after a revalidate lookup. Change-Id: I8d07d2ebb5a0da6c3ea478317442cb42f1797a4b BUG: 1500269 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* Revert "mount/fuse: report ESTALE as ENOENT"Raghavendra G2017-10-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 26d16b90ec7f8acbe07e56e8fe1baf9c9fa1519e. Consider rename (index.new, store.idx) and open (store.idx) being executed in parallel. When we break down operations following sequence is possible. * lookup (store.idx) - as part of open(store.idx) returns gfid1 as the result. * rename (index.new, store.idx) changes gfid of store.idx to gfid2. Note that gfid2 was the nodeid of index.new. Since rename is successful, gfid2 is associated with store.idx. * open (store.idx) resumes and issues open fop to glusterfs with gfid1. open in glusterfs fails as gfid1 doesn't exist and the error returned by glusterfs to kernel-fuse is ENOENT. * kernel passes back the same error to application as a result to open. This error could've been prevented if kernel retries open with gfid2. Interestingly kernel do retry open when it receives ESTALE error. Even though failure to find gfid resulted in ESTALE error, commit 26d16b90ec7f8acb converted that error to ENOENT while sending an error reply to kernel. This prevented kernel from retrying open resulting in error. Change-Id: I2e752ca60dd8af1b989dd1d29c7b002ee58440b4 BUG: 1500269 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse: Make event-history feature configurableKrutika Dhananjay2017-09-241-9/+25
| | | | | | | | | | | | | | | | | | | | | | ... and disable it by default. This is because having it disabled seems to improve performance. This could be due to the lock contention by the different epoll threads on the circular buff lock in the fop cbks just before writing their response to /dev/fuse. Just to provide some data - wrt ovirt-gluster hyperconverged environment, I saw an increase in IOPs by 12K with event-history disabled for randrom read workload. Usage: mount -t glusterfs -o event-history=on $HOSTNAME:$VOLNAME $MOUNTPOINT OR glusterfs --event-history=on --volfile-server=$HOSTNAME --volfile-id=$VOLNAME $MOUNTPOINT Change-Id: Ia533788d309c78688a315dc8cd04d30fad9e9485 BUG: 1467614 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse/readdirp: Remove need_lookup from fuse_readdirp_cbkSusant Palai2017-09-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | background: Various xlators used to populate their ctx, on an explicit lookup. That means without a lookup, the translator will have either null or stale data to function. E.g. dht would depend on lookup to create linkto files on the correct node/hashed subvol, afr would rely on this lookup to heal pending data/metadata etc. So to complete above actions a lookup used to be issued on files, even their inode was populated on a readdirp_cbk. This was done by setting the need_lookup flag on all the files those were read on readdirp fop. We tried a small test on "ACL client". For listing 50k files on root itself, it took around 50seconds with readdirp enabled while the same operation took 5-6 seconds with readdirp disabled. Both the times md-cache was enabled. We observed that on the 1st test case (readdirp enabled), post readdirp a getxattr is done. The number of getxattr depends on the number of acl xattrs (I saw requests on these two: system.posix_acl_default, system.posix_acl_access). Since need_lookup flag is set, during fuse_resolve a nameless lookup is executed on the inode(getxattr being inode operation, hence the nameless lookup). Since md-cache does not serve nameless lookup, a network hop is needed for each file, costing the time. With readdirp disabled, the getxattrs are served from md-cache itself(note: we are discussing the 2nd attempt of ls -l use case). _Current affairs around need of lookup for a file to populate it's ctx_: For the xlators on client stack we discussed quite extensively about the need for a lookup fop post readdirp in all three cluster translators - afr, EC and dht. EC and dht don't really need a nameless lookup post readdirp. For afr too, the need for lookup was negated with patch (http://review.gluster.org/6010 - AFRV2), where afr added a function called afr_inode_refresh() which does a lookup and populates its inode context in case a FOP came to AFR without a lookup being issued prior to it. We ran a thread on gluster-devel asking for feedback on the need of explicit lookup post readdirp. For responses refer [1]. Refer [2] for discussions happened on gerrit. After gathering inputs from [1] and [2], it looks like there is no xlator in current state that requires an explicit lookup post readdirp to function properly. * A separate similar patch will be sent for gfapi/nfs/nfs-ganesha. Note: Only file's inode is built with readdirp. [1] http://lists.gluster.org/pipermail/gluster-devel/2017-August/053505.html [2] https://review.gluster.org/#/c/17985/ Change-Id: Ie1d68ce7bea5e1f8a1fab9a62217f478322554f5 BUG: 1492996 Signed-off-by: Susant Palai <spalai@redhat.com>
* mount/fuse: Include sub-directory in source argument for mount()Vijay Bellur2017-09-071-1/+7
| | | | | | | | | | | | | | | With this, mount of a sub-directory 'foo' gets listed in /proc/mounts as: <hostname>:<volname>/foo on /mnt/glusterfs type fuse.glusterfs (rw,relatime...) Signed-off-by: Vijay Bellur <vbellur@redhat.com> BUG: 1488913 Change-Id: Ib1e1ac3741bf66e1a912d792f2948b748931f2b0 Reviewed-on: https://review.gluster.org/18210 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: Name threads on creationRaghavendra Talur2017-07-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set names to threads on creation for easier debugging. Output of top -H -p <PID-OF-GLUSTERFSD> Before: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd After: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703 BUG: 1254002 Updates: #271 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: https://review.gluster.org/11926 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* fuse: memory leak fixesDanny Couture2017-07-141-38/+30
| | | | | | | | | | | | | | | | | | | Fix fuse ctx memory leak in case an error occurs and the cleanup path is different than usual. Also fix a memory leak in logging if eh_save_history() fails. Change-Id: I7ec967c807b0ed91184e5b958be70702215c46c9 BUG: 1470220 Signed-off-by: Danny Couture <couture.danny@gmail.com> Reviewed-on: https://review.gluster.org/17759 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse-bridge: cleanup first_lookup()Amar Tumballi2017-05-301-68/+13
| | | | | | | | | | | | | | | use syncop_lookup instead of synchronising stack_wind/unwind again. Updates #175 Change-Id: Iad4a181d8601235a999039979bfb7ec688675520 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/17075 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* fuse: implement "-oauto_unmount"Csaba Henk2017-05-231-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libfuse has an auto_unmount option which, if enabled, ensures that the file system is unmounted at FUSE server termination by running a separate monitor process that performs the unmount when that occurs. (This feature would probably better be called "robust auto-unmount", as FUSE servers usually do try to unmount their file systems upon termination, it's just this mechanism is not crash resilient.) This change implements that option and behavior for glusterfs. Note that "auto unmount" (robust or not) is a leaky abstraction, as the kernel cannot guarantee that at the path where the FUSE fs is mounted is actually the toplevel mount at the time of the umount(2) call, for multiple reasons, among others, see: fuse-devel: "fuse: feasible to distinguish between umount and abort?" http://fuse.996288.n3.nabble.com/fuse-feasible-to-distinguish-between-umount-and-abort-tt14358.html https://github.com/libfuse/libfuse/issues/122 Updates #153 Change-Id: Ia4432580c9fd2c156d9c73c3a44f4bfd42437599 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/17230 Tested-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* mount/fuse: Handle racing notify on more than one graph properlyRaghavendra G2017-05-101-2/+6
| | | | | | | | | | | | | | | Make sure that we always use latest graph as a candidate for active-subvol. Change-Id: Ie37c818366f28ba6b1570d65a9eb17697d38a6c5 BUG: 1448364 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/17200 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* fuse: enhance fusedump to include timestamp and a signaturev3.12devCsaba Henk2017-04-301-12/+67
| | | | | | | | | | | | | | (Also referred to as "fusedump v2".) Change-Id: I837944024efd1b9055c2f5f91bd5723ef350e688 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/16422 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse: Replace GF_LOG_OCCASIONALLY with gf_log() to report fop failure ↵Krutika Dhananjay2017-04-301-5/+2
| | | | | | | | | | | | | at all times Change-Id: Ibd8e1c6172812951092ff6097ba4bed943051b7c BUG: 1440051 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17086 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* fuse: clean up mount flag processingCsaba Henk2017-04-271-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In general, when one invokes a mount helper program -- basically anything that mounts something based on its command line, so thinking of mount(8), mount.<fs-type> or fusermount, but also of FUSE servers in general, including glusterfs -- the command line arguments that are to affect mount(2) are mapped to a bitmask called the mount flags, which is passed to mount(2), so that the kernel can interpret the flag bits and adjusts properties of the mount accordingly. There is a traditional syntax for this mechanism as implemented in mount(8): one passes "-ocomma,separated,mount,options" and the individual option name strings are mapped to flag bits in mount(8). FUSE further explores this idea and typically the FUSE server command lines allow further option names to be used in the "-ooption,name,list" which are then separated from the kernel sanctioned option names (to which we'll refer as "system mount options") and are passed to a platform specific lower level fuse mount helper interface. The separation of system mount option names and FUSE specific option names is also platform specific, so the general mount interface function, which in case of glusterfs is gf_fuse_mount(), should abstract this away. Therefore we change the signature of this function from int gf_fuse_mount (const char *mountpoint, char *fsname, unsigned long mountflags, char *mnt_param, pid_t *mtab_pid, int status_fd); to int gf_fuse_mount (const char *mountpoint, char *fsname, char *mnt_param, pid_t *mtab_pid, int status_fd); and deal with flag extraction in platform specific mount code. Note that the sole purpose of the mountflags argument was to indicate read-only mounting. The other system mount option names were expected to reside in the comma-separated mnt_param string, but they were not properly processed (see the referred BUG). With the new gf_fuse_mount signature read-only mounting is to be indicated as a "ro" component in mnt_param. - For Darwin, which has a dedicated, separate gf_fuse_mount implementation, gf_fuse_mount was ignoring mountflags, so only the signature had to to be adjusted. However, as bonus, we gain read-only support for Darwin, which was missing so far, given that it was indicated via the ignored mountflags. Darwin's low level mount helper relies on the "ro" component of the option string, which agrees with the new calling convention of gf_fuse_mount. - On Linux, system mount option name handling (apart from the distinguished read-only option) used to have the inadvertent side effect of adding "nosuid,nodev" as indicated in BUG; since Ia89d975d1e27fcfa5ab2036ba546aa8fa0d2d1b0 this side effect is removed, but system mount option name handling was left broken (passing system mount options other than "ro" fails to mount). - On other platforms, system mount option name handling is broken (expect for the distinguished read-only option). As of this change, in the general (non-Darwin) implementation of gf_fuse_mount we take care of proper separation of system mount names and their conversion to mount flags. For Linux, we adopt the conversion table from FUSE upstream. For other systems we just provide a best effort to support those system mount options which are understood across all Unices (nosuid,nodev,noatime,noexec,ro). (This can be improved later to provide proper plaform support.) BUG: 1297182 Change-Id: I5d10b5df46feba7a02bf5bf1018db69e6b52260a Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: https://review.gluster.org/16313 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com>
* fuse-bridge: fuse_getattr(): send root lookup() only in case of failure.Amar Tumballi2017-04-071-21/+37
| | | | | | | | | | | | | | this will make sure we don't fail in any current cases, and also will enable the performance better. Change-Id: Ia421e1913e1b00f0730a004bf7c84bf7e2a62636 BUG: 1437780 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/16945 Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-0/+10
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: fix memory leak in setxattrXavier Hernandez2017-01-121-27/+21
| | | | | | | | | | | | | | | | | | | If there's some failed check in setxattr of mount/fuse before actually starting the operation, a fuse_state_t structure is leaked. This fix correctly releases allocated resources in case of error. Change-Id: I8b1cda67a613c13b6bc38947352e2ccfccf96a1d BUG: 1412174 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/16380 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* mount/fuse: Fix the place where graph-switch event is loggedKrutika Dhananjay2017-01-041-3/+4
| | | | | | | | | | Change-Id: I3c8577b87db02a2a6ce6159e7d04cf58a2bda0c1 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16302 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: fix fuse dumping for FUSE_WRITECsaba Henk2016-09-221-1/+6
| | | | | | | | | | | | | | | | | Data coming with FUSE_WRITE requests are arranged with a special alignment, cf. 15d85ff1. fuse_dumper() was not aware of this and didn't dump the proper reqion for FUSE_WRITE. BUG: 1377427 Signed-off-by: Csaba Henk <csaba@redhat.com> Change-Id: I36255ca3336e95be6e2d256c8199761ddec41869 Reviewed-on: http://review.gluster.org/15525 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* dict: Don't expose get_new_dict/dict_destroyPranith Kumar K2016-07-251-2/+1
| | | | | | | | | | | | | | | get_new_dict/dict_destroy is causing confusion where, dict_new/dict_destroy or get_new_dict/dict_unref are used instead of dict_new/dict_unref. Change-Id: I4cc69f5b6711d720823395e20fd624a0c6c1168c BUG: 1296043 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13183 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse: unref dict even if fuse_first_lookup failsKremmyda, Olia (NSN - GR/Athens)2016-05-231-2/+1
| | | | | | | | | | | | | | | | | In fuse_first_lookup function, "dict_unref (dict)" should be included in the out label, in case create_frame returns an empty pointer the dict to be unreferenced as well. Bug: 1338544 Change-Id: Ifb8a3378aec6521c1aa848f818968b6bfdb72089 Signed-off-by: Olia Kremmyda <olympia.kremmyda@nokia.com> Reviewed-on: http://review.gluster.org/14464 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* mount/fuse: Log gfid and fd ptr as well when writev/readv failKrutika Dhananjay2016-05-121-3/+9
| | | | | | | | | | | | Change-Id: Iaed3850171155f2452fc29ebecd350c2da0b55cb BUG: 1335091 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14291 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse: report ESTALE as ENOENTRaghavendra G2016-04-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the inode/gfid is missing, brick report back as an ESTALE error. However, most of the applications don't accept ESTALE as an error for a file-system object missing, changing their behaviour. For eg., rm -rf ignores ENOENT errors during unlink of files/directories. But with ESTALE error it doesn't send rmdir on a directory if unlink had failed with ESTALE for any of the files or directories within it. Thanks to Ravishankar N <ravishankar@redhat.com>, here is a link as to why we split up ENOENT into ESTALE and ENOENT. http://review.gluster.org/#/c/6318/ Change-Id: I467df0fdf22734a8ef20c79ac52606410fad04d1 BUG: 1245065 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/13816 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* dht: add "nuke" functionality for efficient server-side deletionJeff Darcy2016-04-071-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This turns a special xattr into an rmdir with flags set. When that hits the posix translator on the server side, that causes the file/directory to be moved into the special "landfill" directory. From there, the posix janitor thread will take care of deleting it entirely on the server side - traversing it recursively if necessary. A couple of secondary issues were fixed to make this effective. * FUSE now ensures that setxattr values are NUL terminated. * The janitor thread now gets woken up immediately when something is placed in 'landfill' instead of only when file descriptors need to be closed. * The default landfill-emptying interval was reduced to 10s. To use the feature, issue a setxattr something like this: setfattr -n glusterfs.dht.nuke -v "" /mnt/glusterfs/vol/some_dir The value doesn't actually matter; the mere receipt of a request with this key is sufficient. Some day it might be useful to allow setting a required value as a sort of password, so that only those who know it can access the underlying special functionality. Change-Id: I8a343c2cdb40a76d5a06c707191fb67babb8514f Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/13878 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: Address the review comments in the backportPoornima G2016-03-091-26/+10
| | | | | | | | | | | | | | | | | Backport @ http://review.gluster.org/#/c/13626/3 Fix a typo error, consolidate the selinux and capability check in getxattr and setxattr. Change-Id: I4303de3d4dd00853169b07577311e03cbb912ed7 BUG: 1316327 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13653 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Vijay Bellur <vbellur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: Add a new mount option capabilityPoornima G2016-03-071-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | Originally all security.* xattrs were forbidden if selinux is disabled, which was causing Samba's acl_xattr module to not work, as it would store the NTACL in security.NTACL. To fix this http://review.gluster.org/#/c/12826/ was sent, which forbid only security.selinux. This opened up a getxattr call on security.capability before every write fop and others. Capabilities can be used without selinux, hence if selinux is disabled, security.capability cannot be forbidden. Hence adding a new mount option called capability. Only when "--capability" or "--selinux" mount option is used, security.capability is sent to the brick, else it is forbidden. Change-Id: I77f60e0fb541deaa416159e45c78dd2ae653105e BUG: 1309462 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13540 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: add support for SEEK_HOLE and SEEK_DATA through lseek()Niels de Vos2016-02-101-0/+64
| | | | | | | | | | | | | | | | | | | | The Linux FUSE kernel module has gained support for passing SEEK_HOLE and SEEK_DATA on through lseek(). This can greatly improve performance when working with sparse files. Linux FUSE introduced support for lseek() with version 4.5. The commit in mainline Linux is 0b5da8db145bfd44266ac964a2636a0cf8d7c286. URL: http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/14752 Change-Id: I12496d788e59461a3023ddd30e0ea3179007f77e BUG: 1220173 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11474 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* Fuse: Add a check for NULL in fuse_itable_dumpAshish Pandey2016-02-081-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Immediately after starting a disperse volume (2+1) kill one brick and just after that try to mount it through fuse. This lead to crash. Our test scripts use process statedumps to determine various things like whether they are up, connected to bricks etc. It takes some time for an active_subvol to be be associated with fuse even after mount process is daemonized. This time is normally a function of completion of handshake with bricks. So, if we try to take statedump in this time window, fuse wouldn't have an active_subvol associated with it leading to this crash. This happened while executing ec-notify.t, which contains above steps. Solution: Check priv and priv->active_subvol for NULL before inode_table_dump. If priv->active_subvol is null its perfectly fine to skip dumping of inode table as inode table is associated with an active_subvol. A Null active_subvol indicates initialization in progress and fuse wouldn't even have started reading requests from /dev/fuse and hence there wouldn't be any inodes or file system activity. Change-Id: I323a154789edf8182dbd1ac5ec7ae07bf59b2060 BUG: 1299410 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13253 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: update fuse_kernel.h to version 23Ravishankar N2016-02-061-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following changes were made upstream: - add FUSE_WRITEBACK_CACHE - add time_gran to fuse_init_out - add reserved space to fuse_init_out - add FATTR_CTIME - add ctime and ctimensec to fuse_setattr_in - add FUSE_RENAME2 request - add FUSE_NO_OPEN_SUPPORT flag Including these changes will make it easier to backport support for lseek(). Because the fuse_init_out structure changed its size, older versions of FUSE would fail initializing. When an older version of FUSE is detected, the fuse_init_out structure is reduced to the previous size. This is harmless, as the attributes that are not passed, are not used for earlier versions anyway. BUG: 1220173 Change-Id: I58c74e161638b2d4ce12fc91a206fdc1b96de14d Signed-off-by: Ravishankar N <ravishankar@redhat.com> [ndevos: splitted from http://review.gluster.org/11474 old version fuse_init_out size correction] Reviewed-on: http://review.gluster.org/11537 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* fuse: fix inode and dentry leaksXavier Hernandez2016-02-031-19/+27
| | | | | | | | | | | | | | | | | | When a readdirp was executed, the nlookup count for each inode of the returned entries was incremented. However the kernel does not increment the counter for '.' and '..' entries. This caused kernel to send forgets with a counter smaller than the inode's current value. This prevented these inodes to be retired when ref count was 0. Change-Id: I31901af36ab7b4cdc3e6fa2f30a0263a1a2daef8 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/13327 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* fuse: use-after-free fix in fuse-bridge, revisitedKaleb S KEITHLEY2016-02-021-14/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prompted by the email exchange in gluster-devel between Oleksandr Natalenko, xavi, and soumyak, I looked at this because the fuse client on the longevity cluster has also been suffering from a serious memory leak for some time. (longevity cluster is currently running 3.7.6) The longevity cluster manifests the same kernel notifier loop terminated log message the Oleksandr sees, and some sample runs suggest that the length passed to the (sys_)write call is unexpectedly and abnormally large. Basically this fix a) uses correct types for len and rv, b) copies the len from potentially incorrectly aligned memory (in a way that should minimize potential performance issues related to accessing unaligned memory.) c) changes log level of the kernel notifier loop terminated message d) fixes a potential mutex lock/unlock issue Change-Id: Icedb3525706f59803878bb37ef6b4ffe4a986880 BUG: 1288857 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13274 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: send lookup if inode_ctx is not setMohammed Rafi KC2016-01-131-4/+13
| | | | | | | | | | | | | | | During resolving of an entry or inode, if inode ctx was not set, we will send a lookup. This patch also make sure that inode_ctx will be created after every inode_link Change-Id: I4211533ca96a51b89d9f010fc57133470e52dc11 BUG: 1297311 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13225 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* performance/write-behind: retry "failed syncs to backend"Raghavendra G2015-12-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. When sync fails, the cached-write is still preserved unless there is a flush/fsync waiting on it. 2. When a sync fails and there is a flush/fsync waiting on the cached-write, the cache is thrown away and no further retries will be made. In other words flush/fsync act as barriers for all the previous writes. The behaviour of fsync acting as a barrier is controlled by an option (see below for details). All previous writes are either successfully synced to backend or forgotten in case of an error. Without such barrier fop (especially flush which is issued prior to a close), we end up retrying for ever even after fd is closed. 3. If a fop is waiting on cached-write and syncing to backend fails, the waiting fop is failed. 4. sync failures when no fop is waiting are ignored and are not propagated to application. For eg., a. first attempt of sync of a cached-write w1 fails b. second attempt of sync of w1 succeeds If there are no fops dependent on w1 are issued b/w a and b, application won't know about failure encountered in a. 5. The effect of repeated sync failures is that, there will be no cache for future writes and they cannot be written behind. fsync as a barrier and resync of cached writes post fsync failure: ================================================================== Whether to keep retrying failed syncs post fsync is controlled by an option "resync-failed-syncs-after-fsync". By default, this option is set to "off". If sync of "cached-writes issued before fsync" (to backend) fails, this option configures whether to retry syncing them after fsync or forget them. If set to on, cached-writes are retried till a "flush" fop (or a successful sync) on sync failures. fsync itself is failed irrespective of the value of this option, when there is a sync failure of any cached-writes issued before fsync. Change-Id: I6097c9257bfb9ee5b15616fbe6a0576ae9af369a Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 1279730 Reviewed-on: http://review.gluster.org/12594
* fuse: forbid only access to security.selinux xattr if not mounted with 'selinux'Michael Adam2015-12-101-2/+2
| | | | | | | | | | | | | | | | | Originally, all selinux.* xattrs were forbidden, causing for example Samba's acl_xattr module which uses security.NTACL to fail without the 'selinux' mount option, which is confusing at least. This change specializes the check to the security.selinux attribute, so other selinux.* attributes work with or without the option. Change-Id: I9d3083123efbf403f20572cfb325a300ce2e90d9 BUG: 1283103 Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-on: http://review.gluster.org/12826 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* mount/fuse: Fix use-after-free crashPranith Kumar K2015-12-061-3/+6
| | | | | | | | | | | | fouh->len is accessed after 'node' is freed. Also 'rv' is int where as fouh->len is uint32, changed comparison to ssize_t variables. BUG: 1288857 Change-Id: Ied43d29e1e52719f9b52fe839cee31ce65711eea Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12886 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* geo-rep: Don't log geo-rep safe errors in mount logsKotresh HR2015-11-181-6/+8
| | | | | | | | | | | | | | | | | | | | | ENOENT is a safe error for geo-replication in case of rm -rf. RMDIR is recorded in changelog of each brick, geo-rep processes all changelogs among which one will succeed and rest will get ENOENT which can be ignored. Similarly ENOENT can also be ignored in case of all unlink operation during changelog replay that can happen when worker goes down and comes back. Change-Id: I6756f8f4c3fce7a159751a2bfce891ff16ad31a4 BUG: 1250009 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11833 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* Revert "fuse: resolve complete path after a graph switch"Mohammed Rafi KC2015-11-081-41/+8
| | | | | | | | | | | | | | This reverts commit d0edb6d555d687f76837515207b9408be0bdd55e. The same functionality will be provided in a different patch Change-Id: I3139478b218fa32e803bb088df585fbbdf94af34 BUG: 1272949 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12375 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: Avoid redundant lookup on "." and ".."Susant Palai2015-10-291-1/+3
| | | | | | | | | | credit: R. Gowdappa Change-Id: I3bc1534e499f2eccd114db69a29c0b2ce82775db BUG: 1273315 Reviewed-on: http://review.gluster.org/12374 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* core: use syscall wrappers instead of direct syscalls - miscellaneousKaleb S. KEITHLEY2015-10-281-18/+19
| | | | | | | | | | | | | | | various xlators and other components are invoking system calls directly instead of using the libglusterfs/syscall.[ch] wrappers. If not using the system call wrappers there should be a comment in the source explaining why the wrapper isn't used. Change-Id: I1f47820534c890a00b452fa61f7438eb2b3f667c BUG: 1267967 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12276 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* mount/fuse: use a queue instead of pipe to communicate with threadRaghavendra G2015-10-261-56/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | doing inode/entry invalidations. Writing to pipe can block if pipe is full. This can lead to deadlocks in some situations. Consider following situation: 1. Kernel sends a write on an inode. Client is waiting for a response to write from brick. 2. A lookup happens on behalf of different application/thread on the same inode. In response, mdc tries to invalidate the inode. 3. fuse_invalidate_inode is called. It writes a invalidation request to pipe. Another thread which reads from this pipe writes the request to /dev/fuse. The invalidate code in fuse-kernel-module, tries to acquire lock on all pages for the inode and is blocked as a write is in progress on same inode (step 1) 4. Now, poller thread is blocked in invalidate notification and cannot receive any messages from same socket (on which lookup response came). But client is expecting a response for write from same socket (again step1) and we've a deadlock. The deadlock can be solved in two ways: 1. Use a queue (and a conditional variable for notifications) to pass invalidation requests from poller to invalidate thread. This is a variant of using non-blocking pipe, but doesn't have any limit on the amount of data (worst case we run out of memory and error out). 2. Allow events from sockets, immediately after we read one rpc-msg. Currently we disallow events till that rpc-msg is read from socket, processed and handled by higher layers. That way we won't run into these kind of issues. Also, it'll increase parallelism in way of reading from sockets. This patch implements solution 1 above. Change-Id: I8e8199fd7f4da9eab46a719d9292f35c039967e1 BUG: 1273387 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/12402
* fuse: resolve complete path after a graph switchMohammed Rafi KC2015-10-081-8/+41
| | | | | | | | | | | | | | | | | | If a graph switch has happended as part of a attach-tier, then there is a chance to hash fops to newly added brick before fix-layout. This causes on going i/o to fail. This patch will resolve a path, for graph switch by sending recursive lookup to the parent directories. Those lookups will help to heal the directory. Change-Id: Ia2bb4b43a21e5cc6875ba1205628744c3f0ce4e5 BUG: 1263549 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12184 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* all: reduce "inline" usageJeff Darcy2015-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155 BUG: 1245331 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11769 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* mount/fuse: Log ENODATA as DEBUG in {f}removexattrVijay Bellur2015-08-281-2/+43
| | | | | | | | | | | | | | | | | Logging ENODATA errors for {f}removexattr at a higher loglevel does not add a lot of value and causes a log message flood as per multiple reports. Added a new cbk, fuse_removexattr_cbk() to be used with removexattr fops. ENODATA now gets logged at loglevel DEBUG in fuse_removexattr_cbk(). This also prevents more conditional checks in the common fuse_err_cbk() callback. Change-Id: I1585b4d627e0095022016c47d7fd212018a7194b BUG: 1257110 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/12015 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* fuse: add "resolve-gids" mount option to overcome 32-groups limitNiels de Vos2015-08-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a --resolve-gids commandline option to the glusterfs binary. This option gets set when executing "mount -t glusterfs -o resolve-gids ...". This option is most useful in combination with the "acl" mount option. POSIX ACL permission checking is done on the FUSE-client side to improve performance (in addition to the checking on the bricks). The fuse-bridge reads /proc/$PID/status by default, and this file contains maximum 32 groups. Any local (client-side) permission checking that requires more than the first 32 groups will fail. By enabling the "resolve-gids" option, the fuse-bridge will call getgrouplist() to retrieve all the groups from the user accessing the mountpoint. This is comparable to how "nfs.server-aux-gids" works. Note that when a user belongs to more than ~93 groups, the volume option server.manage-gids needs to be enabled too. Without this option, the RPC-layer will need to reduce the number of groups to make them fit in the RPC-header. Change-Id: I7ede90d0e41bcf55755cced5747fa0fb1699edb2 BUG: 1246275 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11732 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* libgfapi: send explicit lookups on inodes linked in readdirpRaghavendra Bhat2015-07-021-34/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the inode is linked via readdirp, then the consuners of gfapi which are using handles (got either in lookup or readdirp) might not send an explicit lookup on that object again (ex: NFS, samba, USS). If there is a replicate volume where the replicas of the object are not in sync, then readdirp followed by fops might lead data being served from the subvolume which is not in sync with latest data. And since lookup is needed to trigger self-heal on that object the consumers might keep getting wrong data until an explicit lookup is not done. Fuse handles this situation by sending an explicit lookup by itself (fuse xlator) on those inodes which are linked via readdirp, whenever a fop comes on that inode. The same procedure is done in gfapi as well to address this situation. Thanks to shyam(srangana@redhat.com) for valuable inputs Change-Id: I64f0591495dddc1dea7f8dc319f2558a7e342871 BUG: 1236009 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11236 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* fuse: squash 64-bit inodes in readdirp when enable-ino32 is setNiels de Vos2015-05-281-1/+6
| | | | | | | | | | | | | | | The structures returned by readdirp contain the inode 2x. Only one of them was squashed into 32-bits when enable-ino32 is enabled. Change-Id: I33a6d28fb118bb23971f918ffeb983d7f033106e BUG: 1223889 Signed-off-by: Niels de Vos <ndevos@redhat.com> Tested-by: Cyril Peponnet <cyril@peponnet.fr> [on release-3.5] Reviewed-on: http://review.gluster.org/10881 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>