glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	glusterfsd: Brick is getting crash at the time of startup	Mohit Agrawal	2019-03-13	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Brick is getting crash because graph was not activated at the time of accessing server_conf Solution: To avoid the crash check ctx->active before processing a request > Change-Id: Ib112e0eace19189e45f430abdac5511c026bed47 > fixes: bz#1687705 >(cherry picked from commit 67f48bfcc16a38052e6c9ae7c25e69b03b8ae008) > (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/22339/) Change-Id: I1367c564f04edbad145575b811c67522cc318851 fixes: bz#1688218 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	rpm: add thin-arbiter packagev6.0rc1	Ashish Pandey	2019-03-13	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Discussion on thin arbiter volume - https://github.com/gluster/glusterfs/issues/352#issuecomment-350981148 Main idea of having this rpm package is to deploy thin-arbiter without glusterd and other commands on a node, and all we need on that tie-breaker node is to run a single glusterfs command. Also note that, no other glusterfs installation needs thin-arbiter.so. Make sure RPM contains sample vol file, which can work by default, and a script to configure that volfile, along with translator image. Change-Id: Ibace758373d8a991b6a19b2ecc60c93b2f8fc489 updates: bz#1672818 Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Ashish Pandey <aspandey@redhat.com> (cherry picked from commit ca9bef7f1538beb570fcb190ff94f86f0b8ba38a)
*	cluster/afr: Send truncate on arbiter brick from SHD	karthik-us	2019-03-12	1	-15/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In an arbiter volume configuration SHD will not send any writes onto the arbiter brick even if there is data pending marker for the arbiter brick. If we have a arbiter setup on the geo-rep master and there are data pending markers for the files on arbiter brick, SHD will not mark any data changelog during healing. While syncing the data from master to slave, if the arbiter-brick is considered as ACTIVE, then there is a chance that slave will miss out some data. If the arbiter brick is being newly added or replaced there is a chance of slave missing all the data during sync. Fix: If there is data pending marker for the arbiter brick, send truncate on the arbiter brick during heal, so that it will record truncate as the data transaction in changelog. Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec fixes: bz#1687672 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	glusterd: glusterd memory leak while running "gluster v profile" in a loop	Mohit Agrawal	2019-03-12	2	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: glusterd has memory leak while running "gluster v profile" in a loop Solution: Resolve leak code path to avoid leak > Change-Id: Id608703ff6d0ad34ed8f921a5d25544e24cfadcd > fixes: bz#1685414 > (Cherry pick from commit 9374484917466dff4688d96ff7faa0de1c804a6c) > (Reviewed on link https://review.gluster.org/#/c/glusterfs/+/22301/) Change-Id: I1ca118265f97b188f94b3d5cff649ec36cb18ca0 fixes: bz#1685771 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	core: make compute_cksum function op_version compatible	Sanju Rakonde	2019-03-08	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: commit 5a152a changed the mechansim of computing the checksum. In heterogeneous cluster, peers are running into rejected state because we have different cksum computation mechansims in upgraded and non-upgraded nodes. Solution: add a check for op-version so that all the nodes in the cluster follow the same mechanism for computing the cksum. fixes: bz#1684029 > Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193 > BUG: bz#1685120 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> > (cherry picked from commit 073444b693b7a91c42963512e0fdafb57ad46670) Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193
*	performance/readdir-ahead: fix deadlock	Raghavendra Gowdappa	2019-03-08	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This deadlock happens while processing dentry corresponding to current directory (.) in rda_fill_readdirp. In this case following order is followed: LOCK(directory_fd_ctx->lock); rda_inode_ctx_get_iatt -> LOCK(directory_inode->lock); However, in rda_mark_inode_dirty following lock order is followed: LOCK(directory_inode->lock); LOCK(directory_fd_ctx->lock); these two codepaths when executed concurrently resulted in a deadlock. Current patch fixes this by removing locking directory inode and fd-ctx in rda_fill_readdirp. This is fine as directory inode's stat won't change due to writes to files within directory. Change-Id: Ic93a67a0dac8229bb0d79582e526a512e6f2569c Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Fixes: bz#1686399
*	io-threads: Prioritize fops with NO_ROOT_SQUASH pid	Susant Palai	2019-03-06	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was 30% regression observed in mkdir path with commit b139bc58eb504adf5ef81658896c9283ae21f390. On analysis it is found that io-threads xlator deprioritzes fops with all -ve pid. Some context in to the no-root-squash pid requirement: DHT xlator does some of the internal fops with root privileges. This is needed so that operations like layout healing should not be abandoned because a non root user is operating. If root-squash option is enabled the layout set operation looses its root privilege as server xlator converts the uid and pid to random numbers. Hence, the above mentioned commit converted pid to GF_CLIENT_PID_NO_ROOT_SQUASH to continue fops as root. Combining the above I am proposing not to deprioritize fops with no-root-squash pid. > Change-Id: I54d056c01b25729304a77f9242fbaff39c5672ba > fixes: bz#1676430 > Signed-off-by: Susant Palai <spalai@redhat.com> (cherry picked from commit f5c3b1727f55ffaa3dcdb3c3a09b968ebb45dbb2) Change-Id: I54d056c01b25729304a77f9242fbaff39c5672ba fixes: bz#1676429 Signed-off-by: Susant Palai <spalai@redhat.com>
*	glusterd: remove experimental xlator options from glusterd-volume-set.c	Sanju Rakonde	2019-02-27	1	-20/+0
\| \| \| \| \| \| \| \| \| \| \|	experimental xlators have been removed from the codebase. But we missed to remove the options related to experimental xlators from the codebase. This patch removes those options. fixes: bz#1683506 Change-Id: I3fa7e14c6cd8ebde5cebc8d2b0cb2409bf37c1ae Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 5cddd4d758014fe116d9c130632eada2ecded88c)
*	md-cache: Adapt integer data types to avoid integer overflow	David Spisla	2019-02-25	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "struct iatt" in iatt.h is using int64_t types for storing the atime, mtime and ctime. Therefore the struct 'struct md_cache' in md-cache.c should also use this types to avoid an integer overflow. This can happen e.g. if someone uses a very high default-retention-period in the WORM-Xlator. Change-Id: I605268d300ab622b9c8ab30e459dc00d9340aad1 fixes: bz#1680020 Signed-off-by: David Spisla <david.spisla@iternity.com> (cherry picked from commit 15423e14f16dd1a15ee5e5cbbdbdd370e57ed59f)
*	cluster/dht: Request linkto xattrs in dht_rmdir opendir	N Balachandran	2019-02-22	1	-1/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If parallel-readdir is enabled, the rda xlator is loaded below dht in the graph and proactively lists and caches entries when an opendir is performed. dht_rmdir checks if the directory being deleted contains stale linkto files by performing a readdirp on its child subvols. However, as the entries are actually read in during the opendir operation which does not request the linkto xattr,no linkto xattrs are present for the entries causing dht to incorrectly identify them as data files and fail the rmdir operation with ENOTEMPTY. DHT now always adds the linkto xattr in the list of xattrs requested in the opendir. Change-Id: I0711198e66c59146282eb8b88084170bedfb4018 fixes: bz#1679004 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	dht: fix double extra unref of inode at heal path	Kinglong Mee	2019-02-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The loc_wipe is done in the _out_ section, inode_unref(loc.parent) here casues a double extra unref of loc.parent. > Change-Id: I2dc809328d3d34bf7b02c7df9a4f97788af511e6 > Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> (cherry-pick of https://review.gluster.org/#/c/glusterfs/+/21998/) Change-Id: I2dc809328d3d34bf7b02c7df9a4f97788af511e6 updates: bz#1679275 Signed-off-by: Susant Palai <spalai@redhat.com>
*	performance/write-behind: fix use-after-free in readdirp	Raghavendra Gowdappa	2019-02-22	1	-18/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two issues were found: 1. in wb_readdirp_cbk, inode should unrefed after wb_inode is unlocked. Otherwise, inode and hence the context wb_inode can be freed by the type we try to unlock wb_inode 2. wb_readdirp_mark_end iterates over a list of wb_inodes of children of a directory. But inodes could've been freed and hence the list might be corrupted. To fix take a reference on inode before adding it to invalidate_list of parent. Change-Id: I911b0e0b2060f7f41ded0b05db11af6f9b7c09c5 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Updates: bz#1678570 (cherry picked from commit 64cc4458918e8c8bfdeb114da0a6501b2b98491a)
*	performance/write-behind: handle call-stub leaks	Raghavendra Gowdappa	2019-02-19	1	-0/+8
\| \| \| \| \| \| \|	Change-Id: I7be9a5f48dcad1b136c479c58b1dca1e0488166d Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Fixes: bz#1678570 (cherry picked from commit 6175cb10cd5f59f3c7ae4100bc78f359b68ca3e9)
*	cluster/dht: Fix lookup selfheal and rmdir race	N Balachandran	2019-02-18	1	-9/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A race between the lookup selfheal and rmdir can cause directories to be healed only on non-hashed subvols. This can prevent the directory from being listed from the mount point and in turn causes rm -rf to fail with ENOTEMPTY. Fix: Update the layout information correctly and reduce the call count only after processing the response. Change-Id: I812779aaf3d7bcf24aab1cb158cb6ed50d212451 fixes: bz#1677260 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	performance/md-cache: introduce an option to control invalidation of inodes	Raghavendra Gowdappa	2019-02-18	2	-10/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Explicit invalidation by calling inode_invalidate is necessary when same (meta)data is shared/access across multiple mounts. Without an explicit inode_invalidate call, caches in the mount which didn't witness writes wouldn't be aware of changes as writes wouldn't have passed through them. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, explicit inode invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. Note that otherwise, local writes (which pass through the cache) will change ctime and cause unnecessary invalidations. The name of the option that controls this behavior is "performance.global-cache-invalidation". This option is global and it purges caches both in glusterfs and kernel stack for native FUSE mounts. For non-native FUSE mounts, it purges cache only from glusterfs stack. This option is effective only when performance.stat-prefetch is on. Note that there is a similar option "performance.cache-invalidation", but the scope of that option is limited to quick-read and md-cache. Change-Id: I462bb4b65ff9aae1f6ba76f50b1f2f94fb10323b Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1674364 (cherry picked from commit 2b5aa4489de2017a03bcb6ec8986286f0c76a670)
*	mount/fuse: fix bug related to --auto-invalidation in mount script	Raghavendra Gowdappa	2019-02-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	When "auto-invalidation" option was not specified for mount script, glusterfs cmdline ended with "--auto-invalidation=" option. This patch fixes that bug in mount script. Thanks to Amar for reporting it. Change-Id: Ie5cd4c6ffb3ac644d9d2b032035f914a935d05a8 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1674364
*	glusterd: improve logging	Atin Mukherjee	2019-02-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	glusterd_resolve_all_bricks failure log should highlight the brick identifier. Change-Id: I035b4650ef6a14bb1e1221d3bad1c40f9d43dbdd fixes: bz#1673972 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 12af2067a82e37079e76723d3e25ba1c72ca078a)
*	glusterd: get-state command should not fail if any brick is gone badv7dev	Sanju Rakonde	2019-02-05	2	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: get-state command will error out, if any of the underlying brick(s) of volume(s) in the cluster go bad. It is expected that get-state command should not error out, but should generate an output successfully. Solution: In glusterd_get_state(), a statfs call is made on the brick path for every bricks of the volumes to calculate the total and free memory available. If any of statfs call fails on any brick, we should not error out and should report total memory and free memory of that brick as 0. This patch also handles a statfs failure scenario in glusterd_store_retrieve_bricks(). fixes: bz#1672205 Change-Id: Ia9e8a1d8843b65949d72fd6809bd21d39b31ad83 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd: manage upgrade to current master	Amar Tumballi	2019-02-04	2	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Scenarios tested: * Upgrade the node when there are stripe / tiering and regular type of volumes are present. - All volumes are started fine (as the change was not on brick volfile) - For tier, the functionality may not even work, as changetimerecorder is not present. - 'gluster volume info' properly shows as 'NOT SUPPORTED' for stripe and tier type of volume. * Upgrade in a rolling upgrade scenario, where an old version is able to connect to higher master. - on a normal volume, if the volfile-server was new, the newer client volfiles needed to have utime xlator conditionally. - with this one change, all other changes seem to work fine. Change-Id: Ib2d3b69dafa02b2c695a735b13c1aa70aba07cb8 updates: bz#1635688 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	cluster/dht: Do not use gfid-req in fresh lookup	N Balachandran	2019-02-02	2	-8/+60
\| \| \| \| \| \| \| \| \| \| \| \|	Fuse sets a random gfid-req value for a fresh lookup. Posix lookup will set this gfid on entries with missing gfids causing a GFID mismatch for directories. DHT will now ignore the Fuse provided gfid-req and use the GFID returned from other subvols to heal the missing gfid. Change-Id: I5f541978808f246ba4542564251e341ec490db14 fixes: bz#1670259 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	mount/fuse: expose auto-invalidation as a mount option	Raghavendra Gowdappa	2019-02-02	3	-10/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Auto invalidation is necessary when same (meta)data is shared/access across multiple mounts. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, fuse-auto-invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. From glusterfs --help, <snip> --auto-invalidation[=BOOL] controls whether fuse-kernel can auto-invalidate attribute, dentry and page-cache. Disable this only if same files/directories are not accessed across two different mounts concurrently [default: "on"] </snip> Details on how disabling auto-invalidation helped to reduce pgbench init times can be found at [1]. Time taken for pgbench init of scale 8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s) with auto-invalidations turned off along with other optimizations. Just disabling auto-invalidation contributed 56% improvement by reducing the total time taken by 33260s. [1] https://www.spinics.net/lists/gluster-devel/msg25907.html Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1664934
*	core: make gf_thread_create() easier to use	Xavi Hernandez	2019-02-01	4	-23/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch creates a specific function to set the thread name using a string format and a variable argument list, like printf(). This function is used to set the thread name from gf_thread_create(), which now accepts a variable argument list to create the full name. It's not necessary anymore to use a local array to build the name of the thread. This is done automatically. Change-Id: Idd8d01fd462c227359b96e98699f8c6d962dc17c Updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelog	Ashish Pandey	2019-02-01	4	-19/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a fop to create an entry fails on one of the data brick, we mark the pending changelog on the entry on brick for which it was successful. This is done as part of post op phase to make sure that entry gets healed even if it gets renamed to some other path where its parent was not marked as bad. As it happens as part of post op, we should consider thin-arbiter to check if the brick, which was successful, is the good brick or not. This will avoide split brain and other issues. Change-Id: I12686675be98f02f70a5186b3ed748c541514d53 updates: bz#1662264 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	cluster/dht: Remove internal permission bits	N Balachandran	2019-02-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Rebalance sets the sgid and t bits on a file that is being migrated. These permissions are not removed in dht_readdirp_cbk when listing files causing them to show up on the mountpoint. We now remove these permissions if a non-linkto file has the linkto xattr set. Change-Id: I5c69b2ecfe2df804fe50faea903b242d01729596 fixes: bz#1669937 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	feature/bitrot: Avoid thread creation if xlator is not enabled	Mohit Agrawal	2019-01-31	1	-8/+64
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Avoid thread creation for bitrot-stub for a volume if feature is not enabled Solution: Before thread creation check the flag if feature is enabled Updates: #475 Change-Id: I2c6cc35bba142d4b418cc986ada588e558512c8e Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	core: heketi-cli is throwing error "target is busy"	Mohit Agrawal	2019-01-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When rpc-transport-disconnect happens, server_connection_cleanup_flush_cbk() is supposed to call rpc_transport_unref() after open-files on that transport are flushed per transport.But open-fd-count is maintained in bound_xl->fd_count, which can be incremented/decremented cumulatively in server_connection_cleanup() by all transport disconnect paths. So instead of rpc_transport_unref() happening per transport, it ends up doing it only once after all the files on all the transports for the brick are flushed leading to rpc-leaks. Solution: To avoid races maintain fd_cnt at client instead of maintaining on brick Credits: Pranith Kumar Karampuri Change-Id: I6e8ea37a61f82d9aefb227c5b3ab57a7a36850e6 fixes: bz#1668190 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	readdir-ahead: do not zero-out iatt in fop cbk	Ravishankar N	2019-01-31	1	-20/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...when ctime is zero. ia_type and ia_gfid always need to be non-zero for things to work correctly. Problem: Commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac zeroed out the iatt buffer in the cbks of modification fops before unwinding if the ctime in the buffer was zero. This was causing the fops to fail: noticeable when AFR's 'consistent-metadata' option was enabled. (AFR zeros out the ctime when the option is set. See commit 4c4624c9bad2edf27128cb122c64f15d7d63bbc8). Fixes: -Do not zero out the ia_type and ia_gfid of the iatt buff under any circumstance. -Also, fixed _rda_inode_ctx_update_iatts() to always update these values from the incoming buf when ctime is zero. Otherwise we end up with zero ia_type and ia_gfid the first time the function is called and the incoming buf has ctime set to zero. fixes: bz#1670253 Reported-By:Michael Hanselmann <public@hansmi.ch> Change-Id: Ib72228892d42c3513c19fc6dfb543f2aa3489eca Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	features/sdfs: disable by default	Amar Tumballi	2019-01-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the feature enabled, some of the performance testing results, specially those which create millions of small files, got approximately 4x regression compared to version before enabling this. On master without this patch: 765 creates/sec On master with this patch : 3380 creates/sec Also there seems to be regression caused by this in 'ls -l' workload. On master without this patch: 3030 files/sec On master with this patch : 16610 files/sec This is a feature added to handle multiple clients parallely operating (specially those which race for file creates with same name) on a single namespace/directory. Considering that is < 3% of Gluster's usecase right now, it makes sense to disable the feature by default, so we don't penalize the default users who doesn't bother about this usecase. Also note that the client side translators, specially, distribute, replicate and disperse already handle the issue upto 99.5% of the cases without SDFS, so it makes sense to keep the feature disabled by default. Credits: Shyamsunder <srangana@redhat.com> for running the tests and getting the numbers. Change-Id: Iec49ce1d82e621e9db25eb633fcb1d932e74f4fc Updates: bz#1670031 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	Multiple files: reduce work while under lock.	Yaniv Kaul	2019-01-29	13	-117/+120
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly, unlock before logging. In some cases, moved different code that was not needed to be under lock (for example, taking time, or malloc'ing) to be executed before taking the lock. Note: logging might be slightly less accurate in order, since it may not be done now under the lock, so order of logs is racy. I think it's a reasonable compromise. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89
*	features/shard: Ref shard inode while adding to fsync list	Krutika Dhananjay	2019-01-24	1	-8/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PROBLEM: Lot of the earlier changes in the management of shards in lru, fsync lists assumed that if a given shard exists in fsync list, it must be part of lru list as well. This was found to be not true. Consider this - a file is FALLOCATE'd to a size which would make the number of participant shards to be greater than the lru list size. In this case, some of the resolved shards that are to participate in this fop will be evicted from lru list to give way to the rest of the shards. And once FALLOCATE completes, these shards are added to fsync list but without a ref. After the fop completes, these shard inodes are unref'd and destroyed while their inode ctxs are still part of fsync list. Now when an FSYNC is called on the base file and the fsync-list traversed, the client crashes due to illegal memory access. FIX: Hold a ref on the shard inode when adding to fsync list as well. And unref under following conditions: 1. when the shard is evicted from lru list 2. when the base file is fsync'd 3. when the shards are deleted. Change-Id: Iab460667d091b8388322f59b6cb27ce69299b1b2 fixes: bz#1669077 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	afr/self-heal:Fix wrong type checking	Ravishankar N	2019-01-24	2	-32/+9
\| \| \| \| \| \| \| \| \| \|	gf_dirent struct has d_type variable which should check with DT_DIR istead of IA_IFDIR or IA_IFDIR has to compare with entry->d_stat.ia_type Change-Id: Idf1059ce2a590734bc5b6adaad73604d9a708804 updates: bz#1653359 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	core: heketi-cli is throwing error "target is busy"	Mohit Agrawal	2019-01-24	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: At the time of deleting block hosting volume through heketi-cli , it is throwing an error "target is busy". cli is throwing an error because brick is not detached successfully and brick is not detached due to race condition to cleanp xprt associated with detached brick Solution: To avoid xprt specifc race condition introduce an atomic flag on rpc_transport Change-Id: Id4ff1fe8375a63be71fb3343f455190a1b8bb6d4 fixes: bz#1668190 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	performance/readdir-ahead: Fix deadlock in readdir ahead.	Mohammed Rafi KC	2019-01-23	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a lock contention in reaadir-ahead xlator. There are two issues, one is the processing of "." ".." entry while holding an fd_ctx lock. The other one is destroying the stack inside a fd_ctx lock. Change-Id: Id0bf83a3d9fea6b40015b8d167525c59c6cfa25e updates: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	posix: Change data type to dump nr_files to statedump	Mohit Agrawal	2019-01-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In commit 2261e444a47ffffb5d64305efceee1d5a734cd75 wrong data type of nr_files was changed to dump nr_files to statedump so build is failing on 32bit environment Solution: Change data type to avoid errors Change-Id: Ifb4b19feda6e0e56d110b23351e7a0efd5bfa29b fixes: bz#1657607 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	rpc: use address-family option from vol file	Milind Changire	2019-01-22	3	-4/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch helps enable IPv6 connections in the cluster. The default address-family is IPv4 without using this option explicitly. When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol file, the mount command-line also needs to have -o xlator-option="transport.address-family=inet6" added to it. This option also gets added to the brick command-line. Snapshot and gfapi use-cases should also use this option to pass in the inet6 address-family. Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270 fixes: bz#1635863 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	afr: not resolve splitbrains when copies are of same size	Iraj Jamali	2019-01-22	1	-18/+25
\| \| \| \| \| \| \| \| \| \| \| \|	Automatic Splitbrain with size as policy must not resolve splitbrains when the copies are of same size. Determining if the sizes of copies are same and returning -1 in that case. updates: bz#1655052 Change-Id: I3d8e8b4d7962b070ed16c3ee02a1e5a926fd5eab Signed-off-by: Iraj Jamali <ijamali@redhat.com>
*	locks/fencing: Add a security knob for fencing	Susant Palai	2019-01-22	3	-9/+39
\| \| \| \| \| \| \| \| \| \| \| \| \|	There is a low level security issue with fencing since one client can preempt another client's lock. This patch does not completely eliminate the issue of a client misbehaving, but certainly it adds a security layer for default use cases that does not need fencing. Change-Id: I55cd15f2ed1ae0f2556e3d27a2ef4bc10fdada1c updates: #466 Signed-off-by: Susant Palai <spalai@redhat.com>
*	cluster/dht: Delete invalid linkto files in rmdir	N Balachandran	2019-01-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	rm -rf <dir> fails on dirs which contain linkto files that point to themselves because dht incorrectly thought that they were cached files after looking them up. The fix now treats them as invalid linkto files and deletes them. Change-Id: I376c72a5309714ee339c74485e02cfb4e29be643 fixes: bz#1667804 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	glusterd: Avoid dict_leak in __glusterd_handle_cli_uuid_get function	Mohit Agrawal	2019-01-22	1	-0/+2
\| \| \| \| \| \|	Change-Id: Iefe08b136044495f6fa2b092c9e8c833efee1400 fixes: bz#1667905 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	glusterd: Resolve memory leak in get-state command	Mohit Agrawal	2019-01-21	1	-0/+10
\| \| \| \| \| \| \| \| \|	In gluster get-state volumeoptions command there was some amount of leak observed. This fix resolves the identified leaks. Change-Id: Ibde5743d1136fa72c531d48bb1b0b5da0c0b82a1 fixes: bz#1667779 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	posix: fix coverity issue	Iraj Jamali	2019-01-21	1	-6/+1
\| \| \| \| \| \| \| \| \| \|	Logically dead code CID: 1398468 Updates: bz#789278 Change-Id: I8713a0c51777eb64e617d00ab72fd1db4994b6ab Signed-off-by: Iraj Jamali <ijamali@redhat.com>
*	afr: Splitbrain with size as policy must not resolve for directory	Sheetal Pamecha	2019-01-21	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In automatic Splitbrain resolution when favorite child policy is set as size, split brain resolution must not work for directories. Currently, if a directory is in split brain with both copies having same size, the source is selected arbitrarily and healed. fixes: bz#1655050 Change-Id: I5739498639c17c89874cc577362e543adab55f5d Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
*	quotad: fix wrong memory free	Kinglong Mee	2019-01-21	3	-19/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. cli_req.dict.dict_val, It must be freed no metter operation error or success. Fix it as lookup "alloca" memory before decode. 2. args.xdata.xdata_val, It is allocated by "alloca", free is unneeded. 3. qd_nameless_lookup, It olny needs gfid, a gfs3_lookup_req argument is unneeded. Change-Id: I746dddf7f3d1465b1885af2644afe0bcf0a5665b fixes: bz#1656682 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
*	core: Feature added to accept CidrIp in auth.allow	Rinku Kothiya	2019-01-18	2	-5/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added functionality to gluster volume set auth.allow command to accept CIDR IP addresses. Modified few functions to isolate cidr feature so that it prevents other gluster commands such as peer probe to use cidr format ip. The functions are modified in such a way that they have an option to enable accepting of cidr format for other gluster commands if required in furture. updates: bz#1138841 Change-Id: Ie6734002a7078f1820e5df42d404411cce945e8b Credits: Mohit Agrawal Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
*	lock: Add fencing support	Susant Palai	2019-01-17	6	-122/+708
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	design reference: https://review.gluster.org/#/c/glusterfs-specs/+/21925/ This patch adds the lock preempt support. Note: The current model stores lock enforcement information as separate xattr on disk. There is another effort going in parallel to store this in stat(x) of the file. This patch is self sufficient to add fencing support. Based on the availability of the stat(x) support either I will rebase this patch or we can modify the necessary bits post merging this patch. Change-Id: If4a42f3e0afaee1f66cdb0360ad4e0c005b5b017 updates: #466 Signed-off-by: Susant Palai <spalai@redhat.com>
*	core: Resolve memory leak for brick	Mohit Agrawal	2019-01-16	2	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	Problem: Some functions are not freeing memory allocated by xdr_to_genric so it has become leak Solution: Call free to avoid leak Change-Id: I3524fe2831d1511d378a032f21467edae3850314 fixes: bz#1656682 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	posix: Convert several posix_private members to gf_atomic	Mohit Agrawal	2019-01-15	5	-48/+18
\| \| \| \| \| \|	Change-Id: I629698d8ddf6f15428880bdc1501d36bc37b8ebb fixes: bz#1657607 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	core: Resolve dict_leak at the time of destroying graph	Mohit Agrawal	2019-01-14	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Problem: In gluster code some of the places it call's get_new_dict to create a dictionary without taking reference so at the time of dict_unref it has become a leak Solution: To resolve the same call dict_new instead of get_new_dict updates bz#1650403 Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	cluster/afr: fix zerofill transaction.start	Xiubo Li	2019-01-14	1	-1/+1
\| \| \| \| \| \| \| \|	This maybe one mistake when coding. Fixes: bz#1665332 Change-Id: Ia8f8dadf4a71579240ff9950b141ca528bd342b3 Signed-off-by: Xiubo Li <xiubli@redhat.com>
*	glusterd: fix crash	Sanju Rakonde	2019-01-13	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: running "gluster get-state glusterd odir /get-state" resulted in glusterd crash. Cause: In the above command output directory has been specified without "/" at the end. If "/" is not given at the end, "/" will be added to path using "strcat", so the added character "/" is not having memory allocated. When tried to free, glusterd will crash as"/" has no memory allocated. Solution: Instead of concatenating "/" to output directory, add it to output filename. Change-Id: I5dc00a71e46fbef4d07fe99ae23b36fb60dec1c2 fixes: bz#1665038 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>