glusterfs.git/libglusterfs, branch v8.0

open-behind: rewrite of internal logic

2020-06-29T12:51:52+00:00

There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez

syncop: improve scaling and implement more tools

2020-05-30T05:17:12+00:00

The current scaling of the syncop thread pool is not working properly
and can leave some tasks in the run queue more time than necessary
when the maximum number of threads is not reached.

This patch provides a better scaling condition to react faster to
pending work.

Condition variables and sleep in the context of a synctask have also
been implemented. Their purpose is to replace regular condition
variables and sleeps that block synctask threads and prevent other
tasks to be executed.

The new features have been applied to several places in glusterd.

Change-Id: Ic50b7c73c104f9e41f08101a357d30b95efccfbf
Fixes: #1116
Signed-off-by: Xavi Hernandez

tests: skip tests on absence of reflink in xfs

2020-05-29T07:02:11+00:00

Fixes: #1223
Change-Id: I36cb72d920ffd77405051546615c5262c392daef
Signed-off-by: Pranith Kumar K 
(cherry picked from commit b85f01abab658d1d704cd6caf84dd64eddafbff7)

Posix: Optimize posix code to improve file creation

2020-04-06T10:43:26+00:00

Problem: Before executing a fop in POSIX xlator it builds an internal
         path based on GFID.To validate the path it call's (l)stat
         system call and while .glusterfs is heavily loaded kernel takes
         time to lookup inode and due to that performance drops

Solution: In this patch we followed two ways to improve the performance.
          1) Keep open fd specific to first level directory(gfid[0])
             in .glusterfs, it would force to kernel keep the inodes
             from all those files in cache. In case of memory pressure
             kernel won't uncache first level inodes. We need to open
             256 fd's per brick to access the entry faster.
          2) Use at based call's to access relative path to reduce
             path based lookup time.

Note: To verify the patch we have executed kernel untar 100 times on 6
      different clients after enabling metadata group-cache and some
      other option.We were getting more than 20 percent improvement in
      kenel untar after applying the patch.

Credits: Xavi Hernandez 
Change-Id: I1643e6b01ed669b2bb148d02f4e6a8e08da45343
updates: #891
Signed-off-by: Mohit Agrawal

core[brick_mux]: brick crashed when creating and deleting volumes over time

2020-03-27T15:19:20+00:00

Problem: In brick_mux environment, while volumes are created/stopped in a loop
         after running a long time the main brick is crashed.The brick is crashed
         because the main brick process was not cleaned up memory for all objects
         at the time of detaching a volume.
         Below are the objects that are missed at the time of detaching a volume
         1) xlator object for a brick graph
         2) local_pool for posix_lock xlator
         3) rpc object cleanup at quota xlator
         4) inode leak at brick xlator

Solution: To avoid the crash resolve all leak at the time of detaching a brick
Change-Id: Ibb6e46c5fba22b9441a88cbaf6b3278823235913
updates: #977
Signed-off-by: Mohit Agrawal

Posix: Use simple approach to close fd

2020-03-20T04:08:42+00:00

Problem: posix_release(dir) functions add the fd's into a ctx->janitor_fds
         and janitor thread closes the fd's.In brick_mux environment it is
         difficult to handle race condition in janitor threads because brick
         spawns a single janitor thread for all bricks.

Solution: Use synctask to execute posix_release(dir) functions instead of 
          using background a thread to close fds.

Credits: Pranith Karampuri 
Change-Id: Iffb031f0695a7da83d5a2f6bac8863dad225317e
Fixes: bz#1811631
Signed-off-by: Mohit Agrawal

utime: resolve an issue of permission denied logs

2020-03-13T12:15:21+00:00

In case where uid is not set to be 0, there are possible errors
from acl xlator. So, set `uid = 0;` with pid indicating this is
set from UTIME activity.

The message "E [MSGID: 148002] [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-dev_SNIP_data-utime: dict set of key for set-ctime-mdata failed [Permission denied]" repeated 2 times between [2019-12-19 21:27:55.042634] and [2019-12-19 21:27:55.047887]

Change-Id: Ieadf329835a40a13ac0bf908dac776e66954466c
Fixes: #832
Signed-off-by: Amar Tumballi

Segmentation fault occurs during truncate

2020-02-24T18:31:04+00:00

Problem:
Segmentation fault occurs when bricks are nearly full 100% and in
parallel truncate of a file is attempted (No space left on device).
Prerequicite is that performance xlators are activated
(read-ahead, write-behind etc)
while stack unwind of the frames following an error responce
from brick (No space left on device) frame->local includes a memory
location that is not allocated via mem_get but via calloc.
The destroyed frame is always ra_truncate_cbk winded from ra_ftruncate
and the inode ptr is copied to the frame local in the wb_ftruncate.

Fix:
extra check is added for the pool ptr

Change-Id: Ic5d3bd0ab7011e40b2811c6dece063b256e4d9d1
Fixes: bz#1797882
Signed-off-by: kinsu

core: Prevent crash on process termination

2020-02-19T11:30:43+00:00

A previous patch (ce61da816a) has fixed a use-after-free issue,
but it doesn't work well when the final cleanup is done at process
termination because gluster doesn't stop other threads before
calling exit().

For this reason, the final cleanup is removed to avoid the crash,
at least until the termination sequence properly stops all gluster
threads before exiting the program.

Change-Id: Id7cfb4407fcf208e28f03a7c3cdc3ef9c1f3bf9b
Fixes: bz#1801684
Signed-off-by: Xavi Hernandez

core: fix memory pool management races

2020-02-18T17:29:51+00:00

Objects allocated from a per-thread memory pool keep a reference to it
to be able to return the object to the pool when not used anymore. The
object holding this reference can have a long life cycle that could
survive a glfs_fini() call.

This means that it's unsafe to destroy memory pools from glfs_fini().

Another side effect of destroying memory pools from glfs_fini() is that
the TLS variable that points to one of those pools cannot be reset for
all alive threads.  This means that any attempt to allocate memory from
those threads will access already free'd memory, which is very
dangerous.

To fix these issues, mem_pools_fini() doesn't destroy pool lists
anymore. Only at process termination the pools are destroyed.

Change-Id: Ib189a5510ab6bdac78983c6c65a022e9634b0965
Fixes: bz#1801684
Signed-off-by: Xavi Hernandez