glusterfs.git/glusterfsd, branch v7dev

mount/fuse: expose auto-invalidation as a mount option

2019-02-02T03:07:35+00:00

Auto invalidation is necessary when same (meta)data is shared/access
across multiple mounts. However, if (meta)data is not shared, all
relevant I/O goes through the cache of single mount and hence is
coherent with (meta)data on bricks always. So, fuse-auto-invalidation
can be disabled for this case which gives a huge performance boost for
workloads that write data and then immediately read the data they just
wrote.

From glusterfs --help,


      --auto-invalidation[=BOOL]   controls whether fuse-kernel can
                             auto-invalidate attribute, dentry and page-cache.
                             Disable this only if same files/directories are
                             not accessed across two different mounts
                             concurrently [default: "on"]


Details on how disabling auto-invalidation helped to reduce pgbench
init times can be found at [1]. Time taken for pgbench init of scale
8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s)
with auto-invalidations turned off along with other
optimizations. Just disabling auto-invalidation contributed 56%
improvement by reducing the total time taken by 33260s.

[1] https://www.spinics.net/lists/gluster-devel/msg25907.html

Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce
Signed-off-by: Raghavendra Gowdappa 
updates: bz#1664934

Multiple files: reduce work while under lock.

2019-01-29T09:27:22+00:00

Mostly, unlock before logging.
In some cases, moved different code that was not needed
to be under lock (for example, taking time, or malloc'ing)
to be executed before taking the lock.

Note: logging might be slightly less accurate in order, since it may
not be done now under the lock, so order of logs is racy. I think
it's a reasonable compromise.

Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul 

Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89

rpc: Fix double free

2019-01-22T17:22:13+00:00

The value rsp.xdata.xdata_val was being freed twice. It was assigned
to dict->extra_stdfree, dict_destroy would free it and also there was
an explicit free. Getting rid of explicit free in this patch.

Change-Id: Ia9c73454bec3970b33f154fa754398bf3b045645
fixes: bz#1668268
Signed-off-by: Poornima G

rpc: use address-family option from vol file

2019-01-22T13:47:19+00:00

This patch helps enable IPv6 connections in the cluster.
The default address-family is IPv4 without using this option explicitly.

When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol
file, the mount command-line also needs to have
-o xlator-option="transport.address-family=inet6" added to it.

This option also gets added to the brick command-line.
Snapshot and gfapi use-cases should also use this option to pass in the
inet6 address-family.

Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270
fixes: bz#1635863
Signed-off-by: Milind Changire

core: Resolve memory leak for brick

2019-01-16T17:31:59+00:00

Problem: Some functions are not freeing memory allocated by
         xdr_to_genric so it has become leak

Solution: Call free to avoid leak

Change-Id: I3524fe2831d1511d378a032f21467edae3850314
fixes: bz#1656682
Signed-off-by: Mohit Agrawal

core: glusterd/add-brick-and-validate-replicated-volume-options.t is crash

2019-01-14T12:34:40+00:00

Problem: Sometime brick is getting crash at the time of handling
         pmap signin request

Solution: glusterfs_mgmt_pamp_signin is using same frame to send
          pmap signin request so to avoid crash send signin request
          on separate frame

Change-Id: I443f854171ec4372e8d5f84bdc576c468e92c493
fixes: bz#1665656
Signed-off-by: Mohit Agrawal

core: Resolve dict_leak at the time of destroying graph

2019-01-14T04:01:54+00:00

Problem: In gluster code some of the places it call's get_new_dict
         to create a dictionary without taking reference so at the time
         of dict_unref it has become a leak

Solution: To resolve the same call dict_new instead of get_new_dict

updates bz#1650403
Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06
Signed-off-by: Mohit Agrawal

glusterd: kill the process without releasing the cleanup mutex lock

2019-01-02T07:23:14+00:00

Problem:
glusterd acquires a cleanup mutex lock before it starts
cleanup process, so that any other thread which tries to acquire
lock on any resource will be blocked on cleanup mutex lock.

We don't want any thread to try to acquire any resource, once
the cleanup is started. because other threads might try to acquire
lock on resources which are already freed by the thread which is
going though the cleanup phase.

previously we were releasing the cleanup mutex lock before the
process exit. As we are releasing the cleanup mutex lock, before
the process can exit some other thread which is blocked on
cleanup mutex lock is acquiring the cleanup mutex lock and
trying to acquire some resources which are already freed as a
part of cleanup. This is leading glusterd to crash.

Solution: We should exit the process without releasing the
cleanup mutex lock.

Change-Id: Ibae1c62260f141019017f7a547519a5d38dc2bb6
fixes: bz#1654270
Signed-off-by: Sanju Rakonde

posix: use synctask for janitor

2018-12-19T14:36:52+00:00

With brick mux, the number of threads increases as the number of
bricks increases. As an initiative to reduce the number of
threads in brick mux scenario, replacing janitor thread to use
synctask infra.

Now close() and closedir() handle by separate janitor
thread which is linked with glusterfs_ctx.

Updates #475
Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73
Signed-off-by: Poornima G

fuse: add --lru-limit option

2018-12-14T17:34:28+00:00

The inode LRU mechanism is moot in fuse xlator (ie. there is no
limit for the LRU list), as fuse inodes are referenced from
kernel context, and thus they can only be dropped on request of
the kernel. This might results in a high number of passive
inodes which are useless for the glusterfs client, causing a
significant memory overhead.

This change tries to remedy this by extending the LRU semantics
and allowing to set a finite limit on the fuse inode LRU.

A brief history of problem:

When gluster's inode table was designed, fuse didn't have any
'invalidate' method, which means, userspace application could
never ask kernel to send a 'forget()' fop, instead had to wait
for kernel to send it based on kernel's parameters. Inode table
remembers the number of times kernel has cached the inode based
on the 'nlookup' parameter. And 'nlookup' field is not used by
no other entry points (like server-protocol, gfapi etc).

Hence the inode_table of fuse module always has to have lru-limit
as '0', which means no limit. GlusterFS always had to keep all
inodes in memory as kernel would have had a reference to it.
Again, the reason for this is, kernel's glusterfs inode reference
was pointer of 'inode_t' structure in glusterfs. As it is a
pointer, we could never free it (to prevent segfault, or memory
corruption).

Solution:

In the inode table, handle the prune case of inodes with 'nlookup'
differently, and call a 'invalidator' method, which in this case is
fuse_invalidate(), and it sends the request to kernel for getting
the forget request.

When the kernel sends the forget, it means, it has dropped all
the reference to the inode, and it will send the forget with the
'nlookup' parameter too. We just need to make sure to reduce the
'nlookup' value we have when we get forget. That automatically
cause the relevant prune to happen.

Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B

fixes: bz#1560969
Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2
Signed-off-by: Amar Tumballi