glusterfs.git/xlators, branch v8.0

features/shard: Aggregate file size, block-count before unwinding removexattr

2020-06-29T14:33:05+00:00

Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872
Fixes: #1243
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 32519525108a2ac6bcc64ad931dc8048d33d64de)

open-behind: rewrite of internal logic

2020-06-29T12:51:52+00:00

There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez

features/shard: Aggregate size, block-count in iatt before unwinding setxattr

2020-06-29T12:51:41+00:00

Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops.
These iatts are further cached at layers like md-cache.
Shard translator, in its current state, simply returns these values without
updating the aggregated file size and block-count.

This patch fixes this problem.

Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea
Fixes: #1243
Signed-off-by: Krutika Dhananjay 
(cherry picked from commit 29ec66c6ab77e2d6893c6e213a3d1fb148702c99)

afr: more quorum checks in lookup and new entry marking

2020-06-29T12:51:32+00:00

Problem: See github issue for details.

Fix:
-In lookup if the entry exists in 2 out of 3 bricks, don't fail the
lookup with ENOENT just because there is an entrylk on the parent.
Consider quorum before deciding.

-If entry FOP does not succeed on quorum no. of bricks, do not perform
new entry mark.

Fixes: #1303
Change-Id: I56df8c89ad53b29fa450c7930a7b7ccec9f4a6c5
Signed-off-by: Ravishankar N 
(cherry picked from commit c4a6748f25d2c1ab3ebcf89952278ebf94c8d371)

locks: prevent deletion of locked entries

2020-06-29T12:51:20+00:00

To keep consistency inside transactions started by locking an entry or
an inode, this change delays the removal of entries that are currently
locked by one or more clients. Once all locks are released, the removal
is processed.

It has also been improved the detection of stale inodes in the locking
code of EC.

Fixes: #990
Change-Id: Ic8ba23d9480f80c7f74e7a310bf8a15922320fd5
Signed-off-by: Xavi Hernandez

cluster/afr: Prioritize ENOSPC over other errors

2020-06-16T04:56:19+00:00

Problem:
In a replicate/arbiter volume if file creations or writes fails on
quorum number of bricks and on one brick it is due to ENOSPC and
on other brick it fails for a different reason, it may fail with
errors other than ENOSPC in some cases.

Fix:
Prioritize ENOSPC over other lesser priority errors and do not set
op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any
error_no which can be misinterpreted by __afr_dir_write_finalize().

Also removing the function afr_has_arbiter_fop_cbk_quorum() which
might consider a successful reply form a single brick as quorum
success in some cases, whereas we always need fop to be successful
on quorum number of bricks in arbiter configuration.

Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08
Fixes: #1254
Signed-off-by: karthik-us 
(cherry picked from commit fa63b45ca5edf172b1b89b28b5db3c5129cc57b6)

md-cache: fix several NULL dereferences

2020-05-31T05:56:08+00:00

This patch includes the following CID from Coverity Scan:

  * 1425196
  * 1425197
  * 1425198
  * 1425199
  * 1525200

Change-Id: Iddcfea449d3dd56d4dfcc39f4c3c608518e611e4
Signed-off-by: Xavi Hernandez 
Updates: #1060
(cherry picked from commit b53ba17dbfd2d18c10e2c308b8899d36726ab440)

syncop: improve scaling and implement more tools

2020-05-30T05:17:12+00:00

The current scaling of the syncop thread pool is not working properly
and can leave some tasks in the run queue more time than necessary
when the maximum number of threads is not reached.

This patch provides a better scaling condition to react faster to
pending work.

Condition variables and sleep in the context of a synctask have also
been implemented. Their purpose is to replace regular condition
variables and sleeps that block synctask threads and prevent other
tasks to be executed.

The new features have been applied to several places in glusterd.

Change-Id: Ic50b7c73c104f9e41f08101a357d30b95efccfbf
Fixes: #1116
Signed-off-by: Xavi Hernandez

tests: skip tests on absence of reflink in xfs

2020-05-29T07:02:11+00:00

Fixes: #1223
Change-Id: I36cb72d920ffd77405051546615c5262c392daef
Signed-off-by: Pranith Kumar K 
(cherry picked from commit b85f01abab658d1d704cd6caf84dd64eddafbff7)

fuse: occasional logging for fuse device 'weird' write errors

2020-05-28T07:29:02+00:00

This change is a followup to
I510158843e4b1d482bdc496c2e97b1860dc1ba93.

In referred change we pushed log messages about 'weird'
write errors to fuse device out of sight, by reporting
them at Debug loglevel instead of Error (where
'weird' means errno is not POSIX compliant but having
meaningful semantics for FUSE protocol).

This solved the issue of spurious error reporting.
And so far so good: these messages don't indicate
an error condition by themselves. However, when they
come in high repetitions, that indicates a suboptimal
condition which should be reported.[1]

Therefore now we shall emit a Warning if a certain
errno occurs a certain number of times[2] as the
outcome of a write to the fuse device.

___
[1] typically ENOENTs and ENOTDIRs accumulate
when glusterfs' inode invalidation lags behind
the kernel's internal inode garbage collection
(in this case above errnos mean that the inode
which we requested to be invalidated is not found
in kernel). This can be mitigated with the
invalidate-limit command line / mount option,
cf. bz#1732717.

[2] 256, as of the current implementation.

Change-Id: I8cc7fe104da43a88875f93b0db49d5677cc16045
Updates: #1000
Signed-off-by: Csaba Henk 
(cherry picked from commit c1baf3c68b87584aea5389af958326f6ed01d7ec)