| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The order of operation in rebalance is as follows:
gf_defrag_fix_layout - > gf_defrag_process_dir - > gf_defrag_get_entry
gf_defrag_process_dir is passing to gf_defrag_get_entry a pointer to a
variable 'gf_defrag_get_entry', however this value is ignored (remains
unchanged in the method).
Based on the return value from gf_defrag_get_entry,
gf_defrag_process_dir may change it's return value to the 'magical'
number 2, however since the value of 'should_commit_hash' never changes,
this never happnes.
All of this is propagated back to gf_defrag_fix_layout and is now
removed from there as well.
Change-Id: Ibff297650cf84139bd26c830bfa44f81119b60d4
updates: #1002
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Part 1 of this patch https://review.gluster.org/#/c/glusterfs/+/24328/
Following part 1, this patch complety removes all traces of
"tier" feature in dht.
This is based in the work done in
https://review.gluster.org/#/c/glusterfs/+/23935/
Change-Id: I7fba1ab7249719301ca578b4a6f4acac748da145
updates: #1097
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The allocation
dir_dfmeta = GF_CALLOC(1, sizeof(*dir_dfmeta), gf_common_mt_pointer);
seems to cause a crash.
From the logs:
[2020-09-24 13:10:13.225935 +0000] I [dht-rebalance.c:3273:gf_defrag_process_dir]
0-dist-dht: migrate data called on /dir1
[2020-09-24 13:10:13.226587 +0000] E [mem-pool.c:61:gf_mem_set_acct_info]
(-->/usr/local/lib/glusterfs/9dev/xlator/cluster/distribute.so(+0x18e60)
[0x7f4b1f71ee60] -->/usr/local/lib/glusterfs/9dev/xlator/cluster/distribute.so(+0x173ab)
[0x7f4b1f71d3ab] -->/usr/local/lib/libglusterfs.so.0(+0x4d8e5) [0x7f4b357668e5] )
0-: Assertion failed: type <= mem_acct->num_types
[2020-09-24 13:10:13.226623 +0000] E [mem-pool.c:61:gf_mem_set_acct_info]
(-->/usr/local/lib/glusterfs/9dev/xlator/cluster/distribute.so(+0x18e60)
[0x7f4b1f71ee60] -->/usr/local/lib/glusterfs/9dev/xlator/cluster/distribute.so(+0x173d3)
[0x7f4b1f71d3d3] -->/usr/local/lib/libglusterfs.so.0(+0x4d8e5) [0x7f4b357668e5] )
0-: Assertion failed: type <= mem_acct->num_types
The following change fixes that crash.
fixes: #1511
Change-Id: Ibf605648981f7108e863c91a80370cf077ad7c4a
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The 3rd argument of gf_thread_create is a pointer to a method, however
in this case, a pointer to a pointer to a method is passed (2 levels of
indirection instead of 1).
Change-Id: Ic2d4ea75aa54c6bc85a80bd0277a0efa5e5814ad
updates: #1002
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the discussion on the issue, it is decided that it is
better to not have this implementation of the feature.
This reverts commit 3af9443c770837abe4f54db399623380ab9767a7.
Change-Id: I4e3bf18fc376cdb0cf29f1d98a915deca17c3496
Updates: #1422
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Prefer time_t and gf_time() over 'struct timeval' and gettimeofday()
where microseconds are not really used, drop unneeded 'struct timeval'
to 'struct timespec' conversion in dht_file_counter_thread().
Change-Id: Ibd802f79b8848df3f6175ca1fd82e93532bba38d
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #1002
|
|
|
|
|
|
|
|
|
|
| |
Add gf_tvdiff() and gf_tsdiff() to calculate the difference
between 'struct timeval' and 'struct timespec' values, use
them where appropriate.
Change-Id: I172be06ee84e99a1da76847c15e5ea3fbc059338
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Updates: #1002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For distribute only volumes we can use the information for
local subvolumes to avoid syncop calls which goes through the
whole stack to fetch stat and entries.
A separate function gf_defrag_fix_layout_puredist is introduced.
TODO: A glusterd flag needs to be introduced in case we want to
fall back to run the old way.
Perf numbers:
DirSize - 1Million Old New %diff
Depth - 100 (Run 1) 353 74 +377%
Depth - 100 (Run 2) 348 72 +377~%
Depth - 50 246 122 +100%
Depth - 3 174 114 +52%
Change-Id: I67cc136cebd34092fd775e69f74c2d5b33d3156d
Fixes: #1242
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A rebalance process does a lookup for every file in the dir it is processing
before checking if it supposed to migrate the file.
In this issue there are two rebalance processses running on a replica subvol:
R1 is migrating the FILE.
R2 is not supposed to migrate the FILE, but it does a lookup and
finds a stale linkfile which is mostly due to a stale layout.
Then, it tries to unlink the stale linkfile and gets EBUSY
as the linkfile fd is open due R1 migration.
As a result a misleading error msg about FILE migration failure
due EBUSY is logged in R2 logfile.
Fix:
suppress the error in case it occured in a node that
is not supposed to migrate the file.
fixes: #1371
Change-Id: I37832b404e2b0cc40ac5caf45f14c32c891e71f3
Signed-off-by: Tamar Shacked <tshacked@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
AFR doesn't delay post-op for fsync fop. For fsync heavy workloads
this leads to un-necessary fxattrop/finodelk for every fsync leading
to bad performance.
Fix:
Have delayed post-op for fsync. Add special flag in xdata to indicate
that afr shouldn't delay post-op in cases where either the
process will terminate or graph-switch would happen. Otherwise it leads
to un-necessary heals when the graph-switch/process-termination
happens before delayed-post-op completes.
Fixes: #1253
Change-Id: I531940d13269a111c49e0510d49514dc169f4577
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently data migration in rebalance reads sparse file sequentially,
disregarding which segments are holes and which are data. This can lead
to extremely long migration time for large sparse file.
Data migration mechanism needs to be enhanced so only data segments are
read and migrated. This can be achieved using lseek to seek for holes
and data in the file.
This enhancement is a consequence of
https://bugzilla.redhat.com/show_bug.cgi?id=1823703
fixes: #1222
Change-Id: If5f448a0c532926464e1f34f504c5c94749b08c3
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
| |
fixes: #1258
Change-Id: I9d1fb512072bcc540d21d47da5b15ae1b79cf2b8
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently opendir is done from the cluster view. Hence, even if
one opendir is successful, the opendir operation as a whole is considered
successful.
But since in gf_defrag_get_entry we fetch entries selectively from
local_subvols, we need to opendir individually on those local subvols
and keep track of fds separately. Otherwise it is possible that opendir
failed on one of the subvol and we wind readdirp call on the fd to the
corresponding subvol, which will ultimately result in EINVAL error.
fixes: #1218
Change-Id: I50dd88b9597852a15579f4ee325918979417f570
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Selfheal as part of directory does not return an error if
the layout setxattr fails. This is because the actual lookup fop
must have been successful to proceed for layout heal. Hence, we could
not tell if fix-layout failed in rebalance.
Solution: We can check this information in the layout structure that
whether all the xlators have returned error.
fixes: #1200
Change-Id: I3e5f2a36c0d934c21476a73a9a5473d8e490cde7
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Rebalance process handling of files which contains holes casued
rebalance to fail with "No space left on device" errors.
This patch modifies the code-flow in such a way that files with holes
will be rebalanced correctly.
fixes: #1187
Change-Id: I89bc3d4ea7f074db7213d759c49307f379543932
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
If rebalance process is failing, recursive failures appear in the log
file, which is distracting from the root cause.
In order to avoid recursive failure, error handling mechanism has
been modified.
fixes: #1072
Change-Id: Iae19430323630acd97c2c8d35685626d8da747a7
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is removing some of the "tier" code in dht xlator, as it is no longer
being used.
Not all of the not-needed code is removed at once, so reviewing is easier.
Follow up patches removing additional unused code will follow.
This is based in the work done in https://review.gluster.org/#/c/glusterfs/+/23935/
Change-Id: I3cb6a0c5d8f14afcd87cf021ef8f74b91c0f908a
updates: #1097
Signed-off-by: Barak Sason Rofman <bsaonro@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Currently even though gf_defrag_fix_layout fails with ENOENT or ESTALE, a
subsequent call is made to gf_defrag_process_dir leading to rebalance failure.
fixes: #1102
Change-Id: Ib0c309fd78e89a000fed3feb4bbe2c5b48e61478
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Probelm description:
When topping rebalance, the following error messages appear in the
rebalance log file:
[2020-01-28 14:31:42.452070] W [dht-rebalance.c:3447:gf_defrag_process_dir] 0-distrep-dht: Found error from gf_defrag_get_entry
[2020-01-28 14:31:42.452764] E [MSGID: 109111] [dht-rebalance.c:3971:gf_defrag_fix_layout] 0-distrep-dht: gf_defrag_process_dir failed for directory: /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31
[2020-01-28 14:31:42.453498] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30
In order to avoid seing these error messages, a modification to the
error handling mechanism has been made.
In addition, several log messages had been added in order to improve debugging efficiency
fixes: bz#1800956
Change-Id: Ifc82dae79ab3da9fe22ee25088a2a6b855afcfcf
Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Ieb7531af19ae89fb8a8387e81663c7f157b10c02
Updates: bz#1765421
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In many places we use it, compare to it, etc. It could be a static variable,
as it really doesn't change. I think it's better than initializing to 0
and then doing gfid[15] = 1 or other tricks.
I think there are additional oppportunuties to make more variables static.
This is an attempt at an easy one.
Change-Id: I7f23a30a94056d8f043645371ab841cbd0f90d19
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
| |
File truncate operations during a migration were not handled properly.
This has been fixed.
Change-Id: Ic642d257e893641236a4a21ab69fcc7a569dd70a
Fixes: bz#1745967
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Initialize a dictionary for example seems to be prefectly fine to be done
before taking a lock.
Change-Id: Ib29516c4efa8f0e2b526d512beab488fcd16d2e7
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )
Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.
Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A rebalance process currently only looks up files
that it is supposed to migrate. This could cause issues
when lookup-optimize is enabled as the dir layout can be
updated with the commit hash before all files are looked up.
This is expecially problematic of one of the rebalance processes
fails to complete as clients will try to access files whose
linkto files might not have been created.
Each process will now lookup every file in the directory it is
processing.
Pros: Less likely that files will be inaccessible.
Cons: More lookup requests sent to the bricks and a potential
performance hit.
Note: this does not handle races such as when a layout is updated on disk
just as the create fop is sent by the client.
Change-Id: I22b55846effc08d3b827c3af9335229335f67fb8
fixes: bz#1711764
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch creates a specific function to set the thread name using a
string format and a variable argument list, like printf().
This function is used to set the thread name from gf_thread_create(),
which now accepts a variable argument list to create the full name. It's
not necessary anymore to use a local array to build the name of the
thread. This is done automatically.
Change-Id: Idd8d01fd462c227359b96e98699f8c6d962dc17c
Updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* we shouldn't be using 'local' after DHT_STACK_UNWIND() as it frees
the content of local. Add a 'goto out' or similar logic to handle
the situation.
* fix possible overlook of unref(dict), instead of unref(xdata).
* make coverity happy by re-ordering unref in meta-defaults.
* gfid-access: re-order dictionary allocation so we don't have to
do a extra unref.
* other obvious errors reported.
updates: bz#789278
Change-Id: If05961ee946b0c4868df19861d7e4a927a2a2489
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.
Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation <> in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs
This change although big, is just moving around the headers and
making it correct when including these headers from other sources.
This helps us correctly include libglusterfs includes without
namespace conflicts.
Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the proposal to remove few features as they are not
actively maintained [1], removing tier translator from the
build. Also make sure there are no regression tests involving
tiering feature are present.
[1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html
Change-Id: I2c177f711f9b54b7b24e1a13525ff3132bd9a9c5
updates: bz#1642807
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dereferencing NUll pointers this,local and stbuf.
1.Replaced this->name with "dht".
2.Removed GF_VALIDATE_OR_GOTO.
3.Removed the check for "stbuf" and "this".
Updates: bz#1622665
Change-Id: Id2fb2270d5ec37b76fa2aae1f1c8dca72dcc728a
Signed-off-by: Harpreet Lalwani <hlalwani@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
When compiling in other architectures there appear many warnings. Some
of them are actual problems that prevent gluster to work correctly on
those architectures.
Change-Id: Icdc7107a2bc2da662903c51910beddb84bdf03c0
fixes: bz#1632717
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
| |
Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu <nigelb@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xlators/cluster/afr/src/afr-common.c
xlators/cluster/dht/src/dht-common.c
xlators/cluster/dht/src/dht-rebalance.c
xlators/cluster/stripe/src/stripe-helpers.c
strncpy may not be very efficient for short strings copied into
a large buffer: If the length of src is less than n,
strncpy() writes additional null bytes to dest to ensure
that a total of n bytes are written.
Instead, use snprintf().
Also:
- save the result of strlen() and re-use it when possible.
- move from strlen to SLEN (sizeof() ) for const strings.
Compile-tested only!
Change-Id: Icdf79dd3d9f9ff120e4720ff2b8bd016df575c38
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
dict_set
There is no need to remove an item before re-setting it.
Compile-tested only!
Change-Id: I2869aec9ebf474859127b8b38d284246e6097e84
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Fixes 1133997, 1370910, 1382387, 1382444,
1394635
Change-Id: Ie63ad47abd5519b9b9536da26b61ed4c9eaf2c75
updates: bz#789278
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
An error caused skipped files to be counted as
rebalanced files.
Change-Id: I02333f099fb8b73ba953f41a2922021a1e4da7be
fixes: bz#1615474
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting in Fedora 26 which has gcc-7.1.x, -Wformat-trunction is enabled
with -Wformat, resulting in a flood of new warnings. This many warnings
is a concern because it makes it hard(er) to see other warnings that
should be addressed.
An example is at
https://kojipkgs.fedoraproject.org//packages/glusterfs/3.12.0/1.fc28/data/logs/x86_64/build.log
For more info see https://review.gluster.org/#/c/18267/
I can't find much (or good) documentation on the heuristics the
compiler uses for this warning. In the case of printing integer types
it appears it looks at the available space in the destination and the
range of values for the variable and/or its type.
To address the specific question about why 0x3ff versus 0xfff to mask
the value, either would suffice to hint to the compiler that the
printed value will fit in three characters. But the loop is from
0...1023 (or 0...0x3ff if you prefer) so I chose that as a more
"accurate" mask to use as it exactly matches the range of values of
the loop.
Fixes: bz#1492847
Change-Id: I6e309ba42159841131d8241bfc0566ef09e00aa9
|
|
|
|
|
|
|
|
|
|
|
|
| |
Please review, it's not always just the comments that were fixed.
I've had to revert of course all calls to creat() that were changed
to create() ...
Only compile-tested!
Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Created init and cleanup functions for certain
functionality in order to improve readability.
Removed unused code.
Change-Id: Ia6a2f4ab64923b6ea8e10487227fb5621eec1488
updates: bz#1586363
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
| |
syncop functions return -op_errno.
Change-Id: Ifdb1bd1d1d11972b4306a2336e6737d6236a2fb1
fixes: bz#1580238
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An entry from readdirp might get renamed just before migration leading to
lookup failures. For such lookup failure, remove-brick process does not
see any increment in failure count. Though there is a warning message
after remove-brick commit for the user to check in the decommissioned brick
for any files those are not migrated, it's better to increase the failure count
so that user can check in the decommissioned bricks for files before commit.
Note: This can result in false negative cases for rm -rf interaction with
remove-brick op, where remove-brick shows non-zero failed count, but the
entry was actually deleted by user.
Fixes :bz#1580269
Change-Id: Icd1047ab9edc1d5bfc231a1f417a7801c424917c
fixes: bz#1580269
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Corrected the name of the xattr and fixed
the code to log an error only if op_errno
is not ENODATA or ENOATTR.
Change-Id: I42c5b1d838eec586ac7bed2471eb1d27ff09a9ea
fixes: bz#1580238
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test case:
# while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv
"test$uuid" "test" -f || break; echo "done:$uuid"; done
This script was run in parallel from multiple mountpoints
Along the course of getting the above usecase working, many issues
were found:
Issue 1:
=======
consider a case of rename (src, dst). We can encounter a situation
where,
* dst is a file present at the time of lookup
* dst is removed by the time rename fop reaches glusterfs
In this scenario, acquring inodelk on dst fails with ESTALE resulting
in failure of rename. However, as per POSIX irrespective of whether
dst is present or not, rename should be successful. Acquiring entrylk
provides synchronization even in races like this.
Algorithm:
1. Take inodelks on src and dst (if dst is present) on respective
cached subvols. These inodelks are done to preserve backward
compatibility with older clients, so that synchronization is
preserved when a volume is mounted by clients of different
versions. Once relevant older versions (3.10, 3.12, 3.13) reach
EOL, this code can be removed.
2. Ignore ENOENT/ESTALE errors of inodelk on dst.
3. protect namespace of src and dst. To protect namespace of a file,
take inodelk on parent on hashed subvol, then take entrylk on the
same subvol on parent with basename of file. inodelk on parent is
done to guard against changes to parent layout so that hashed
subvol won't change during rename.
4. <rest of rename continues>
5. unlock all locks
Issue 2:
========
linkfile creation in lookup codepath can race with a rename. Imagine
the following scenario:
* lookup finds a data-file with gfid - gfid-dst - without a
corresponding linkto file on hashed-subvol. It decides to create
linkto file with gfid - gfid-dst.
- Note that some codepaths of dht-rename deletes linkto file of
dst as first step. So, a lookup racing with an in-progress
rename can easily run into this situation.
* a rename (src-path:gfid-src, dst-path:gfid-dst) renames data-file
and hence gfid of data-file changes to gfid-src with path dst-path.
* lookup proceeds and creates linkto file - dst-path - with gfid -
dst-gfid - on hashed-subvol.
* rename tries to create a linkto file dst-path with src-gfid on
hashed-subvol, but it fails with EEXIST. But EEXIST is ignored
during linkto file creation.
Now we've ended with dst-path having different gfids - dst-gfid on
linkto file and src-gfid on data file. Future lookups on dst-path will
always fail with ESTALE, due to differing gfids.
The fix is to synchronize linkfile creation in lookup path with rename
using the same mechanism of protecting namespace explained in solution
of Issue 1. Once locks are acquired, before proceeding with linkfile
creation, we check whether conditions for linkto file creation are
still valid. If not, we skip linkto file creation.
Issue 3:
========
gfid of dst-path can change by the time locks are acquired. This
means, either another rename overwrote dst-path or dst-path was
deleted and recreated by a different client. When this happens,
cached-subvol for dst can change. If rename proceeds with old-gfid and
old-cached subvol, we'll end up in inconsistent state(s) like dst-path
with different gfids on different subvols, more than one data-file
being present etc.
Fix is to do the lookup with a new inode after protecting namespace of
dst. Post lookup, we've to compare gfids and correct local state
appropriately to be in sync with backend.
Issue 4:
========
During revalidate lookup, if following a linkto file doesn't lead to a
valid data-file, local->cached-subvol was not reset to NULL. This
means we would be operating on a stale state which can lead to
inconsistency. As a fix, reset it to NULL before proceeding with
lookup everywhere.
Issue 5:
========
Stale dentries left out in inode table on brick resulted in failures
of link fop even though the file/dentry didn't exist on backend fs. A
patch is submitted to fix this issue. Please check the dependency tree
of current patch on gerrit for details
In short, we fix the problem by not blindly trusting the
inode-table. Instead we validate whether dentry is present by doing
lookup on backend fs.
Change-Id: I832e5c47d232f90c4edb1fafc512bf19bebde165
updates: bz#1543279
BUG: 1543279
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: A directory deletion can happen just before gf_defrag_settle_hash
which internally does a setxattr operation on a directory.
Solution: Ignore ENOENT and ESTALE errors
Fixes: bz#1572581
Change-Id: I2f91809f3b5e02976c4c3a5a596406a8b2f8f6f2
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The decision as to which node would migrate a file
was based on the gfid of the file. Files were divided
among the nodes for the replica/disperse set. However,
if a brick was down when rebalance started, the nodeuuids
would be saved as NULL and a set of files would not be migrated.
Now, if the nodeuuid is NULL, the first non-null entry in
the set is the node responsible for migrating the file.
Change-Id: I72554c107792c7d534e0f25640654b6f8417d373
fixes: bz#1564198
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lookup-optimize has been shown to improve create
performance. The code has been in the project for several
years and is considered stable.
Enabling this by default in order to test this in the
upstream regression runs.
Change-Id: Iab792979ee34f0af4713931e0b5b399c23f65313
updates: bz#1557435
BUG: 1557435
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
ENOSPC returned by a file migration is no longer
considered a rebalance failure.
Change-Id: I21cf3a8acdc827bc478e138d6cb5db649d53a28c
fixes: bz#1553598
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
For skipped files, use a return value of 1 to prevent
error messages being logged.
Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861
BUG: 1553598
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
dht_migrate_file no longer prints an error if getxattr for
posix acls fails with ENODATA/ENOATTR.
Change-Id: Id9ecf6852cb5294c1c154b28d609889ea3420e1c
BUG: 1546954
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
| |
Replaced "then" with "than"
Change-Id: I73090e8c1a639befd7c5458e8d63bd173248bc7d
BUG: 1547128
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|