| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.
Note that ctime still doesn't support consistent time across
distribute sub-volume.
This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.
Backport of:
> Patch: https://review.gluster.org/23127/
> Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
> BUG: 1734026
> Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 304640e55c0f3c6d15f4e230dc6376e4f5020fea)
Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
Signed-off-by: Kotresh HR <khiremat@redhat.com>
fixes: bz#1752429
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were not passing xattr_req when doing a name self heal
as well as a meta data heal. Because of this, some xdata
was missing which causes i/o errors
Backport of > https://review.gluster.org/#/c/glusterfs/+/23024/
>Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
>Fixes: bz#1728770
>Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Fixes: bz#1749305
Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
(cherry picked from commit d026f0bcfd301712e4f0671ccf238f43f2e6dd30)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
get_real_filename is implemented as a virtual extended attribute to help
Samba implement the case-insensitive but case preserving SMB protocol
more efficiently. It is implemented as a getxattr call on the parent directory
with the virtual key of "get_real_filename:<entryname>" by looking for a
spelling with different case for the provided file/dir name (<entryname>)
and returning this correct spelling as a result if the entry is found.
Originally (05aaec645a6262d431486eb5ac7cd702646cfcfb), the
implementation used the ENOENT errno to return the authoritative answer
that <entryname> does not exist in any case folding.
Now this implementation is actually a violation or misuse of the defined
API for the getxattr call which returns ENOENT for the case that the dir
that the call is made against does not exist and ENOATTR (or the synonym
ENODATA) for the case that the xattr does not exist.
This was not a problem until the gluster fuse-bridge was changed
to do map ENOENT to ESTALE in 59629f1da9dca670d5dcc6425f7f89b3e96b46bf,
after which we the getxattr call for get_real_filename returned an
ESTALE instead of ENOENT breaking the expectation in Samba.
It is an independent problem that ESTALE should not leak out to user
space but is intended to trigger retries between fuse and gluster.
But nevertheless, the semantics seem to be incorrect here and should
be changed.
This patch changes the implementation of the get_real_filename virtual
xattr to correctly return ENOATTR instead of ENOENT if the file/directory
being looked up is not found.
The Samba glusterfs_fuse vfs module which takes advantage of the
get_real_filename over a fuse mount will receive a corresponding change
to map ENOATTR to ENOENT. Without this change, it will still work
correctly, but the performance optimization for nonexisting files is
lost. On the other hand side, this change removes the distinction
between the old not-implemented case and the implemented case.
So Samba changed to treat ENOATTR like ENOENT will not work correctly
any more against old servers that don't implement get_real_filename.
I.e. existing files will be reported as non-existing
Change-Id: I971b427ab8410636d5d201157d9af70e0d075b67
fixes: bz#1745914
Signed-off-by: Michael Adam <obnox@samba.org>
(cherry picked from commit dc1b87fcfef08c9497b0c02b2410c9d18bbc2dba)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...whenever shd is re-enabled after disabling or there is a change in
`cluster.heal-timeout`, without needing to restart shd or waiting for the
current `cluster.heal-timeout` seconds to expire.
See BZ 1743988 for more details.
Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe
fixes: bz#1747301
Reported-by: Glen Kiessling <glenk1973@hotmail.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit 600ba94183333c4af9b4a09616690994fd528478)
|
|
|
|
|
|
|
|
|
|
| |
When discard/truncate performs write fop, it should do so
after updating lock->good_mask to make sure readv happens
on the correct mask
fixes: bz#1739424
Change-Id: Idfef0bbcca8860d53707094722e6ba3f81c583b7
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
when a file needs to be re-opened O_APPEND and O_EXCL
flags are not filtered in EC.
- O_APPEND should be filtered because EC doesn't send O_APPEND below EC for
open to make sure writes happen on the individual fragments instead of at the
end of the file.
- O_EXCL should be filtered because shd could have created the file so even
when file exists open should succeed
- O_CREAT should be filtered because open happens with gfid as parameter. So
open fop will create just the gfid which will lead to problems.
Fix:
Filter out these two flags in reopen.
Change-Id: Ia280470fcb5188a09caa07bf665a2a94bce23bc4
Fixes: bz#1739426
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
There are cases where fop->mask may have fop->healing added
and readv shouldn't be wound on fop->healing. To avoid this
always wind readv to lock->good_mask
updates: bz#1739424
Change-Id: I2226ef0229daf5ff315d51e868b980ee48060b87
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EC doesn't allow concurrent writes on overlapping areas, they are
serialized. However non-overlapping writes are serviced in parallel.
When a write is not aligned, EC first needs to read the entire chunk
from disk, apply the modified fragment and write it again.
The problem appears on sparse files because a write to an offset
implicitly creates data on offsets below it (so, in some way, they
are overlapping). For example, if a file is empty and we read 10 bytes
from offset 10, read() will return 0 bytes. Now, if we write one byte
at offset 1M and retry the same read, the system call will return 10
bytes (all containing 0's).
So if we have two writes, the first one at offset 10 and the second one
at offset 1M, EC will send both in parallel because they do not overlap.
However, the first one will try to read missing data from the first chunk
(i.e. offsets 0 to 9) to recombine the entire chunk and do the final write.
This read will happen in parallel with the write to 1M. What could happen
is that half of the bricks process the write before the read, and the
half do the read before the write. Some bricks will return 10 bytes of
data while the otherw will return 0 bytes (because the file on the brick
has not been expanded yet).
When EC tries to recombine the answers from the bricks, it can't, because
it needs more than half consistent answers to recover the data. So this
read fails with EIO error. This error is propagated to the parent write,
which is aborted and EIO is returned to the application.
The issue happened because EC assumed that a write to a given offset
implies that offsets below it exist.
This fix prevents the read of the chunk from bricks if the current size
of the file is smaller than the read chunk offset. This size is
correctly tracked, so this fixes the issue.
Also modifying ec-stripe.t file for Test #13 within it.
In this patch, if a file size is less than the offset we are writing, we
fill zeros in head and tail and do not consider it strip cache miss.
That actually make sense as we know what data that part holds and there is
no need of reading it from bricks.
Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6
Fixes: bz#1739427
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
| |
If lock has info, fop should inherit healing mask from it.
Otherwise, fop cannot inherit right healing when changed_flags is zero.
Change-Id: Ife80c9169d2c555024347a20300b0583f7e8a87f
updates: bz#1739424
Signed-off-by: Kinglong Mee <mijinlong@horiscale.com>
|
|
|
|
|
|
|
| |
Fixes: bz#1741041
Change-Id: I29e338bac62104233a6f80212df8d0fb016affda
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit 8e9c53ebf16705b9a1db2fc486dc24a5cb244ddd)
|
|
|
|
|
|
|
|
|
| |
Fixed a memleak in dht_rename_cbk when creating
a linkto file.
Change-Id: I705adef3cb79e33806520fc2b15558e90e2c211c
fixes: bz#1739337
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In case of thin arbiter, before index healer starts crawling the
indices at every heal-timeout interval, even if there is nothing to
be healed it will send an upcall notification to all the clients to
release any AFR_TA_DOM_NOTIFY locks that they hold. SHD will wait
for the upcall to return before proceeding with the heal even though
there is nothing to be healed. This will also invalidates the cached
information about the bricks states on the clients which leads to
extra calls on TA from clients for the next reads & writes if needed.
This will impact the IO performance.
Fix:
- Before sending the upcall to the clients, check for any pending heals
on TA without taking any locks.
- If there is nothing marked bad on TA, then continue with the index
crawl to heal any dirty markings present on the files due to any post-op
failure.
- If there is a brick marked as bad on TA, then take the
AFR_TA_DOM_NOTIFY lock on TA from SHD, get the state on TA and
continue with the current healing process.
Change-Id: Ieb477bc6cb18bbdfd4e7a0453c5ed79b574ec9d6
fixes: bz#1729483
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problems:
1. When checking for type and gfid mismatch, if the type or gfid
is unknown because of missing gfid handle and the gfid xattr
it will be reported as type or gfid mismatch and the heal will
not complete.
2. If the source selected during entry heal has null gfid the same
will be sent to afr_lookup_and_heal_gfid(). In this function when
we try to assign the gfid on the bricks where it does not exist,
we are considering the same gfid and try to assign that on those
bricks. This will fail in posix_gfid_set() since the gfid sent
is null.
Fix:
If the gfid sent to afr_lookup_and_heal_gfid() is null choose a
valid gfid before proceeding to assign the gfid on the bricks
where it is missing.
In afr_selfheal_detect_gfid_and_type_mismatch(), do not report
type/gfid mismatch if the type/gfid is unknown or not set.
Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da
fixes: bz#1729481
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Race:
Thread-1 Thread-2
1) Does ec_get_size_version() to perform
pre-op fxattrop as part of write-1
2) Calls ec_set_dirty_flag() in
ec_get_size_version() for write-2.
This sets dirty[] to 1
3) Completes executing
ec_prepare_update_cbk leading to
ctx->dirty[] = '1'
4) Takes LOCK(inode->lock) to check if there are
any flags and sets dirty-flag because
lock->waiting_flag is 0 now. This leads to
fxattrop to increment on-disk dirty[] to '2'
At the end of the writes the file will be marked for heal even when it doesn't need heal.
Fix:
Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked
as '1' in step 2) above
Updates bz#1593224
Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A name-less lookup does not contain parent's stat,
It is hard to check the lookuped file is at the right path.
This patch changes to a name lookup, and check file's gfid with
expected gfid. If the gfid is different, mark it estale.
fixes: bz#1702131
Change-Id: I2de20b10d680eed1e2fb1d3830b3b3dec4520dbf
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Network latency is an important factor selecting a read subvolume.
So this patch is adding two new policy.
1) We measure the latency of a child during a GF_DUMP rpc call.
Then use this latency to pick a read subvol having the least
latency.
2) Second one is an hybrid mode where it calculates the effective
latency by multiplying outstanding pending read request and
latency, and choose the least one.
Change-Id: Ia49c8a08ab61f7dcdad8b8950aa4d338e7accf97
fixes: #520
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Some internal DHT xattrs were not being
removed when calling getxattr in pass-through mode.
This has been fixed.
Change-Id: If7e3dbc7b495db88a566bd560888e3e9c167defa
fixes: bz#1721435
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
We should free the mem_pool local_pool during an afr_fini.
Otherwise this will lead to mem leak for shd
Change-Id: I805a34a88077bf7b886c28b403798bf9eeeb1c0b
Updates: bz#1716695
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht-common.c: because there was a 'goto err' before assigning
the 'local' variable, there is possibility of NULL
dereference. As the check which was done wouldn't
ever be true, removed the check.
glusterd-geo-rep.c: a possible path where 'slave_host' could be
NULL when it gets passed to strcmp() is found.
strcmp() expects a valid string. Add a NULL check.
Updates: bz#1622665
Change-Id: I64c280bc1beac9a2b109e8fa88f2a5ce8b823c3a
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )
Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.
Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.
Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.
Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.
Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1717819
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
While we process a cleanup, there is a chance for a race between
async operations, for example ec_launch_replace_heal. So this can
lead to invalid mem access.
Solution:
Just like we track on going heal fops, we can also track fops like
ec_launch_replace_heal, so that we can decide when to send a
PARENT_DOWN request.
Change-Id: I055391c5c6c34d58aef7336847f3b570cb831298
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Fixed a bug in the revalidate code path that wiped out
directory permissions if no mds subvol was found.
Change-Id: I8b4239ffee7001493c59d4032a2d3062586ea115
fixes: bz#1716830
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the following CID's:
* 1124829
* 1274075
* 1274083
* 1274128
* 1274135
* 1274141
* 1274143
* 1274197
* 1274205
* 1274210
* 1274211
* 1288801
* 1398629
Change-Id: Ia7c86cfab3245b20777ffa296e1a59748040f558
Updates: bz#789278
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EC was ignoring lock contention notifications received while a lock was
being acquired. When a lock is partially acquired (some bricks have
granted the lock but some others not yet) we can receive notifications
from acquired bricks, which should be honored, since we may not receive
more notifications after that.
Since EC was ignoring them, once the lock was acquired, it was not
released until the eager-lock timeout, causing unnecessary delays on
other clients.
This fix takes into consideration the notifications received before
having completed the full lock acquisition. After that, the lock will
be releaed as soon as possible.
Fixes: bz#1708156
Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A rebalance process currently only looks up files
that it is supposed to migrate. This could cause issues
when lookup-optimize is enabled as the dir layout can be
updated with the commit hash before all files are looked up.
This is expecially problematic of one of the rebalance processes
fails to complete as clients will try to access files whose
linkto files might not have been created.
Each process will now lookup every file in the directory it is
processing.
Pros: Less likely that files will be inaccessible.
Cons: More lookup requests sent to the bricks and a potential
performance hit.
Note: this does not handle races such as when a layout is updated on disk
just as the create fop is sent by the client.
Change-Id: I22b55846effc08d3b827c3af9335229335f67fb8
fixes: bz#1711764
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During a graph cleanup, we first sent a PARENT_DOWN and wait for
a child down to ultimately free the xlator and the graph.
In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event.
But a racing event like CHILD_UP or event xl_op may trigger healing threads
after threads cleanup.
So there is a chance that the threads might access a freed private variabe
Change-Id: I252d10181bb67b95900c903d479de707a8489532
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In function "afr_selfheal_entry_granular", after completing the
heal we are not destroying the frame. This will lead to crash.
when we execute statedump operation, where it tried to access
xlator object. If this xlator object is freed as part of the
graph destroy this will lead to an invalid memory access
Change-Id: I0a5e78e704ef257c3ac0087eab2c310e78fbe36d
fixes: bz#1708926
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
We were not properly cleaning self-heal daemon resources
during ec fini. With shd multiplexing, it is absolutely
necessary to cleanup all the resources during ec fini.
Change-Id: Iae4f1bce7d8c2e1da51ac568700a51088f3cc7f2
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- pass fop state instead of afr local to
afr_ta_dom_lock_check_and_release()
- avoid afr_lock_release_synctask() being called simultaneosuly from
notify code path and transaction (post-op) code path due to races.
- Check if the post-op on TA is valid based on event_gen checks.
- Invalidate in-memory information when we get TA child down.
Note: Thi patch addresses some pending review comments of commit
053b1309dc8fbc05fcde5223e734da9f694cf5cc
(https://review.gluster.org/#/c/glusterfs/+/20095/)
fixes: bz#1698449
Change-Id: I2ccd7e1b53362f9f3fed8680aecb23b5011eb18c
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was working on a blog about troubleshooting AFR issues and I wanted to copy
the messages logged by self-heal for my blog. I then realized that AFR-v2 is not
logging *before* attempting data heal while it logs it for metadata and entry
heals.
I [MSGID: 108026] [afr-self-heal-entry.c:883:afr_selfheal_entry_do]
0-testvol-replicate-0: performing entry selfheal on
d120c0cf-6e87-454b-965b-0d83a4c752bb
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed entry selfheal on
d120c0cf-6e87-454b-965b-0d83a4c752bb. sources=[0] 2 sinks=1
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed data selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1
I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-testvol-replicate-0: performing metadata selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed metadata selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1
Adding it in this patch. Now there is a 'performing' and a corresponding
'Completed' message for every type of heal.
fixes: bz#1707746
Change-Id: I0b954cf1e17b48280aefa76640b5119b92133d61
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: If any custom xattrs are set on the directory before
add a brick, xattrs are not healed on the directory
after adding a brick.
Solution: xattr are not healed because dht_selfheal_dir_mkdir_lookup_cbk
checks the value of MDS and if MDS value is not negative
selfheal code path does not take reference of MDS xattrs.Change the
condition to take reference of MDS xattr so that custom xattrs are
populated on newly added brick
Updates: bz#1702299
Change-Id: Id14beedb98cce6928055f294e1594b22132e811c
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed coverity error, "Unchecked return value (CHECKED_RETURN)".
Checking return value & logging error message if afr_set_pending_dict
fails.
updates: bz#789278
Change-Id: Iab7da6b4f3cd0622b95b8e1c412b007a330467e5
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Right now, the timeout is written by hard code,
fix it by using heal-timeout.
fixes: bz#1703020
Change-Id: I0d154e7807f9dba7efc3896805559bbfaa7af2ad
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Doing re-open with O_TRUNC will truncate the fragment even when it is not
needed needing extra heals
Fix:
At the time of re-open don't use O_TRUNC.
fixes bz#1706603
Change-Id: Idc6408968efaad897b95a5a52481c66e843d3fb8
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Part 2: Modify dht_revalidate_cbk to call
dht_selfheal_directory instead of separate calls
to heal attrs and xattrs.
Change-Id: Id41ac6c4220c2c35484812bbfc6157fc3c86b142
updates: bz#1590385
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently EC tries to reopen fd's that have been opened while a brick
was down. This is done as part of regular write operations, just after
having acquired the locks, and it's sent as a sub-fop of the main write
fop.
There were two problems:
1. The reopen was attempted on all UP bricks, even if a previous lock
didn't succeed. This is incorrect because most probably the open will
fail.
2. If reopen is sent and fails, the error is propagated to the main
operation, causing it to fail when it shouldn't.
To fix this, we only attempt reopens on bricks where the current fop
owns a lock, and we prevent any error to be propagated to the main
fop.
To implement this behaviour an argument used to indicate the minimum
number of required answers has overloaded to also include some flags. To
make the change consistent, it has been necessary to rename the
argument, which means that a lot of files have been changed. However
there are no functional changes.
This change has also uncovered a problem in discard code, which didn't
correctely process requests of small sizes because no real discard fop
was being processed, only a write of 0's on some region. In this case
some fields of the fop remained uninitialized or with incorrect values.
To fix this, a new function has been created to simulate success on a
fop and it's used in the discard case.
Thanks to Pranith for providing a test script that has also detected an
issue in this patch. This patch includes a small modification of this
script to force data to be written into bricks before stopping them.
Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec
Fixes: bz#1699866
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
| |
Updates: bz#1624701
Change-Id: I7152c28ad85925abccdcc4cd6de8cb2a2b847a51
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When eager-lock lock acquisition fails because of say network failures, the
local is not being removed from owners_list, this leads to accumulation of
waiting frames and the application will hang because the waiting frames are
under the assumption that another transaction is in the process of acquiring
lock because owner-list is not empty. Handled this case as well in this patch.
Added asserts to make it easier to find these problems in future.
fixes bz#1696599
Change-Id: I3101393265e9827755725b1f2d94a93d8709e923
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
memdup() and gf_memdup() have the same implementation. Removed one API
as the presence of both can be confusing.
Change-Id: I562130c668457e13e4288e592792872d2e49887e
updates: bz#1193929
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
ec_truncate_clean does writing under the lock granted for truncate,
but the lock is calculated by ec_adjust_offset_up, so that,
the write in ec_truncate_clean is out of lock.
Updates: bz#1699189
Change-Id: Idbe1fd48d26afe49c36b77db9f12e0907f5a4134
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
|
|
|
|
|
|
|
|
|
| |
This patch address post-merge review comments for commit
5784a00f997212d34bd52b2303e20c097240d91c
Change-Id: I7ed954664a2ae8e1091d23ee3ceb9c66e83bfeac
fixes: bz#1697930
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Part 1: refactor the dht_lookup_dir_cbk
and dht_selfheal_directory functions.
Added a simple dht selfheal directory test
Change-Id: I1410c26359e3c14b396adbe751937a52bd2fcff9
updates: bz#1590385
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
When split-brain choice is changed from one brick to another
brick, inode-invalidate is not called so readv call is served
from cache leading to failures in split-brain-resolution.t.
Fixed it by calling inode_invaldate() when this happens.
updates bz#1193929
Change-Id: I2624614eec38c0303f3e1dc55dfae3d4b864218b
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we use heal info command, it takes lot of time as in
some cases it takes lock on entries to find out if the
entry actually needs heal or not.
There are some cases where we can avoid these locks and
can conclude if the entry needs heal or not.
1 - We do a lookup (without lock) on an entry, which we found in
.glusterfs/indices/xattrop, and find that lock count is
zero. Now if the file contains dirty bit set on all or any
brick, we can say that this entry needs heal.
2 - If the lock count is one and dirty is greater than 1,
then it also means that some fop had left the dirty bit set
which made the dirty count of current fop (which has taken lock)
more than one. At this point also we can definitely say that
this entry needs heal.
This patch is modifying code to take into consideration above two
points.
It is also changing code to not to call ec_heal_inspect if ec_heal_do
was called from client side heal. Client side heal triggeres heal
only when it is sure that it requires heal.
[We have changed the code to not to call heal for lookup]
updates bz#1689799
Change-Id: I7f09f0ecd12f65a353297aefd57026fd2bebdf9c
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found missing assignment of lk-owner for an inodelk/entrylk before winding
the fops. locks xlator at the moment allows this operation. This leads to
multiple threads in the same client being able to get locks on the inode
because lk-owner is same and transport is same. So isolation with locks can't
be achieved. To fix it, we need locks xlator change which will disallow
null-lk-owner based inodelk/entrylk/lk. To achieve that we need to first
fix all the places which do this mistake.
updates bz#1624701
Change-Id: Ic3431da3f451a1414f1f4fdcfc4cf41e555f69dd
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fixes afr_ta_read_txn() to handle inode refresh failures.
code-path.
- Fixes a double free issue of dict.
Note: This patch address post-merge review comments for commit
69532c141be160b3fea03c1579ae4ac13018dcdf
fixes: bz#1686398
Change-Id: Id5299b45b68569d47df6b73755918237a1592cb4
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
1 - heal-wait-qlength is by default 128. If shd is disabled
and we need to heal files, client side heal is needed.
If we access these files that will trigger the heal.
However, it has been observed that a file will be enqueued
multiple times in the heal wait queue, which in turn causes
queue to be filled and prevent other files to be enqueued.
2 - While a file is going through healing and a write fop from
mount comes on that file, it sends write on all the bricks including
healing one. At the end it updates version and size on all the
bricks. However, it does not unset dirty flag on all the bricks,
even if this write fop was successful on all the bricks.
After healing completion this dirty flag remain set and never
gets cleaned up if SHD is disabled.
Solution:
1 - If an entry is already in queue or going through heal process,
don't enqueue next client side request to heal the same file.
2 - Unset dirty on all the bricks at the end if fop has succeeded on
all the bricks even if some of the bricks are going through heal.
Change-Id: Ia61ffe230c6502ce6cb934425d55e2f40dd1a727
updates: bz#1593224
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
| |
client-pid for glustershd is GF_CLIENT_PID_SELF_HEALD
client-pid for glfsheal is GF_CLIENT_PID_GLFS_HEALD
updates: bz#1689250
Change-Id: Ib3a863af160ff48c822a5e6b0c27c575c9887470
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
| |
updates bz#1193929
Change-Id: I01b60d644f517c00a1bcc127bf9a8ed90b6eb7a0
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|