| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For nameless LOOKUPs, server creates a new inode which shall
remain invalid until the fop is successfully processed post
which it is linked to the inode table.
But incase if there is an already linked inode for that entry,
it discards that newly created inode which results in upcall
notification. This may result in client being bombarded with
unnecessary upcalls affecting performance if the data set is huge.
This issue can be avoided by looking up and storing the upcall
context in the original linked inode (if exists), thus saving up on
those extra callbacks.
Change-Id: I044a1737819bb40d1a049d2f53c0566e746d2a17
fixes: bz#1718338
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans some iovec code and creates two additional helper
functions to simplify management of iovec structures.
iov_range_copy(struct iovec *dst, uint32_t dst_count, uint32_t dst_offset,
struct iovec *src, uint32_t src_count, uint32_t src_offset,
uint32_t size);
This function copies up to 'size' bytes from 'src' at offset
'src_offset' to 'dst' at 'dst_offset'. It returns the number of
bytes copied.
iov_skip(struct iovec *iovec, uint32_t count, uint32_t size);
This function removes the initial 'size' bytes from 'iovec' and
returns the updated number of iovec vectors remaining.
The signature of iov_subset() has also been modified to make it safer
and easier to use. The new signature is:
iov_subset(struct iovec *src, int src_count, uint32_t start, uint32_t size,
struct iovec **dst, int32_t dst_count);
This function creates a new iovec array containing the subset of the
'src' vector starting at 'start' with size 'size'. The resulting
array is allocated if '*dst' is NULL, or copied to '*dst' if it fits
(based on 'dst_count'). It returns the number of iovec vectors used.
A new set of functions to iterate through an iovec array have been
created. They can be used to simplify the implementation of other
iovec-based helper functions.
Change-Id: Ia5fe57e388e23392a8d6cdab17670e337cadd587
Updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.
Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.
Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.
Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1717819
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Long tale of double unref! But do read...
In cases where a shard base inode is evicted from lru list while still
being part of fsync list but added back soon before its unlink, there
could be an extra inode_unref() leading to premature inode destruction
leading to crash.
One such specific case is the following -
Consider features.shard-deletion-rate = features.shard-lru-limit = 2.
This is an oversimplified example but explains the problem clearly.
First, a file is FALLOCATE'd to a size so that number of shards under
/.shard = 3 > lru-limit.
Shards 1, 2 and 3 need to be resolved. 1 and 2 are resolved first.
Resultant lru list:
1 -----> 2
refs on base inode - (1) + (1) = 2
3 needs to be resolved. So 1 is lru'd out. Resultant lru list -
2 -----> 3
refs on base inode - (1) + (1) = 2
Note that 1 is inode_unlink()d but not destroyed because there are
non-zero refs on it since it is still participating in this ongoing
FALLOCATE operation.
FALLOCATE is sent on all participant shards. In the cbk, all of them are
added to fync_list.
Resulting fsync list -
1 -----> 2 -----> 3 (order doesn't matter)
refs on base inode - (1) + (1) + (1) = 3
Total refs = 3 + 2 = 5
Now an attempt is made to unlink this file. Background deletion is triggered.
The first $shard-deletion-rate shards need to be unlinked in the first batch.
So shards 1 and 2 need to be resolved. inode_resolve fails on 1 but succeeds
on 2 and so it's moved to tail of list.
lru list now -
3 -----> 2
No change in refs.
shard 1 is looked up. In lookup_cbk, it's linked and added back to lru list
at the cost of evicting shard 3.
lru list now -
2 -----> 1
refs on base inode: (1) + (1) = 2
fsync list now -
1 -----> 2 (again order doesn't matter)
refs on base inode - (1) + (1) = 2
Total refs = 2 + 2 = 4
After eviction, it is found 3 needs fsync. So fsync is wound, yet to be ack'd.
So it is still inode_link()d.
Now deletion of shards 1 and 2 completes. lru list is empty. Base inode unref'd and
destroyed.
In the next batched deletion, 3 needs to be deleted. It is inode_resolve()able.
It is added back to lru list but base inode passed to __shard_update_shards_inode_list()
is NULL since the inode is destroyed. But its ctx->inode still contains base inode ptr
from first addition to lru list for no additional ref on it.
lru list now -
3
refs on base inode - (0)
Total refs on base inode = 0
Unlink is sent on 3. It completes. Now since the ctx contains ptr to base_inode and the
shard is part of lru list, base shard is unref'd leading to a crash.
FIX:
When shard is readded back to lru list, copy the base inode pointer as is into its inode ctx,
even if it is NULL. This is needed to prevent double unrefs at the time of deleting it.
Change-Id: I99a44039da2e10a1aad183e84f644d63ca552462
Updates: bz#1696136
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
While we process a cleanup, there is a chance for a race between
async operations, for example ec_launch_replace_heal. So this can
lead to invalid mem access.
Solution:
Just like we track on going heal fops, we can also track fops like
ec_launch_replace_heal, so that we can decide when to send a
PARENT_DOWN request.
Change-Id: I055391c5c6c34d58aef7336847f3b570cb831298
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
| |
* Also some logging enhancements in snapview-server
Change-Id: I6a7646771cedf4bd1c62806eea69d720bbaf0c83
fixes: bz#1715921
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
commit 146e4b45d0ce906ae50fd6941a1efafd133897ea enabled
storage.fips-mode-rchecksum option for all new volumes with op-version
>=GD_OP_VERSION_7_0 but `gluster vol get $volname
storage.fips-mode-rchecksum` was displaying it as 'off'. This patch fixes it.
fixes: bz#1717782
Change-Id: Ie09f89838893c5776a3f60569dfe8d409d1494dd
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I8cf0a153f84ef2d162e6dd03261441d211c07d40
updates: bz#1193929
Signed-off-by: Anoop C S <anoopcs@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Fixed a bug in the revalidate code path that wiped out
directory permissions if no mds subvol was found.
Change-Id: I8b4239ffee7001493c59d4032a2d3062586ea115
fixes: bz#1716830
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
| |
All these checks are done after analyzing clang-scan report produced
by the CI job @ https://build.gluster.org/job/clang-scan
updates: bz#1622665
Change-Id: I590305af4ceb779be952974b2a36066ffc4865ca
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way delta_blocks is computed in shard is incorrect, when a file
is truncated to a lower size. The accounting only considers change
in size of the last of the truncated shards.
FIX:
Get the block-count of each shard just before an unlink at posix in
xdata. Their summation plus the change in size of last shard
(from an actual truncate) is used to compute delta_blocks which is
used in the xattrop for size update.
Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
fixes: bz#1705884
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
1401716: Resource leak
1401714: Dereference before null check
updates: bz#789278
Change-Id: I8fb0b143a1d4b37ee6be7d880d9b5b84ba00bf36
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: While tier code was removed, the is_tier_enabled
related to tier wasn't handled for upgrade.
As this option was missing in the info file, the checksum
mismatch issue happens during upgrade.
This results in the peer rejections happening.
Fix: use the op_version check and note down the is_tier_enabled
always. This way it will be dummy key, but the future upgrades
will work fine.
NOTE: Just having the key from 3.10 to 7 will cause issues when
upgraded from 5 to 8 or any such upgrade which skips the version
where we handle it.
Change-Id: I9951e2b74f16e58e884e746c34dcf53e559c7143
fixes: bz#1714973
Signed-off-by: hari gowtham <hgowtham@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
upcall: remove extra variable assignment and use just one
initialization.
open-behind: reduce the overall number of lines, in functions
not frequently called
selinux: reduce some lines in init failure cases
updates: bz#1693692
Change-Id: I7c1de94f2ec76a5bfe1f48a9632879b18e5fbb95
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* locks/posix.c: key was not freed in one of the cases.
* locks/common.c: lock was being free'd out of context.
* nfs/exports: handle case of missing free.
* protocol/client: handle case of entry not freed.
* storage/posix: handle possible case of double free
CID: 1398628, 1400731, 1400732, 1400756, 1124796, 1325526
updates: bz#789278
Change-Id: Ieeaca890288bc4686355f6565f853dc8911344e8
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
storage.reserve-size option will take size as input
instead of percentage. If set, priority will be given to
storage.reserve-size over storage.reserve. Default value
of this option is 0.
fixes: bz#1651445
Change-Id: I7a7342c68e436e8bf65bd39c567512ee04abbcea
Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
|
|
|
|
|
|
|
| |
updates: bz#1193929
Change-Id: Ieb5e35d454498bc389972f9f15fe46b640f1b97d
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: While high no. of volumes are configured around 2000
glusterd has bottleneck during handshake at the time
of copying dictionary
Solution: To avoid the bottleneck serialize a dictionary instead
of copying key-value pair one by one
Change-Id: I9fb332f432e4f915bc3af8dcab38bed26bda2b9a
fixes: bz#1711297
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Traditionally all svc manager will execute process stop and then
followed by start each time when they called. But that is not
required by shd, because the attach request implemented in the shd
multiplex has the intelligence to check whether a detach is required
prior to attaching the graph. So there is no need to send an explicit
detach request if we are sure that the next call is an attach request
Change-Id: I9157c8dcaffdac038f73286bcf5646a3f1d3d8ec
fixes: bz#1710054
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
While restarting a glusterd process, when we have a stale pid
we were doing a simple kill. Instead we can use glusterd_proc_stop
Because it has more logging plus force kill in case if there is
any problem with kill signal handling.
Change-Id: I4a2dadc210a7a65762dd714e809899510622b7ec
updates: bz#1710054
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterd_svcs_stop should call individual wrapper function to stop a
daemon rather than calling glusterd_svc_stop. For example for shd,
it should call glusterd_shdsvc_stop instead of calling basic API
function to stop. Because the individual functions for each daemon
could be doing some specific operation in their wrapper function.
Change-Id: Ie6d40590251ad470ef3901d1141ab7b22c3498f5
fixes: bz#1712741
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: "gluster v status" is hung in heterogenous cluster
when issued from a non-upgraded node.
Cause: commit 34e010d64 fixes the txn-opinfo mem leak
in op-sm framework by not setting the txn-opinfo if some
conditions are true. When vol status is issued from a
non-upgraded node, command is hanging in its upgraded peer
as the upgraded node setting the txn-opinfo based on new
conditions where as non-upgraded nodes are following diff
conditions.
Fix: Add an op-version check, so that all the nodes follow
same set of conditions to set txn-opinfo.
fixes: bz#1710159
Change-Id: Ie1f353212c5931ddd1b728d2e6949dfe6225c4ab
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the moment new stack doesn't populate frame->root->unique in all cases. This
makes it difficult to debug hung frames by examining successive state dumps.
Fuse and server xlators populate it whenever they can, but other xlators won't
be able to assign 'unique' when they need to create a new frame/stack because
they don't know what 'unique' fuse/server xlators already used. What we need is
for unique to be correct. If a stack with same unique is present in successive
statedumps, that means the same operation is still in progress. This makes
'finding hung frames' part of debugging hung frames easier.
fixes bz#1714098
Change-Id: I3e9a8f6b4111e260106c48a2ac3a41ef29361b9e
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
1401590: Deadcode
updates: bz#789278
Change-Id: I3aa1d3aa9769e6990f74b6a53e288e788173c5e0
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
| |
After basic analysis, found that these methods were not being
used at all.
updates: bz#1693692
Change-Id: If9cfa1ab189e6e7b56230c4e1d8e11f9694a9a65
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In commit ac70f66c5805e10b3a1072bd467918730c0aeeb4 I
missed one condition to populate volume dictionary in
multiple threads while brick_multiplex is enabled.Due
to that glusterd is not sending volume dictionary for
all volumes to peer.
Solution: Update the condition in code as well as update test case
also to avoid the issue
Change-Id: I06522dbdfee4f7e995d9cc7b7098fdf35340dc52
fixes: bz#1711250
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The handler functions are pointed to dummy functions.
The switch case handling for tier also have been moved to
point default case to avoid issues, if reintroduced.
The tier changes in DHT still remain as such.
updates: bz#1693692
Change-Id: I80d80c9a3eb862b4440a36b31ae82b2e9d92e4dc
Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the following CID's:
* 1124829
* 1274075
* 1274083
* 1274128
* 1274135
* 1274141
* 1274143
* 1274197
* 1274205
* 1274210
* 1274211
* 1288801
* 1398629
Change-Id: Ia7c86cfab3245b20777ffa296e1a59748040f558
Updates: bz#789278
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EC was ignoring lock contention notifications received while a lock was
being acquired. When a lock is partially acquired (some bricks have
granted the lock but some others not yet) we can receive notifications
from acquired bricks, which should be honored, since we may not receive
more notifications after that.
Since EC was ignoring them, once the lock was acquired, it was not
released until the eager-lock timeout, causing unnecessary delays on
other clients.
This fix takes into consideration the notifications received before
having completed the full lock acquisition. After that, the lock will
be releaed as soon as possible.
Fixes: bz#1708156
Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
| |
CID: 1401345 - Unused value
updates: bz#789278
Change-Id: I6b8f2611151ce0174042384b7632019c312ebae3
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We only need to calculate and write the checksum in case of
!is_quota_conf .
Align the code in accordance.
Also, use a smaller buffer (to write few chars).
Change-Id: I40c83ce10447df77ff9975d314d768ec2c0087c2
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A rebalance process currently only looks up files
that it is supposed to migrate. This could cause issues
when lookup-optimize is enabled as the dir layout can be
updated with the commit hash before all files are looked up.
This is expecially problematic of one of the rebalance processes
fails to complete as clients will try to access files whose
linkto files might not have been created.
Each process will now lookup every file in the directory it is
processing.
Pros: Less likely that files will be inaccessible.
Cons: More lookup requests sent to the bricks and a potential
performance hit.
Note: this does not handle races such as when a layout is updated on disk
just as the create fop is sent by the client.
Change-Id: I22b55846effc08d3b827c3af9335229335f67fb8
fixes: bz#1711764
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
| |
updates: bz#1712322
Change-Id: I120a1d23506f9ebcf88c7ea2f2eff4978a61cf4a
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During a graph cleanup, we first sent a PARENT_DOWN and wait for
a child down to ultimately free the xlator and the graph.
In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event.
But a racing event like CHILD_UP or event xl_op may trigger healing threads
after threads cleanup.
So there is a chance that the threads might access a freed private variabe
Change-Id: I252d10181bb67b95900c903d479de707a8489532
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In function "afr_selfheal_entry_granular", after completing the
heal we are not destroying the frame. This will lead to crash.
when we execute statedump operation, where it tried to access
xlator object. If this xlator object is freed as part of the
graph destroy this will lead to an invalid memory access
Change-Id: I0a5e78e704ef257c3ac0087eab2c310e78fbe36d
fixes: bz#1708926
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the following case -
1. A file gets FALLOCATE'd such that > "shard-lru-limit" number of
shards are created.
2. And then it is deleted after that.
The unique thing about FALLOCATE is that unlike WRITE, all of the
participant shards are resolved and created and fallocated in a single
batch. This means, in this case, after the first "shard-lru-limit"
number of shards are resolved and added to lru list, as part of
resolution of the remaining shards, some of the existing shards in lru
list will need to be evicted. So these evicted shards will be
inode_unlink()d as part of eviction. Now once the fop gets to the actual
FALLOCATE stage, the lru'd-out shards get added to fsync list.
2 things to note at this point:
i. the lru'd out shards are only part of fsync list, so each holds 1 ref
on base shard
ii. and the more recently used shards are part of both fsync and lru list.
So each of these shards holds 2 refs on base inode - one for being
part of fsync list, and the other for being part of lru list.
FALLOCATE completes successfully and then this very file is deleted, and
background shard deletion launched. Here's where the ref counts get mismatched.
First as part of inode_resolve()s during the deletion, the lru'd-out inodes
return NULL, because they are inode_unlink()'d by now. So these inodes need to
be freshly looked up. But as part of linking them in lookup_cbk (precisely in
shard_link_block_inode()), inode_link() returns the lru'd-out inode object.
And its inode ctx is still valid and ctx->base_inode valid from the last
time it was added to list.
But shard_common_lookup_shards_cbk() passes NULL in the place of base_pointer
to __shard_update_shards_inode_list(). This means, as part of adding the lru'd out
inode back to lru list, base inode is not ref'd since its NULL.
Whereas post unlinking this shard, during shard_unlink_block_inode(),
ctx->base_inode is accessible and is unref'd because the shard was found to be part
of LRU list, although the matching ref didn't occur. This at some point leads to
base_inode refcount becoming 0 and it getting destroyed and released back while some
of its associated shards are continuing to be unlinked in parallel and the client crashes
whenever it is accessed next.
Fix is to pass base shard correctly, if available, in shard_link_block_inode().
Also, the patch fixes the ret value check in tests/bugs/shard/shard-fallocate.c
Change-Id: Ibd0bc4c6952367608e10701473cbad3947d7559f
Updates: bz#1696136
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
We were not properly cleaning self-heal daemon resources
during ec fini. With shd multiplexing, it is absolutely
necessary to cleanup all the resources during ec fini.
Change-Id: Iae4f1bce7d8c2e1da51ac568700a51088f3cc7f2
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ISSUE: gluster volume stop succeeds even if quorum is not met.
Fix: Add GD_OP_STOP_VOLUME to gluster_validate_quorum in
glusterd_mgmt_v3_pre_validate ().
Since the volume stop command has been ported from synctask to mgmt_v3,
the quorum check was missed out.
Change-Id: I7a634ad89ec2e286ea262d7952061efad5360042
fixes: bz#1690753
Signed-off-by: Vishal Pandey <vpandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
At the time of a glusterd restart, while doing a handshake
there is a possibility that multiple shd manager might get
executed. Because of this, there is a chance that multiple
shd get spawned during a glusterd restart
Change-Id: Ie20798441e07d7d7a93b7d38dfb924cea178a920
fixes: bz#1707081
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- pass fop state instead of afr local to
afr_ta_dom_lock_check_and_release()
- avoid afr_lock_release_synctask() being called simultaneosuly from
notify code path and transaction (post-op) code path due to races.
- Check if the post-op on TA is valid based on event_gen checks.
- Invalidate in-memory information when we get TA child down.
Note: Thi patch addresses some pending review comments of commit
053b1309dc8fbc05fcde5223e734da9f694cf5cc
(https://review.gluster.org/#/c/glusterfs/+/20095/)
fixes: bz#1698449
Change-Id: I2ccd7e1b53362f9f3fed8680aecb23b5011eb18c
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
volume get all all | grep <key> & volume get <volname> all | grep <key>
dumps two different output value for cluster.brick-multiplex and
cluster.server-quorum-ratio
Fixes: bz#1707700
Change-Id: Id131734e0502aa514b84768cf67fce3c22364eae
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to https://review.gluster.org/#/c/glusterfs/+/22652/ ,
reduce some of the work by using smaller buffers and less
conversion of parameters when snprintf()'ing them.
On the way, remove some clang warnings, mainly on dead assignment.
Change-Id: Ie51e6d6f14df6b2ccbebba314cf937af08839741
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
| |
updates: bz#1193929
Change-Id: Idad745d5869c92e6bed71842f14bc1a3362ca4bd
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was working on a blog about troubleshooting AFR issues and I wanted to copy
the messages logged by self-heal for my blog. I then realized that AFR-v2 is not
logging *before* attempting data heal while it logs it for metadata and entry
heals.
I [MSGID: 108026] [afr-self-heal-entry.c:883:afr_selfheal_entry_do]
0-testvol-replicate-0: performing entry selfheal on
d120c0cf-6e87-454b-965b-0d83a4c752bb
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed entry selfheal on
d120c0cf-6e87-454b-965b-0d83a4c752bb. sources=[0] 2 sinks=1
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed data selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1
I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-testvol-replicate-0: performing metadata selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5
I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal]
0-testvol-replicate-0: Completed metadata selfheal on
a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1
Adding it in this patch. Now there is a 'performing' and a corresponding
'Completed' message for every type of heal.
fixes: bz#1707746
Change-Id: I0b954cf1e17b48280aefa76640b5119b92133d61
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of saving each key-value separately, which is slow (
especially as we fflush() after each!), store them all as one
string and write all together.
Implements https://github.com/gluster/glusterfs/issues/629
Change-Id: Ie77a272446b0b6785584b710a4fdd9c613dd9578
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat,.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: If any custom xattrs are set on the directory before
add a brick, xattrs are not healed on the directory
after adding a brick.
Solution: xattr are not healed because dht_selfheal_dir_mkdir_lookup_cbk
checks the value of MDS and if MDS value is not negative
selfheal code path does not take reference of MDS xattrs.Change the
condition to take reference of MDS xattr so that custom xattrs are
populated on newly added brick
Updates: bz#1702299
Change-Id: Id14beedb98cce6928055f294e1594b22132e811c
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed coverity error, "Unchecked return value (CHECKED_RETURN)".
Checking return value & logging error message if afr_set_pending_dict
fails.
updates: bz#789278
Change-Id: Iab7da6b4f3cd0622b95b8e1c412b007a330467e5
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a race in the way O_DIRECT writes are handled. Assume two
overlapping write requests w1 and w2.
* w1 is issued and is in wb_inode->wip queue as the response is still
pending from bricks. Also wb_request_unref in wb_do_winds is not yet
invoked.
list_for_each_entry_safe (req, tmp, tasks, winds) {
list_del_init (&req->winds);
if (req->op_ret == -1) {
call_unwind_error_keep_stub (req->stub, req->op_ret,
req->op_errno);
} else {
call_resume_keep_stub (req->stub);
}
wb_request_unref (req);
}
* w2 is issued and wb_process_queue is invoked. w2 is not picked up
for winding as w1 is still in wb_inode->wip. w1 is added to todo
list and wb_writev for w2 returns.
* response to w1 is received and invokes wb_request_unref. Assume
wb_request_unref in wb_do_winds (see point 1) is not invoked
yet. Since there is one more refcount, wb_request_unref in
wb_writev_cbk of w1 doesn't remove w1 from wip.
* wb_process_queue is invoked as part of wb_writev_cbk of w1. But, it
fails to wind w2 as w1 is still in wip.
* wb_requet_unref is invoked on w1 as part of wb_do_winds. w1 is
removed from all queues including w1.
* After this point there is no invocation of wb_process_queue unless
new request is issued from application causing w2 to be hung till
the next request.
This bug is similar to bz 1626780 and bz 1379655.
Change-Id: Iaa47437613591699d4c8ad18bc0b32de6affcc31
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Fixes: bz#1705865
|
|
|
|
|
|
|
|
| |
CID: 1382403 (CHECKED_RETURN)
Updates: bz#789278
Change-Id: I4c57b93fd3d14c524ff8519ed876f029834de306
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Right now, the timeout is written by hard code,
fix it by using heal-timeout.
fixes: bz#1703020
Change-Id: I0d154e7807f9dba7efc3896805559bbfaa7af2ad
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
|