| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there are big directories or files that need to be healed,
other shds are stuck on getting lock on self-heal domain for these
directories/files. If there is a tie-breaker logic, other shds
can heal some other files/directories while 1 of the shds is healing
the big file/directory.
Before this patch:
96.67 4890.64 us 12.89 us 646115887.30us 340869 INODELK
After this patch:
40.76 42.35 us 15.09 us 6546.50us 438478 INODELK
Fixes gluster/glusterfs#354
Change-Id: Ia995b5576b44f770c064090705c78459e543cc64
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
The op-version used for the new option was wrong. It has been set
to 3.13.0.
Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf
BUG: 1502610
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Consider an EC volume with configuration 4 + 2.
The stripe size for this would be 512 * 4 = 2048.
That means, 2048 bytes of user data stored in one
stripe. Let's say 2048 + 512 = 2560 bytes are
already written on this volume. 512 Bytes would
be in second stripe. Now, if there are sequential
writes with offset 2560 and of size 1 Byte, we have
to read the whole stripe, encode it with 1 Byte and
then again have to write it back. Next, write with
offset 2561 and size of 1 Byte will again
READ-MODIFY-WRITE the whole stripe. This is causing
bad performance because of lots of READ request
travelling over the network.
There are some tools and scenario's where such kind
of load is coming and users are not aware of that.
Example: fio and zip
Solution:
One possible solution to deal with this issue is to
keep last stripe in memory. This way, we need not to
read it again and we can save READ fop going over the
network. Considering the above example, we have to
keep last 2048 bytes (maximum) in memory per file.
Change-Id: I3f95e6fc3ff81953646d374c445a40c6886b0b85
BUG: 1471753
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: Return values are not used.
Solution: Removed the unused values.
BUG: 789278
Change-Id: Ica2b95bb659dfc7ec5346de0996b9d2fcd340f8d
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: fop could be NULL.
Solution: Check has been added to verify fop.
BUG: 789278
Change-Id: I7e8d2c1bdd8960c609aa485f180688a95606ebf7
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: Source could be negative.
Solution: Check has been added to verify the same.
BUG: 789278
Change-Id: I8cca6297327e7923b25949c20ec8d1a711207a76
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: xattr could be NULL.
Solution: Added check to verify the same.
BUG: 789278
Change-Id: Ie013f5655f4621434e5023dd76cef44b976adc68
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity ID: 237
Problem: In ec_check_status we are trying to deref fop->answer
which could be NULL.
Solution: Check Null condition before using this pointer.
Change-Id: I4f9a73dc2f062ca9c62b4c4baf0a6fcadade88f2
BUG: 789278
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
| |
A new option is added to allow independent configuration of eager
locking for regular files and non-regular files.
Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60
BUG: 1502610
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: cbk could be NULL.
Solution: Assigned appropriate value to cbk.
BUG: 789278
Change-Id: I2e4bba9a54f965c6a7bccf0b0cb6c5f75399f6e6
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: switch case syntax issue.
Solution: syntax fixed.
BUG: 789278
Change-Id: I76da72c3ab6ffc5db671686a71d6a596beaf496e
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Updates #254
This code change implements DISCARD FOP support for
EC.
BUG: 1461018
Change-Id: I09a9cb2aa9d91ec27add4f422dc9074af5b8b2db
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Ec at the moment sends one modification fop after another, so if some of
the disks become slow, for a while then the wait time for the writes that
are waiting in the queue becomes really bad.
Fix:
Allow parallel writes when possible. For this we need to make 3 changes.
1) Each fop now has range parameters they will be updating.
2) Xattrop is changed to handle parallel xattrop requests where some
would be modifying just dirty xattr.
3) Fops that refer to size now take locks and update the locks.
Fixes #251
Change-Id: Ibc3c15372f91bbd6fb617f0d99399b3149fa64b2
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: The value returned by cluster_mkdir is assigned to ret at
ec-heal.c:1076. But this value is overwritten before it can be
used.
Solution: The return value of cluster_mkdir is ignored. It is not
assigned to ret.
Change-Id: Iee6b8d8b04e0bd800dd30d2c24cab755b9e63443
BUG: 789278
Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
1 - If a brick is down and we see an index entry in
.glusterfs/indices, we should show it in heal info
output as it most certainly needs heal.
2 - The first problem is also not getting handled after
ec_heal_inspect. Even if in ec_heal_inspect, lookup will
mark need_heal as true, we don't handle it properly in
ec_get_heal_info and continue with locked inspect which
takes lot of time.
Solution:
1 - In first case we need not to do any further invstigation.
As soon as we see that a brick is down, we should say that
this index entry needs heal for sure.
2 - In second case, if we have need_heal as _gf_true after
ec_heal_inspect, we should show it as heal requires.
Change-Id: Ibe7f9d7602cc0b382ba53bddaf75a2a2c3326aa6
BUG: 1476668
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes old functions to align offsets and sizes
to stripe size boundaries and adds new ones to offer more
possibilities.
The new functions are:
* ec_adjust_offset_down()
Aligns a given offset to a multiple of the stripe size
equal or smaller than the initial one. It returns the
size of the gap between the aligned offset and the given
one.
* ec_adjust_offset_up()
Aligns a given offset to a multiple of the stripe size
equal or greater than the initial one. It returns the
size of the skipped region between the given offset and
the aligned one. If an overflow happens, the returned
valid has negative sign (but correct value) and the
offset is set to the maximum value (not aligned).
* ec_adjust_size_down()
Aligns the given size to a multiple of the stripe size
equal or smaller than the initial one. It returns the
size of the missed region between the aligned size and
the given one.
* ec_adjust_size_up()
Aligns the given size to a multiple of the stripe size
equal or greater than the initial one. It returns the
size of the gap between the given size and the aligned
one. If an overflow happens, the returned value has
negative sign (but correct value) and the size is set
to the maximum value (not aligned).
These functions have been defined in ec-helpers.h as static
inline since they are very small and compilers can optimize
them (specially the 'scale' argument).
Change-Id: I4c91009ad02f76c73772034dfde27ee1c78a80d7
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
| |
Updates #251
Change-Id: I6244014dbc90af3239d63d75a064ae22ec12a054
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing EC code updates the xattr on the subvolume
in a sequential pattern resulting in very poor performance.
With this fix EC now updates the xattr on the subvolume
in parallel which improves the xattr update performance.
BUG: 1445663
Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: ctx pointer could be NULL
Solution: Updated the code to verify ctx pointer
BUG: 789278
Change-Id: I25e07a07c6ebe2f630c99ba3aa9a61656fbaa981
Signed-off-by: Akarsha Rai <akrai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
cbk could be NULL.
Solution:
Returning NULL when memory is not allocated
for cbk.
BUG: 789278
Change-Id: Iea9128e0f3b95100deca560f690f9baaae226abf
Signed-off-by: Akarsha Rai <akrai@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: Pool pointer could be NULL while destroying it.
Solution: Verifying pointer before destroying it.
BUG: 789278
Change-Id: I497d1310aa47cb749a4c992aa961bd4dfa23ee48
Signed-off-by: Akarsha Rai <akrai@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Issue :Event check_return: Calling "ec_dict_set_number" without checking return value.
Fix : Type casted the return value of the function "ec_dict_set_number" to void.
Change-Id: Id97034f9b1b8591536d63dca680ca7c7a9c4fcc3
BUG: 789278
Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Address comments to https://review.gluster.org/18067, (Change-Id
I86e15d12939c610c99f5f96c551bb870df20f4b4)
Which was posted as an RFC as an example of a possible alternative
fix to https://review.gluster.org/17860 (Change-Id
I28a3bdd4a357526dba0cf84c262919c05cfa173e)
An alternative fix that preserved the unsignedness of the indexes
throughout, obviating the need to check its value before using it to
shift. (shift by negative number is undefined, as is shift by more
bits than in the type.)
BUG: 1474309
Change-Id: I46fe9cec140d3397463780748f6876251acb06dd
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to generate statedumps per glusterfs_ctx_t, it is needed to
place all the memory pools in a structure that the context can reach.
The 'struct mem_pool' has been extended with a 'list_head owner' that is
linked with the glusterfs_ctx_t->mempool_list.
All callers of mem_pool_new() have been updated to pass the current
glusterfs_ctx_t along. This context is needed to add the new memory pool
to the list and for grabbing the ctx->lock while updating the
glusterfs_ctx_t->mempool_list.
Updates: #307
Change-Id: Ia9384424d8d1630ef3efc9d5d523bf739c356c6e
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/18075
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is how I would like to see this fixed.
passes (eliminates the warning in) coverity.
The use of uintptr_t as a bitmask is a problem IMO, especially on
32-bit clients.
Change-Id: I86e15d12939c610c99f5f96c551bb870df20f4b4
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/18067
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In case of truncate, if writev or open fails on a brick,
in some cases it does not mark the failure onlock->good_mask.
This causes the update of size and version on all the bricks
even if it has failed on one of the brick. That ultimately
causes a data corruption.
Solution:
In callback of such writev and open calls, mark fop->good
for parent too.
Thanks Pranith Kumar K <pkarampu@redhat.com> for finding the
root cause.
Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466
BUG: 1476205
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/17906
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set names to threads on creation for easier
debugging.
Output of top -H -p <PID-OF-GLUSTERFSD>
Before:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
After:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc
Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703
BUG: 1254002
Updates: #271
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: https://review.gluster.org/11926
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Enabling optimistic changelog on EC volume was not
handling node down scenarios appropriately resulting
in volume data inaccessibility.
Solution:
Update dirty xattr appropriately on good bricks whenever
nodes are down. This would fix the metadata information
as part of heal and thus ensures data accessibility.
BUG: 1468261
Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/17703
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
For allowing parallel writes we shouldn't depend on ia_size to be same for
all the bricks in each write_cbk(). But we need to make sure backend size
is correct on all the bricks and no crashes/manual modifications happened.
Fix:
At the time of get_size_version() we do 1 check to make sure size of the file
is same across the bricks. From then on the FOPs will give the status of the
fop, so we rely on this information to keep which bricks are good/bad.
Updates #251
Change-Id: I1df645347e2e9f2e09cfa4411b6cc305d7f4e4e5
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/17741
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
4 + 2 EC volume configuration.
If untar of linux is going on and we kill a brick,
indices will be created for the files/dir which need
to be healed. ec_shd_index_sweep spawns threads to
scan these entries and start heal. If in the middle
of this we kill one more brick, we end up in a
situation where we can not heal an entry as there
are only "ec->fragment" number of bricks are UP.
However, the scan will be continued and it will
trigger the heal for those entries.
Solution:
When a heal is triggered for an entry, check if it
*CAN* be healed or not. If not come out with ENOTCONN.
Change-Id: I305be7701c289f36bd7bde22491b71074771424f
BUG: 1464359
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/17692
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a SEEK_HOLE was issued near to the end of file, sometimes an
offset beyond the end of file was returned. Another problem was that
using some offsets greater than the end of file returned successfully
instead of failing with ENXIO.
Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5
BUG: 1449348
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: https://review.gluster.org/17228
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
"gluster v heal <volname> info" is taking
long time to respond when a brick is down.
RCA:
Heal info command does virtual mount.
EC wait for 10 seconds, before sending UP call to upper xlator,
to get notification (DOWN or UP) from all the bricks.
Currently, we are increasing ec->xl_notify_count based on
the current status of the brick. So, if a DOWN event notification
has come and brick is already down, we are not increasing
ec->xl_notify_count in ec_handle_down.
Solution:
Handle DOWN even as notification irrespective of what
is the current status of brick.
Change-Id: I0acac0db7ec7622d4c0584692e88ad52f45a910f
BUG: 1464091
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/17606
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The change in EC to return list of node uuids for
GF_XATTR_NODE_UUID_KEY was causing problems with
geo-rep.
Fix:
This patch will allow to get the single node uuid
as it was doing before with the key
"GF_XATTR_NODE_UUID_KEY", and will also allow to get
the list of node uuids by using a new key
"GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve
the problem with geo-rep and any other features which
were depending on this.
BUG: 1462790
Change-Id: I2d9214a9658d4a41a3d6de08600884d2bda5f3eb
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/17594
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When application sends a blocking lock, the lk fop actually waits under
inodelk. This can lead to a dead-lock.
1) Let's say app-1 takes exculsive-fcntl-lock on the file
2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking
stage note: app-2 is blocked inside transaction which holds an inode-lock
3) app-1 tries to perform write which needs inode-lock so it gets blocked on
app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock
Fix:
Correct way to fix this issue and make fcntl locks perform well would be to
introduce
2-phase locking for fcntl lock:
1) Implement a try-lock phase where locks xlator will not merge lk call with
existing calls until a commit-lock phase.
2) If in try-lock phase we get quorum number of success without any EAGAIN
error, then send a commit-lock which will merge locks.
3) In case there are any errors, unlock should just delete the lock-object
which was tried earlier and shouldn't touch the committed locks.
Unfortunately this is a sizeable feature and need to be thought through for any
corner cases. Until then remove transaction from lk call.
BUG: 1455049
Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/17542
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem-1 : Recursive healing of same file is happening
when IO is going on even after data heal completes.
Solution:
RCA: At the end of the write, when ec_update_size_version
gets called, we send it only on good bricks and not
on healing brick. Due to this, xattr on healing brick
will always remain out of sync and when the background
heal check source and sink, it finds this brick to be
healed and start healing from scratch. That involve
ftruncate and writing all of the data again.
To solve this, send xattrop on all the good bricks as
well as healing bricks.
Problem-2: The above fix exposes the data corruption
during heal. If the write on a file is going on and
heal finishes, we find that the file gets corrupted.
RCA:
The real problem happens in ec_rebuild_data(). Here we receive the
'size' argument which contains the real file size at the time of
starting self-heal and it's assigned to heal->total_size.
After that, a sequence of calls to ec_sync_heal_block() are done. Each
call ends up calling ec_manager_heal_block(), which does the actual work
of healing a block.
First a lock on the inode is taken in state EC_STATE_INIT using
ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is
called. This function calls ec_set_inode_size() to store the real size
of the inode (it uses heal->total_size).
The next step is to read the block to be healed. This is done using a
regular ec_readv(). One of the things this call does is to trim the
returned size if the file is smaller than the requested size.
In our case, when we read the last block of a file whose size was = 512
mod 1024 at the time of starting self-heal, ec_readv() will return only
the first 512 bytes, not the whole 1024 bytes.
This isn't a problem since the following ec_writev() sent from the heal
code only attempts to write the amount of data read, so it shouldn't
modify the remaining 512 bytes.
However ec_writev() also checks the file size. If we are writing the
last block of the file (determined by the size stored on the inode that
we have set to heal->total_size), any data beyond the (imposed) end of
file will be cleared with 0's. This causes the 512 bytes after the
heal->total_size to be cleared. Since the file was written after heal
started, the these bytes contained data, so the block written to the
damaged brick will be incorrect.
Solution:
Align heal->total_size to a multiple of the stripe size.
Thanks "Xavier Hernandez" <xhernandez@datalab.es>
to find out the root cause and to fix the issue.
Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
BUG: 1428673
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/16985
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes for various minor spelling errors and typos
Reported-by: Patrick Matthäi <pmatthaei@debian.org>
Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5
BUG: 1457808
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/17442
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FALLOCATE file operations is not implemented in the
existing EC code. This change set implements it
for EC.
BUG: 1448293
Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/15200
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EC was retuning the UUID of the brick with smaller value. This had
the side effect of not evenly balancing the load between bricks on
rebalance operations.
This patch modifies the common functions that combine multiple subvolume
values into a single result to take into account the subvolume order
and, optionally, other subvolumes that could be damaged.
This makes easier to add future features where brick order is important.
It also makes possible to easily identify the originating brick of each
answer, in case some brick will have an special meaning in the future.
Change-Id: Iee0a4da710b41224a6dc8e13fa8dcddb36c73a2f
BUG: 1366817
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: https://review.gluster.org/17297
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A bad check in the answer of a seek request caused a segmentation
fault when seek reported an error.
Change-Id: Ifb25ae8bf7cc4019d46171c431f7b09b376960e8
BUG: 1439068
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: https://review.gluster.org/16998
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix implements the heal window size option for
EC. This option control the maximum size of
read/write operation carried out in self-heal
process.
BUG: 1441491
Change-Id: I6c0ef65c9ca18b0828f91b319d4f52ac5b77d0d8
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/17098
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In ec_child_select, we should send fop on healing bricks
unconditionaly but to check the number of healthy bricks
against fragments and minimum count, we should not count
these healing bricks.
Count bits of fop->mask before adding ealing brick to
fop->mask
Change-Id: I3fa80bdd5ca34ca070d610116b84154b917c5999
BUG: 1439527
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/17007
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note: Even though gcc(1) will automatically treat ffs() and popcount()
as built-in, calling them explicitly as __builtin in the source helps
make it easier to find them. (And if no __builtin_ffs use the one in
libc.)
Change-Id: Ib74d9b221ff03a01df5ad05907024da1a83a7a88
BUG: 1438772
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/16993
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Debian builds detected spelling issues with GlusterFS 3.10.1. Instead of
carrying the patch in the Debian sources, let's include the fixes here
too.
Change-Id: I38db6adf142f7ec247bffd47aa1e6ff1a0c49e00
Reported-by: Patrick Matthäi <pmatthaei@debian.org>
BUG: 1437853
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/16973
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During meatadata heal, we were not updating the version
though all the inode attributes were in sync.
Updated the code to adjust version when all the inode
attributes are in sync.
BUG: 1425703
Change-Id: I6723be3c5f748b286d4efdaf3c71e9d2087c7235
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/16772
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We wanted to mark dirty for metadata/entry operations
whenever query-info is set and info is not yet there because we
are anyway sending xattrop over the network. But this is causing
25% regression from 3.8.8 so removing this optimization
Also fixed two small issues that we didn't find in the previous
patch
1) reconfigure failure was sending return value 0 for optimistic-changelog
2) ec->optimistic_changelog was set to true even before OPTION_INIT
BUG: 1408809
Change-Id: Iabb0b64bd4d3623688790e4b67e5c20b4da977a1
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/16865
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Fix to https://bugzilla.redhat.com/show_bug.cgi?id=1316873 has made
changes to set dirty flag before every update fop, data or metadata, and unset
it after successful operation. That makes some of the fops very slow such as
entry operations or metadata operations.
Solution: File data operations are the only operation which take some time and
setting dirty flag before a fop and unsetting it after serves the purpose as
probability of failure of a fop is high when the time duration is more. For all
the other operations, set dirty flag at the end of the fop, if any brick is
down and need heal.
Providing following option to choose between high performance or better heal
marking for metadata and entry fops.
Set/Unset dirty flag for every update fop at the start of the fop. If ON, this
option impacts performance of entry operations or metadata operations as it
will set dirty flag at the start and unset it at the end of ALL update fop. If
OFF and all the bricks are good, dirty flag will be set at the start only for
file fops For metadata and entry fops dirty flag will not be set at the start,
if all the bricks are good. This does not impact performance for metadata
operations and entry operation but has a very small window to miss marking
entry as dirty in case it is required to be healed.
Thanks to Xavi and Ashish for the design
Picked the .t file from Ashish' patch https://review.gluster.org/16298
BUG: 1408809
Change-Id: I3ce860063f0e2901e50754dcfc3e4ed22daf819f
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/16821
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Xavier Hernandez <xhernandez@datalab.es>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem-1
If Lookup which doesn't take any locks observes version mismatch it can't be
trusted. If we launch a heal based on this information it will lead to
self-heals which will affect I/O performance in the cases where Lookup is
wrong. Considering self-heal-daemon and operations on the inode from client
which take locks can still trigger heal we can choose to not attempt a heal on
Lookup.
Problem-2:
Fixed spurious failure of
tests/bitrot/bug-1373520.t
For the issues above, what was happening was that ec_heal_inspect()
is preventing 'name' heal to happen
Problem-3:
tests/basic/ec/ec-background-heals.t
To be honest I don't know what the problem was, while fixing
the 2 problems above, I made some changes to ec_heal_inspect() and
ec_need_heal() after which when I tried to recreate the spurious
failure it just didn't happen even after a long time.
BUG: 1414287
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Change-Id: Ife2535e1d0b267712973673f6d474e288f3c6834
Reviewed-on: https://review.gluster.org/16468
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When dynamic code generation fails for some reason, instead of causing
a failure in encode/decode, fallback to the precompiled version.
Change-Id: I4f8a97d3033aa5885779722b19c6e611caa4ffea
BUG: 1421955
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: https://review.gluster.org/16614
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On FreeBSD the S_ISVTX flag is completely ignored when creating a
regular file. Since gluster needs to create files with this flag set,
specialy for DHT link files, it's necessary to force the flag.
This fix does this by calling fchmod() after creating a file that
must have this flag set.
Change-Id: I51eecfe4642974df6106b9084a0b144835a4997a
BUG: 1411228
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: https://review.gluster.org/16417
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Heal failed or passed should not be logged as info. These can be
observed from heal info if the heal is happening or not. If we require
to debug a case where heal is not happening, we can set the level to
DEBUG.
Change-Id: I062668eadd145ef809b25e818e6bca1094f54cd6
BUG: 1420619
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/16580
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
|