| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The locks xlator now is able to send a contention notification to
the current owner of the lock.
This is only a notification that can be used to improve performance
of some client side operations that might benefit from extended
duration of lock ownership. Nothing is done if the lock owner decides
to ignore the message and to not release the lock. For forced
release of acquired resources, leases must be used.
Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.
Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.
BUG: 1431955
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v1 of the patch started off as adding new fields to iatt that can be
filled up using statx but the discussions were more around introducing
masks to check the validity of different fields from a RIO perspective.
To that extent, I have dropped the statx call in this version and
introduced a 64 bit mask for existing fields. The masks I have defined
are similar with the statx() flags' masks.
I have *not* changed iatt_to_stat() to use the macros IATT_TYPE_VALID,
IATT_GFID_VALID etc before blindly copying from struct iatt to struct.
Also fixed warnings in xlators because of atime/mtime/ctime seconds
field change from uint32_t to int64_t.
Change-Id: I4ac614f1e8d5c8246fc99d5bc2d2a23e7941512b
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
A coverity scan has revelaed a potential shift overflow while scanning
the bitmap of available subvolumes. The actual overflow cannot happen,
but I've changed to test used to control the limit to make it explicit.
Change-Id: Ieb55f010bbca68a1d86a93e47822f7c709a26e83
BUG: 789278
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the moment in EC, [f]getxattr operations wait to acquire a lock
while other operations are in progress even when it is in the same mount with a
lock on the file/directory. This happens because [f]getxattr operations
follow the model where the operation is wound on 'k' of the bricks and are
matched to make sure the data returned is same on all of them. This consistency
check requires that no other operations are on-going while [f]getxattr
operations are wound to the bricks. We can perform [f]getxattr in
another way as well, where we find the good_mask from the lock that is already
granted and wind the operation on any one of the good bricks and unwind the
answer after adjusting size/blocks to the parent xlator. Since we are taking
into account good_mask, the reply we get will either be before or after a
possible on-going operation. Using this method, the operation doesn't need to
depend on completion of on-going operations which could be taking long time (In
case of some slow disks and writes are in progress etc). Thus we reduce the
time to serve [f]getxattr requests.
I changed [f]getxattr to dispatch-one and added extra logic in
ec_link_has_lock_conflict() to not have any conflicts for fops with
EC_MINIMUM_ONE as fop->minimum to achieve the effect described above.
Modified scripts to make sure READ fop is received in EC to trigger heals.
Updates gluster/glusterfs#368
Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
| |
Updates #302
Change-Id: Ifccc6a64c2b2a3b3a0133e6c8042cdf61a0838ce
Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This patch creates a new way of defining message id's that is easier
and less error prone because it doesn't require so many manual changes
each time a new component is defined or a new message created.
Change-Id: I71ba8af9ac068f5add7e74f316a2478bc991c67b
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1 - This patch fixes a bug in ec_update_stripe()
that prevented some stripes to be updated after a write.
2 - This patch also include code modification for the
case in which a file does not exist and we write on
unaligned offset and user size, the last stripe on
which "end" will fall should also be cached.
Change-Id: I069cb4be1c8d59c206e3b35a6991e1fbdbc9b474
BUG: 1520758
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
DISCARD operation on EC volume was punching hole of lesser
size than the specified size in some cases.
Solution:
EC was not handling punch hole for tail part in some cases.
Updated the code to handle it appropriately.
BUG: 1516206
Change-Id: If3e69e417c3e5034afee04e78f5f78855e65f932
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the volume is being stopped, PARENT_DOWN event is received.
This instructs EC to wait until all pending operations are completed
before declaring itself down. However heal operations are ignored
and allowed to continue even after having said it was down.
This may cause unexpected results and crashes.
To solve this, heal operations are considered exactly equal as any
other operation and EC won't propagate PARENT_DOWN until all
operations, including healing, are complete. To avoid big delays
if this happens in the middle of a big heal, a check has been
added to quit current heal if shutdown is detected.
Change-Id: I26645e236ebd115eb22c7ad4972461111a2d2034
BUG: 1515266
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there are big directories or files that need to be healed,
other shds are stuck on getting lock on self-heal domain for these
directories/files. If there is a tie-breaker logic, other shds
can heal some other files/directories while 1 of the shds is healing
the big file/directory.
Before this patch:
96.67 4890.64 us 12.89 us 646115887.30us 340869 INODELK
After this patch:
40.76 42.35 us 15.09 us 6546.50us 438478 INODELK
Fixes gluster/glusterfs#354
Change-Id: Ia995b5576b44f770c064090705c78459e543cc64
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
The op-version used for the new option was wrong. It has been set
to 3.13.0.
Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf
BUG: 1502610
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Consider an EC volume with configuration 4 + 2.
The stripe size for this would be 512 * 4 = 2048.
That means, 2048 bytes of user data stored in one
stripe. Let's say 2048 + 512 = 2560 bytes are
already written on this volume. 512 Bytes would
be in second stripe. Now, if there are sequential
writes with offset 2560 and of size 1 Byte, we have
to read the whole stripe, encode it with 1 Byte and
then again have to write it back. Next, write with
offset 2561 and size of 1 Byte will again
READ-MODIFY-WRITE the whole stripe. This is causing
bad performance because of lots of READ request
travelling over the network.
There are some tools and scenario's where such kind
of load is coming and users are not aware of that.
Example: fio and zip
Solution:
One possible solution to deal with this issue is to
keep last stripe in memory. This way, we need not to
read it again and we can save READ fop going over the
network. Considering the above example, we have to
keep last 2048 bytes (maximum) in memory per file.
Change-Id: I3f95e6fc3ff81953646d374c445a40c6886b0b85
BUG: 1471753
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: Return values are not used.
Solution: Removed the unused values.
BUG: 789278
Change-Id: Ica2b95bb659dfc7ec5346de0996b9d2fcd340f8d
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: fop could be NULL.
Solution: Check has been added to verify fop.
BUG: 789278
Change-Id: I7e8d2c1bdd8960c609aa485f180688a95606ebf7
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: Source could be negative.
Solution: Check has been added to verify the same.
BUG: 789278
Change-Id: I8cca6297327e7923b25949c20ec8d1a711207a76
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: xattr could be NULL.
Solution: Added check to verify the same.
BUG: 789278
Change-Id: Ie013f5655f4621434e5023dd76cef44b976adc68
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity ID: 237
Problem: In ec_check_status we are trying to deref fop->answer
which could be NULL.
Solution: Check Null condition before using this pointer.
Change-Id: I4f9a73dc2f062ca9c62b4c4baf0a6fcadade88f2
BUG: 789278
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
| |
A new option is added to allow independent configuration of eager
locking for regular files and non-regular files.
Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60
BUG: 1502610
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: cbk could be NULL.
Solution: Assigned appropriate value to cbk.
BUG: 789278
Change-Id: I2e4bba9a54f965c6a7bccf0b0cb6c5f75399f6e6
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: switch case syntax issue.
Solution: syntax fixed.
BUG: 789278
Change-Id: I76da72c3ab6ffc5db671686a71d6a596beaf496e
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Updates #254
This code change implements DISCARD FOP support for
EC.
BUG: 1461018
Change-Id: I09a9cb2aa9d91ec27add4f422dc9074af5b8b2db
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Ec at the moment sends one modification fop after another, so if some of
the disks become slow, for a while then the wait time for the writes that
are waiting in the queue becomes really bad.
Fix:
Allow parallel writes when possible. For this we need to make 3 changes.
1) Each fop now has range parameters they will be updating.
2) Xattrop is changed to handle parallel xattrop requests where some
would be modifying just dirty xattr.
3) Fops that refer to size now take locks and update the locks.
Fixes #251
Change-Id: Ibc3c15372f91bbd6fb617f0d99399b3149fa64b2
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: The value returned by cluster_mkdir is assigned to ret at
ec-heal.c:1076. But this value is overwritten before it can be
used.
Solution: The return value of cluster_mkdir is ignored. It is not
assigned to ret.
Change-Id: Iee6b8d8b04e0bd800dd30d2c24cab755b9e63443
BUG: 789278
Signed-off-by: Kamal Mohanan <kmohanan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
1 - If a brick is down and we see an index entry in
.glusterfs/indices, we should show it in heal info
output as it most certainly needs heal.
2 - The first problem is also not getting handled after
ec_heal_inspect. Even if in ec_heal_inspect, lookup will
mark need_heal as true, we don't handle it properly in
ec_get_heal_info and continue with locked inspect which
takes lot of time.
Solution:
1 - In first case we need not to do any further invstigation.
As soon as we see that a brick is down, we should say that
this index entry needs heal for sure.
2 - In second case, if we have need_heal as _gf_true after
ec_heal_inspect, we should show it as heal requires.
Change-Id: Ibe7f9d7602cc0b382ba53bddaf75a2a2c3326aa6
BUG: 1476668
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes old functions to align offsets and sizes
to stripe size boundaries and adds new ones to offer more
possibilities.
The new functions are:
* ec_adjust_offset_down()
Aligns a given offset to a multiple of the stripe size
equal or smaller than the initial one. It returns the
size of the gap between the aligned offset and the given
one.
* ec_adjust_offset_up()
Aligns a given offset to a multiple of the stripe size
equal or greater than the initial one. It returns the
size of the skipped region between the given offset and
the aligned one. If an overflow happens, the returned
valid has negative sign (but correct value) and the
offset is set to the maximum value (not aligned).
* ec_adjust_size_down()
Aligns the given size to a multiple of the stripe size
equal or smaller than the initial one. It returns the
size of the missed region between the aligned size and
the given one.
* ec_adjust_size_up()
Aligns the given size to a multiple of the stripe size
equal or greater than the initial one. It returns the
size of the gap between the given size and the aligned
one. If an overflow happens, the returned value has
negative sign (but correct value) and the size is set
to the maximum value (not aligned).
These functions have been defined in ec-helpers.h as static
inline since they are very small and compilers can optimize
them (specially the 'scale' argument).
Change-Id: I4c91009ad02f76c73772034dfde27ee1c78a80d7
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
| |
Updates #251
Change-Id: I6244014dbc90af3239d63d75a064ae22ec12a054
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing EC code updates the xattr on the subvolume
in a sequential pattern resulting in very poor performance.
With this fix EC now updates the xattr on the subvolume
in parallel which improves the xattr update performance.
BUG: 1445663
Change-Id: I3fc40d66db0b88875ca96a9fa01002ba386c0486
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: ctx pointer could be NULL
Solution: Updated the code to verify ctx pointer
BUG: 789278
Change-Id: I25e07a07c6ebe2f630c99ba3aa9a61656fbaa981
Signed-off-by: Akarsha Rai <akrai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
cbk could be NULL.
Solution:
Returning NULL when memory is not allocated
for cbk.
BUG: 789278
Change-Id: Iea9128e0f3b95100deca560f690f9baaae226abf
Signed-off-by: Akarsha Rai <akrai@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Problem: Pool pointer could be NULL while destroying it.
Solution: Verifying pointer before destroying it.
BUG: 789278
Change-Id: I497d1310aa47cb749a4c992aa961bd4dfa23ee48
Signed-off-by: Akarsha Rai <akrai@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Issue :Event check_return: Calling "ec_dict_set_number" without checking return value.
Fix : Type casted the return value of the function "ec_dict_set_number" to void.
Change-Id: Id97034f9b1b8591536d63dca680ca7c7a9c4fcc3
BUG: 789278
Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Address comments to https://review.gluster.org/18067, (Change-Id
I86e15d12939c610c99f5f96c551bb870df20f4b4)
Which was posted as an RFC as an example of a possible alternative
fix to https://review.gluster.org/17860 (Change-Id
I28a3bdd4a357526dba0cf84c262919c05cfa173e)
An alternative fix that preserved the unsignedness of the indexes
throughout, obviating the need to check its value before using it to
shift. (shift by negative number is undefined, as is shift by more
bits than in the type.)
BUG: 1474309
Change-Id: I46fe9cec140d3397463780748f6876251acb06dd
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to generate statedumps per glusterfs_ctx_t, it is needed to
place all the memory pools in a structure that the context can reach.
The 'struct mem_pool' has been extended with a 'list_head owner' that is
linked with the glusterfs_ctx_t->mempool_list.
All callers of mem_pool_new() have been updated to pass the current
glusterfs_ctx_t along. This context is needed to add the new memory pool
to the list and for grabbing the ctx->lock while updating the
glusterfs_ctx_t->mempool_list.
Updates: #307
Change-Id: Ia9384424d8d1630ef3efc9d5d523bf739c356c6e
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/18075
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is how I would like to see this fixed.
passes (eliminates the warning in) coverity.
The use of uintptr_t as a bitmask is a problem IMO, especially on
32-bit clients.
Change-Id: I86e15d12939c610c99f5f96c551bb870df20f4b4
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/18067
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In case of truncate, if writev or open fails on a brick,
in some cases it does not mark the failure onlock->good_mask.
This causes the update of size and version on all the bricks
even if it has failed on one of the brick. That ultimately
causes a data corruption.
Solution:
In callback of such writev and open calls, mark fop->good
for parent too.
Thanks Pranith Kumar K <pkarampu@redhat.com> for finding the
root cause.
Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466
BUG: 1476205
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/17906
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set names to threads on creation for easier
debugging.
Output of top -H -p <PID-OF-GLUSTERFSD>
Before:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
After:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc
Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703
BUG: 1254002
Updates: #271
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: https://review.gluster.org/11926
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Enabling optimistic changelog on EC volume was not
handling node down scenarios appropriately resulting
in volume data inaccessibility.
Solution:
Update dirty xattr appropriately on good bricks whenever
nodes are down. This would fix the metadata information
as part of heal and thus ensures data accessibility.
BUG: 1468261
Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/17703
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
For allowing parallel writes we shouldn't depend on ia_size to be same for
all the bricks in each write_cbk(). But we need to make sure backend size
is correct on all the bricks and no crashes/manual modifications happened.
Fix:
At the time of get_size_version() we do 1 check to make sure size of the file
is same across the bricks. From then on the FOPs will give the status of the
fop, so we rely on this information to keep which bricks are good/bad.
Updates #251
Change-Id: I1df645347e2e9f2e09cfa4411b6cc305d7f4e4e5
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/17741
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
4 + 2 EC volume configuration.
If untar of linux is going on and we kill a brick,
indices will be created for the files/dir which need
to be healed. ec_shd_index_sweep spawns threads to
scan these entries and start heal. If in the middle
of this we kill one more brick, we end up in a
situation where we can not heal an entry as there
are only "ec->fragment" number of bricks are UP.
However, the scan will be continued and it will
trigger the heal for those entries.
Solution:
When a heal is triggered for an entry, check if it
*CAN* be healed or not. If not come out with ENOTCONN.
Change-Id: I305be7701c289f36bd7bde22491b71074771424f
BUG: 1464359
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/17692
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a SEEK_HOLE was issued near to the end of file, sometimes an
offset beyond the end of file was returned. Another problem was that
using some offsets greater than the end of file returned successfully
instead of failing with ENXIO.
Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5
BUG: 1449348
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: https://review.gluster.org/17228
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
"gluster v heal <volname> info" is taking
long time to respond when a brick is down.
RCA:
Heal info command does virtual mount.
EC wait for 10 seconds, before sending UP call to upper xlator,
to get notification (DOWN or UP) from all the bricks.
Currently, we are increasing ec->xl_notify_count based on
the current status of the brick. So, if a DOWN event notification
has come and brick is already down, we are not increasing
ec->xl_notify_count in ec_handle_down.
Solution:
Handle DOWN even as notification irrespective of what
is the current status of brick.
Change-Id: I0acac0db7ec7622d4c0584692e88ad52f45a910f
BUG: 1464091
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/17606
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The change in EC to return list of node uuids for
GF_XATTR_NODE_UUID_KEY was causing problems with
geo-rep.
Fix:
This patch will allow to get the single node uuid
as it was doing before with the key
"GF_XATTR_NODE_UUID_KEY", and will also allow to get
the list of node uuids by using a new key
"GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve
the problem with geo-rep and any other features which
were depending on this.
BUG: 1462790
Change-Id: I2d9214a9658d4a41a3d6de08600884d2bda5f3eb
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/17594
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When application sends a blocking lock, the lk fop actually waits under
inodelk. This can lead to a dead-lock.
1) Let's say app-1 takes exculsive-fcntl-lock on the file
2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking
stage note: app-2 is blocked inside transaction which holds an inode-lock
3) app-1 tries to perform write which needs inode-lock so it gets blocked on
app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock
Fix:
Correct way to fix this issue and make fcntl locks perform well would be to
introduce
2-phase locking for fcntl lock:
1) Implement a try-lock phase where locks xlator will not merge lk call with
existing calls until a commit-lock phase.
2) If in try-lock phase we get quorum number of success without any EAGAIN
error, then send a commit-lock which will merge locks.
3) In case there are any errors, unlock should just delete the lock-object
which was tried earlier and shouldn't touch the committed locks.
Unfortunately this is a sizeable feature and need to be thought through for any
corner cases. Until then remove transaction from lk call.
BUG: 1455049
Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/17542
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem-1 : Recursive healing of same file is happening
when IO is going on even after data heal completes.
Solution:
RCA: At the end of the write, when ec_update_size_version
gets called, we send it only on good bricks and not
on healing brick. Due to this, xattr on healing brick
will always remain out of sync and when the background
heal check source and sink, it finds this brick to be
healed and start healing from scratch. That involve
ftruncate and writing all of the data again.
To solve this, send xattrop on all the good bricks as
well as healing bricks.
Problem-2: The above fix exposes the data corruption
during heal. If the write on a file is going on and
heal finishes, we find that the file gets corrupted.
RCA:
The real problem happens in ec_rebuild_data(). Here we receive the
'size' argument which contains the real file size at the time of
starting self-heal and it's assigned to heal->total_size.
After that, a sequence of calls to ec_sync_heal_block() are done. Each
call ends up calling ec_manager_heal_block(), which does the actual work
of healing a block.
First a lock on the inode is taken in state EC_STATE_INIT using
ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is
called. This function calls ec_set_inode_size() to store the real size
of the inode (it uses heal->total_size).
The next step is to read the block to be healed. This is done using a
regular ec_readv(). One of the things this call does is to trim the
returned size if the file is smaller than the requested size.
In our case, when we read the last block of a file whose size was = 512
mod 1024 at the time of starting self-heal, ec_readv() will return only
the first 512 bytes, not the whole 1024 bytes.
This isn't a problem since the following ec_writev() sent from the heal
code only attempts to write the amount of data read, so it shouldn't
modify the remaining 512 bytes.
However ec_writev() also checks the file size. If we are writing the
last block of the file (determined by the size stored on the inode that
we have set to heal->total_size), any data beyond the (imposed) end of
file will be cleared with 0's. This causes the 512 bytes after the
heal->total_size to be cleared. Since the file was written after heal
started, the these bytes contained data, so the block written to the
damaged brick will be incorrect.
Solution:
Align heal->total_size to a multiple of the stripe size.
Thanks "Xavier Hernandez" <xhernandez@datalab.es>
to find out the root cause and to fix the issue.
Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e
BUG: 1428673
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/16985
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes for various minor spelling errors and typos
Reported-by: Patrick Matthäi <pmatthaei@debian.org>
Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5
BUG: 1457808
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/17442
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FALLOCATE file operations is not implemented in the
existing EC code. This change set implements it
for EC.
BUG: 1448293
Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/15200
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EC was retuning the UUID of the brick with smaller value. This had
the side effect of not evenly balancing the load between bricks on
rebalance operations.
This patch modifies the common functions that combine multiple subvolume
values into a single result to take into account the subvolume order
and, optionally, other subvolumes that could be damaged.
This makes easier to add future features where brick order is important.
It also makes possible to easily identify the originating brick of each
answer, in case some brick will have an special meaning in the future.
Change-Id: Iee0a4da710b41224a6dc8e13fa8dcddb36c73a2f
BUG: 1366817
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: https://review.gluster.org/17297
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A bad check in the answer of a seek request caused a segmentation
fault when seek reported an error.
Change-Id: Ifb25ae8bf7cc4019d46171c431f7b09b376960e8
BUG: 1439068
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: https://review.gluster.org/16998
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix implements the heal window size option for
EC. This option control the maximum size of
read/write operation carried out in self-heal
process.
BUG: 1441491
Change-Id: I6c0ef65c9ca18b0828f91b319d4f52ac5b77d0d8
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/17098
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|