| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bitmask of good and bad bricks was kept in the context of the
corresponding inode or fd. This was problematic when an external
process (another client or the self-heal process) did heal the
bricks but no one changed the bitmaks of other clients.
This patch removes the bitmask stored in the context and calculates
which bricks are healthy after locking them and doing the initial
xattrop. After that, it's updated using the result of each fop.
> Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c
> BUG: 1236065
> Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
> Reviewed-on: http://review.gluster.org/11844
> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Change-Id: Idbe68b28b865c4b28366703ad1e96ae16ba44b66
BUG: 1235964
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/11867
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
>Change-Id: I82e245615419c2006a2d1b5e94ff0908d2f5e891
>BUG: 1245276
>Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
>Reviewed-on: http://review.gluster.org/11741
>Tested-by: Gluster Build System <jenkins@build.gluster.com>
>Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
>Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Change-Id: Ifd3d63f88a686a2963c5ba2e62110249f84f338d
BUG: 1250864
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/11852
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
New lock could come at the time timer is on the way to unlock. This was leading
to crash in timer thread because thread executing new lock can free up the
timer_link->fop and then timer thread will try to access structures already
freed.
Fix:
If the timer event is fired, set lock->release to true and wait for unlock to
complete.
Thanks to Xavi and Bhaskar for helping in confirming that this race is the RC.
Thanks to Kritika for pointing out and explaining how Avati's patch can be used
to fix this bug.
> Change-Id: I45fa5470bbc1f03b5f3d133e26d1e0ab24303378
> BUG: 1243187
> Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
> Reviewed-on: http://review.gluster.org/11670
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Change-Id: I9af012e717493684b7cd7d1c63baf2fa401fb542
BUG: 1246121
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11752
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- On lock reuse preserve 'healing' bits
- Don't set ctx->size outside locks in healing code
- Allow xattrop internal fops also on the fop->mask.
>Change-Id: I6b76da5d7ebe367d8f3552cbf9fd18e556f2a171
>BUG: 1232678
>Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
>Reviewed-on: http://review.gluster.org/11640
>Tested-by: NetBSD Build System <jenkins@build.gluster.org>
>Tested-by: Gluster Build System <jenkins@build.gluster.com>
>Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
BUG: 1243647
Change-Id: I1b3828e4d4a863b84b2c4e732e7965d1302cea47
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11686
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
>BUG: 1232172
>Change-Id: I3a56e487840d86147dd85bf5fbe79b165eae289f
>Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
>Reviewed-on: http://review.gluster.org/11589
>Tested-by: NetBSD Build System <jenkins@build.gluster.org>
>Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
>Tested-by: Gluster Build System <jenkins@build.gluster.com>
BUG: 1234679
Change-Id: I08560eee095a3921e9c24f16dc2a242a76018a42
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11687
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
>BUG: 1232678
>Change-Id: I35503039e4723cf7f33d6797f0ba90dd0aca130b
>Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
>Reviewed-on: http://review.gluster.org/11580
>Tested-by: NetBSD Build System <jenkins@build.gluster.org>
>Tested-by: Gluster Build System <jenkins@build.gluster.com>
>Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
BUG: 1243647
Change-Id: I4eb45197a5a8d9652eded37ba1e67d9ea745a583
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11685
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In ec_lock() there is a chance that ec_resume is called on fop even before
ec_sleep. This can result in refs == 0 for fop leading to use after free in
this function when it calls ec_sleep so do ec_sleep at start and ec_resume at
end of this function.
>Change-Id: I879b2667bf71eaa56be1b53b5bdc91b7bb56c650
>BUG: 1240284
>Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
>Reviewed-on: http://review.gluster.org/11558
>Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
>Tested-by: Gluster Build System <jenkins@build.gluster.com>
>Tested-by: NetBSD Build System <jenkins@build.gluster.org>
BUG: 1243648
Change-Id: I57515d1f478b2a41a20d37368c947049d23778f0
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11683
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
>Change-Id: Ic22813371faca4e8198c9b0b20518e68d275f3c1
>BUG: 1232678
>Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
>Reviewed-on: http://review.gluster.org/11531
>Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
>Tested-by: NetBSD Build System <jenkins@build.gluster.org>
>Tested-by: Gluster Build System <jenkins@build.gluster.com>
BUG: 1243647
Change-Id: Ic8570710a16715322bc0be59367007567f6438cd
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11682
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.com/11246
BUG: 1234679
Change-Id: I2d774f62740c82e922efab50fc78fa74050ede93
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11357
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a backport of http://review.gluster.org/#/c/10465/
cherry-picked from commit b0b9eaea9dbb4e9a535f5e969defc4556a9e2204
>Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c
>BUG: 1194640
>Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com>
Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c
BUG: 1217722
Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com>
Reviewed-on: http://review.gluster.org/11429
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I264e47ca679d8b57cd8c80306c07514e826f92d8
BUG: 1229563
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/10784
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/11132
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/11317
In very rare circumstances it was possible that a subfop started
by another fop could finish fast enough to cause that two or more
instances of the same state machine be executing at the same time.
Change-Id: I319924a18bd3f88115e751a66f8f4560435e0e0e
BUG: 1233484
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/11319
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/11152
Problem:
While files are being created if more than redundancy number of bricks
go down, then unlock for these fops do not go to the bricks. This will
lead to stale locks leading to hangs.
Fix:
Wind unlock fops at all costs.
BUG: 1230350
Change-Id: I3312b0fe1694ad02af5307bcbaf233ac63058846
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11166
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.com/11111
Problem:
1) ec_access/ec_readlink_/ec_readdir[p] _cbks are trying to recover only from
ENOTCONN.
2) When the fop succeeds it unwinds right away. But when its
ec_fop_manager resumes, if the number of bricks that are up is less than
ec->fragments, the the state machine will resume with -EC_STATE_REPORT which
unwinds again. This will lead to crashes.
Fix:
- If fop fails retry on other subvols, as ESTALE/ENOENT/EBADFD etc are also
recoverable.
- unwind success/failure in _cbks
BUG: 1229331
Change-Id: I7510984a237761efba65e872313a8ede8b7543e5
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11128
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.com/11178
BUG: 1230653
Change-Id: I708d4d84b1341fb7cb515808836721735dc63f14
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11179
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.com/11078
Problem:
ec_update_size_version expects all the keys it did xattrop with to come in
response so that it can set the values again in ec_update_size_version_done.
But EC_XATTR_DIRTY is not combined so the value won't be present in the
response. So ctx->post/pre_dirty are not updated in
ec_update_size_version_done. So these values are still non-zero. When
ec_unlock_now is called as part of flush's unlock phase it again tries to
perform same xattrop for EC_XATTR_DIRTY. But ec_update_size_version is not
expected to be called in unlock phase of flush because ec_flush_size_version
should have reset everything to zero and unlock is never invoked from
ec_update_size_version_done for flush/fsync/fsyncdir. This leads to stale lock
which leads to hang.
Fix:
EC_XATTR_DIRTY is removed in ex_xattrop_cbk and is never combined with other
answers. So remove handling of this in the response.
BUG: 1228160
Change-Id: I657efca6e706e7acb541f98f526943f67562da9f
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11084
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/10770
Backport of http://review.gluster.org/10806
Backport of http://review.gluster.org/10787
Backport of http://review.gluster.org/10868
Backport of http://review.gluster.com/10852
- When a blocking lock is requested, lock request is succeeded even when
ec->fragment number of locks are acquired successfully in non-blocking locking
phase. This will lead to fop succeeding only on the bricks where the locks are
acquired, leading to the necessity of self-heals. To prevent these un-necessary
self-heals, if the remaining locks fail with EAGAIN in non-blocking lock phase
try blocking locking phase instead.
- Handle lookup failures while op in progress
- cluster/ec: Correctly cleanup delayed locks
When a delayed lock is pending, a graph switch doesn't correctly
terminate it. This means that the update of version and size xattrs
is lost, causing EIO errors. This patch handles GF_EVENT_PARENT_DOWN
event to correctly finish pending udpdates before completing the
graph switch.
- Fix use after free crash
ec_heal creates ec_fop_data but doesn't run ec_manager. ec_fop_data_allocate
adds this fop to ec->pending_fops, because ec_manager is not run on this heal
fop it is never removed from ec->pending_fops. When it is accessed after free
it leads to crash. It is better to not to add HEAL fops to ec->pending_fops
because we don't want graph switch to hang the mount because of a BIG
file/directory heal.
- Forced unlock when lock contention is detected
EC uses an eager lock mechanism to optimize multiple read/write
requests on the same entry or inode. This increases performance
but can have adverse results when other clients try to access the
same entry/inode. To solve this, this patch adds a functionality
to detect when this happens and force an earlier release to not
block other clients.
The method consists on requesting GF_GLUSTERFS_INODELK_COUNT and
GF_GLUSTERFS_ENTRYLK_COUNT for all fops that take a lock. When this
count is greater than one, the lock is marked to be released. All
fops already waiting for this lock will be executed normally before
releasing the lock, but new requests that also require it will be
blocked and restarted after the lock has been released and reacquired
again.
Another problem was that some operations did correctly lock the
parent of an entry when needed, but got the size and version xattrs
from the entry instead of the parent.
This patch solves this problem by binding all queries of size and
version to each lock and replacing all entrylk calls by inodelk ones
to remove concurrent updates on directory metadata. This also allows
rename to correctly update source and destination directories.
BUG: 1225279
Change-Id: I02a6084b138dd38e018a462347cd9ce38610c7ef
Reviewed-on: http://review.gluster.org/10926
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.com/9407
When a file does not exist on a brick but it does on others, there
could be problems trying to access it because there was some loc_t
structures with null 'pargfid' but 'name' was set. This forced
inode resolution based on <pargfid>/name instead of <gfid> which
would be the correct one. To solve this problem, 'name' is always
set to NULL when 'pargfid' is not present.
Another problem was caused by an incorrect management of errors
while doing incremental locking. The only allowed error during an
incremental locking was ENOTCONN, but missing files on a brick can
be returned as ESTALE. This caused an EIO on the operation.
This patch doesn't care of errors during an incremental locking. At
the end of the operation it will check if there are enough successfully
locked bricks to continue or not.
BUG: 1220011
Change-Id: I4a1e6235d80e20ef7ef12daba0807b859ee5c435
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/10701
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- With this change, the xattr will represent if the file needs to be healed or
not. It will have different values for data/entry and metadata changes.
- inode ref leaks and dict_set_dynstr related leaks fixed
- Added support for trylock/lock based on heal-cmd execution or not
in data heal.
- Made fixes to pass regression runs
Change-Id: I9d8def4c2badde18a76b7898816fecfac113737a
BUG: 1216303
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10385
Reviewed-on: http://review.gluster.org/10693
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding 64 bits in "version" key of extended attributes. First 64 bits (Left)
represents Data version. Last 64 bits (right) represents Meta Data version.
Note: 3.7 and 3.6 version ec can't co-exist with this change because xattrop in
3.6 will fail with ERANGE as the buffer passed to it will be '8' bytes where as
the value will be 16 bytes in 3.7. Where as 3.7 version clients can work with
old version files. For upgrades we need to tell users to complete heals and
then upgrade
BUG: 1215265
Change-Id: Ib85114680cb7e75b8371c984d9f7b6401c1ffb93
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/10312
Reviewed-on: http://review.gluster.org/10626
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia7d43cb3b222db34ecb0e35424f1766715ed8e6a
BUG: 1219358
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/10176
Reviewed-on: http://review.gluster.org/10625
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport of http://review.gluster.org/10372
If the version numbers do not match, then writes are performed only on at least
N-R bricks which have same version. But if we want to do healing of files which
are constantly modified we need to allow writes on subvols that are undergoing
heal. Data healing will mark 62nd bit while the heal is going on. When the data
transaction sees that this bit is set it needs to perform the fop on that
subvol irrespective of whether the versions match or do not match. Fop is
considered successful only if N-R non-healing bricks succeed.
BUG: 1216303
Change-Id: I79aaf1ac86357c51547cdaaa56cf7338004cc512
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10440
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterfs relies on Linux uuid implementation, which
API is incompatible with most other systems's uuid. As
a result, libglusterfs has to embed contrib/uuid,
which is the Linux implementation, on non Linux systems.
This implementation is incompatible with systtem's
built in, but the symbols have the same names.
Usually this is not a problem because when we link
with -lglusterfs, libc's symbols are trumped. However
there is a problem when a program not linked with
-lglusterfs will dlopen() glusterfs component. In
such a case, libc's uuid implementation is already
loaded in the calling program, and it will be used
instead of libglusterfs's implementation, causing
crashes.
A possible workaround is to use pre-load libglusterfs
in the calling program (using LD_PRELOAD on NetBSD for
instance), but such a mechanism is not portable, nor
is it flexible. A much better approach is to rename
libglusterfs's uuid_* functions to gf_uuid_* to avoid
any possible conflict. This is what this change attempts.
BUG: 1206587
Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10017
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I5d3aca101c8cdda406d31d06c40404fa6a2b7170
BUG: 1192378
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9995
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This xattr will be incremented before each data modifying operation and
decremented after it. This will add the possibility to detect partially
updated writes and refuse them on reads.
It will also be useful for interacting with index xlator and have a way
to heal dispersed files from the self-heal daemon.
Change-Id: Ie644a8dd074ae0f254c809c5863bdb030be5486a
BUG: 1190581
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9607
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch solves some problems that caused dispersed volumes to not
pass posix smoke tests:
* Problems in open/create with O_WRONLY
Opening files with -w- permissions using O_WRONLY returned an EACCES
error because internally O_WRONLY was replaced with O_RDWR.
* Problems with entrylk on renames.
When source and destination were the same, ec tried to acquire
the same entrylk twice, causing a deadlock.
* Overwrite of a variable when reordering locks.
On a rename, if the second lock needed to be placed at the beggining
of the list, the 'lock' variable was overwritten and later its timer
was cancelled, cancelling the incorrect one.
* Handle O_TRUNC in open.
When O_TRUNC was received in an open call, it was blindly propagated
to child subvolumes. This caused a discrepancy between real file
size and the size stored into trusted.ec.size xattr. This has been
solved by removing O_TRUNC from open and later calling ftruncate.
Change-Id: I20c3d6e1c11be314be86879be54b728e01013798
BUG: 1161886
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9420
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes introduced by this patch:
* Fix an incorrect error propagation when the state of the life
cycle of a fop returns an error.
* Fix incorrect unlocking of failed locks.
* Return ENOTCONN if there aren't enough bricks online.
* In readdir(p) check that the fd has been successfully open by
a previous opendir.
Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe
BUG: 1162805
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9098
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch solves 3 issues detected by coverity scan:
CID1241484 Data race condition
CID1241486 Data race condition
CID1256173 Thread deadlock
With this patch, inode lock is never acquired inside a region locked
with fop->lock.
Change-Id: I35c4633efd1b68b9f72b42661fa7c728b1f52c6a
BUG: 1170254
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9230
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Three problems have been detected:
1. Self healing is executed in background, allowing the fop that
detected the problem to continue without blocks nor delays.
While this is quite interesting to avoid unnecessary delays,
it can cause spurious failures of self-heal because it may
try to recover a file inside a directory that a previous
self-heal has not recovered yet, causing the file self-heal
to fail.
2. When a partial self-heal is being executed on a directory,
if a full self-heal is attempted, it won't be executed
because another self-heal is already in process, so the
directory won't be fully repaired.
3. Information contained in loc's of some fop's is not enough
to do a complete self-heal.
To solve these problems, I've made some changes:
* Improved ec_loc_from_loc() to add all available information
to a loc.
* Before healing an entry, it's parent is checked and partially
healed if necessary to avoid failures.
* All heal requests received for the same inode while another
self-heal is being processed are queued. When the first heal
completes, all pending requests are answered using the results
of the first heal (without full execution), unless the first
heal was a partial heal. In this case all partial heals are
answered, and the first full heal is processed normally.
* An special virtual xattr (not physically stored on bricks)
named 'trusted.ec.heal' has been created to allow synchronous
self-heal of files.
Now, the recommended way to heal an entire volume is this:
find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \;
Some minor changes:
* ec_loc_prepare() has been renamed to ec_loc_update().
* All loc management functions return 0 on success and -1 on
error.
* Do not delay fop unlocks if heal is needed.
* Added basic ec xattrs initially on create, mkdir and mknod
fops.
* Some coding style changes
Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985
BUG: 1161588
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9072
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b
BUG: 1168167
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9201
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To avoid inconsistent directory listings, a full self-heal
cannot happen on a directory until all its contents have
been healed. This is controlled by a manual command using
getfattr recursively and in post-order.
While navigating the directories, sometimes an (f)stat fop
can be sent. This fop caused a full self-heal of the directory.
This patch makes that (f)stat only initiates a partial self-heal.
Change-Id: I0a92bda8f4f9e43c1acbceab2d7926944a8a4d9a
BUG: 1163760
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/9117
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some issues in ec xlator made that rebalance didn't complete
successfully and generated some warnings and errors in the
log. The most critical error was a race condition that caused
false corruption detection when two specific operations were
executed sequentially and they shared the same lock.
This explains the problem:
1. A setxattr is issued.
2. setxattr: ec locks the inode before updating the xattr.
3. setxattr: The xattr is updated.
4. setxattr: Upper xlator is notified that the operation completed.
5. setxattr: A background task is initiated to update the version
of the file.
6. A stat is issued on the same file.
7. stat: Since the lock is already acquired, it's reused.
8. stat: A lookup is issued to determine version and size
information of the file.
At this point, operations 5 and 8 can interfere. This can make that
lookup sees different information on each brick, determining that
some bricks are corrupted and incorrectly excluding them from the
operation and initiating a self-heal. In some cases this false
detection combined with self-heal could lead to invalid updates of
the trusted.ec.size xattr, leaving the file smaller than it should
be.
This only happens if the first operation does not perform a lookup,
because chained operations reuse the information returned by the
previous one, avoiding this kind of problems.
To solve this, now the background update is executed atomically with
the posterior unlock. This avoids some reuses of the lock while
updating. However this reduces performance because the window in
which new requests can reuse the lock is much smaller now. This has
been alleviated by using the same technique implemented in AFR (i.e.
waiting some time before releasing the lock).
Some minor changes also introduced in this patch:
* Bug in management of 'trusted.glusterfs.pathinfo' that was writing
beyond the allocated space.
* Uninitialized variable.
* trusted.ec.config was not created for regular files created with
mknod.
* An invalid state was used in access fop.
Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395
BUG: 1152902
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8947
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Doing an 'ls' of a directory that has been modified while one
of the bricks was down, sometimes returns the old directory
contents.
Cause: Directories are not marked when they are modified as files are.
The ec xlator balances requests amongst available and healthy
bricks. Since there is no way to detect that a directory is
out of date in one of the bricks, it is used from time to time
to return the directory contents.
Solution: Basically the solution consists in use versioning information
also for directories, however some additional changes have
been necessary.
Changes:
* Use directory versioning:
This required to lock full directory instead of a single entry for
all requests that add or remove entries from it. This is needed to
allow atomic version update. This affects the following fops:
create, mkdir, mknod, link, symlink, rename, unlink, rmdir
Another side effect is that opendir requires to do a previous
lookup to get versioning information and discard out of date
bricks for subsequent readdir(p) calls.
* Restrict directory self-heal:
Till now, when one discrepancy was found in lookup, a self-heal
was automatically started. This caused the versioning information
of a bad directory to be healed instantly, making the original
problem to reapear again.
To solve this, when a missing directory is detected in one or more
bricks on lookup or opendir fops, only a partial self-heal is
performed on it. A partial self-heal basically creates the
directory but does not restore any additional information.
This avoids that an 'ls' could repair the directory and cause the
problem to happen again. With this change, output of 'ls' is
always consistent. However, since the directory has been created
in the brick, this allows any other operation on it (create new
files, for example) to succeed on all bricks and not add additional
work to the self-heal process.
To force a self-heal of a directory, any other operation must be
done on it. For example a getxattr.
With these changes, the correct healing procedure that would avoid
inconsistent directory browsing consists on a post-order traversal
of directoriesi being healed. This way, the directory contents will
be healed before healing the directory itslef.
* Additional changes to fix self-heal errors
- Don't use fop->fd to decide between fd/loc.
open, opendir and create have an fd, but the correct data is in
loc.
- Fix incorrect management of bad bricks per inode/fd.
- Fix incorrect selection of fop's target bricks when there are bad
bricks involved.
- Improved ec_loc_parent() to always return a parent loc as
complete as possible.
Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad
BUG: 1149726
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8916
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some additional 32 bits issues have been added by a recent patch.
This patch solves it.
Change-Id: Ice81032fbe8e36e5ccad19a781b7876891993906
BUG: 1146903
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8882
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org>
Tested-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The final lookup made to restore final file attributes after a self-heal
did clear the mask of bad bricks, causing that the final setattr won't
modify any brick at all. This caused that some attriutes, specially the
modification time of the file didn't get updated properly.
Now the mask of healed bricks is saved before doing the last lookup.
It's also used to correctly report the repaired bricks.
Change-Id: Ib94083c9e1b562515dfb54f9574120f1f031dccc
BUG: 1149723
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8905
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Operations processed by ec_dispatch_one() were not correctly
completed by ec_complete(), leaving some structures in memory.
Now ec_complete() also calls ec_resume() for this type of fops.
Change-Id: Iaf0f2e8227399ebb735db9f1bd007593e0ece041
BUG: 1148520
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8896
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To simplify backward compatibility of the ec xlator when some
parameter or the implementation itself is changed, a new xattr
is added to each file with the configuration needed to recover
it.
The new attribute is called 'trusted.ec.config', and it's a 64-bit
value containing the following information:
8 bits: version of the config information (currently always 0)
8 bits: algorithm used to encode the file (currently always 0)
8 bits: size of the galois field (currently always 8)
8 bits: number of bricks
8 bits: redundancy
24 bits: chunk size (currently 512)
This new xattr could allow, in a future version, to have different
configurations per file.
Change-Id: I8c12d40ff546cc201fc66caa367484be3d48aeb4
BUG: 1140861
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8770
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 64 bits 'trusted.ec.size' extended attribute was incorrectly
computed on 32 bits machines due to an overflow on negative
numbers.
Also changed some potentially dangerous uses of size_t in other
places.
Change-Id: Id76cfe49a2f350e564b5c71d8c8644fb9ce86662
BUG: 1125312
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8738
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch significantly improves performance of read/write
operations on a dispersed volume by reusing previous inodelk/
entrylk operations on the same inode/entry. This reduces the
latency of each individual operation considerably.
Inode version and size are also updated when needed instead
of on each request. This gives an additional boost.
Change-Id: I4b98d5508c86b53032e16e295f72a3f83fd8fcac
BUG: 1122586
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8369
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
Change-Id: I293917501d5c2ca4cdc6303df30cf0b568cea361
BUG: 1118629
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/7749
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|