| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
This reverts commit 81456ec2dfb312ae60c5c4e6f960a3cbf8aaaa4c.
Change-Id: Id03335117f5137f5d09781850bf4fba6eca0f73d
Reviewed-on: http://review.gluster.com/492
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a change in the way write transactions hold a lock
which optimizes the case of sequential writes from a single writer.
Lock phase of a transaction has two sub-phases. First is an attempt
to acquire locks in parallel by broadcasting non-blocking lock
requests. If lock aquistion fails on any server, then the held locks
are unlocked and revert to a blocking locked mode sequentially on
one server after another.
The change in this patch is to make the initial broadcasting lock
request attempt to acquire lock on the entire file. If this fails,
we revert back to the sequential "regional" blocking lock as before.
In the case where such an "eager" lock is granted in the non-blocking
phase, it gives rise to an opportunity for optimization. i.e, if
the next write transaction on the same FD arrives before the unlock
phase of the first transaction, it "takes over" the full file lock.
Similarly if yet another transaction arrives before the unlock phase
of the "optimized" transaction, that in turn "takes over" the lock
as well. The actual unlock now happens at the end of the last
"optimzed" transaction.
Any operation which arrives before the unlock phase of the previous
transaction is a potential candidate to become an "optimized"
transaction. In cases where the previous transaction had aquired
lock as a "regional" blocking lock, and the next transaction comes
in before its unlock phase, then it would not be an "optimized"
transaction.
Implied assumption
------------------
Since two or more transactions can now operate within the same
large lock, there is a possibility that overlapping transactions
can arrive at oppoosite orders on the servers. However in the
larger picture this is not possible as write-behind already
ensures that no two overlapping writes on an inode are in transit
at the same time. Overlapping writes across clients are not a
problem as they compete at locks anyways.
Theoretical benefits and potential harms
----------------------------------------
In case of a single writer: The benefits are large for sequential
writes. In the best case the entire file write can happen with just
one lock and unlock per server, provided writes are coming in fast
enough and getting pipelined by write-behind soon enough (which is
usually the case). If the writes are not coming in fast enough, then
the optimization "kicks in" for only those subsets of writes which
are close enough to get "piggybacked". For random writes the benefits
are the same as well. In any case the overall performance is better
than or equal to the performance without this optimization for a single
writer.
In case of multiple writers: When multiple writers are not writing
concurrently, there is no negative performance impact. When multiple
writers are writing concurrently to the same region, there is no
negative impact either, as they were previously getting arbitrated
at the locks translator too. In the case of multiple writers writing
to different regions concurrently, there will be an increased number
of "failovers" from failed parallel non-blocking to sequential blocking
regional locks. This above "worst case" has a simple workaround that
as soon as we detect > 1 open-fd-count in lookup xattr, we can disable
this optimization on those fds.
Beneficial side-effects
-----------------------
There is another similar optimization in AFR for changelogs which goes
by the name of "changelog-piggybacking". That works in a similar way where
pending flags get 'taken over' or 'piggybacked' by the next transaction
if its 'pre-op' phase kicks in before the 'post-op' phase of the
previous transaction. It has been observed that this changelog-piggybacking
optimization gives a saving of about ~55% savings of xattr calls hitting
the wire, measured across various types of network interfaces. The side
effect of this eager-lock optimization is that it gives an almost 100%
saving of xattr calls by making the optimistic-changelog work much more
efficiently as it gives a wider overlap of the xattr phases of two
consecutive transactions.
Change-Id: I41c02eb3b64c14c68ef66a344610ec3f024cd59d
BUG: 3409
Reviewed-on: http://review.gluster.com/243
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ie67c4da49876555c3162909e474b9089a85f99a6
BUG: 3182
Reviewed-on: http://review.gluster.com/256
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Id1f1a91cf15d933d5621a0073ddaebe02df0f159
BUG: 3348
Reviewed-on: http://review.gluster.com/198
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ibf5f45431d7a55b70d7304649af652d6f25bb688
BUG: 3348
Reviewed-on: http://review.gluster.com/183
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2346 (Log message enhancements in GlusterFS - phase 1)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
|
|
|
|
|
|
|
|
| |
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2346 (Log message enhancements in GlusterFS - phase 1)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2517 (the size of allocated memory may be wrong)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2517
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The standard way of maintaining changelog in replicate has been to
write out pending flags and to unset the pending flag post the
actual operation.
This new optimization kicks in only when all subvolumes are up.
The optimization is that, during pre-op, no changelog is written for
METADATA and ENTRY/RENAME operations. If during the operation nothing
failed, no changelog is updated in post-op either. If however,
something does fail during an operation, then, pending flags get
written during post op pointing only towards the failed nodes.
DATA transactions continue to work the way they are.
If one subvolume is down, pending flags are written in pre-op changelog
itself as before.
The impact of this optimization is only in the case when both servers
die or the client dies while the 'FOP' stage of the transaction is
in progress. By nature of METADATA and ENTRY operations, detecting a
mismatch later is not dependent on the presence of changelog. Changelog
only determines the direction in which self-heal happens for these types
of transactions. For the direction too this optimization does not have
a major impact because in the cases of failure (both servers dieing or
client dieing) the final state (direction of self-heal) would be
arbitrary anyways as the syscall wouldn't have completed.
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 2068 (performance enhancements)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2068
|
|
|
|
|
|
|
|
| |
Signed-off-by: Vijay Bellur <vijay@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 971 (dynamic volume management)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1388 ()
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1388
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In every transaction check if the currently set read child in the
inode context failed in the fop and set it to another subvol on
which the latest fop has passed. This will prevent read fops landing
on subvols which have witnessed a failure.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1172 (ls -lh on NFS mount of 2-mirror replicate gives incorrect file size)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1172
|
|
|
|
|
|
|
|
|
|
|
|
| |
use a changelog piggybacking optimization instead of first-write-to-flush
optimization and do other cleanups (removal of post-post-op hook etc.)
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1235 (Bug for all pump/migrate commits)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1235
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1235 (Bug for all pump/migrate commits)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1235
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pavan Vilas Sondur <pavan@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 960 ()
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=960
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pavan Vilas Sondur <pavan@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 1235 (Bug for all pump/migrate commits)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1235
|
|
|
|
|
|
|
|
| |
Signed-off-by: Vijay Bellur <vijay@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 824 (Crash in afr rename transaction)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=824
|
|
|
|
|
|
|
|
|
|
|
|
| |
reset pre_op_done[i] to 0 after issuing a postop in flush. this was
missed during the introduction of pre_op_done[] array and was resulting
in a lot of spurious self heals when spurious flushes were received
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
|
|
|
| |
alloca.h should be included on a platform-specific basis.
Lets common-utils.h handle that.
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 349 (FreeBSD compilation error (alloca.h).)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=349
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch completes the previous patch for self-heal of
open fds in replicate.
If an fd was never opened on a subvolume, we remember that
and do the open after we've done self-heal on that fd.
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch brings in partial support for self-heal of open
fds. The precondition is that the fd should have been opened
successfully during the initial open() (or create()), and we
assume that protocol/client has successfully reopened the fd
when the subvolume comes back up.
It works by doing an "up/down flush" (a dummy flush transaction
to do post-op wherever necessary) and then triggering
data self-heal on the file in the post-post-op hook of the
dummy flush transaction. This ensures that any writes
that come in during self-heal will wait until self-heal completes.
The up/down flush is also done when a subvolume goes down,
so that post-op is done on all subvolumes where pre-op was done.
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
| |
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
|
|
|
| |
Data self-heal now holds blocking locks, and instead of locking
on all subvolumes, it only locks on {data-lock-server-count} subvolumes.
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
|
|
|
| |
For ENTRY_RENAME_TRANSACTIONs, keep track separately whether the
lower_path and the higher_path have been locked, and unlock only
those which have been.
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 112 (parallel deletion of files mounted by different clients on the same back-end hangs and/or does not completely delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=112
|
|
|
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 112 (parallel deletion of files mounted by different clients on the same back-end hangs and/or does not completely delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=112
|
|
|
|
|
|
|
|
|
| |
mark a subvol with held lock only if op_ret == 0
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 112 (parallel deletion of files mounted by different clients on the same back-end hangs and/or does not completely delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=112
|
|
|
|
|
|
|
|
|
|
|
|
| |
transactions.
Hold the lock on the {higher_path} only after the lock on the
{lower_path} has been granted successfully.
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 112 (parallel deletion of files mounted by different clients on the same back-end hangs and/or does not completely delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=112
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
each of the subvolume
- This patch fixes bug #29.
- Using separate copies of dictionaries also eliminates a potential bug in a
setup consisting of afr with a posix and client, each having io-threads on
top as children. Since posix_xattrop after performing required operations
on the xattr array passed in dictionary, sets the result at the same key
and in the same dictionary passed as input argument,
there can be race conditions where in the results of the operation on
posix-child can be sent to the other child as input argument for xattrop,
which ofcourse is wrong.
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
| |
__if_fd_pre_op_done - reset fd_ctx->pre_op_done to 0 so that double flushes do not result in two xattrop() calls
|
|
|
|
|
|
|
|
| |
Save the original pid while locking and restore it
after the FOP is done. This ensures posix-locks can
release locks (fcntl) properly.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
| |
subvolumes while keeping existing data.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
| |
If a writev fails, remember it by marking it in the fd context.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
|
| |
Earlier the check was in afr_flush(), which caused race conditions
with writev()
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
| |
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
| |
updated copyright header to include 2009.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|