| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gettimeofday() returns the current wall clock time and timezone.
Using these functions in order to measure the passage of time
(how long an operation took) therefore seems like a no-brainer.
This time suffer's from some limitations:
a. They have a low resolution: “High-performance” timing by
definition, requires clock resolutions into the microseconds
or better.
b. They can jump forwards and backwards in time: Computer
clocks all tick at slightly different rates, which causes
the time to drift. Most systems have NTP enabled which
periodically adjusts the system clock to keep them in sync
with “actual” time. The adjustment can cause the clock to
suddenly jump forward (artificially inflating your timing
numbers) or jump backwards (causing your timing calculations
to go negative or hugely positive). In such cases timer
thread could go into an infinite loop.
From 'man gettimeofday':
----------
..
..
The time returned by gettimeofday() is affected by discontinuous
jumps in the system time (e.g., if the system administrator manually
changes the system time). If you need a monotonically increasing
clock, see clock_gettime(2).
..
..
----------
Rationale:
For calculating interval timing for Timer thread, all that’s
needed should be clock as a simple counter that increments
at a stable rate.
This is necessary to avoid the jumps which are caused by using
"wall time", this counter must be monotonic that can never
“tick” backwards, ever.
Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb
BUG: 1017993
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-on: http://review.gluster.org/6070
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently to know the number of files to be healed, either user
has to go to backend and check the number of entries present in
indices/xattrop directory. But if a volume consists of large
number of bricks, going to each backend and counting the number
of entries is a time-taking task. Otherwise user can give
gluster volume heal vol-name info command but with this
approach if no. of entries are very hugh in the indices/
xattrop directory, it will comsume time.
So as a feature, new command is implemented.
Command 1: gluster volume heal vn statistics heal-count
This command will get the number of entries present in
every brick of a volume. The output displays only entries
count.
Command 2: gluster volume heal vn statistics heal-count
replica 192.168.122.1:/home/user/brickname
Here if we are concerned with just one replica.
So providing any one of the brick of a replica will get
the number of entries to be healed for that replica only.
Example:
Replicate volume with replica count 2.
Backend status:
--------------
[root@dhcp-0-17 xattrop]# ls -lia | wc -l
1918
NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy
entries so actual no. of entries to be healed are
1916.
[root@dhcp-0-17 xattrop]# pwd
/home/user/2ty/.glusterfs/indices/xattrop
Command output:
--------------
Gathering count of entries to be healed on volume volume3 has been successful
Brick 192.168.122.1:/home/user/22iu
Status: Brick is Not connected
Entries count is not available
Brick 192.168.122.1:/home/user/2ty
Number of entries: 1916
Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290
BUG: 1015990
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/6044
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"gluster volume heal volumename statistics" command gives the summary
of the afr crawl done based on the entries present in the xattrop
directory. Whenever afr crawls are attempted, the beginning time of
crawl, end time of crawl, no of files healed, heal-failed count and
number of files in split brain are shown along with the type of the
crawl. If crawl is already in progress then it will give the number
of files healed, heal failed count and number of files in split-brain
from the beginning of the crawl and instead of telling the end time of
the crawl, "CRAWL IN PROGRESS" message will be shown.
Output format:
command: "gluster volume heal volume-name statistics"
Output:
Gathering afr crawl statistics crawl statistics on volume volume-name
has been successful
------------------------------------------------
Crawl statistics for brick no 0
Hostname of brick 192.168.122.248
Starting time of crawl: Wed Jul 10 15:52:38 2013
Ending time of crawl: Wed Jul 10 15:52:38 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
Starting time of crawl: Wed Jul 10 15:52:38 2013
Ending time of crawl: Wed Jul 10 15:52:38 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
------------------------------------------------
Crawl statistics for brick no 1
Hostname of brick 192.168.122.1
Starting time of crawl: Wed Jul 10 15:52:42 2013
Ending time of crawl: Wed Jul 10 15:52:42 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
Starting time of crawl: Wed Jul 10 15:52:42 2013
Ending time of crawl: Wed Jul 10 15:52:42 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
--------------------------------------------------
Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9
BUG: 949400
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4790
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quota size xattrs are not maintained by afr. There is a
possibility that they differ even when both the directory
changelog xattrs suggest everything is fine. So if there is at
least one 'source' check among the sources which has the maximum
quota size. Otherwise check among all the available ones for
maximum quota size. This way if there is a source and stale copies
it always votes for the 'source'.
Change-Id: Ia222379cbafa7043dd03f533c105860f2c7b8b0d
BUG: 1016683
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6052
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Varun Shastry <vshastry@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'-' can be present in a volume. This may lead to domain
collisions in future.
Tests:
Checked in gdb that domain comes with ':' separator:
Breakpoint 1, pl_common_inodelk (frame=0x7fdabcce51a4,
this=0x8bde20, volume=0x8b50d0 "r2-replicate-0:self-heal",
inode=0x7fdab822f0e8, cmd=6, flock=0x7fdabc76eee4,
loc=0x7fdabc76ede4, fd=0x0, xdata=0x7fdabc6e0ab0) at inodelk.c:597
Change-Id: I4456ae35ac8bf21e6361c34e9ad437f744a2e84b
BUG: 993981
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6025
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
"unlock failed on 1 unlock" seems meaningless in the log message. Improved it.
Change-Id: If67d3f9d4aa5310d0b6728a6c89fa58a5cc93d12
BUG: 1012947
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/6012
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia7b324b86d6a7051d187106d7a060155e77defc5
BUG: 910217
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5238
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I18583f14edf1011401be15744371e2a6b79d75cc
BUG: 1003842
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5763
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If truncate/ftruncate is called with the offset as the current size
of file, then skip the durability fsync and unwind quickly.
Change-Id: I0baec68d96c6d4d8217d33bd9738f7ed0d1b40c5
BUG: 958118
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5737
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Additional information for source and sinks are added.
Change-Id: I1704956ff86ac3ae36744efe7499c1d1c43faeaf
BUG: 968301
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5638
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
internal_lock->lk_attempted_count keeps track of the number of blocking
locks attempted. lk_expected_count keeps track of the number locks expected.
Here are the sequence of steps that happen which lead to the illution that
a full file lock is achieved, even without attempting any lock.
2 mounts are doing dd on same file. Both of them witness a brick going
down and coming back up again. Both of the mounts issue self-heal
1) Both mount-1, mount-2 attempt full file locks in self-heal domain.
lets say mount-1 got the lock, mount-2 attempts blocking lock.
2) mount-1 attempts full file lock in data domain. It goes into blocking
mode because some other writes are in progress. Eventually it gets the lock.
But this results in lk_attempted_count to be still as 2 and will not be reset.
It completes syncing the data.
3) mount-1 before unlocking final small range lock attempts full file lock in
data domain to figure out the source/sink. This will be put into blocked mode
again because some other writes are in progress. But this time seeing the
stale value of lk_attempted_count being equal to lk_expected_count, blocking_lock
phase thinks it completed locking without acquiring a single lock :-O.
4) mount-1 reads xattrs without any lock but since it does not modify the xattrs,
no harm is done by this phase. It tries to do unlocks and the unlocks will fail
because the locks are never taken in data domain. mount-1 also unlocks
self-heal domain locks.
Our beloved mount-2 now gets the chance to cause horror :-(.
5) mount-2 gets the full range blocking lock in self-heal domain.
Please note that this sets lk_attempted_count to 2.
6) mount-2 attempts full range lock in data domain, since there are still
writes on going, it switches to blocking mode. But since lk_attempted_count is 2
which is same as lk_expected_count, blocking phase locks thinks it actually got
the full range locks even though not a single lock request went out the wire.
7) mount-2 reads the change-log xattrs, which would give the number of operations
in progress (lets call this 'X'). It does the syncing and at the end of the sync
decrements the changelog by 'X'. But since that 'X' was introduced by 'X' number
of transactions that are in progress, they also decrement the changelog by 'X'.
Effectively for 'X' operations 'X' number of pre-ops are done but 2 times 'X'
number of post-ops are done resulting in -ve changelog numbers.
Fix:
Reset the lk_attempted_count and inode locks array that is used to remember locks
that are granted.
Change-Id: Ic0a79cd16f32392ea7c790511343c73592bbe6bd
BUG: 1002698
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5736
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Else this results in a missing frame causing a hang
Change-Id: Ib5f3dc6a3999449faa2853cee2944af2fb065a20
BUG: 1002399
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5731
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Idea is to not leave the file in FOOL-FOOL scenario in case on
all the bricks data transaction failed with EDQUOT to avoid
increasing un-necessary load of self-heals in the system.
For directory transactions don't leave pending changelog in case
the failures are seen on all the subvolumes.
Change-Id: I38a5561d1d581a78347a76a4a509514e4a0c3fb7
BUG: 969461
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5709
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Enhance syncenv_new() to accept scaling parameters of syncproc.
Previously the scaling parameters were hardcoded and decided at
compile time.
- New API synctask_create() which returns the created synctask. This
is similar to synctask_new which only returned the status of whether
a synctask could be created or not.
The meaning of NULL cbk in synctask_create() means the task is
"joinable". Until synctask_join() is called on such a synctask,
the task is not reaped and resources are not destroyed. The
task would be in a zombie state after synctask_fn returns and
before synctask_join() is called.
Change-Id: I368ec9037de9510d2ba951f0aad86aaf18d9a6b6
BUG: 986775
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5365
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ib0c3af6babc61dc3ed45252582876e2f243d6446
BUG: 958118
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5635
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For Write Once Read Many times type of work-load choosing largest
file to be the source will always resolve fool-fool
scenarios correctly. In other cases we fsync() the files and
will have a reliable 'wise man'.
Change-Id: Ic4dbea8d06db6d578fbcb866fb65ee2d066ac7ba
BUG: 958118
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5519
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Durability of appending writes is implicit in the file size. Therefore
performing an explicit fsync() is unnecessary in such cases as self-heal
can check for the size of file when pending changelog is not unambiguous.
Change-Id: I05446180a91d20e0dbee5de5a7085b87d57f178a
BUG: 927146
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5501
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When user has explicitly configured to disable entry self-heal in the
client, it is wrong to do the healing in opendir. So skip it. This
is especially useful to reduce opendir() times after graph switches.
Change-Id: Ic6eb9ff2334a5b8417f2f35410a366a536bad5df
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5528
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I5701bf115e0aa1adb4fb52f5418534910a2268d4
BUG: 994959
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5531
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When one of the bricks of a 1x2 replicate volume is down,
writes to the volume is causing a race between afr_flush_wrapper() and
afr_flush_cbk(). The latter frees up the call_frame's local variables
in the unwind, while the former accesses them in the for loop and
sending a stack wind the second time. This causes the FUSE mount process
(glusterfs) toa receive a SIGSEGV when the corresponding unwind is hit.
This patch adds the call_count check which was removed when
afr_flush_wrapper() was introduced in commit 29619b4e
Change-Id: I87d12ef39ea61cc4c8244c7f895b7492b90a7042
BUG: 988182
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/5393
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lets say mount1 has eager-lock(full-lock) and after the eager-lock
is taken mount2 opened the same file, it won't be able to
perform any data operations until mount1 releases eager-lock.
To avoid such scenario do not enable eager-lock for transaction
if open-fd-count is > 1. Delaying of changelog piggybacking is
avoided in this situation.
Change-Id: I51b45d6a7c216a78860aff0265a0b8dabc6423a5
BUG: 910217
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5432
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: venkatesh somyajulu <vsomyaju@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I95e47e589419dc6a032cbd8ba01964b6c176c2d5
BUG: 927146
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5408
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ib99f79d3fa607c818dbc62006516480f598d8add
BUG: 886998
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4640
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of the extra fsync()s issued by AFR transaction, they
could potentially "clog" all the io-threads denying unrelated
operations from making progress.
This patch assigns a dedicated thread to issues fsyncs, as
an experimental feature to understand performance characteristics
with the approach.
As a basis, incoming individual fsync requests are grouped into
batches, falling in the same @batch-fsync-delay-usec window of
time. These windows can extend in practice, as processing of
the previous batch can take longer than @batch-fsync-delay-usec
while new requests are getting batched.
The feature support three modes (similar to the -S modes of fs_mark)
- syncfs: In this mode one syncfs() is issued per batch, instead
of N fsync()s (one per file.)
- syncfs-single-fsync: In this mode one syncfs() is issued per
batch (which, on Linux, guarantees the completion of write-out
of dirty pages in the filesystem up to that point) and one single
fsync() to synchronize or flush the controller/drive cache. This
corresponds to -S 2 of fsmark.
- syncfs-reverse-fsync: In this mode, one syncfs() is issued per
batch, and all the open files in that batch are fsync()'ed in
the reverse order of the queue. This corresponds to -S 4 of
fsmark.
- reverse-fsync: In this mode, no syncfs() is issued and all the
files in the batch are fsync()'ed in the reverse order. This
corresponds to -S 3 of fsmark.
Change-Id: Ia1e170a810c780c8d80e02cf910accc4170c4cd4
BUG: 927146
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4746
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ieda11870c65edae500140b6c061f15a7b3f264f3
BUG: 986905
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5370
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using the new 'pluggable policies' API of libxlator.
Change-Id: Ie7528182dff8fb42c6e8287a106d3057944df775
BUG: 847839
Original Author: Csaba Henk <csaba@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4904
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The API is described in libxlator.h.
Behavior remains the same for this commit; this
is a preparatory step for per-translator customization
of aggregation.
Change-Id: I5d42923af59b2fd78e1ff59c12763875b57c5190
BUG: 847839
Original Author: Csaba Henk <csaba@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4903
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* in both distribute and replicate (ignoring stripe for now),
add logic to calculate the min() of stime values.
* What is a 'stime' ? Why is this required:
- stime means 'slave xtime', mainly used to keep track of slave
node's sync status when distributed geo-replication is used.
Logic of calculating 'min()' for this stime is very important as
in case of crashes/reboots/shutdown, we will have to 'restart'
with crawling from stime time value from the mount point, which
gives the 'min()' of all the bricks, which means, we don't miss
syncing any files in the above cases.
Change-Id: I2be8d434326572be9d4986db665570a6181db1ee
BUG: 847839
Original Author: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/4893
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the logging levels from WARNING to DEBUG in the lookup path to
minimize incessant logging in case of gfid mismatch errors.
Change-Id: I631b16df3249cf826606f547531f985dac696088
BUG: 959083
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/4939
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
At the moment data-self-heal acquires locks in following
pattern. It takes full file lock then gets xattrs on files on both
replicas. Decides sources/sinks based on the xattrs. Now it acquires
lock from 0-128k then unlocks the full file lock. Syncs 0-128k range
from source to sink now acquires lock 128k+1 till 256k then unlocks
0-128k, syncs 128k+1 till 256k block... so on finally it takes full file
lock again then unlocks the final small range block.
It decrements pending counts and then unlocks the full file lock.
This pattern of locks is chosen to avoid more than 1 self-heal
to be in progress. BUT if another self-heal tries to take a full
file lock while a self-heal is already in progress it will be put in
blocked queue, further inodelks from writes by the application will
also be put in blocked queue because of the way locks xlator grants
inodelks. So until the self-heal is complete writes are blocked.
Here is the code:
xlators/features/locks/src/inodelk.c - line 225
if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, lock))) {
ret = -EAGAIN;
if (can_block == 0)
goto out;
gettimeofday (&lock->blkd_time, NULL);
list_add_tail (&lock->blocked_locks, &dom->blocked_inodelks);
}
This leads to hangs in applications.
Fix:
Since we want to prevent two parallel self-heals. We let them compete
in a separate "domain". Lets call the domain on which the locks have
been taken on in previous approach as "data-domain".
In the new approach When a self-heal is triggered,
it acquires a full lock in the new domain "self-heal-domain".
After this it performs data-self-heal using the locks in
"data-domain" as before.
unlock the full file lock in "self-heal-domain"
With this approach, application's writevs don't have to wait
in pending queue when more than 1 self-heal is triggered.
Change-Id: Id79aef3dfa888945977fb9758374ac41c320d0d5
BUG: 967717
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5100
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- afr_local_copy should not be memduping locked nodes, that would
mean that lock is taken in self-heal on those nodes even before
it actually takes the lock. So removed memdup code. Even entry
lock related copying (lockee info) is also not necessary for
self-heal functionality, so removing that as well. Since it is
not local_copy anymore changed its name.
- My editor changed tabs to spaces.
Change-Id: I8dfb92cb8338e9a967c06907a8e29a8404782d61
BUG: 967717
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5099
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I40eec20ca6b3f857245a2438883822e251077ee9
BUG: 979365
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5269
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
At the moment afr-flush makes sure that a delayed post-op
is woken up but it does not wait for it to complete the
post-op before flush unwinds.
These are the steps that are happening:
1) flush fop comes on an fd which wakes up a delayed post-op
and continues with the flush fop.
2) post-op sends fsync on the wire.
3) flush completes and unwinds to fuse.
4) graph switch happens on the fuse mount disconnecting the
old graph's client connections to bricks.
5) xattrop after fsync fails with ENOTCONN because the
connections from old graph are taken down now.
Fix:
Wait for post-op to complete before starting to flush.
We could make flush act similar to fsync (i.e.) wind
flush as is but wait for post-op to complete before unwinding
flush, but it is better to send flush as the final fop. So
wind of flush will start after post-op is complete. Had to
change fsync to accommodate this change.
Change-Id: I93aa642647751969511718b0e137afbd067b388a
BUG: 980548
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5274
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Currently whenever there is metadata split-brain, a variable
sh->op_failed is set to 1 to denote that self heal got failed.
But if we proceed for data self heal, even code-path of data
self heal also relies on the sh->op_failed variable. So if will
check for sh->op_failed variable and will eventually fails to
do data self heal. So needed a mechanism to allow data self heal
even if metadata is in split brain.
Fix:
Some data structure revamp is done in
http://review.gluster.com/#/c/5106/ fix and this patch is
based on the above fix. Now we can store which particular self-heal
got failed i.e GFID_OR_MISSING_ENTRY_SELF_HEAL, METADATA, DATA,
ENTRY. And we can do two types of self heal failure check.
1. Individual type check: We can check which among all four
(Metadata, Data, Gfid or missing entry, entry self heal)
got failed.
2. In afr_self_heal_completion_cbk, we need to make check
based on the fact that if any specific self heal got failed treat
the complete self heal as failure so that it will populate
corresponding circular buffer of event history accordingly.
Change-Id: Icb91e513bcc752386fc8a78812405cfabe5cac2d
BUG: 977797
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5253
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If fdctx is NULL in afr_fsync, process crashes because
of NULL dereference.
Fix:
if fdctx is NULL, always say witnessed unstable write so
that fsyncs are done properly. Handled fdctx being null
in afr_delayed_changelog_post_op otherwise fsync stub is
never resumed and the mount was hanging.
Change-Id: Icacc900e9be63c29db3325cb0e19cc250adebaac
BUG: 978794
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We added this code as an interim fix until afr can
handle split-brains even when opens are not issued.
Afr code has matured to reject fd based fops when
there are split-brains so we can remove it.
Change-Id: Ib337f78eccee86469a5eaabed1a547a2cea2bdcf
BUG: 974972
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5227
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I764883811e30ca9d9c249ad00b6762101083a2fe
BUG: 976800
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5248
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Nfs xlator never does open on a file for performing writes,
afr does not perform changelog wakeup for this fd so operations
which do metadata operations as soon as the data operations are
completed perceive a delay of 'post-op-delay-secs'.
Fix:
Perform changelog wakeup on anon-fd if the fd with same pid is
not present in inode-list.
Note:
This approach is a short-term fix. A proper fix needs a new domain
for taking metadata locks so that data/metadata locks don't compete
with each other.
Change-Id: I253afb289eadf30c7951e56fb2c4840d7132f5e4
BUG: 966018
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5066
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
As the end of the self heal, message logged by
"afr_self_heal_completion_cbk" is inadequate to determine what exactly failed
during the course of afr self heal. It is worth to have knowledge of what all
types of self heal got triggered for an entity and whether the status is success
or failure.
Fix:
At the end of self heal, it will log information about out of 4 types of self
heal (gfid or missing entry self heal, metadata, data and entry self heal),
who all got triggered and who all got failed or successful at the end.
Change-Id: I5360762fbd7d391ac4c6af6706b4835c5801835a
BUG: 968301
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5106
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for the DISCARD file operation. Discard punches a hole
in a file in the provided range. Block de-allocation is implemented
via fallocate() (as requested via fuse and passed on to the brick
fs) but a separate fop is created within gluster to emphasize the
fact that discard changes file data (the discarded region is
replaced with zeroes) and must invalidate caches where appropriate.
BUG: 963678
Change-Id: I34633a0bfff2187afeab4292a15f3cc9adf261af
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/5090
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support for the fallocate file operation. fallocate
allocates blocks for a particular inode such that future writes
to the associated region of the file are guaranteed not to fail
with ENOSPC.
This patch adds fallocate support to the following areas:
- libglusterfs
- mount/fuse
- io-stats
- performance/md-cache,open-behind
- quota
- cluster/afr,dht,stripe
- rpc/xdr
- protocol/client,server
- io-threads
- marker
- storage/posix
- libgfapi
BUG: 949242
Change-Id: Ice8e61351f9d6115c5df68768bc844abbf0ce8bd
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4969
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When taking blocking entrylks, afr orders the entrylks based on
uuid_compare of gfids of parent dirs, if they are equal then it orders
them based on the basenames. While this approach works fine, the
implementation assumes loc->gfids to be populated at the time of
the comparison, but loc may have gfid in loc->inode->gfid instead
of loc->gfid which was leading to order mismatches and dead-locks.
Fix:
Implemented loc_gfid which gives gfid by checking both loc->gfid,
loc->inode->gfid. Used this for ordering the blocking entrylks.
Change-Id: Ib0db36bbaf0df09fa87c3c3bb6a834db74fc2154
BUG: 965987
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5062
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: If it is fresh volume with no files created the xattrop directory is
not present. If crawl happens in that time, lookup for xattrop
directory will fail and it results in printing a misleading log.
Fix: Changed misleading log.
Change-Id: Iae5da3e8423564d64096f88abdaf8c98e4935840
BUG: 928575
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4993
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
gfid-self-heal always assigns the gfid(GFID-1) it gets from lookup.
Between the time of lookup to triggering the gfid-self-heal the
entry could have changed. Now lets say there is a case where
one of the files of the replica subolumes already has a gfid
(GFID-2) and the other does not. In that case healing should
happen with GFID-2 instead of GFID-1.
Fix:
Missing-entry-self-heal already handles all these cases. So removed
separate handling of gfid-self-heal.
Change-Id: Ie96261e9036c8f3cb4cad89347f9bf7b681cdc1a
BUG: 767585
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/2670
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since removexattr() fails to remove "security.selinux" in a system
where SELinux is enforcing, xattr self-healing fails.
As a consequence of this, user extended attributes are not being healed.
Added a check in afr to prune SELinux xattr from the dictionary
used for removing xattrs from the sink.
Minor changes in tests and md-cache as well.
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Change-Id: I854bfc0098dde812ce2afe64b125ee40c04bdeb1
BUG: 957877
Reviewed-on: http://review.gluster.org/4905
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ifa42762adde8b55ef1e2b51a59c93cebd983343f
BUG: 912581
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4792
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today there is a non-obvious dependence of eager-locking on
write-behind. The reason is that eager-locking works as long
as the inheriting transaction has no overlaps with any of the
transactions already in progress. While write-behind provides
non-overlapping writes as a side-effect most of times (and only
guarantees it when strict-write-ordering option is enabled,
which is not on by default) eager-lock needs the behavior
as a guarantee. This is leading to complex and unwanted checks
for the presence of write-behind in the graph, for the simple
task of checking for overlaps.
This patch removes the interdependence between eager-locking
and write-behind by making eager-locking do its own overlap checks
with in-progress writes.
Change-Id: Iccba1185aeb5f1e7f060089c895a62840787133f
BUG: 912581
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4782
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example:
If a new entry creation fop fails with EEXIST or a delete entry fop
fails with ENOENT, on all the subvols the fop is wound, then no
change took place to the directory. So we can treat that case as no
change happened to the directory.
Change-Id: I3b3a7931954da2166a9cba19ff9f76f37739d751
BUG: 860210
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4626
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FIX:
In missing entry self heal, once the source directories are determined
after the lookup and if file is not present on any of the brick which
contains the souce directory, the entry is removed from the directory.
So log message should give information of "Purging of entry".
Change-Id: I4d3deb602e0812dc1c9c8ba0a466716d81dede7e
BUG: 947312
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4753
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In Pump entry self-heal happens for each directory during the
first opendir using conservative merge. But in entry-self-heal
readdir is issued with '0' size. So entry self-heal is not
creating any files. After pump thinks entry self-heal is complete
it proceeds to heal each of the file in the directory it just
healed. Fortunately most of the times it chooses source-brick
in pump as read-child for readdir. This happens because readchild is the
subvolume on which lookup succeeds first. In pump lookup succeeds
faster in local process than on the destination brick process most
of the times. For all the entries pump finds in readdir it does a
lookup. During this lookup the entry on the destination brick is
created and healed. This is the reason why replace-brick
succeeds whenever read-child for the directory is chosen as the
source-brick. Which is most of the times. When read-child is chosen
as the destination brick, readdir returns no entries so replace-brick
completes without syncing the whole data.
Fix:
Set readdir-size in pump so that entry self-heal happens with
64k size. This ensures that entry self-heal triggered from
opendir actually creates the files on the destination brick.
Change-Id: I65ea45d3c2735a9578f3aa34eff771b6563241ca
BUG: 909800
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4712
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|