| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
xattr name can legally be NULL. Handle that case without crashing.
Change-Id: Ie214cb05ccd52565dc247a9234ad83ae799d3866
BUG: 1036879
Signed-off-by: Poornima <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/6412
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I267b935a16a6fdc72a4e791f681289e6868baee6
BUG: 1010834
Reviewed-on: http://review.gluster.org/6385
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
@key can legally be NULL. Handle that case without crashing.
Change-Id: Iaae293caa7eeb24afc9cd2580799173e2ce00911
BUG: 1036879
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6395
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When getxattr fails with errno other than ENODATA fail rebalance
on that file. Log the reason for error.
Change-Id: Ia519870b88e6e6dd464d1c0415411aa999f80bc9
BUG: 1032927
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6341
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Creating linkfile could have failed, but we dont care about linkfile
for setting layout in the inode ctx (could be EEXIST etc.)
So ignore @inode in cbk and pick it up from local->loc.inode
Change-Id: I2952799d7ae0d3441b84b2ca2981afd75d7576e2
BUG: 1032859
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6319
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When clients refer to a GFID which does not exist, the errno to
be returned in ESTALE (and not ENOENT). Even though ENOENT might
look "proper" most of the time, as the application eventually expects
ENOENT even if a parent directory does not exist, not returning
ESTALE results in resolvers (FUSE and GFAPI) to not retry resolution
in uncached mode. This can result in spurious ENOENTs during
concurrent path modification operations.
Change-Id: I7a06ea6d6a191739f2e9c6e333a1969615e05936
BUG: 1032894
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6318
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is needed for two reasons:
* since dht-linkfiles are internal, they shouldn't be accounted.
* hardlink handling in marker is broken. link/unlink of hardlinks
present in same directory can break marker accounting. Hence, if src
and dst are in same directory in case of rename, dht - if it breaks
rename into link/unlink operations - should instruct marker to not to
do accounting.
Change-Id: I9c9f7384569f75a2792f6450ee7a5279bf751ae7
BUG: 1022995
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/6203
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
components other than distribute (like marker to exclude linkfiles
from being accounted) also need awareness of what constitutes a
linkfile. Hence its good to separate out this functionality into
core.
Change-Id: Ib944eeacc991bb1de464c9e73ee409fc7a689ff1
BUG: 1022995
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/6152
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Two stages of quota enforcement is done:
Soft and hard quota Upon reaching soft quota limit on the directory
it logs/alerts in the quota daemon log (ie DEFAULT_LOG_DIR/quotad.log)
and no more writes allowed after hard
quota limit. After reaching the soft-limit the daemon alerts the
user/admin repeatively for every 'alert-time', which is
configurable.
* Quota enforcer is moved to server-side.
It takes care of enforcing quota. Since enforcer doesn't have the
cluster view, it relies on another service called
quota-aggregator. Aggregator, on query can return the size of a
directory based on the cluster view.
Enforcer is always loaded in the server graph and is by passed if
the feature is not enabled.
Options specific to enforcer:
server-quota - Specifies whether the feature is on/off. It is used
to by pass the quota if turned off.
deem-statfs - If set to on, it takes quota limits into consideration
while estimating fs size. (df command). The algorithm followed is,
i. Adjust statvfs based on limit configured on root.
ii. If limit is set on the inode passed, use size/limits on that inode to
populate statvfs. Otherwise, use size/limits configured on root.
iii. Upon statvfs, update the ctx->size on the inode.
iv. Don't let DHT aggregate, instead take the maximum of the usages from the
subvols of the DHT, since each of it contains the complete information.
Enforcer also makes use of gfid-to-path conversion functionality to
work correctly when a client like nfs predominently relies on
nameless lookups.
* Quota Aggregator acts as a thin client to provide cluster view
Its a lightweight *gluster client* process with no mount point,
started upon enabling quota or restarting the volume. This is a
single process run on each brick, which can answer queries on all
volumes in the cluster. Its volfile stored in
GLUSTERD_DEFAULT_WORKING_DIR/quotad/quotad.vol.
Credits:
Raghavendra Bhat <rabhat@redhat.com>
Varun Shastry <vshastry@redhat.com>
Shishir Gowda <sgowda@redhat.com>
Kruthika Dhananjay <kdhananj@redhat.com>
Brian Foster <bfoster@redhat.com>
Krishnan Parthasarathi <kparthas@redhat.com>
Change-Id: Id1cb25b414951da34c665a55f77385d482e0f9de
BUG: 969461
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/5952
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
afr_[f]getxattr_pathinfo_cbks fail the fop even when it succeeded on
one of the bricks. This can happen if the last response to pathinfo
[f]getxattr is a failure.
Fix:
Remember if any of the [f]getxattr_pathinfos are successful and send
that as the op_ret/op_errno value to the xlators above.
Note:
Winding fop to a client xlator that is not connected to server produces
an error log. Preventing that by not even winding fop when client xlator
is DOWN.
Change-Id: I846e8c47423ffcfa2eabffe8924534781a36841a
BUG: 1032927
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6332
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 1028673
Change-Id: I7c75738cca22c81c5629d579ef5bea24000e622e
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/6291
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 764655
Change-Id: I4d18c9a6c00cb4696645fcb437398562f00b9d24
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/6284
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glfs_zerofill() can be potentially called to zero-out entire file and
hence allow for bigger value of length parameter.
Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63
BUG: 1028673
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-on: http://review.gluster.org/6266
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com>
Tested-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* migrate all the fd's on an inode to newer subvol after rebalance
* use the migration in progress flag in inode, so all the operations
on the inode can make use of it
Change-Id: Ib807a46e927a1062688fc15119c916797c52a350
BUG: 1013456
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.org/5891
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make changes to distributed xlator to work with BD xlator. Unlike files,
a block device can't be removed when its opened. So some part of the
code were moved down to avoid this situation. Also before truncating a
BD file its BD_XATTR should be set otherwise truncate will result in
truncating posix file. So file is created with needed BD_XATTR and
truncate is invoked. Also enables BD xlator in stripe volume type.
Change-Id: If127516e261fac5fc5b137e7fe33e100bc92acc0
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5235
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
the specified range. This fop will be useful when a whole file needs to
be initialized with zero (could be useful for zero filled VM disk image
provisioning or during scrubbing of VM disk images).
Client/application can issue this FOP for zeroing out. Gluster server
will zero out required range of bytes ie server offloaded zeroing. In
the absence of this fop, client/application has to repetitively issue
write (zero) fop to the server, which is very inefficient method because
of the overheads involved in RPC calls and acknowledgements.
WRITESAME is a SCSI T10 command that takes a block of data as input and
writes the same data to other blocks and this write is handled
completely within the storage and hence is known as offload . Linux ,now
has support for SCSI WRITESAME command which is exposed to the user in
the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to
implement this fop. Thus zeroing out operations can be completely
offloaded to the storage device , making it highly efficient.
The fop takes two arguments offset and size. It zeroes out 'size' number
of bytes in an opened file starting from 'offset' position.
This patch adds zerofill support to the following areas:
- libglusterfs
- io-stats
- performance/md-cache,open-behind
- quota
- cluster/afr,dht,stripe
- rpc/xdr
- protocol/client,server
- io-threads
- marker
- storage/posix
- libgfapi
Client applications can exloit this fop by using glfs_zerofill introduced in
libgfapi.FUSE support to this fop has not been added as there is no system call
for this fop.
Changes from previous version 3:
* Removed redundant memory failure log messages
Changes from previous version 2:
* Rebased and fixed build error
Changes from previous version 1:
* Rebased for latest master
TODO :
* Add zerofill support to trace xlator
* Expose zerofill capability as part of gluster volume info
Here is a performance comparison of server offloaded zeofill vs zeroing
out using repeated writes.
[root@llmvm02 remote]# time ./offloaded aakash-test log 20
real 3m34.155s
user 0m0.018s
sys 0m0.040s
[root@llmvm02 remote]# time ./manually aakash-test log 20
real 4m23.043s
user 0m2.197s
sys 0m14.457s
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;
real 4m28.363s
user 0m0.021s
sys 0m0.025s
[root@llmvm02 remote]# time ./manually aakash-test log 25
real 5m34.278s
user 0m2.957s
sys 0m18.808s
The argument log is a file which we want to set for logging purpose and
the third argument is size in GB .
As we can see there is a performance improvement of around 20% with this
fop.
Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18
BUG: 1028673
Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com>
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5327
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch avoids giving more info to the user about the
internal heuristic employed in afr, for quota sizes.
Change-Id: Ice3a164399f09b6967500ec0c17dc340e7ae9aba
BUG: 1016683
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6098
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Bail out of fops if split brain has been detected during lookup
Change-Id: Id387dbb1a25eec4a121dedceadc6069bdea24b5d
BUG: 1010834
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/5988
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two glusterfs clients return inconsistent errnos when the bricks of the volume
were down. Consider two gluster mounts. Mount 1 was done when the bricks were
online. Mount 2 was done after the bricks were killed, (using the 'glusterfs'
command instead of the mount script).
For any request, mount 1 will return ENOTCONN, where as mount 2 will return
ENOENT.
This happens because for the 2nd mount, a fuse would send a lookup on '/' for
any request, as it hadn't been done yet. The client xlator returns ENOTCONN,
but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when
aggregating. So, fuse returned ENOENT, even though the errno should have been
ENOTCONN.
Change-Id: I4b7a6d84ce5153045a807fccc01485afe0377117
BUG: 1019095
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/6072
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gettimeofday() returns the current wall clock time and timezone.
Using these functions in order to measure the passage of time
(how long an operation took) therefore seems like a no-brainer.
This time suffer's from some limitations:
a. They have a low resolution: “High-performance” timing by
definition, requires clock resolutions into the microseconds
or better.
b. They can jump forwards and backwards in time: Computer
clocks all tick at slightly different rates, which causes
the time to drift. Most systems have NTP enabled which
periodically adjusts the system clock to keep them in sync
with “actual” time. The adjustment can cause the clock to
suddenly jump forward (artificially inflating your timing
numbers) or jump backwards (causing your timing calculations
to go negative or hugely positive). In such cases timer
thread could go into an infinite loop.
From 'man gettimeofday':
----------
..
..
The time returned by gettimeofday() is affected by discontinuous
jumps in the system time (e.g., if the system administrator manually
changes the system time). If you need a monotonically increasing
clock, see clock_gettime(2).
..
..
----------
Rationale:
For calculating interval timing for Timer thread, all that’s
needed should be clock as a simple counter that increments
at a stable rate.
This is necessary to avoid the jumps which are caused by using
"wall time", this counter must be monotonic that can never
“tick” backwards, ever.
Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb
BUG: 1017993
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-on: http://review.gluster.org/6070
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently to know the number of files to be healed, either user
has to go to backend and check the number of entries present in
indices/xattrop directory. But if a volume consists of large
number of bricks, going to each backend and counting the number
of entries is a time-taking task. Otherwise user can give
gluster volume heal vol-name info command but with this
approach if no. of entries are very hugh in the indices/
xattrop directory, it will comsume time.
So as a feature, new command is implemented.
Command 1: gluster volume heal vn statistics heal-count
This command will get the number of entries present in
every brick of a volume. The output displays only entries
count.
Command 2: gluster volume heal vn statistics heal-count
replica 192.168.122.1:/home/user/brickname
Here if we are concerned with just one replica.
So providing any one of the brick of a replica will get
the number of entries to be healed for that replica only.
Example:
Replicate volume with replica count 2.
Backend status:
--------------
[root@dhcp-0-17 xattrop]# ls -lia | wc -l
1918
NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy
entries so actual no. of entries to be healed are
1916.
[root@dhcp-0-17 xattrop]# pwd
/home/user/2ty/.glusterfs/indices/xattrop
Command output:
--------------
Gathering count of entries to be healed on volume volume3 has been successful
Brick 192.168.122.1:/home/user/22iu
Status: Brick is Not connected
Entries count is not available
Brick 192.168.122.1:/home/user/2ty
Number of entries: 1916
Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290
BUG: 1015990
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/6044
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"gluster volume heal volumename statistics" command gives the summary
of the afr crawl done based on the entries present in the xattrop
directory. Whenever afr crawls are attempted, the beginning time of
crawl, end time of crawl, no of files healed, heal-failed count and
number of files in split brain are shown along with the type of the
crawl. If crawl is already in progress then it will give the number
of files healed, heal failed count and number of files in split-brain
from the beginning of the crawl and instead of telling the end time of
the crawl, "CRAWL IN PROGRESS" message will be shown.
Output format:
command: "gluster volume heal volume-name statistics"
Output:
Gathering afr crawl statistics crawl statistics on volume volume-name
has been successful
------------------------------------------------
Crawl statistics for brick no 0
Hostname of brick 192.168.122.248
Starting time of crawl: Wed Jul 10 15:52:38 2013
Ending time of crawl: Wed Jul 10 15:52:38 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
Starting time of crawl: Wed Jul 10 15:52:38 2013
Ending time of crawl: Wed Jul 10 15:52:38 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
------------------------------------------------
Crawl statistics for brick no 1
Hostname of brick 192.168.122.1
Starting time of crawl: Wed Jul 10 15:52:42 2013
Ending time of crawl: Wed Jul 10 15:52:42 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
Starting time of crawl: Wed Jul 10 15:52:42 2013
Ending time of crawl: Wed Jul 10 15:52:42 2013
Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0
--------------------------------------------------
Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9
BUG: 949400
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4790
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quota size xattrs are not maintained by afr. There is a
possibility that they differ even when both the directory
changelog xattrs suggest everything is fine. So if there is at
least one 'source' check among the sources which has the maximum
quota size. Otherwise check among all the available ones for
maximum quota size. This way if there is a source and stale copies
it always votes for the 'source'.
Change-Id: Ia222379cbafa7043dd03f533c105860f2c7b8b0d
BUG: 1016683
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6052
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Varun Shastry <vshastry@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'-' can be present in a volume. This may lead to domain
collisions in future.
Tests:
Checked in gdb that domain comes with ':' separator:
Breakpoint 1, pl_common_inodelk (frame=0x7fdabcce51a4,
this=0x8bde20, volume=0x8b50d0 "r2-replicate-0:self-heal",
inode=0x7fdab822f0e8, cmd=6, flock=0x7fdabc76eee4,
loc=0x7fdabc76ede4, fd=0x0, xdata=0x7fdabc6e0ab0) at inodelk.c:597
Change-Id: I4456ae35ac8bf21e6361c34e9ad437f744a2e84b
BUG: 993981
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6025
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
"unlock failed on 1 unlock" seems meaningless in the log message. Improved it.
Change-Id: If67d3f9d4aa5310d0b6728a6c89fa58a5cc93d12
BUG: 1012947
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/6012
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Incorrect NFS ACL encoding causes "system.posix_acl_default"
setxattr failure on bricks on XFS file system. XFS (potentially
others?) doesn't understand when the 0x10 prefix is added to the
ACL type field for default ACLs (which the Linux NFS client adds)
which causes setfacl()->setxattr() to fail silently. NFS client
adds NFS_ACL_DEFAULT(0x1000) for default ACL.
FIX:
Mask the prefix (added by NFS client) OFF, so the setfacl is not
rejected when it hits the FS.
Original patch by: "Richard Wareing"
Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396
BUG: 1009210
Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com>
Reviewed-on: http://review.gluster.org/5980
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Block all signal except those which are set for explicit handling
in glusterfs_signals_setup(). Since thread spawning code in
libglusterfs and xlators can get called from application threads
when used through libgfapi, it is necessary to do this blocking.
Change-Id: Ia320f80521a83d2edcda50b9ad414583a0175281
BUG: 1011662
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5995
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a3e593f9f17cb1e68db97bb5a0d8074793a33964 which
was bought into fix dht_layout_anomalies error handling.
We still see applications error'ing out due to graph switches.
To fix the above issue -
We cannot heal in dht_discover, as it is a gfid based lookup, and not
path based. So, returning error here would lead to app's to see failure.
Also, update the layout in inode_ctx even if it has anomalies. Let
subsequent heals fix the issue.
Conflicts:
xlators/cluster/dht/src/dht-common.c
Signed-off-by: shishir gowda <sgowda@redhat.com>
Change-Id: I68c1056c3587e04a02344548546ddd06034489c5
BUG: 960348
Reviewed-on: http://review.gluster.org/5443
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were wrongly detecting holes/overlaps for already accounted
errors. Additionally, sort should also handle zero'ed out layout
Change-Id: Ic3d13e1d735b914f9acc01fe919bc90656baea48
BUG: 1003851
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5762
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier disk space check had an issue which didn't
provide the needed functionality to avoid migration
when the destination had lesser available space,
scenario we need to avoid is stated below :
During rebalance `migrate-data` - Destination subvol experiences
a `reduction` in 'blocks' of free space, at the same time source
subvol gains certain 'blocks' of free space. A valid check is
necessary here to avoid errorneous move to destination where
the space could be scantily available.
This patch provides a proper fix in place by subtracting
necessary file blocks from destination and adding those blocks
to source.
Change-Id: I9c7840716a4256ef614ffc0fbfd9f2b456ac28c8
BUG: 982919
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-on: http://review.gluster.org/5961
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Currenly the CLI rebalance status command output does not indicate the
'type' of rebalance, i.e. whether a full rebalance or only a fix-layout
was carried out.
Fix: After the rebalance status of all peers is received by the
originator glusterd, alter it to reflect the type of rebalance
before passing it on to the CLI process.
Change-Id: I1940ffda0d36e25e5b33c84a0ea210394cc9e1d3
BUG: 1004744
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/5826
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ia7b324b86d6a7051d187106d7a060155e77defc5
BUG: 910217
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5238
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 215fea41a96479312a5ab8783c13b30ab9fe00fa
Realized soon after merging, that this patch is actually useless.
Checking for inode utilization on dst relative to src is of no use
because dst is already storing linkfile and inode is already consumed
no matter what. We might as well perform the rebalance and save on an
inode on the src node.
Change-Id: I7b8b8d35d9063a710e1bee1c7c6c7739fcc45ad4
Reviewed-on: http://review.gluster.org/5960
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
Tested-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently during `MIGRATE_DATA` we never verified about the total
inode usage among new and old bricks. Such checks are available for
disk space usage but it is also needed for inodes during data
migration. Such a check leads to uniform outcome of file distribution
upon rebalance.
Patch provides:
- Check dst_inodes < src_inodes, a friendly `warning` message to
indicate we have to ignore such an attempt
- Rename __dht_check_free_space() --> __dht_check_free_space_and_inodes()
Change-Id: I7bc4dd8b507883f0fb836300e99f0bb083493f5f
BUG: 982919
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-on: http://review.gluster.org/5948
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current self-healing algorithm is ignoring missing directories
for assigning new layout. When lookup() is racing against mkdir()
or when self-healing a half-done mkdir(), the layout assignment split
must happen based on the final number of directories, and not the
currently existing number of directories (because we finish mkdir()
of missing directories before hash layout assignment).
Without this fix, concurrent mkdir() and lookup() will step on
each others feet, create a messed up layout on disk, and end up
with different in-memory layouts.
Once two clients have different in-memory layouts, creation of
subdirectory will not arbitrate on the same hashed subvolume and will
result in GFID mismatch of the sub-directory.
Change-Id: Ia47acad67c265060405984c822b4d37512b9dbb3
BUG: 907072
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5849
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Peter Portante <pportant@redhat.com>
Tested-by: Peter Portante <pportant@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I18583f14edf1011401be15744371e2a6b79d75cc
BUG: 1003842
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5763
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If truncate/ftruncate is called with the offset as the current size
of file, then skip the durability fsync and unwind quickly.
Change-Id: I0baec68d96c6d4d8217d33bd9738f7ed0d1b40c5
BUG: 958118
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5737
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Additional information for source and sinks are added.
Change-Id: I1704956ff86ac3ae36744efe7499c1d1c43faeaf
BUG: 968301
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5638
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
internal_lock->lk_attempted_count keeps track of the number of blocking
locks attempted. lk_expected_count keeps track of the number locks expected.
Here are the sequence of steps that happen which lead to the illution that
a full file lock is achieved, even without attempting any lock.
2 mounts are doing dd on same file. Both of them witness a brick going
down and coming back up again. Both of the mounts issue self-heal
1) Both mount-1, mount-2 attempt full file locks in self-heal domain.
lets say mount-1 got the lock, mount-2 attempts blocking lock.
2) mount-1 attempts full file lock in data domain. It goes into blocking
mode because some other writes are in progress. Eventually it gets the lock.
But this results in lk_attempted_count to be still as 2 and will not be reset.
It completes syncing the data.
3) mount-1 before unlocking final small range lock attempts full file lock in
data domain to figure out the source/sink. This will be put into blocked mode
again because some other writes are in progress. But this time seeing the
stale value of lk_attempted_count being equal to lk_expected_count, blocking_lock
phase thinks it completed locking without acquiring a single lock :-O.
4) mount-1 reads xattrs without any lock but since it does not modify the xattrs,
no harm is done by this phase. It tries to do unlocks and the unlocks will fail
because the locks are never taken in data domain. mount-1 also unlocks
self-heal domain locks.
Our beloved mount-2 now gets the chance to cause horror :-(.
5) mount-2 gets the full range blocking lock in self-heal domain.
Please note that this sets lk_attempted_count to 2.
6) mount-2 attempts full range lock in data domain, since there are still
writes on going, it switches to blocking mode. But since lk_attempted_count is 2
which is same as lk_expected_count, blocking phase locks thinks it actually got
the full range locks even though not a single lock request went out the wire.
7) mount-2 reads the change-log xattrs, which would give the number of operations
in progress (lets call this 'X'). It does the syncing and at the end of the sync
decrements the changelog by 'X'. But since that 'X' was introduced by 'X' number
of transactions that are in progress, they also decrement the changelog by 'X'.
Effectively for 'X' operations 'X' number of pre-ops are done but 2 times 'X'
number of post-ops are done resulting in -ve changelog numbers.
Fix:
Reset the lk_attempted_count and inode locks array that is used to remember locks
that are granted.
Change-Id: Ic0a79cd16f32392ea7c790511343c73592bbe6bd
BUG: 1002698
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5736
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Else this results in a missing frame causing a hang
Change-Id: Ib5f3dc6a3999449faa2853cee2944af2fb065a20
BUG: 1002399
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5731
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It has been available for a while now and is probably the sane
default due to the more efficient layout and performance benefit.
BUG: 1001207
Change-Id: I6275f9741866c0afd6e685f8dc5867a86485fd20
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/5624
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Idea is to not leave the file in FOOL-FOOL scenario in case on
all the bricks data transaction failed with EDQUOT to avoid
increasing un-necessary load of self-heals in the system.
For directory transactions don't leave pending changelog in case
the failures are seen on all the subvolumes.
Change-Id: I38a5561d1d581a78347a76a4a509514e4a0c3fb7
BUG: 969461
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5709
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I9c27b1edab111031ca8eea9cc49480ea01e39089
BUG: 1002207
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/5716
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Enhance syncenv_new() to accept scaling parameters of syncproc.
Previously the scaling parameters were hardcoded and decided at
compile time.
- New API synctask_create() which returns the created synctask. This
is similar to synctask_new which only returned the status of whether
a synctask could be created or not.
The meaning of NULL cbk in synctask_create() means the task is
"joinable". Until synctask_join() is called on such a synctask,
the task is not reaped and resources are not destroyed. The
task would be in a zombie state after synctask_fn returns and
before synctask_join() is called.
Change-Id: I368ec9037de9510d2ba951f0aad86aaf18d9a6b6
BUG: 986775
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5365
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ib0c3af6babc61dc3ed45252582876e2f243d6446
BUG: 958118
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5635
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I23b8cb7223b91a55af1cd4214f61bbe0e87351f6
BUG: 952029
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.org/5683
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I38d2fdc47e4b805deafca6805e54807976ffdb7e
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 952029
Reviewed-on: http://review.gluster.org/5496
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 4c0f4c8a89039b1fa1c9c015fb6f273268164c20.
Conflicts:
xlators/mount/fuse/src/fuse-bridge.c
For build issues added CREATE_MODE_KEY definition in:
libglusterfs/src/glusterfs.h
Change-Id: I8093c2a0b5349b01e1ee6206025edbdbee43055e
BUG: 952029
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.org/5495
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we sent GF_READDIR_SKIP_DIRS for all subvolumes if
first_subvol != first_up_subvolume.
Also first_up_subvolume can change with-in the life of a call and
cbk. Saving the first_up_subvol in dht_local for checks.
Change-Id: I6e369e63f29c9761993f2a66ed768c424bb44d27
BUG: 996474
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5577
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For Write Once Read Many times type of work-load choosing largest
file to be the source will always resolve fool-fool
scenarios correctly. In other cases we fsync() the files and
will have a reliable 'wise man'.
Change-Id: Ic4dbea8d06db6d578fbcb866fb65ee2d066ac7ba
BUG: 958118
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/5519
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|