| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu <nigelb@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The recent changes to use malloc instead
of calloc left the new_name and new_path
non-null terminated. We now use snprintf
instead of strncpy to fix this.
Change-Id: I1a31701ca9447efde38921be0ba2c73cde2e7976
fixes: bz#1626346
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xlators/cluster/stripe/src/stripe-helpers.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/tier.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-layout.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-helper.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-common.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/afr/src/afr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/afr/src/afr-inode-read.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/bugs/replicate/bug-1250170-fsync.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/basic/gfapi/gfapi-async-calls-test.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/basic/ec/ec-fast-fgetxattr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/xdr/src/glusterfs3.h: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/rpc-transport/socket/src/socket.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/rpc-lib/src/rpc-clnt.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
extras/geo-rep/gsync-sync-gfid.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-xml-output.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-rpc-ops.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-volume.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-system.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-snapshot.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-peer.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-global.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
It doesn't make sense to calloc (allocate and clear) memory
when the code right away fills that memory with data.
It may be optimized by the compiler, or have a microscopic
performance improvement.
In some cases, also changed allocation size to be sizeof some
struct or type instead of a pointer - easier to read.
In some cases, removed redundant strlen() calls by saving the result
into a variable.
1. Only done for the straightforward cases. There's room for improvement.
2. Please review carefully, especially for string allocation, with the
terminating NULL string.
Only compile-tested!
updates: bz#1193929
Original-Author: Yaniv Kaul <ykaul@redhat.com>
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Change-Id: I16274dca4078a1d06ae09a0daf027d734b631ac2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In dht_heal_path, the inodes are created & looked up from top to down.
If the path is "a/b/c", then lookup will be done on a, then b and so
on. Here is a rough snippet of the function "dht_heal_path".
<snippet>
if (bname) { ref_count
- loc.inode = create/grep inode 1
- syncop_lookup (loc.inode)
- linked_inode = inode_link (loc.inode) 2
/*clean up current loc*/
- loc_wipe(&loc) 1
/*set up parent and bname for next child */
- loc.parent = inode
- bname = next_child_name
}
out:
- inode_ref (linked_inode) 2
- loc_wipe (&loc) 1
</snippet>
The problem with the above code is if _bname_ is empty ie the chain lookup is
done, then for the next iteration we populate loc.parent anyway. Now that
bname is empty, the loc_wipe is done in the _out_ section as well. Since, the
loc.parent was set to the previous inode, we lose a ref unwantedly. Now a
dht_local_wipe as part of the DHT_STACK_UNWIND takes away the last ref leading
to inode_destroy.
This problenm is observed currently with nfs-ganesha with the nameless lookup.
Post the inode_purge, gfapi does not get the new inode to link and hence, it links
the inode it sent in the lookup fop, which does not have any dht related context
(layout) leading to "invalid argument error" in lookup path done parallely with tar
operation.
test done in the following way:
- create two nfs client connected with two different nfs servers.
- run untar on one client and run lookup continuously on the other.
- Prior to this patch, invalid arguement was seen which is fixed with
the current patch.
Change-Id: Ifb90c178a2f3c16604068c7da8fa562b877f5c61
fixes: bz#1610256
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not needed.
There's a good chance the compiler is smart enough to remove it
anyway, but it can't hurt - I hope.
Compile-tested only!
Change-Id: Id7c054e146ba630227affa591007803f3046416b
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Please review, it's not always just the comments that were fixed.
I've had to revert of course all calls to creat() that were changed
to create() ...
Only compile-tested!
Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The last using of the subvol argument has been removed at 4e1ec35ef4f7
("core: fill 'ia_ino' from 'ia_gfid' in 'storage/posix' ......")
7 years ago (2011-06-16).
Change-Id: I9788d79e2e40cc153cf2960e28c7c1c1033dc8f7
fixes: bz#1601683
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DHT attempts to use up the entire buffer in readdirp before
unwinding in an attempt to reduce the number of calls.
However, this has 2 disadvantages:
1. This can cause a stack overflow when parallel readdir
is enabled. If the buffer only has a little space,rda can send back
only one or two entries. If those entries are stripped out by
dht_readdirp_cbk (linkto files for example) it will once again
wind down to rda in an attempt to fill the buffer before unwinding to FUSE.
This process can continue for several iterations, causing the stack
to grow and eventually overflow, causing the process to crash.
2. If parallel readdir is disabled, dht could send readdirp
calls with small buffers to the bricks, thus increasing the
number of network calls.
We are therefore reverting to the existing behaviour.
Please note, this only mitigates the stack overflow, it does
not prevent it from happening. This is still possible if
a subvol has thousands of linkto files for instance.
Change-Id: I291bc181c5249762d0c4fe27fa4fc2631166adf5
fixes: bz#1593548
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test case:
# while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv
"test$uuid" "test" -f || break; echo "done:$uuid"; done
This script was run in parallel from multiple mountpoints
Along the course of getting the above usecase working, many issues
were found:
Issue 1:
=======
consider a case of rename (src, dst). We can encounter a situation
where,
* dst is a file present at the time of lookup
* dst is removed by the time rename fop reaches glusterfs
In this scenario, acquring inodelk on dst fails with ESTALE resulting
in failure of rename. However, as per POSIX irrespective of whether
dst is present or not, rename should be successful. Acquiring entrylk
provides synchronization even in races like this.
Algorithm:
1. Take inodelks on src and dst (if dst is present) on respective
cached subvols. These inodelks are done to preserve backward
compatibility with older clients, so that synchronization is
preserved when a volume is mounted by clients of different
versions. Once relevant older versions (3.10, 3.12, 3.13) reach
EOL, this code can be removed.
2. Ignore ENOENT/ESTALE errors of inodelk on dst.
3. protect namespace of src and dst. To protect namespace of a file,
take inodelk on parent on hashed subvol, then take entrylk on the
same subvol on parent with basename of file. inodelk on parent is
done to guard against changes to parent layout so that hashed
subvol won't change during rename.
4. <rest of rename continues>
5. unlock all locks
Issue 2:
========
linkfile creation in lookup codepath can race with a rename. Imagine
the following scenario:
* lookup finds a data-file with gfid - gfid-dst - without a
corresponding linkto file on hashed-subvol. It decides to create
linkto file with gfid - gfid-dst.
- Note that some codepaths of dht-rename deletes linkto file of
dst as first step. So, a lookup racing with an in-progress
rename can easily run into this situation.
* a rename (src-path:gfid-src, dst-path:gfid-dst) renames data-file
and hence gfid of data-file changes to gfid-src with path dst-path.
* lookup proceeds and creates linkto file - dst-path - with gfid -
dst-gfid - on hashed-subvol.
* rename tries to create a linkto file dst-path with src-gfid on
hashed-subvol, but it fails with EEXIST. But EEXIST is ignored
during linkto file creation.
Now we've ended with dst-path having different gfids - dst-gfid on
linkto file and src-gfid on data file. Future lookups on dst-path will
always fail with ESTALE, due to differing gfids.
The fix is to synchronize linkfile creation in lookup path with rename
using the same mechanism of protecting namespace explained in solution
of Issue 1. Once locks are acquired, before proceeding with linkfile
creation, we check whether conditions for linkto file creation are
still valid. If not, we skip linkto file creation.
Issue 3:
========
gfid of dst-path can change by the time locks are acquired. This
means, either another rename overwrote dst-path or dst-path was
deleted and recreated by a different client. When this happens,
cached-subvol for dst can change. If rename proceeds with old-gfid and
old-cached subvol, we'll end up in inconsistent state(s) like dst-path
with different gfids on different subvols, more than one data-file
being present etc.
Fix is to do the lookup with a new inode after protecting namespace of
dst. Post lookup, we've to compare gfids and correct local state
appropriately to be in sync with backend.
Issue 4:
========
During revalidate lookup, if following a linkto file doesn't lead to a
valid data-file, local->cached-subvol was not reset to NULL. This
means we would be operating on a stale state which can lead to
inconsistency. As a fix, reset it to NULL before proceeding with
lookup everywhere.
Issue 5:
========
Stale dentries left out in inode table on brick resulted in failures
of link fop even though the file/dentry didn't exist on backend fs. A
patch is submitted to fix this issue. Please check the dependency tree
of current patch on gerrit for details
In short, we fix the problem by not blindly trusting the
inode-table. Instead we validate whether dentry is present by doing
lookup on backend fs.
Change-Id: I832e5c47d232f90c4edb1fafc512bf19bebde165
updates: bz#1543279
BUG: 1543279
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dht_(f)xattrop implementation did not implement
migration phase1/phase2 checks which could cause issues
with rebalance on sharded volumes.
This does not solve the issue where fops may reach the target
out of order.
Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c
BUG: 1471031
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity ID: 245
Check statvfs received as cbk before using it
Coverity ID: 228
Check NULL loc before freeing it.
Change-Id: I1b153ed5e7b81bcf7033bf710808e95908dcfef4
BUG: 789278
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Comparing the uuid string of the local node against that stored in the
local_subvol information is inefficient, especially as it is
done for every file to be migrated. The code has now been changed
to set the value of info to 1 if the nodeuuid is that of the node
making the comparison so this becomes an integer comparison.
Change-Id: I7491d59caad3b71dbf5facc94dcde0cd53962775
BUG: 1451434
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In a distributed volume custom extended attribute value for a directory
does not display correct value after stop/start or added newly brick.
If any extended(acl) attribute value is set for a directory after stop/added
the brick the attribute(user|acl|quota) value is not updated on brick
after start the brick.
Solution: First store hashed subvol or subvol(has internal xattr) on inode ctx and
consider it as a MDS subvol.At the time of update custom xattr
(user,quota,acl, selinux) on directory first check the mds from
inode ctx, if mds is not present on inode ctx then throw EINVAL error
to application otherwise set xattr on MDS subvol with internal xattr
value of -1 and then try to update the attribute on other non MDS
volumes also.If mds subvol is down in that case throw an
error "Transport endpoint is not connected". In dht_dir_lookup_cbk|
dht_revalidate_cbk|dht_discover_complete call dht_call_dir_xattr_heal
to heal custom extended attribute.
In case of gnfs server if hashed subvol has not found based on
loc then wind a call on all subvol to update xattr.
Fix: 1) Save MDS subvol on inode ctx
2) Check if mds subvol is present on inode ctx
3) If mds subvol is down then call unwind with error ENOTCONN and if it is up
then set new xattr "GF_DHT_XATTR_MDS" to -1 and wind a call on other
subvol.
4) If setxattr fop is successful on non-mds subvol then increment the value of
internal xattr to +1
5) At the time of directory_lookup check the value of new xattr GF_DHT_XATTR_MDS
6) If value is not 0 in dht_lookup_dir_cbk(other cbk) functions then call heal
function to heal user xattr
7) syncop_setxattr on hashed_subvol to reset the value of xattr to 0
if heal is successful on all subvol.
Test : To reproduce the issue followed below steps
1) Create a distributed volume and create mount point
2) Create some directory from mount point mkdir tmp{1..5}
3) Kill any one brick from the volume
4) Set extended attribute from mount point on directory
setfattr -n user.foo -v "abc" ./tmp{1..5}
It will throw error " Transport End point is not connected "
for those hashed subvol is down
5) Start volume with force option to start brick process
6) Execute getfattr command on mount point for directory
7) Check extended attribute on brick
getfattr -n user.foo <volume-location>/tmp{1..5}
It shows correct value for directories for those
xattr fop were executed successfully.
Note: The patch will resolve xattr healing problem only for fuse mount
not for nfs mount.
BUG: 1371806
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I4eb137eace24a8cb796712b742f1d177a65343d5
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add EBADF handling for dht_fremovexattr and dht_fsetxattr.
Change-Id: Ide0d5812dae79655d2565157e5baabcd753b4309
BUG: 1476665
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17999
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DHT fd based fops used to check if the fd was open
on the cached subvol before winding the call. However,
this introduced a performance regression of about
30% for reads.
This check was introduced to handle cases where files
were migrated while IOs were happening. As this is not
the common case, dht will now check if the fd is
open on the cached subvol only if the call fails
with EBADF.
This will prevent a performance hit where a rebalance
is not running.
Change-Id: I2035a858d63c3fcd22bb634055bbb0ad01686808
BUG: 1476665
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17976
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a another race between the cached subvol
being updated in the inode_ctx and the fd being opened on
the target.
1. fop1 -> fd1 -> subvol0
2. file migrated from subvol0 to subvol1 and cached_subvol
changed to subvol1 in inode_ctx
3. fop2 -> fd1 -> subvol1 [takes new cached subvol]
4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1
5. fop1 -> checks fd ctx (fd not open on subvol0)
-> tries to open fd1 on subvol0 -> fails with "No such file on directory".
Fix:
If dht_fd_open_on_dst fails with ENOENT or ESTALE, wind to old subvol
and let the phase1/phase2 checks handle it.
Change-Id: I34f8011574a8b72e3bcfe03b0cc4f024b352f225
BUG: 1465075
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17731
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an fd is opened on a file, the file is migrated
and the cached subvol is updated in the inode_ctx
before an fd based fop is sent, the fop is sent to
the dst subvol on which the fd is not opened.
This causes the FOP to fail with EBADF.
Now, every fd based fop will check to see that the fd
has been opened on the dst subvol before winding it down.
Change-Id: Id92ef5eb7a5b5226688e2d2868b15e383f5f240e
BUG: 1465075
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17630
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_readdirp must unwind with list of entries only after
the entire buffer requested by kernel is filled to avoid
extra syscalls occuring when returning partially filled
buffer. Also wind readdir call to next subvol on reaching
EOD for directory on that subvol to avoid extra network call.
Change-Id: If2e1a2722f813d95457c7542bff25fef56c7a041
BUG: 1356453
Signed-off-by: Sakshi <sabansal@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: https://review.gluster.org/12271
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Rebalance compares the node-uuid of a file against its own
to and migrates a file only if they match. However, the
current behaviour in both AFR and EC is to return
the node-uuid of the first brick in a replica set for all
files. This means a single node ends up migrating all
the files if the first brick of every replica set is on the
same node.
Fix:
AFR and EC will return all node-uuids for the replica set.
The rebalance process will divide the files to be migrated
among all the nodes by hashing the gfid of the file and
using that value to select a node to perform the migration.
This patch makes the required DHT and tiering changes.
Some tests in rebal-all-nodes-migrate.t will need to be
uncommented once the AFR and EC changes are merged.
Change-Id: I5ce41600f5ba0e244ddfd986e2ba8fa23329ff0c
BUG: 1366817
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17239
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Design doc: https://review.gluster.org/16876
Directory creation is now synchronized with blocking inodelk of the
parent on the hashed subvolume followed by the entrylk on the hashed
subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup
selfheal mkdir.
To maintain internal consistency of directories across all subvols of
dht, we need locks. Specifically we are interested in:
1. Consistency of layout of a directory. Only one writer should modify
the layout at a time. A writer (layout setting during directory heal
as part of lookup) shouldn't modify the layout while there are
readers (all other fops like create, mkdir etc., which consume
layout) and readers shouldn't read the layout while a writer is in
progress. Readers can read the layout simultaneously. Writer takes
a WRITE inodelk on the directory (whose layout is being modified)
across ALL subvols. Reader takes a READ inodelk on the directory
(whose layout is being read) on ANY subvol.
2. Consistency of directory namespace across subvols. The path and
associated gfid should be same on all subvols. A gfid should not be
associated with more than one path on any subvol. All fops that can
change directory names (mkdir, rmdir, renamedir, directory creation
phase in lookup-heal) takes an entrylk on hashed subvol of the
directory.
NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a
directory, the transaction itself is a consumer of layout on
parent directory. So, the transaction is a reader of parent
layout and does an inodelk on parent directory just like any
other layout reader. So a mkdir (dir/subdir) would:
> Acquire a READ inodelk on "dir" on any subvol.
> Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir".
> creates directory on hashed subvol and possibly on non-hashed subvols.
> UNLOCK (entrylk)
> UNLOCK (inodelk)
NOTE2: mkdir fop while setting the layout of the directory being created
is considered as a reader, but NOT a writer. The reason is for
a fop which can consume the layout of a directory to come either
of the following conditions has to be true:
> mkdir syscall from application has to complete. In this case no
need of synchronization.
> A lookup issued on the directory racing with mkdir has to complete.
Since layout setting by a lookup is considered as a writer, only
one of either mkdir or lookup will set the layout.
Code re-organization:
All the lock related routines are moved to "dht-lock.c" file.
New wrapper function is introduced to take blocking inodelk
followed by entrylk 'dht_protect_namespace'
Updates #191
Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31
Signed-off-by: Kotresh HR <khiremat@redhat.com>
BUG: 1443373
Reviewed-on: https://review.gluster.org/15472
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of the functions called to free the refcounted structure are doing a
typecast from (void*) to their own type taht is being free'd. This
really is not needed and the refcount interface is made a little simpler
without the requirement of typecasting.
With this small improvement in the API, all callers are updated too.
Change-Id: I32473b6d1799f62861d4b2d78ea30c09e6c80ab1
BUG: 1416889
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/16471
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there is no helpful log in fix-layout code path. Adding
the logs to be helpful for debugging fix-layout failures.
BUG: 1414782
Change-Id: I61c29ceedcaa2e235fa7be99866709d6ca6de3ae
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/16040
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As with dht, dirs are present on all subvolumes,
renaming them is a compound operation and thus a
partial success + partial failure scenario is
possible, resulting in an inconsistent state.
For purposes of reproduction, such a scenario can
easily be produced by stopping the volume, edit the
volfile of a certain subvolume to get at an
"option read-only on" setting, and then restart
the volume. Thus those operations that are to make change
on the affected subvolume will fail with EROFS.
To handle such scenarios, we introduce an in-memory cache
where we record the return values obtained from the
subvolumes. At the final stage of the dir rename operation
we check if it's a partial success/fail situation. If yes,
then we perform a reverse rename op on those subvolumes
where the operation succeeded.
Change-Id: I3d05f74f53932cb984a918d252a7309c1009a51d
BUG: 1412069
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/15739
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed a memleak where dict was not being unrefed
in the dht_migration_complete_check_task and
dht_rebalance_inprogress_task functions.
Change-Id: I3d42e9a2e5c8596c985bf6431a68fd3905227383
BUG: 1409186
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/16308
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removed a structure member that was not used
from dht_local_t. Skipped some unnecessary checks
in dht_filter_loc_subvol_key.
Change-Id: I81740b6528e063fb9cf5817e05865ff4d77aa748
BUG: 1378305
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/15542
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This converts sprintf to gf_asprintf in following components: * quotad.c
* dht
* afr
* protocol/client
* rpc/rpc-lib
* rpc/rpc-transport
Change-Id: If8a267bab3d91003bdef3a92664077a0136745ee
BUG: 1332073
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/14102
Tested-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When DHT traverses the inode->fd_list, it does that in an unsafe
way that can generate races with fd_unref() called from other threads.
This patch fixes this problem taking the inode->lock and adding a
reference to the fd while it's being used outside of the mutex
protected region.
A minor change in storage/posix has been done to also access the
inode->fd_list in a safe way.
Change-Id: I10d469ca6a8f76e950a8c9779ae9c8b70f88ef93
BUG: 1344340
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/14682
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During locking we send lock request to cached subvol,
and normally we unlock to the cached subvol
But with parallel fresh lookup on a directory, there
is a race window where the cached subvol can change
and the unlock can go into a different subvol from
which we took lock.
This will result in a stale lock held on one of the
subvol.
So we will store the details of subvol which we took the lock
and will unlock from the same subvol
Change-Id: I47df99491671b10624eb37d1d17e40bacf0b15eb
BUG: 1311002
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/13492
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_mkdir ()
{
first-hashed-subvol = hashed-subvol for "bname" in in-memory
layout of "parent";
inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any
subvol, but we choose first-hashed-subvol randomly");
{
begin:
hashed-subvol = hashed-subvol for "bname" in in-memory
layout of "parent";
hash-range = extract hashe-range from layout of "parent";
ret = mkdir (parent/bname, hashed-subvol, hash-range);
if (ret == "hash-value doesn't fall into layout stored on
the brick (this error is returned by posix-mkdir)")
{
refresh_parent_layout ();
goto begin;
}
}
inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN",
"first-hashed-subvol");
proceed with other parts of dht_mkdir;
}
posix_mkdir (parent/bname, client-hash-range)
{
disk-hash-range = getxattr (parent, "dht-layout-key");
if (disk-hash-range != client-hash-range) {
fail-with-error ("hash-value doesn't fall into layout
stored on the brick");
return 0;
}
continue-with-posix-mkdir;
}
Similar changes need to be done for dentry operations like create,
symlink, link, unlink, rmdir, rename. These will be addressed in
subsequent patches. This patch addresses only mkdir codepath.
This change breaks stripe tests, as on some striped subvols dht layout
xattrs are not set for some reason. This results in failure of
mkdir. Since striped volumes are always created with dht, some tests
associated with stripe also fail. So, I am making following tests
changes (since stripe is out of maintainance):
* modify ./tests/basic/rpc-coverage.t to not to use striped volumes
* mark all (2) tests in tests/bugs/stripe/ as bad tests
Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25
BUG: 1323040
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/13885
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a possibility that while an rmdir is completed on
some non-hashed subvol and proceeding to others, a lookup
selfheal can recreate the same directory on those subvols
for which the rmdir had succeeded. Now the deletion of the
parent directory will fail with an ENOTEMPTY.
To fix this take blocking inodelk on the subvols before
starting rmdir. Selfheal must also take blocking inodelk
before creating the entry.
Change-Id: I168a195c35ac1230ba7124d3b0ca157755b3df96
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/13528
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Directory size is meaningless. Every filesystem has its own
unpredictable way of increasing or decreasing it, based on internal data
structures and even transient conditions. Some filesystems (e.g. ext4)
never decrease it at all. Others (e.g. btrfs) don't even report it.
Very few programs look at it, and those that do are broken.
Unfortunately, one such program is GNU tar, which will complain when it
sees different values because at different times we got the value from
different DHT subvolumes. To avoid such problems, just report a
constant value.
Change-Id: Id64ce917c75b5f7ff50cb55b6e997f3b3556e7e3
BUG: 1302948
Original-author: Shyam <srangana@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13770
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What:
If dht_open is called on a migrating file after the inode_ctx is set,
subsequent FOPs on that fd do not open the fd on the dst subvol.
This is seen when the open-ftruncate-close sequence is repeatedly
called on a migrating file.
A second call to the sequence described above causes dht_truncate_cbk
to call dht_truncate2 as the dht_inode_ctx was already set by the first
call. As dht_rebalance_in_progress_check is not called, the fd is not
opened on the dst subvol.
On a distributed-replicate volume, this causes AFR to
open the fd using afr_fix_open, but with the wrong flags, causing
posix_ftruncate to fail with EINVAL.
The fix: We require fd specific information to make a decision while
handling migrating files.
Set the fd_ctx to indicate the fd has been opened on the dst subvol
and check if it has been set while processing Phase1/Phase2 checks
in the FOP callback functions.
Change-Id: I43cdcd8017b4a11e18afdd210469de7cd9a5ef14
BUG: 1284823
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12985
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterd occasionally loads shared libraries of translators. This
failed for tiering due to a reference to dht_methods which is defined
as a global variable which is not necessary.
The global variable has been removed and this is now a member of
dht_conf and is now initialised in the *_init calls.
Change-Id: Ifa0a21e3962b5cd8d9b927ef1d087d3b25312953
BUG: 1287842
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12863
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After a successful nameless lookup if the directory is not
present on any of the subvol, then we will get the path of
the directory and will recursively send a named lookp on
each parent directory.
This will help particularly for the scenarios like add brick
and attach-tier.
Change-Id: I64c2118a5ab03bbaa59b0dfc62babdf4472a92a3
BUG: 1272949
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12376
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ib3911dfa1f950ff9decbe249ad798e97226dd06d
BUG: 1266877
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/12295
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Determine which DHT level is responsible for
handling fops on a file undergoing migration based
on the name of the the linkto xattr set on the file
being migrated and process accordingly.
Change-Id: I82772e39314d4fe7f2ba0dcf22de0c6a374ee139
BUG: 1254428
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12090
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lookup selfheal race
Locking on all subvols before an rmdir is unable to remove all
directory entries. Hence reverting the patch for now.
Change-Id: I31baf2b2fa2f62c57429cd44f3f229c35eff1939
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/12125
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are three kinds of inline functions: plain inline, extern inline,
and static inline. All three have been removed from .c files, except
those in "contrib" which aren't our problem. Inlines in .h files, which
are overwhelmingly "static inline" already, have generally been left
alone. Over time we should be able to "lower" these into .c files, but
that has to be done in a case-by-case fashion requiring more manual
effort. This part was easy to do automatically without (as far as I can
tell) any ill effect.
In the process, several pieces of dead code were flagged by the
compiler, and were removed.
Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155
BUG: 1245331
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/11769
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a possibility that while an rmdir is completed on
some non-hashed subvol and proceeding to others. A lookup
selfheal can recreate the same directory on those subvols
for which the rmdir had succeeded. The fix is to take a
blocking inodelk on the subvols before starting rmdir.
Since selfheal requires lock on all subvols, if an rmdir
is in progess acquiring locks will fail and vice versa.
Change-Id: I841a44758c3b88f5e04d1cb73ad36e0cac9fdabb
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/11725
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
dht_rebalance_inprogress_task() was not sending lookups to the
destination subvolume for a file undergoing writes during rebalance. Due to
this, afr was not able to populate the read_subvol and failed the write
with EIO.
Fix:
Send lookup for fd based operations as well.
Thanks to Raghavendra G for helping with the RCA.
Change-Id: I638c203abfaa45b29aa5902ffd76e692a8212a19
BUG: 1244165
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/11713
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
information.
Without refcounting, we might free up memory while other fops are
still accessing it.
BUG: 1235927
Change-Id: Ia4fa4a651cd6fe2394a0c20cef83c8d2cbc8750f
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/11418
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ib3bb61c5223f409c23c68100f3fe884918d2dc3f
BUG: 1194640
Signed-off-by: arao <arao@redhat.com>
Reviewed-on: http://review.gluster.org/10021
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Joseph Fernandes
Tested-by: Joseph Fernandes
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I1ea358b83267b0bcdf654ce18fe881fd4a6bf08d
BUG: 1233139
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/11313
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I2d1f5bb2dd27f6cea52c059b4ff08ca0fa63b140
BUG: 1231425
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11209
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stashing additional information in the inode_ctx to help
decide whether the migration information is stale, which could
happen if a file was migrated several times but FOPs only detected
the P1 migration phase. If no FOP detects the P2 phase, the inode
ctx1 is never reset.
We now save the src subvol as well as the dst subvol in the
inode ctx. The src subvol is the subvol on which the FOP was sent
when the mig info was set in the inode ctx. This information is
considered stale if:
1. The subvol on which the current FOP is sent is the same as
the dst subvol in the ctx
2. The subvol on which the current FOP is sent is not the same
as the src subvol in the ctx
This does not handle the case where the same file might have been
renamed such that the src subvol is the same but the dst subvol
is different. However, that is unlikely to happen very often.
Change-Id: I05a2e9b107ee64750c7ca629aee03b03a02ef75f
BUG: 1142423
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/10834
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The destination subvol used in the fop2 variants is either stored in
inode-ctx1 or local->cached_subvol. However, it is not guaranteed that
a value stored in these locations before invocation of fop2 is still
present after the invocation as these locations are shared among
different concurrent operations. So, to preserve the atomicity of
"check dst-subvol and invoke fop2 variant if dst-subvol found", we
pass down the dst-subvol to fop2 variant.
This patch also fixes error handling in some fop2 variants.
Change-Id: Icc226228a246d3f223e3463519736c4495b364d2
BUG: 1142423
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/10943
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of including config.h in each file, and have the additional
config.h included from the compiler commandline (-include option).
When a .c file tests for a certain #define, and config.h was not
included, incorrect assumtions were made. With this change, it can not
happen again.
BUG: 1222319
Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10808
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
phase 2 of migration.
linkto xattr on source file cannot be relied to find where the data
file currently resides. This can happen if there are multiple
migrations before phase 2 detection by a client. For eg.,
* migration (M1, node1, node2) starts.
* application writes some data. DHT correctly stores the state in
inode context that phase-1 of migration is in progress
* migration M1 completes
* migration (M2, node2, node3) is triggered and completed
* application resumes writes to the file. DHT identifies it as phase-2
of migration. However, linkto xattr on node1 points to node2, but
the file is on node3. A lookup correctly identifies node3 as cached
subvol
TBD:
When we identify phase-2 of a previous migration (say M1), there
might be a migration in progress - say (M3, node3, node4). In this
case we need to send writes to both (node3, node4) not just
node3. Also, the inode state needs to correctly indicate that its in
phase-1 of migration. I'll send this as a different patch.
Change-Id: I1a861f766258170af2f6c0935468edb6be687b95
BUG: 1142423
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/10805
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a read IO occurs against a file that has reached rebalance
phase 2, we redirect the IO to the destination. For tiered
volumes, when we try to reopen the file (on the destination),
the lower level DHT receives the open call and fails; it does
not have a "cached subvol". Fix is to "teach" the lower level
DHT of the new location by sending a locate before the open.
Change-Id: Ia4acb0035ff1da15f6a8f9ed54f43c76e8b98f5f
BUG: 1214048
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: root <root@gprfs018.sbu.lab.eng.bos.redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10324
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current patch address two part of the design proposed.
1. Rebalance multiple files in parallel
2. Crawl only bricks that belong to the current node
Brief design explanation for the above two points.
1. Rebalance multiple files in parallel:
-------------------------------------
The existing rebalance engine is single threaded. Hence, introduced
multiple threads which will be running parallel to the crawler. The
current rebalance migration is converted to a "Producer-Consumer"
frame work.
Where Producer is : Crawler
Consumer is : Migrating Threads
Crawler: Crawler is the main thread. The job of the crawler is now
limited to fix-layout of each directory and add the files which are
eligible for the migration to a global queue in a round robin manner
so that we will use all the disk resources efficiently. Hence, the
crawler will not be "blocked" by migration process.
Producer: Producer will monitor the global queue. If any file is
added to this queue, it will dqueue that entry and migrate the file.
Currently 20 migration threads are spawned at the beginning of the
rebalance process. Hence, multiple file migration happens in parallel.
2. Crawl only bricks that belong to the current node:
--------------------------------------------------
As rebalance process is spawned per node, it migrates only the files
that belongs to it's own node for the sake of load balancing. But it
also reads entries from the whole cluster, which is not necessary as
readdir hits other nodes.
New Design:
As part of the new design the rebalancer decides the subvols
that are local to the rebalancer node by checking the node-uuid of
root directory prior to the crawler starts. Hence, readdir won't hit
the whole cluster as it has already the context of local subvols and
also node-uuid request for each file can be avoided. This makes the
rebalance process "more scalable".
Change-Id: I73ed6ff807adea15086eabbb8d9883e88571ebc1
BUG: 1171954
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/9657
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|