| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DHT fd based fops used to check if the fd was open
on the cached subvol before winding the call. However,
this introduced a performance regression of about
30% for reads.
This check was introduced to handle cases where files
were migrated while IOs were happening. As this is not
the common case, dht will now check if the fd is
open on the cached subvol only if the call fails
with EBADF.
This will prevent a performance hit where a rebalance
is not running.
> BUG: 1476665
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17976
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Reviewed-by: Susant Palai <spalai@redhat.com>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
(cherry picked from commit cdca1cb26a0aba390c6d8485c0d6d95e22ffc8bd)
Change-Id: I2035a858d63c3fcd22bb634055bbb0ad01686808
BUG: 1479303
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17995
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A brief about how hardlink migration works:
- Different hardlinks (to the same file) may hash to different bricks,
but their cached subvol will be same. Rebalance picks up the first hardlink,
calculates it's hash(call it TARGET) and set the hashed subvolume as an xattr
on the data file.
- Now all the hardlinks those come after this will fetch that xattr and will
create linkto files on TARGET (all linkto files for the hardlinks will be hardlink
to each other on TARGET).
- When number of hardlinks on source is equal to the number of hardlinks on
TARGET, the data migration will happen.
RACE:1
Since rebalance is multi-threaded, the first lookup (which decides where the TARGET
subvol should be), can be called by two hardlink migration parallely and they may end
up creating linkto files on two different TARGET subvols. Hence, hardlinks won't be
migrated.
Fix: Rely on the xattr response of lookup inside gf_defrag_handle_hardlink since it
is executed under synclock.
RACE:2
The linkto files on TARGET can be created by other clients also if they are doing
lookup on the hardlinks. Consider a scenario where you have 100 hardlinks. When
rebalance is migrating 99th hardlink, as a result of continuous lookups from other
client, linkcount on TARGET is equal to source linkcount. Rebalance will migrate data
on the 99th hardlink itself. On 100th hardlink migration, hardlink will have TARGET as
cached subvolume. If it's hash is also the same, then a migration will be triggered from
TARGET to TARGET leading to data loss.
Fix: Make sure before the final data migration, source is not same as destination.
RACE:3
Since a hardlink can be migrating to a non-hashed subvolume, a lookup from other
client or even the rebalance it self, might delete the linkto file on TARGET leading
to hardlinks never getting migrated.
This will be addressed in a different patch in future.
Change-Id: If0f6852f0e662384ee3875a2ac9d19ac4a6cea98
BUG: 1469964
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17755
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The earlier approach of using the number of files
to determine when the rebalance would complete did
not work well when file sizes differed widely.
The new approach now gets the total data size and
uses that information to determine how long
the rebalance is expected to take.
Change-Id: I84e80a0893efab72ff06130e4596fa71c9c8c868
BUG: 1467209
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17668
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an fd is opened on a file, the file is migrated
and the cached subvol is updated in the inode_ctx
before an fd based fop is sent, the fop is sent to
the dst subvol on which the fd is not opened.
This causes the FOP to fail with EBADF.
Now, every fd based fop will check to see that the fd
has been opened on the dst subvol before winding it down.
Change-Id: Id92ef5eb7a5b5226688e2d2868b15e383f5f240e
BUG: 1465075
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17630
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rebalance used to get the file count in the beginning
and not update it. This caused estimates to fail
if the number changed during the rebalance.
The rebalance now updates the file count periodically.
Change-Id: I1667ee69e8a1d7d6bc6bc2f060fad7f989d19ed4
BUG: 1464110
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17607
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a similar issue fixed for rename in https://review.gluster.org/#/c/16016/
For hardlinks, if cached and hashed subvolumes are different, then it will first
create linkto file in hashed using root permission, but actually hardlink creation
fails with EACESS and stale linkto file is never removed.All the followup hardlink
calls with file name will result ESTALE because linktofile creation fails with EEXIST
and follow up lookup on linkto file returns gfid-mismatching(old linkto file) and
finally fails with ESTALE
Steps to produce :
(From link/00.t test from posix-testsuite)
Steps executed in script
* create a file "abc" using root
* change the ownership of file to a non root user
* create hardlink "link" for "abc" using a non root user, it fails with EACESS
* delete "abc"
* create directory "abc" using root
* again try to create hadrlink "link" for "abc" using non root user, fails with ESTALE
Also tried to fix other bugs in dht_linkfile_create_cbk() and posix_lookup.
Thanks Susant for the help in debugging the issue and suggestion for this patch.
Change-Id: I7a5a1899d3fd1fdb13578b37f9d52a084492e35d
BUG: 1452084
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-on: https://review.gluster.org/17331
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Empty directories were not being considered while
calculating rebalance estimates leading to negative
time-left values being displayed as part of the
rebalance status.
Change-Id: I48d41d702e72db30af10e6b87b628baa605afa98
BUG: 1457985
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17448
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_readdirp must unwind with list of entries only after
the entire buffer requested by kernel is filled to avoid
extra syscalls occuring when returning partially filled
buffer. Also wind readdir call to next subvol on reaching
EOD for directory on that subvol to avoid extra network call.
Change-Id: If2e1a2722f813d95457c7542bff25fef56c7a041
BUG: 1356453
Signed-off-by: Sakshi <sabansal@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: https://review.gluster.org/12271
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On demand migration of files i.e. migration done by clients
triggered by a setfattr was broken.
Dependency on defrag led to crash when migration was triggered from
client.
Note: This functionality is not available for tiered volumes. Migration
from tier served client will fail with ENOTSUP.
usage (But refer to the steps mentioned below to avoid any issues) :
setfattr -n "trusted.distribute.migrate-data" -v "1" <filename>
The purpose of fixing the on-demand client migration was to give a
workaround where the user has lots of empty directories compared to
files and want to do a remove-brick process.
Here are the steps to trigger file migration for remove-brick process from
client. (This is highly recommended to follow below steps as is)
Let's say it is a replica volume and user want to remove a replica pair
named brick1 and brick2. (Make sure healing is completed before you run
these steps)
Step-1: Start remove-brick process
- gluster v remove-brick <volname> brick1 brick2 start
Step-2: Kill the rebalance daemon
- ps aux | grep glusterfs | grep rebalance\/ | awk '{print $2}' | xargs kill
Step-3: Do a fresh mount as mentioned here
- glusterfs -s ${localhostname} --volfile-id rebalance/$volume-name /tmp/mount/point
Step-4: Go to one of the bricks (among brick1 and brick2)
- cd <brick1 path>
Step-5: Run the following command.
- find . -not \( -path ./.glusterfs -prune \) -type f -not -perm 01000 -exec bash -c 'setfattr -n "distribute.fix.layout" -v "1" ${mountpoint}/$(dirname '{}')' \; -exec setfattr -n "trusted.distribute.migrate-data" -v "1" ${mountpoint}/'{}' \;
This command will ignore the linkto files and empty directories. Do a fix-layout of
the parent directory. And trigger a migration operation on the files.
Step-6: Once this process is completed do "remove-brick force"
- gluster v remove-brick <volname> brick1 brick2 force
Note: Use the above script only when there are large number of empty directories.
Since the script does a crawl on the brick side directly and avoids directories those
are empty, the time spent on fixing layout on those directories are eliminated(even if the script
does not do fix-layout on empty directories, post remove-brick a fresh layout will be built
for the directory, hence not affecting application continuity).
Detailing the expectation for hardlink migartion with this patch:
Hardlink is migrated only for remove-brick process. It is highly essential
to have a new mount(step-3) for the hardlink migration to happen. Why?:
setfattr operation is an inode based operation. Since, we are doing setfattr from
fuse mount here, inode_path will try to build path from the linked dentries to the inode.
For a file without hardlinks the path construction will be correct. But for hardlinks,
the inode will have multiple dentries linked.
Without fresh mount, inode_path will always get the most recently linked dentry.
e.g. if there are three hardlinks named dir1/link1, dir2/link2, dir3/link3, on a client
where these hardlinks are looked up, inode_path will always return the path dir3/link3
if dir3/link3 was looked up most recently. Hence, we won't be able to create linkto
files for all other hardlinks on destination (read gf_defrag_handle_hardlink for more details
on hardlink migration).
With a fresh mount, the lookup and setfattr become serialized. e.g. link2 won't be
looked up until link1 is looked up and migrated. Hence, inode_path will always have the correct
path, in this case link1 dentry is picked up(as this is the most recently looked up inode) and
the path is built right.
Note: If you run the above script on an existing mount(all entries looked up), hard links may
not be migrated, but there should not be any other issue. Please raise a bug, if you find any
issue.
Tests: Manual
Change-Id: I9854cdd4955d9e24494f348fb29ba856ea7ac50a
BUG: 1450975
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17115
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Rebalance compares the node-uuid of a file against its own
to and migrates a file only if they match. However, the
current behaviour in both AFR and EC is to return
the node-uuid of the first brick in a replica set for all
files. This means a single node ends up migrating all
the files if the first brick of every replica set is on the
same node.
Fix:
AFR and EC will return all node-uuids for the replica set.
The rebalance process will divide the files to be migrated
among all the nodes by hashing the gfid of the file and
using that value to select a node to perform the migration.
This patch makes the required DHT and tiering changes.
Some tests in rebal-all-nodes-migrate.t will need to be
uncommented once the AFR and EC changes are merged.
Change-Id: I5ce41600f5ba0e244ddfd986e2ba8fa23329ff0c
BUG: 1366817
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17239
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current rebalance throttle options: lazy/normal/aggressive may not always be
sufficient for the purpose of throttling. In our recent test, we observed for
certain setups, normal and aggressive modes behaved similarly consuming full
disk bandwidth. So in cases like this admin should be able to tune it
down(or vice versa) depending on the need.
Along with old throttle configurations, thread counts are tuned based on number.
e.g. gluster v set vol-name cluster-rebal.throttle 5.
Admin can tune up/down between 0 and the number of cores available.
Note: For heterogenous servers, validation will fail on the old server if "number"
is given for throttle configuration.
The message looks something like this:
"volume set: failed: Staging failed on vm2. Error: cluster.rebal-throttle should be {lazy|normal|aggressive}"
Test: Manual test by logging active thread number after reconfiguring throttle option.
testcase: tests/basic/distribute/throttle-rebal.t
Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3
BUG: 1438370
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16980
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Throttle settings "normal" and "aggressive" for rebalance
did not have performance difference.
normal mode spawns $(no. of cores - 4)/2 threads and aggressive
spawns $(no. of cores - 4) threads. Though aggressive mode has twice
the number of threads compared to that of normal mode, there was no
performance gain when switched to aggressive mode from normal mode.
RCA:
During the course of debugging the above problem, we tried assigning
migration job to migration threads spawned by rebalance, rather than
synctasks(as there is more overhead associated to manage the task
queue and threads). This gave us a significant improvement over rebalance
under synctasks. This patch does not really gurantee that there will be a
clear performance difference between normal and aggressive mode, but this
patch certainly maximized the disk utilization for 1GBfiles run.
Results:
Test enviroment:
Gluster Config:
Number of Bricks: 2 (one brick per disk(RAID-6 12 disk))
Bricks:
Brick1: server1:/brick/test1/1
Brick2: server2:/brick/test1/1
Options Reconfigured:
performance.readdir-ahead: on
server.event-threads: 4
client.event-threads: 4
1000 files with 1GB each were created/renamed such that all files will have
server1 as cached and server2 as hashed, so that all files will be migrated.
Test machines had 24 cores each.
Results with/without synctask based migration:
-----------------------------------------------
mode normal(10threads) aggressive(20threads)
timetaken 0:55:30 (h:m:s) 0:56:3 (h:m:s)
withsynctask
timetaken
with migrator 0:38:3 (h:m:s) 0:23:41 (h:m:s)
threads
From above table it can be seen that, there is a clear 2x perf gain between
rebalance with synctask vs rebalance with migrator threads.
Additionally this patch modifies the code so that caller will have the exact error
number returned by dht_migrate_file(earlier the errno meaning was overloaded). This
will help avoiding scenarios where migration failure due to ENOENT, can result in
rebalance abort/failure.
Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e
BUG: 1420166
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16427
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Design doc: https://review.gluster.org/16876
Directory creation is now synchronized with blocking inodelk of the
parent on the hashed subvolume followed by the entrylk on the hashed
subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup
selfheal mkdir.
To maintain internal consistency of directories across all subvols of
dht, we need locks. Specifically we are interested in:
1. Consistency of layout of a directory. Only one writer should modify
the layout at a time. A writer (layout setting during directory heal
as part of lookup) shouldn't modify the layout while there are
readers (all other fops like create, mkdir etc., which consume
layout) and readers shouldn't read the layout while a writer is in
progress. Readers can read the layout simultaneously. Writer takes
a WRITE inodelk on the directory (whose layout is being modified)
across ALL subvols. Reader takes a READ inodelk on the directory
(whose layout is being read) on ANY subvol.
2. Consistency of directory namespace across subvols. The path and
associated gfid should be same on all subvols. A gfid should not be
associated with more than one path on any subvol. All fops that can
change directory names (mkdir, rmdir, renamedir, directory creation
phase in lookup-heal) takes an entrylk on hashed subvol of the
directory.
NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a
directory, the transaction itself is a consumer of layout on
parent directory. So, the transaction is a reader of parent
layout and does an inodelk on parent directory just like any
other layout reader. So a mkdir (dir/subdir) would:
> Acquire a READ inodelk on "dir" on any subvol.
> Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir".
> creates directory on hashed subvol and possibly on non-hashed subvols.
> UNLOCK (entrylk)
> UNLOCK (inodelk)
NOTE2: mkdir fop while setting the layout of the directory being created
is considered as a reader, but NOT a writer. The reason is for
a fop which can consume the layout of a directory to come either
of the following conditions has to be true:
> mkdir syscall from application has to complete. In this case no
need of synchronization.
> A lookup issued on the directory racing with mkdir has to complete.
Since layout setting by a lookup is considered as a writer, only
one of either mkdir or lookup will set the layout.
Code re-organization:
All the lock related routines are moved to "dht-lock.c" file.
New wrapper function is introduced to take blocking inodelk
followed by entrylk 'dht_protect_namespace'
Updates #191
Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31
Signed-off-by: Kotresh HR <khiremat@redhat.com>
BUG: 1443373
Reviewed-on: https://review.gluster.org/15472
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
criteria happens to be the same subvol containing data-file
Rebalance need to figure out a new subvol in case the hashed subvol
does not have enough space. In the process of figuring out the new subvol,
we need to ignore the source subvol, otherwise it will lead to data loss.
Test: Manual
Ran the following
sizeof /tmp/1: 1.5GB
sizeof /brick/1: 16GB
sizeof /tmp/2: 1.5GB
<start>
glusterd; gluster v create test1 vm1:/brick/1 vm1:/tmp/1;
gluster v start test1;
mount -t glusterfs vm1:test1 /mnt;
for i in {1..2000}
do
dd if=/dev/zero of=/mnt/file$i bs=1KB count=1 &> /dev/null;
done
gluster v add-brick test1 vm1:/tmp/2
gluster v set test1 min-free-disk 12GB
gluster v remove-brick test1 vm1:/tmp/1 star
<end>
file count and data were intact.
Change-Id: Ib8fc8467a3d48a7c12958824c4f0b88e160b86c1
BUG: 1441508
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17064
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
test: Manual
created files of size 1K on 2 brick(of size 1GB) setup .
added a brick of size 16GB.
set min-free-disk to 12GB(so that first two bricks won't receive any files).
removed one of the 1st brick of size 1GB.
Logs from test:
[2017-04-12 08:52:08.196484] W [MSGID: 0] [dht-rebalance.c:895:__dht_check_free_space]
0-test1-dht: Write will cross min-free-disk for file - /tile32 on subvol - test1-client-1.
Looking for new subvol.
[2017-04-12 08:52:08.196904] I [MSGID: 0] [dht-rebalance.c:925:__dht_check_free_space]
0-test1-dht: new target found - test1-client-2 for file - /tile32
- Post migration we have two files. The new destination (/brick/1) has the data file
[root@vm1 ~]# ll /brick/1/tile32
-rw-r--r--. 2 root root 0 Apr 12 14:22 /brick/1/tile32
- On the old target the linkto file is there with linkto xattr pointing to /brick/1
[root@vm1 ~]# ll /tmp/2/tile32
---------T. 2 root root 1000 Apr 12 14:22 /tmp/2/tile32
[root@vm1 ~]# getfattr -m . -de text /tmp/2/tile32
getfattr: Removing leading '/' from absolute path names
security.selinux="unconfined_u:object_r:user_tmp_t:s0"
trusted.gfid="����:Aс�#�/'b2"
trusted.glusterfs.dht.linkto="test1-client-2"
Marking ./tests/features/worm_sh.t as bad test.
Reason being, this patch failed on master branch as well and it has nothing
to do with rebalance/remove-brick.
BUG: 1441508
Change-Id: I90bae251cda3d957a49cdceda90cd08311a392fb
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17034
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity complain about enum mismatch since we assign GF_OP_CMD_NONE
to a variable holding gf_defrag_type.
Change-Id: I63e71f552b3cc752c26c1b8705420f38908e17e6
BUG: 789278
Signed-off-by: Michael Scherer <misc@redhat.com>
Reviewed-on: https://review.gluster.org/16756
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Michael Scherer <misc@fedoraproject.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tierd is implemented by separating from rebalance process.
The commands affected:
1) Attach tier will trigger this process instead of old one
2) tier start and tier start force will also trigger this process.
3) volume status [tier] will show tier daemon as a process instead
of task and normal tier status and tier detach status works.
4) tier stop implemented.
5) detach tier implemented separately along with new detach tier
status
6) volume tier volname status will work using the changes.
7) volume set works
This patch has separated the tier translator from the legacy
DHT rebalance code. It now sends the RPCs from the CLI
to glusterd separate to the DHT rebalance code.
The daemon is now a service, similar to the snapshot daemon,
and can be viewed using the volume status command.
The code for the validation and commit phase are the same
as the earlier tier validation code in DHT rebalance.
The “brickop” phase has been changed so that the status
command can use this framework.
The service management framework is now used.
DHT rebalance does not use this framework.
This service framework takes care of :
*) spawning the daemon, killing it and other such processes.
*) volume set options , which are written on the volfile.
*) restart and reconfigure functions. Restart is to restart
the daemon at two points
1)after gluster goes down and comes up.
2) to stop detach tier.
*) reconfigure is used to make immediate volfile changes.
By doing this, we don’t restart the daemon.
it has the code to rewrite the volfile for topological
changes too (which comes into place during add and remove brick).
With this patch the log, pid, and volfile are separated
and put into respective directories.
Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399
BUG: 1313838
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/13365
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: hari gowtham <hari.gowtham005@gmail.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As with dht, dirs are present on all subvolumes,
renaming them is a compound operation and thus a
partial success + partial failure scenario is
possible, resulting in an inconsistent state.
For purposes of reproduction, such a scenario can
easily be produced by stopping the volume, edit the
volfile of a certain subvolume to get at an
"option read-only on" setting, and then restart
the volume. Thus those operations that are to make change
on the affected subvolume will fail with EROFS.
To handle such scenarios, we introduce an in-memory cache
where we record the return values obtained from the
subvolumes. At the final stage of the dir rename operation
we check if it's a partial success/fail situation. If yes,
then we perform a reverse rename op on those subvolumes
where the operation succeeded.
Change-Id: I3d05f74f53932cb984a918d252a7309c1009a51d
BUG: 1412069
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/15739
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
private
If reconfigure is executed parallely (or concurrently with dht_init),
there are races that can corrupt memory. One such race is modification
of regexes stored in conf (conf->rsync_regex_valid and
conf->extra_regex_valid) through dht_init_regex. With change [1],
reconfigure codepath can get executed parallely (with itself or with
dht_init) and this fix is needed.
Also, a reconfigure can race with any thread doing dht_layout_search,
resulting in dht_layout_search accessing regex freed up by reconfigure
(like in bz 1399134).
[1] http://review.gluster.org/15046
Change-Id: I039422a65374cf0ccbe0073441f0e8c442ebf830
BUG: 1399134
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/15945
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Demote files on priority if hi-watermark has been breached and continue
to demote until the watermark drops below hi-watermark.
Monitor watermark more frequently.
Trigger demotion as soon as hi-watermark is breached.
Add cluster.tier-emergency-demote-query-limit option to limit number
of files returned from the database query for every iteration of
tier_migrate_using_query_file(). If watermark hasn't dropped below
hi-watermark during the first iteration, the next iteration will be
triggered approximately 1 second after tier_demote() returns to the
main tiering loop.
Update changetimerecorder xlator to handle query for emergency demote
mode.
Add tier-ctr-interface.h:
Move tier and ctr interface specific macros and struct definition from
libglusterfs/src/gfdb/gfdb_data_store.h to new header
libglusterfs/src/tier-ctr-interface.h
Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811
BUG: 1366648
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/15158
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removed a structure member that was not used
from dht_local_t. Skipped some unnecessary checks
in dht_filter_loc_subvol_key.
Change-Id: I81740b6528e063fb9cf5817e05865ff4d77aa748
BUG: 1378305
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/15542
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In a distributed-replicate volume attribute
"replica.split-brain-status" value does not display split-brain
condition though directory is in split-brain.
If directory is in split brain on mutiple replica-pairs
it does not show full list of replica pairs.
Solution: Update the dht_aggregate code to aggregate the xattr
value in this specific condition.
Fix: 1) function getChoices returns the choices from split-brain
status string.
2) function add_opt adding the choices to local buffer to
store in dictionary
3) For the key "replica.split-brain-status" function dht_aggregate
call dht_aggregate_split_brain_xattr to prepare the list.
Test: To verify the patch followed below steps
1) Create a distributed replica volume and create mount point
2) Stop heal daemon
3) Touch file and directories on mount point
mkdir test{1..5};touch tmp{1..5}
4) Down brick process on one of the replica set
pkill -9 glusterfsd
5) Change permission of dir on mount point
chmod 755 test{1..5}
6) Restart brick process on node with force option
7) kill brick process on other node in same replica set
8) Change permission of dir again on mount point
chmod 766 test{1..5}
9) Reexecute same step from 4-9 on other replica set also
10) After check heal status on server it will show dir's are
in split brain on all replica sets
11) After check the replica.split-brain-status attr on mount
point it will show wrong status of split brain.
12) After apply the patch the attribute shows correct value.
BUG: 1368312
Change-Id: Icdfd72005a4aa82337c342762775a3d1761bbe4a
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: http://review.gluster.org/15201
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add events for:
* tier attach and detach
* tier pause and resume
* tier rising and dropping hi and lo watermarks
Update eventskeygen.py with tiering events.
Update cli help with:
* attach: add optional force argument
* detach: make force available as non-optional argument on its own
Change-Id: I43990d3a8742151a4a7889bafa19cb572fe661bd
BUG: 1368336
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/15232
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: As metadata in the database fills up, querying the database
take a long time. As a result, tier migration slows down. To
counteract this, we added a way to enable the compaction methods of
the underlying database. The goal is to reduce the size of the
underlying file by eliminating database fragmentation.
NOTE: There is currently a bug where sometimes a brick will
attempt to activate compaction. This happens even compaction is already
turned on.
The cause is narrowed down to the compact_mode_switch flipping its value.
Changes: libglusterfs/src/gfdb - Added a gfdb function to compact the
underlying database, compact_db() This is a no-op if the database has
no such option.
- Added a compaction function for SQLite3 that does the following
1) Changes the auto_vacuum pragma of the database
2) Compacts the database according to the type of compaction requested
- Compaction type can be changed by changing the macro
GF_SQL_COMPACT_DEF to one of the 4 compaction types in
gfdb_sqlite3.h
It is currently set to GF_SQL_COMPACT_INCR, or incremental
vacuuming.
xlators/cluster/dht/src - Added the following command-line option to
enable SQLite3 compaction.
gluster volume set <vol-name> tier-compact <off|on>
- Added the following command-line option to change the frequency the
hot and cold tier are ordered to compact.
gluster volume set <vol-name> tier-hot-compact-frequency <int>
gluster volume set <vol-name> tier-cold-compact-frequency <int>
- tier daemon periodically sends the (new)
GFDB_IPC_CTR_SET_COMPACT_PRAGMA IPC to the CTR xlator. The IPC
triggers compaction of the database.
The inputs are both gf_boolean_t.
IPC Input:
compact_active: Is compaction currently on for the db.
compact_mode_switched: Did we flip the compaction switch recently?
IPC Output:
0 if the compaction succeeds.
Non-zero otherwise.
xlators/features/changetimerecorder/src/ - When the CTR gets the
compaction IPC, it launches a thread that will perform the
compaction. The IPC ends after the thread is launched. To avoid extra
allocations, the parameters are passed using static variables.
Change-Id: I5e1433becb9eeff2afe8dcb4a5798977bf5ba0dd
Signed-off-by: Diogenes Nunez <dnunez@redhat.com>
Reviewed-on: http://review.gluster.org/15031
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ipc is used by md-cache to communicate the list of xattrs that
it is caching, to the upcall xlator. Hence implement this in
dht, such that it winds to all the bricks if the ipc op is
GF_IPC_MDC_TARGET_UPCALL. The ips should not fail if any of
the bricks is down, as md-cache will replay the ipc late when
the brick comes back up.
Change-Id: Ica551a550c04cbb1240c0d211fe831c2e5eb6017
BUG: 1211863
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/15225
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add test to fail promotion if estimated block consumption grows
beyond hi watermark.
Skip file migrations until next cycle if tier_get_fs_stat() fails
in tier_migrate_using_query_file()
Change-Id: Ice04572fa739c09109c4433e65965197482a7beb
BUG: 1349284
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/14780
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Return the correct size of the tiered volume in statfs. It should
be the size of the cold tier, not the sum of the hot and cold tier,
because the hot tier is a cache and not an extension of the volume's
capacity. The number of free blocks, etc is the cold tier's capacity
subtracted by the sum of utilization on the hot and cold tiers. Note
if both tiers are part of the same file system this must be accounted
for as well.
The patch also fixes a pre-existing bug in the DHT/tier
translators. If statfs was taken on a file, the code only calculated
free space on the cached subvolume, not all subvolumes in the replica
group. With the fix, this is corrected, except in the case
where quota is used with the deem-statfs option set to "on".
Change-Id: I2b8bcb4511edf83f12130960aad0a609fcf8f513
BUG: 1339689
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/14536
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During locking we send lock request to cached subvol,
and normally we unlock to the cached subvol
But with parallel fresh lookup on a directory, there
is a race window where the cached subvol can change
and the unlock can go into a different subvol from
which we took lock.
This will result in a stale lock held on one of the
subvol.
So we will store the details of subvol which we took the lock
and will unlock from the same subvol
Change-Id: I47df99491671b10624eb37d1d17e40bacf0b15eb
BUG: 1311002
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/13492
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With lock-migration, we need to send requests to destination
brick post migration. Once, the source brick marks the lock
structure to be already migrated, the requests will be redirected
to destination brick by dht_lk2/flush2.
Change-Id: I50b14011c5ab68c34826fb7ba7f8c8d42a68ad97
BUG: 1326085
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/13493
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I48c6f9cdda47503615ba65882acd5eedf0a70c89
BUG: 1326085
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/14024
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When we spawn promote and demote thread, query files
are build. And only query file with index 0 is picked for migration
as the first query file. This may not be suitable for scenarios,
where the file in the query are too big to move in the first cycle,
as a result file in the other query files always get missed. We need to
shuffle so that other query files also get a chance.
Fix: Remember the previous first query file and shift it by one index,
before the migration starts.
Change-Id: I704947bcf4bab6b20b1179a6d9ae4a15a3d51bd9
BUG: 1330353
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/14068
Tested-by: Joseph Fernandes
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an ongoing rebalance completion check task been
triggered by dht, there is a possibility of a race
between afr setting subvol as non-readable and dht updates
the cached subvol. In this window a write can fail with EIO.
Change-Id: I42638e6d4104c0dbe893d1bc73e1366188458c5d
BUG: 1329503
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/14049
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I0bbc2c2ef115c78393f6570815a5b80316e7e4be
BUG: 1319992
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/11720
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_mkdir ()
{
first-hashed-subvol = hashed-subvol for "bname" in in-memory
layout of "parent";
inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any
subvol, but we choose first-hashed-subvol randomly");
{
begin:
hashed-subvol = hashed-subvol for "bname" in in-memory
layout of "parent";
hash-range = extract hashe-range from layout of "parent";
ret = mkdir (parent/bname, hashed-subvol, hash-range);
if (ret == "hash-value doesn't fall into layout stored on
the brick (this error is returned by posix-mkdir)")
{
refresh_parent_layout ();
goto begin;
}
}
inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN",
"first-hashed-subvol");
proceed with other parts of dht_mkdir;
}
posix_mkdir (parent/bname, client-hash-range)
{
disk-hash-range = getxattr (parent, "dht-layout-key");
if (disk-hash-range != client-hash-range) {
fail-with-error ("hash-value doesn't fall into layout
stored on the brick");
return 0;
}
continue-with-posix-mkdir;
}
Similar changes need to be done for dentry operations like create,
symlink, link, unlink, rmdir, rename. These will be addressed in
subsequent patches. This patch addresses only mkdir codepath.
This change breaks stripe tests, as on some striped subvols dht layout
xattrs are not set for some reason. This results in failure of
mkdir. Since striped volumes are always created with dht, some tests
associated with stripe also fail. So, I am making following tests
changes (since stripe is out of maintainance):
* modify ./tests/basic/rpc-coverage.t to not to use striped volumes
* mark all (2) tests in tests/bugs/stripe/ as bad tests
Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25
BUG: 1323040
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/13885
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a possibility that while an rmdir is completed on
some non-hashed subvol and proceeding to others, a lookup
selfheal can recreate the same directory on those subvols
for which the rmdir had succeeded. Now the deletion of the
parent directory will fail with an ENOTEMPTY.
To fix this take blocking inodelk on the subvols before
starting rmdir. Selfheal must also take blocking inodelk
before creating the entry.
Change-Id: I168a195c35ac1230ba7124d3b0ca157755b3df96
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/13528
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Spawn a thread for background fix-layout for tier process.
2. Once the fix-layout is completed a marker xttr is set on the root of
volume to mark the completion of the background fixlayout, so that
even if the tier process is spawned again, fixlayout will not be
issued, if it was completed last time.
3. Please note that promotion of legacy files will happen eventually as
the ctr lookup heal in the fixlayout slowly heals the ctr db for legacy
files OR the ctr lookup heal happend due to a name lookup.
4. When a detach tier is successful in evacuation data from hot tier, we remove
the marker xattr is removed. So that next attach tier runs the background
tier fixlayout.
what is remaining ?
1. Instead of clearing the marker xattr of tiering fix layout at the end of detach start
clear it during detach commit. But the issue is detach commit is a glusterd operation
and the volume is not mounted in glusterd.
The reason we want to do it in detach commit is that if the admin wants to attach the
same tier again, then a background fixlayout will be triggered, which would not be needed.
2. Clearing the CTR DB of the cold bricks when there is a detach commit, as it will be having
entries which will be stale when the volume is used, with ctr off (ctr is switched off only when
we have detach commit.)
Change-Id: Ibe343572e95865325cd0eef4d0b976b626a3c0c5
BUG: 1313228
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/13491
Smoke: Gluster Build System <jenkins@build.gluster.com>
Tested-by: Joseph Fernandes
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Directory size is meaningless. Every filesystem has its own
unpredictable way of increasing or decreasing it, based on internal data
structures and even transient conditions. Some filesystems (e.g. ext4)
never decrease it at all. Others (e.g. btrfs) don't even report it.
Very few programs look at it, and those that do are broken.
Unfortunately, one such program is GNU tar, which will complain when it
sees different values because at different times we got the value from
different DHT subvolumes. To avoid such problems, just report a
constant value.
Change-Id: Id64ce917c75b5f7ff50cb55b6e997f3b3556e7e3
BUG: 1302948
Original-author: Shyam <srangana@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13770
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix adds a paramater "tier-max_promote_size" to control wether
a file is migrated or not based on its size. By default the value
is 0, meaning all files are migrated. If set to a non-zero
value, files larger than the parameter won't be moved
in tiered volumes.
Change-Id: Ia6b88e9b2508935bef500d956f9192e59670fe00
BUG: 1313495
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13570
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cli/src/cli-cmd-parser.c (chenk)
cli/src/cli-xml-output.c (spandit)
cli/src/cli.c (chenk)
libglusterfs/src/common-utils.c (vmallika)
libglusterfs/src/gfdb/gfdb_sqlite3.c (jfernand +1)
rpc/rpc-transport/socket/src/socket.c (?)
xlators/cluster/afr/src/afr-transaction.c (?)
xlators/cluster/dht/src/dht-common.h (srangana +2)
xlators/cluster/dht/src/dht-selfheal.c (srangana +2)
xlators/debug/io-stats/src/io-stats.c (R. Wareing)
xlators/features/barrier/src/barrier.c (vshastry)
xlators/features/bit-rot/src/bitd/bit-rot-scrub.h (vshankar +1)
xlators/features/shard/src/shard.c (kdhananj +1)
xlators/mgmt/glusterd/src/glusterd-ganesha.c (skoduri)
xlators/mgmt/glusterd/src/glusterd-handler.c (atinmu)
xlators/mgmt/glusterd/src/glusterd-op-sm.h (atinmu)
xlators/mgmt/glusterd/src/glusterd-snapshot.c (spandit)
xlators/mgmt/glusterd/src/glusterd-syncop.c (atinmu)
xlators/mgmt/glusterd/src/glusterd-volgen.c (atinmu)
xlators/protocol/client/src/client-messages.h (mselvaga +1)
xlators/storage/bd/src/bd-helper.c (M. Mohan Kumar)
xlators/storage/bd/src/bd.c (M. Mohan Kumar)
xlators/storage/posix/src/posix.c (nbalacha +1)
Change-Id: I85934fbcaf485932136ef3acd206f6ebecde61dd
BUG: 1293133
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/13031
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A query to the database may take a long time if the database
has many entries. The tier daemon also sends IPC calls to the
bricks which can run slowly, espcially in RHEL6. While it is
possible to track down each such instance, the snapshot
feature should not be affected by database operations. It requires
no migration be underway. Therefore it is okay to pause tiering
at any time except when DHT is moving a file. This fix implements
this strategy by monitoring when control passes to DHT to
migrate a file using the GF_XATTR_FILE_MIGRATE_KEY trigger. If it
is not, the pause operation is successful.
Change-Id: I21f168b1bd424077ad5f38cf82f794060a1fabf6
BUG: 1287842
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13104
Reviewed-by: Joseph Fernandes
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What:
If dht_open is called on a migrating file after the inode_ctx is set,
subsequent FOPs on that fd do not open the fd on the dst subvol.
This is seen when the open-ftruncate-close sequence is repeatedly
called on a migrating file.
A second call to the sequence described above causes dht_truncate_cbk
to call dht_truncate2 as the dht_inode_ctx was already set by the first
call. As dht_rebalance_in_progress_check is not called, the fd is not
opened on the dst subvol.
On a distributed-replicate volume, this causes AFR to
open the fd using afr_fix_open, but with the wrong flags, causing
posix_ftruncate to fail with EINVAL.
The fix: We require fd specific information to make a decision while
handling migrating files.
Set the fd_ctx to indicate the fd has been opened on the dst subvol
and check if it has been set while processing Phase1/Phase2 checks
in the FOP callback functions.
Change-Id: I43cdcd8017b4a11e18afdd210469de7cd9a5ef14
BUG: 1284823
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12985
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had run sleep() in the pause tier callback. Blocking within
a synctask is dangerous. The sleep() call does not inform
the synctask scheduler that a thread is no longer running.
It therefore believes it is running. If a second synctask already
exists, it may not be able to run. This occurs if the thread
limit in the pool has been reached.
Note the pool size only grows when a synctask is created, not
when it is moved from wait state to run state, as is the case
when an FOP completes. When the tier is paused during migration,
synctasks already exist waiting for responses to FOPs to the
server with high probability.
The fix is to yield() in the RPC callback, which will place
the synctask into the wait queue and free up a thread for the
FOP callback. A timer wakes the callback after sufficient
time has elapsed for the pause to occur.
Change-Id: I6a947ee04c6e5649946cb6d8207ba17263a67fc6
BUG: 1267950
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12987
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
files deleted during promotion were not deleting as the
files are moving from hashed to non-hashed.
On deleting a file that is undergoing promotion,
the unlink call is not sent to the dst file as the
hashed subvol == cached subvol. This causes
the file to reappear once the migration is complete.
This patch also fixes a problem with stale linkfile
deleting.
Change-Id: I4b02a498218c9d8eeaa4556fa4219e91e7fa71e5
BUG: 1282390
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12829
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterd occasionally loads shared libraries of translators. This
failed for tiering due to a reference to dht_methods which is defined
as a global variable which is not necessary.
The global variable has been removed and this is now a member of
dht_conf and is now initialised in the *_init calls.
Change-Id: Ifa0a21e3962b5cd8d9b927ef1d087d3b25312953
BUG: 1287842
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12863
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After a successful nameless lookup if the directory is not
present on any of the subvol, then we will get the path of
the directory and will recursively send a named lookp on
each parent directory.
This will help particularly for the scenarios like add brick
and attach-tier.
Change-Id: I64c2118a5ab03bbaa59b0dfc62babdf4472a92a3
BUG: 1272949
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12376
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Snaps of tiered volumes cannot handle files undergoing migration.
We implement a helper mechanism to "pause" migration. Any files
undergoing migration are aborted. Clean up is done to remove
sticky bits and data at the destination. Migration is restarted
after snap completes.
For testing an internal switch is added. It is not exposed externally.
gluster volume set vol1 tier-pause [true|false]
Change-Id: Ia85bbf89ac142e9b7e73fcbef98bb9da86097799
BUG: 1267950
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12304
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix introduces infrastructure to support different
policies for promotion and demotion.
Currently the tier feature automatically promotes and demotes
files periodically based on access. This is good for testing
but too stringent for most real workloads. It makes it
difficult to fully utilize a hot tier- data will be demoted
before it is touched- its unlikely a 100GB hot SSD will have
all its data touched in a window of time.
A new parameter "mode" allows the user to pick promotion/demotion
polcies.
The "test mode" will be used for *.t and other general testing.
This is the current mechanism.
The "cache mode" introduces watermarks. The watermarks
represent levels of data residing on the hot tier.
"cache mode" policy:
The % the hot tier is full is called P.
Do not promote or demote more than D MB or F files.
A random number [0-100] is called R.
Rules for migration:
if (P < watermark_low) don't demote, always promote.
if (P >= watermark_low) && (P < watermark_hi) demote if R < P; promote if R > P.
if (P > watermark_hi) always demote, don't promote.
gluster volume set {vol} cluster.watermark-hi %
gluster volume set {vol} cluster.watermark-low %
gluster volume set {vol} cluster.tier-max-mb {D}
gluster volume set {vol} cluster.tier-max-files {F}
gluster volume set {vol} cluster.tier-mode {test|cache}
Change-Id: I157f19667ec95aa1d53406041c1e3b073be127c2
BUG: 1257911
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12039
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Determine which DHT level is responsible for
handling fops on a file undergoing migration based
on the name of the the linkto xattr set on the file
being migrated and process accordingly.
Change-Id: I82772e39314d4fe7f2ba0dcf22de0c6a374ee139
BUG: 1254428
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12090
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lookup selfheal race
Locking on all subvols before an rmdir is unable to remove all
directory entries. Hence reverting the patch for now.
Change-Id: I31baf2b2fa2f62c57429cd44f3f229c35eff1939
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/12125
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a possibility that while an rmdir is completed on
some non-hashed subvol and proceeding to others. A lookup
selfheal can recreate the same directory on those subvols
for which the rmdir had succeeded. The fix is to take a
blocking inodelk on the subvols before starting rmdir.
Since selfheal requires lock on all subvols, if an rmdir
is in progess acquiring locks will fail and vice versa.
Change-Id: I841a44758c3b88f5e04d1cb73ad36e0cac9fdabb
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/11725
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|