| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DHT fd based fops used to check if the fd was open
on the cached subvol before winding the call. However,
this introduced a performance regression of about
30% for reads.
This check was introduced to handle cases where files
were migrated while IOs were happening. As this is not
the common case, dht will now check if the fd is
open on the cached subvol only if the call fails
with EBADF.
This will prevent a performance hit where a rebalance
is not running.
> BUG: 1476665
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17976
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Reviewed-by: Susant Palai <spalai@redhat.com>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
(cherry picked from commit cdca1cb26a0aba390c6d8485c0d6d95e22ffc8bd)
Change-Id: I2035a858d63c3fcd22bb634055bbb0ad01686808
BUG: 1479303
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17995
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
> Change-Id: Id91ef35f890055cd42b9a94462f92297c77f1fff
> Bug: 1475282
> Signed-off-by: Susant Palai <spalai@redhat.com>
> Reviewed-on: https://review.gluster.org/17868
> Tested-by: Raghavendra G <rgowdapp@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Signed-off-by: Susant Palai <spalai@redhat.com>
Change-Id: Id91ef35f890055cd42b9a94462f92297c77f1fff
Bug: 1477152
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17943
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To calculate available space on a subvolume we used to do
the following in __dht_check_free_space.
post_availspace = (dst_statfs.f_bavail * dst_statfs.f_frsize) - stbuf->ia_size
Now to subtracting the file size from available space is tricky here.
Sometime available space will be lesser than the file size and since all the
participating members in calculation are unsigned int, the result is a large
number (integer overflow).
Solution: We do not need to subtract the file size from the space available,
since fallocate would have reserved file size space already.
> Change-Id: I4f724358c44b9911933742ff3ff8d55b3dfda1cb
> BUG: 1475282
> Signed-off-by: Susant Palai <spalai@redhat.com>
> Reviewed-on: https://review.gluster.org/17876
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
> Reviewed-by: N Balachandran <nbalacha@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Signed-off-by: Susant Palai <spalai@redhat.com>
Change-Id: I4f724358c44b9911933742ff3ff8d55b3dfda1cb
BUG: 1477152
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17942
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Corrected the iterator for looping over the list of
decommissioned bricks while checking if the new target
determined because of min-free-disk values has been
decommissioned.
> BUG: 1474318
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17861
> Reviewed-by: Susant Palai <spalai@redhat.com>
(cherry picked from commit 8c3e766fe0a473734e8eca0f70d0318a2b909e2e)
Change-Id: Iee778547eb7370a8069e954b5d629fcedf54e59b
BUG: 1475181
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17872
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size of non-migrated files was not added to the
size_processed causing incorrect rebalance estimate
calculations. This has been fixed.
> BUG: 1467209
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17867
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
(cherry picked from commit 24ab0ef44a1646223b59e33d0109d8424f8eddd0)
Change-Id: I9f338c44da22b856e9fdc6dc558f732ae9a22f15
BUG: 1475192
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17873
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The calculation of the rebalance estimates will start
after the rebalance operation has been running for 10
minutes. This patch also changes the cli rebalance status
code to use unsigned variables for the time calculations.
> BUG: 1457985
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: https://review.gluster.org/17863
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
(cherry picked from commit e21c915679244ddc1fae886e52badf02b4d95efc)
Change-Id: Ic76f517c59ad938a407f1cf5e3b9add571690a6c
BUG: 1475399
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17882
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every time all the thread sleeps or wakes up, we log a message
about that event. Sometime this can be noisy where the number of
files eligible to be migrated are placed far away from each other.
Moving the logs to DEBUG.
> Change-Id: I4dc2cc9fdf4f42d4001754532a5bc4aeb3f0f959
> BUG: 1474639
> Signed-off-by: Susant Palai <spalai@redhat.com>
> Reviewed-on: https://review.gluster.org/17866
> Smoke: Gluster Build System <jenkins@build.gluster.org>
> Reviewed-by: Amar Tumballi <amarts@redhat.com>
> Reviewed-by: N Balachandran <nbalacha@redhat.com>
> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
> Signed-off-by: Susant Palai <spalai@redhat.com>
Change-Id: I4dc2cc9fdf4f42d4001754532a5bc4aeb3f0f959
BUG: 1475662
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17893
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The local->call_cnt was being accessed and updated inside
the loop where the entries were being processed and the calls
were being wound.
This could end up in a scenario where the local->call_cnt became
0 before the processing was complete causing the crash when the
next entry was being processed.
Change-Id: I930f61f1a1d1948f90d4e58e80b7d6680cf27f2f
BUG: 1472949
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17825
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set names to threads on creation for easier
debugging.
Output of top -H -p <PID-OF-GLUSTERFSD>
Before:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
After:
19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer
19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd
19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep
19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0
19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1
19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0
19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker
19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0
19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign
19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker
19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon
19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0
19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1
19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2
19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan
19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy
25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1
5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2
7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc
Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703
BUG: 1254002
Updates: #271
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: https://review.gluster.org/11926
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A brief about how hardlink migration works:
- Different hardlinks (to the same file) may hash to different bricks,
but their cached subvol will be same. Rebalance picks up the first hardlink,
calculates it's hash(call it TARGET) and set the hashed subvolume as an xattr
on the data file.
- Now all the hardlinks those come after this will fetch that xattr and will
create linkto files on TARGET (all linkto files for the hardlinks will be hardlink
to each other on TARGET).
- When number of hardlinks on source is equal to the number of hardlinks on
TARGET, the data migration will happen.
RACE:1
Since rebalance is multi-threaded, the first lookup (which decides where the TARGET
subvol should be), can be called by two hardlink migration parallely and they may end
up creating linkto files on two different TARGET subvols. Hence, hardlinks won't be
migrated.
Fix: Rely on the xattr response of lookup inside gf_defrag_handle_hardlink since it
is executed under synclock.
RACE:2
The linkto files on TARGET can be created by other clients also if they are doing
lookup on the hardlinks. Consider a scenario where you have 100 hardlinks. When
rebalance is migrating 99th hardlink, as a result of continuous lookups from other
client, linkcount on TARGET is equal to source linkcount. Rebalance will migrate data
on the 99th hardlink itself. On 100th hardlink migration, hardlink will have TARGET as
cached subvolume. If it's hash is also the same, then a migration will be triggered from
TARGET to TARGET leading to data loss.
Fix: Make sure before the final data migration, source is not same as destination.
RACE:3
Since a hardlink can be migrating to a non-hashed subvolume, a lookup from other
client or even the rebalance it self, might delete the linkto file on TARGET leading
to hardlinks never getting migrated.
This will be addressed in a different patch in future.
Change-Id: If0f6852f0e662384ee3875a2ac9d19ac4a6cea98
BUG: 1469964
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17755
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the target of a file migration was changed because
of min-free-disk limits, the dst_fd was closed but the
clean_dst flag was not set to false. If the file could
not be created on the new target for some reason, the
ftruncate call to clean up the dst was sent on the now
invalid fd causing the process to deadlock.
Change-Id: I5bfa80f519b04567413d84229cf62d143c6e2f04
BUG: 1469029
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17735
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a another race between the cached subvol
being updated in the inode_ctx and the fd being opened on
the target.
1. fop1 -> fd1 -> subvol0
2. file migrated from subvol0 to subvol1 and cached_subvol
changed to subvol1 in inode_ctx
3. fop2 -> fd1 -> subvol1 [takes new cached subvol]
4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1
5. fop1 -> checks fd ctx (fd not open on subvol0)
-> tries to open fd1 on subvol0 -> fails with "No such file on directory".
Fix:
If dht_fd_open_on_dst fails with ENOENT or ESTALE, wind to old subvol
and let the phase1/phase2 checks handle it.
Change-Id: I34f8011574a8b72e3bcfe03b0cc4f024b352f225
BUG: 1465075
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17731
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The earlier approach of using the number of files
to determine when the rebalance would complete did
not work well when file sizes differed widely.
The new approach now gets the total data size and
uses that information to determine how long
the rebalance is expected to take.
Change-Id: I84e80a0893efab72ff06130e4596fa71c9c8c868
BUG: 1467209
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17668
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Plus minor readability improvements.
Reported-by: pmatthaei@debian.org
Change-Id: I5393819a2fc9f240a19811143bb57b127df717cf
BUG: 1466785
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/17660
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a local variable to store the call count
in the STACK_WIND for loop. Using frame->local
is dangerous as it could be freed while the loop
is still being processed
Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e
BUG: 1466110
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17645
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Nigel Babu <nigelb@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an fd is opened on a file, the file is migrated
and the cached subvol is updated in the inode_ctx
before an fd based fop is sent, the fop is sent to
the dst subvol on which the fd is not opened.
This causes the FOP to fail with EBADF.
Now, every fd based fop will check to see that the fd
has been opened on the dst subvol before winding it down.
Change-Id: Id92ef5eb7a5b5226688e2d2868b15e383f5f240e
BUG: 1465075
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17630
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Afr has introduced a new key GF_XATTR_LIST_NODE_UUIDS_KEY,
through which rebalance will figure out its local subvolumes.(Reference
bugid=1463250)
key: GF_XATTR_NODE_UUID_KEY will continue to serve it's old
purpose of returning the first afr chiild.
test: prove tests/basic/distribute/rebal-all-nodes-migrate.t
Change-Id: I4d602feda2a05b29d2210c712a07a4ac6b8bc112
BUG: 1463648
Signed-off-by: Susant Palai <spalai@redhat.com>
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17595
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rebalance used to get the file count in the beginning
and not update it. This caused estimates to fail
if the number changed during the rebalance.
The rebalance now updates the file count periodically.
Change-Id: I1667ee69e8a1d7d6bc6bc2f060fad7f989d19ed4
BUG: 1464110
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17607
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a similar issue fixed for rename in https://review.gluster.org/#/c/16016/
For hardlinks, if cached and hashed subvolumes are different, then it will first
create linkto file in hashed using root permission, but actually hardlink creation
fails with EACESS and stale linkto file is never removed.All the followup hardlink
calls with file name will result ESTALE because linktofile creation fails with EEXIST
and follow up lookup on linkto file returns gfid-mismatching(old linkto file) and
finally fails with ESTALE
Steps to produce :
(From link/00.t test from posix-testsuite)
Steps executed in script
* create a file "abc" using root
* change the ownership of file to a non root user
* create hardlink "link" for "abc" using a non root user, it fails with EACESS
* delete "abc"
* create directory "abc" using root
* again try to create hadrlink "link" for "abc" using non root user, fails with ESTALE
Also tried to fix other bugs in dht_linkfile_create_cbk() and posix_lookup.
Thanks Susant for the help in debugging the issue and suggestion for this patch.
Change-Id: I7a5a1899d3fd1fdb13578b37f9d52a084492e35d
BUG: 1452084
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-on: https://review.gluster.org/17331
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many calls in gf_msg where errno
is needed to pass as an argument instead of
strerrno(error)
Change-Id: I15048a5e0b41f9752a2023afe8470eca6f2cd383
Bug: 1454701
Signed-off-by: AnkitRaj <anraj@redhat.com>
Reviewed-on: https://review.gluster.org/17464
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: N Balachandran <nbalacha@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Karthik U S <ksubrahm@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rebalance estimates calculation was not handling
calculations correctly when no files had been processed,
i.e., when rate_lookedup was 0.
Now, the estimated time is set to 0 in such scenarios as
there is no way for rebalance to figure out how long the
process will take to complete without knowing the rate at
which the files are being processed.
Change-Id: I7b6378e297e1ba139852bcb2239adf2477336b5b
BUG: 1457985
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17564
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updates #234
Change-Id: I016f6d4f1e5ad2ea56a611c1bffbd189f10650db
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: https://review.gluster.org/17525
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: hari gowtham <hari.gowtham005@gmail.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Empty directories were not being considered while
calculating rebalance estimates leading to negative
time-left values being displayed as part of the
rebalance status.
Change-Id: I48d41d702e72db30af10e6b87b628baa605afa98
BUG: 1457985
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17448
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes for various minor spelling errors and typos
Reported-by: Patrick Matthäi <pmatthaei@debian.org>
Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5
BUG: 1457808
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/17442
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_readdirp must unwind with list of entries only after
the entire buffer requested by kernel is filled to avoid
extra syscalls occuring when returning partially filled
buffer. Also wind readdir call to next subvol on reaching
EOD for directory on that subvol to avoid extra network call.
Change-Id: If2e1a2722f813d95457c7542bff25fef56c7a041
BUG: 1356453
Signed-off-by: Sakshi <sabansal@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: https://review.gluster.org/12271
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On demand migration of files i.e. migration done by clients
triggered by a setfattr was broken.
Dependency on defrag led to crash when migration was triggered from
client.
Note: This functionality is not available for tiered volumes. Migration
from tier served client will fail with ENOTSUP.
usage (But refer to the steps mentioned below to avoid any issues) :
setfattr -n "trusted.distribute.migrate-data" -v "1" <filename>
The purpose of fixing the on-demand client migration was to give a
workaround where the user has lots of empty directories compared to
files and want to do a remove-brick process.
Here are the steps to trigger file migration for remove-brick process from
client. (This is highly recommended to follow below steps as is)
Let's say it is a replica volume and user want to remove a replica pair
named brick1 and brick2. (Make sure healing is completed before you run
these steps)
Step-1: Start remove-brick process
- gluster v remove-brick <volname> brick1 brick2 start
Step-2: Kill the rebalance daemon
- ps aux | grep glusterfs | grep rebalance\/ | awk '{print $2}' | xargs kill
Step-3: Do a fresh mount as mentioned here
- glusterfs -s ${localhostname} --volfile-id rebalance/$volume-name /tmp/mount/point
Step-4: Go to one of the bricks (among brick1 and brick2)
- cd <brick1 path>
Step-5: Run the following command.
- find . -not \( -path ./.glusterfs -prune \) -type f -not -perm 01000 -exec bash -c 'setfattr -n "distribute.fix.layout" -v "1" ${mountpoint}/$(dirname '{}')' \; -exec setfattr -n "trusted.distribute.migrate-data" -v "1" ${mountpoint}/'{}' \;
This command will ignore the linkto files and empty directories. Do a fix-layout of
the parent directory. And trigger a migration operation on the files.
Step-6: Once this process is completed do "remove-brick force"
- gluster v remove-brick <volname> brick1 brick2 force
Note: Use the above script only when there are large number of empty directories.
Since the script does a crawl on the brick side directly and avoids directories those
are empty, the time spent on fixing layout on those directories are eliminated(even if the script
does not do fix-layout on empty directories, post remove-brick a fresh layout will be built
for the directory, hence not affecting application continuity).
Detailing the expectation for hardlink migartion with this patch:
Hardlink is migrated only for remove-brick process. It is highly essential
to have a new mount(step-3) for the hardlink migration to happen. Why?:
setfattr operation is an inode based operation. Since, we are doing setfattr from
fuse mount here, inode_path will try to build path from the linked dentries to the inode.
For a file without hardlinks the path construction will be correct. But for hardlinks,
the inode will have multiple dentries linked.
Without fresh mount, inode_path will always get the most recently linked dentry.
e.g. if there are three hardlinks named dir1/link1, dir2/link2, dir3/link3, on a client
where these hardlinks are looked up, inode_path will always return the path dir3/link3
if dir3/link3 was looked up most recently. Hence, we won't be able to create linkto
files for all other hardlinks on destination (read gf_defrag_handle_hardlink for more details
on hardlink migration).
With a fresh mount, the lookup and setfattr become serialized. e.g. link2 won't be
looked up until link1 is looked up and migrated. Hence, inode_path will always have the correct
path, in this case link1 dentry is picked up(as this is the most recently looked up inode) and
the path is built right.
Note: If you run the above script on an existing mount(all entries looked up), hard links may
not be migrated, but there should not be any other issue. Please raise a bug, if you find any
issue.
Tests: Manual
Change-Id: I9854cdd4955d9e24494f348fb29ba856ea7ac50a
BUG: 1450975
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17115
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Self heal directory code path doesn't always
have local->hashed_subvol populated. Populating
the same which otherwise would fail the self
heal.
Change-Id: I03b64709fd7a68e28f9e7438243e817c53c6ef5d
BUG: 1455104
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: https://review.gluster.org/17381
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a local variable to store the call cnt used in the
for loop for the STACK_WIND so as not to access local
which may be freed by STACK_UNWIND after all fops return.
Change-Id: I24f49b6dbd29a2b706e388e2f6d5196c0f80afc5
BUG: 1452102
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17343
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normal value were different in dht_init and dht_reconfigure.
Initialization/reconfigure of throttle option are carved out to a separate function
(dht_configure_throttle) now. Normal value will be "2".
Change-Id: Ie323eae019af41d6bef0a136e3d284dc82bab9a1
BUG: 1451162
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17303
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Rebalance compares the node-uuid of a file against its own
to and migrates a file only if they match. However, the
current behaviour in both AFR and EC is to return
the node-uuid of the first brick in a replica set for all
files. This means a single node ends up migrating all
the files if the first brick of every replica set is on the
same node.
Fix:
AFR and EC will return all node-uuids for the replica set.
The rebalance process will divide the files to be migrated
among all the nodes by hashing the gfid of the file and
using that value to select a node to perform the migration.
This patch makes the required DHT and tiering changes.
Some tests in rebal-all-nodes-migrate.t will need to be
uncommented once the AFR and EC changes are merged.
Change-Id: I5ce41600f5ba0e244ddfd986e2ba8fa23329ff0c
BUG: 1366817
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17239
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using local->call_cnt to check STACK_WINDs can
cause dht_rmdir_do to be called erroneously if
dht_rmdir_readdirp_cbk unwinds before we check if
local->call_cnt is zero in dht_rmdir_opendir_cbk.
This can cause frame corruptions and crashes.
Thanks to Shyam (srangana@redhat.com) for the
analysis.
Change-Id: I5362cf78f97f21b3fade0b9e94d492002a8d4a11
BUG: 1451083
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17305
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tier_readdirp_cbk updates the cached subvol to
the hot tier if it finds a linkto file.
However, if no lookup has been sent to the hot tier,
lower layers will not have updated the inode-ctx causing
later fops to fail.
Change-Id: Ib8a5e58a6e7fd7750cf6a0ea85da611aa24c7512
BUG: 1402406
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/16163
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@gmail.com>
Tested-by: Dan Lambright <dlambrig@gmail.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed an incorrect return code check in the rebalance
code.
Change-Id: I60804ff121cec7a2f0419e2ee70dd22ea7533c0c
BUG: 1448640
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17197
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverty rightfully note that if we verify that
A =! C or A != B, it will always be true.
In one case, that prevent healing from continuing. In the other,
that trigger useless logs. Fixing this bug also show that
ENOSPC shouldn't abort the rebalance operation, as seen during
the review of the first patch on
https://review.gluster.org/#/c/16676/1/xlators/cluster/dht/src/dht-rebalance.c
Change-Id: I93c4df43b880b211da202a7e49cef6b1ce7ab68f
BUG: 1424817
Signed-off-by: Michael Scherer <misc@redhat.com>
Reviewed-on: https://review.gluster.org/16676
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Tested-by: Jeff Darcy <jeff@pl.atyp.us>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current rebalance throttle options: lazy/normal/aggressive may not always be
sufficient for the purpose of throttling. In our recent test, we observed for
certain setups, normal and aggressive modes behaved similarly consuming full
disk bandwidth. So in cases like this admin should be able to tune it
down(or vice versa) depending on the need.
Along with old throttle configurations, thread counts are tuned based on number.
e.g. gluster v set vol-name cluster-rebal.throttle 5.
Admin can tune up/down between 0 and the number of cores available.
Note: For heterogenous servers, validation will fail on the old server if "number"
is given for throttle configuration.
The message looks something like this:
"volume set: failed: Staging failed on vm2. Error: cluster.rebal-throttle should be {lazy|normal|aggressive}"
Test: Manual test by logging active thread number after reconfiguring throttle option.
testcase: tests/basic/distribute/throttle-rebal.t
Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3
BUG: 1438370
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16980
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inside rename, a lookup is done on the source name to make sure that
the file is there. But we used to do a gfid based lookup and hence,
even if the source name was renamed to a new name from some other client,
lookup will be successful as server3_3_lookup will fetch the new path
based on the gfid.
So even if the source file does not exist any more rename will carry on,
and as server3_3_link(destination is hashed to a different brick other
than source cached scenario) also does gfid based resolve, it wont
detect that the source name does not exist and hardlink creation will be
successful (since gfid based resolve will get the new dentry).
To solve this problem, do a name based lookup inside rename. So that
rename will fail right away if the source does not exist.
Change-Id: Ieba8bdd6675088dbf18de90ed4622df043d163bd
BUG: 1412135
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16375
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Throttle settings "normal" and "aggressive" for rebalance
did not have performance difference.
normal mode spawns $(no. of cores - 4)/2 threads and aggressive
spawns $(no. of cores - 4) threads. Though aggressive mode has twice
the number of threads compared to that of normal mode, there was no
performance gain when switched to aggressive mode from normal mode.
RCA:
During the course of debugging the above problem, we tried assigning
migration job to migration threads spawned by rebalance, rather than
synctasks(as there is more overhead associated to manage the task
queue and threads). This gave us a significant improvement over rebalance
under synctasks. This patch does not really gurantee that there will be a
clear performance difference between normal and aggressive mode, but this
patch certainly maximized the disk utilization for 1GBfiles run.
Results:
Test enviroment:
Gluster Config:
Number of Bricks: 2 (one brick per disk(RAID-6 12 disk))
Bricks:
Brick1: server1:/brick/test1/1
Brick2: server2:/brick/test1/1
Options Reconfigured:
performance.readdir-ahead: on
server.event-threads: 4
client.event-threads: 4
1000 files with 1GB each were created/renamed such that all files will have
server1 as cached and server2 as hashed, so that all files will be migrated.
Test machines had 24 cores each.
Results with/without synctask based migration:
-----------------------------------------------
mode normal(10threads) aggressive(20threads)
timetaken 0:55:30 (h:m:s) 0:56:3 (h:m:s)
withsynctask
timetaken
with migrator 0:38:3 (h:m:s) 0:23:41 (h:m:s)
threads
From above table it can be seen that, there is a clear 2x perf gain between
rebalance with synctask vs rebalance with migrator threads.
Additionally this patch modifies the code so that caller will have the exact error
number returned by dht_migrate_file(earlier the errno meaning was overloaded). This
will help avoiding scenarios where migration failure due to ENOENT, can result in
rebalance abort/failure.
Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e
BUG: 1420166
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16427
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Id84bc87e48f435573eba3b24d3fb3c411fd2445d
BUG: 1440051
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: https://review.gluster.org/17126
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removing redundant logs were introduced in
https://review.gluster.org/#/c/17065/
Change-Id: I0d6055488b51a13c91d2121e87f653cdb94888b0
BUG: 1445590
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17118
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Susant Palai <spalai@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Design doc: https://review.gluster.org/16876
Directory creation is now synchronized with blocking inodelk of the
parent on the hashed subvolume followed by the entrylk on the hashed
subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup
selfheal mkdir.
To maintain internal consistency of directories across all subvols of
dht, we need locks. Specifically we are interested in:
1. Consistency of layout of a directory. Only one writer should modify
the layout at a time. A writer (layout setting during directory heal
as part of lookup) shouldn't modify the layout while there are
readers (all other fops like create, mkdir etc., which consume
layout) and readers shouldn't read the layout while a writer is in
progress. Readers can read the layout simultaneously. Writer takes
a WRITE inodelk on the directory (whose layout is being modified)
across ALL subvols. Reader takes a READ inodelk on the directory
(whose layout is being read) on ANY subvol.
2. Consistency of directory namespace across subvols. The path and
associated gfid should be same on all subvols. A gfid should not be
associated with more than one path on any subvol. All fops that can
change directory names (mkdir, rmdir, renamedir, directory creation
phase in lookup-heal) takes an entrylk on hashed subvol of the
directory.
NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a
directory, the transaction itself is a consumer of layout on
parent directory. So, the transaction is a reader of parent
layout and does an inodelk on parent directory just like any
other layout reader. So a mkdir (dir/subdir) would:
> Acquire a READ inodelk on "dir" on any subvol.
> Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir".
> creates directory on hashed subvol and possibly on non-hashed subvols.
> UNLOCK (entrylk)
> UNLOCK (inodelk)
NOTE2: mkdir fop while setting the layout of the directory being created
is considered as a reader, but NOT a writer. The reason is for
a fop which can consume the layout of a directory to come either
of the following conditions has to be true:
> mkdir syscall from application has to complete. In this case no
need of synchronization.
> A lookup issued on the directory racing with mkdir has to complete.
Since layout setting by a lookup is considered as a writer, only
one of either mkdir or lookup will set the layout.
Code re-organization:
All the lock related routines are moved to "dht-lock.c" file.
New wrapper function is introduced to take blocking inodelk
followed by entrylk 'dht_protect_namespace'
Updates #191
Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31
Signed-off-by: Kotresh HR <khiremat@redhat.com>
BUG: 1443373
Reviewed-on: https://review.gluster.org/15472
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With rebalance doing fallocate on destination, we don't need to
add file size to the "destination available space" to decide whether
to migrate the file or not.
Notes: Fallocate would have already occupied the file size space on
destination
Change-Id: If7f6a6654e6257726680cf20d618482a6e9095a6
BUG: 1441508
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17104
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug was causing VMs to pause during rebalance. When qemu winds
down a STAT, shard fills the trusted.glusterfs.shard.file-size attribute
in the req dict which DHT doesn't wind its STAT fop with upon detecting
the file has undergone migration. As a result shard doesn't find the
value to this key in the unwind path, causing it to fail the STAT
with EINVAL.
Also, the same bug exists in other fops too, which is also fixed in
this patch.
Change-Id: Id7823fd932b4e5a9b8779ebb2b612a399c0ef5f0
BUG: 1440051
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: https://review.gluster.org/17085
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rm -rf <dir> fails with ENOENT if dir contains a lot of
stale linkto files. This is because a single
readdirp is sent as part of the rmdir which would return
and delete only as many linkto files on the bricks as would fit
in one readdirp buffer. Running rm -rf <dir> multiple times
will eventually delete all the files. The fix sends readdirp
on each subvol until no more entries are returned.
Change-Id: I447f2d193de4bd8ac16e4541c6b919d22250e39e
BUG: 1442724
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17065
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
criteria happens to be the same subvol containing data-file
Rebalance need to figure out a new subvol in case the hashed subvol
does not have enough space. In the process of figuring out the new subvol,
we need to ignore the source subvol, otherwise it will lead to data loss.
Test: Manual
Ran the following
sizeof /tmp/1: 1.5GB
sizeof /brick/1: 16GB
sizeof /tmp/2: 1.5GB
<start>
glusterd; gluster v create test1 vm1:/brick/1 vm1:/tmp/1;
gluster v start test1;
mount -t glusterfs vm1:test1 /mnt;
for i in {1..2000}
do
dd if=/dev/zero of=/mnt/file$i bs=1KB count=1 &> /dev/null;
done
gluster v add-brick test1 vm1:/tmp/2
gluster v set test1 min-free-disk 12GB
gluster v remove-brick test1 vm1:/tmp/1 star
<end>
file count and data were intact.
Change-Id: Ib8fc8467a3d48a7c12958824c4f0b88e160b86c1
BUG: 1441508
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17064
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I6adce98f52e17953f501bc590ff7189cceac3c31
BUG: 1431908
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: https://review.gluster.org/17057
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
test: Manual
created files of size 1K on 2 brick(of size 1GB) setup .
added a brick of size 16GB.
set min-free-disk to 12GB(so that first two bricks won't receive any files).
removed one of the 1st brick of size 1GB.
Logs from test:
[2017-04-12 08:52:08.196484] W [MSGID: 0] [dht-rebalance.c:895:__dht_check_free_space]
0-test1-dht: Write will cross min-free-disk for file - /tile32 on subvol - test1-client-1.
Looking for new subvol.
[2017-04-12 08:52:08.196904] I [MSGID: 0] [dht-rebalance.c:925:__dht_check_free_space]
0-test1-dht: new target found - test1-client-2 for file - /tile32
- Post migration we have two files. The new destination (/brick/1) has the data file
[root@vm1 ~]# ll /brick/1/tile32
-rw-r--r--. 2 root root 0 Apr 12 14:22 /brick/1/tile32
- On the old target the linkto file is there with linkto xattr pointing to /brick/1
[root@vm1 ~]# ll /tmp/2/tile32
---------T. 2 root root 1000 Apr 12 14:22 /tmp/2/tile32
[root@vm1 ~]# getfattr -m . -de text /tmp/2/tile32
getfattr: Removing leading '/' from absolute path names
security.selinux="unconfined_u:object_r:user_tmp_t:s0"
trusted.gfid="����:Aс�#�/'b2"
trusted.glusterfs.dht.linkto="test1-client-2"
Marking ./tests/features/worm_sh.t as bad test.
Reason being, this patch failed on master branch as well and it has nothing
to do with rebalance/remove-brick.
BUG: 1441508
Change-Id: I90bae251cda3d957a49cdceda90cd08311a392fb
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/17034
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The job of the crawler in rebalance is to fetch files from each
local subvolume and push them to migration queue if it is eligible for
migration. And we do a lookup on the entries received to figure out the
eligibilty. Since, the lookup done is on a local subvolume we receive
linkto files and regular files as well. This requires us to do two lookups.
first: do a lookup on the file to figure out whether it is a linkto file
second: do a lookup on the file to figure out if it should be migrated
Note: The migrator thread also does one lookup for the file before
migration.
Optimization: Remove the lookup done by the crawler. Offload these task
to the migrator threads. For linkto file verification get the stat and
xattr information from readdirp.
So in total we have one lookup instead of three for each entry.
Performance numbers:
Create two node, two brick setup. Created 100000 files. And started
rebalance. Since, there is no add-brick, no files will be migrated and
we will get the crawler performance.
Without patch:
[root@gprfs039 ~]# grs
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 0 0Bytes
50070 0 0 completed 0:0:48
server2 0 0Bytes
49930 0 0 completed 0:0:44
volume rebalance: test1: success
Total: 48 seconds
WiththecurrentPatch:
[root@gprfs039 mnt]# gluster v rebalance test1 status
Node Rebalanced-files size
scanned failures skipped status run time in
h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------
--------------
localhost 0 0Bytes
50070 0 0 completed 0:0:12
server2 0 0Bytes
49930 0 0 completed 0:0:12
volume rebalance: test1: success
Total: 12 seconds
That's 4X speed gain. :)
Updates glusterfs#155
Change-Id: Idc8e5b366e76c54aa40d698876ae62fe1630b6cc
BUG: 1439571
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/15781
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As readdir-ahead can be loaded as a child of dht, dht has to specify
the xattrs it is intrested in, as part of opendir call itself.
Change-Id: I012ef96cc143b0cef942df78aa7150d85ec38606
BUG: 1431908
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: https://review.gluster.org/16902
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
local->loc.gfid in dht_lookup_directory() will be null-gfid for a fresh lookup.
dht_lookup_dir_cbk() updates local->loc.gfid while in other thread dht_lookup_directory()
is still winding lookup calls to subvolumes so there is a chance of partial gfid being
seen by EC.
We saw in 12x(4+2) volume, ec is receiving an loc where the gfid has last 10 bytes matching
with the gfid of the directory and the first 4 bytes are all-zeros. This is leading to EC
erroring out the lookup with EINVAL which leads to NFS failing lookup with EIO.
snip from gdb:
$37 = (dht_local_t *) 0x7fde5de5b3cc
(gdb) p /x $37->loc.gfid
$39 = {0x3b, 0x82, 0x10, 0x5e, 0x40, 0x65, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5,
0x6c, 0x2c, 0xb8, 0x56}
(gdb) fr 7
state=<optimized out>) at ec-generic.c:837
837 ec_lookup_rebuild(fop->xl->private, fop, cbk);
(gdb) p /x fop->loc[0].gfid
$40 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c,
0x2c, 0xb8, 0x56}
snip from log:
[2017-01-29 03:22:30.132328] W [MSGID: 122019]
[ec-helpers.c:354:ec_loc_gfid_check] 0-butcher-disperse-4: Mismatching GFID's
in loc [2017-01-29 03:22:30.132709] W [MSGID: 112199]
[nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3:
/linux-4.9.5/Documentation => (XID: b27b9474, MKDIR: NFS: 5(I/O error), POSIX:
5(Input/output error)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid
00000000-0000-0000-0000-000000000000, mountid
00000000-0000-0000-0000-000000000000 [Invalid argument]
Fix:
update local->loc.gfid in last-call to make sure there are no races.
BUG: 1438411
Change-Id: Ifcb7e911568c1f1f83123da6ff0cf742b91800a0
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/16986
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
function "rebalance_task_completion" did not honor returned values other
than -1 and 1. This led to ignoring errno e.g in the current bug,
ENOTSUP returned by __is_file_migratable and it returned
EINVAL(op_errno initialized) to it's caller leading to failure messages
logged in ERROR level.
Change-Id: I45549fba35c72b278539269b750768fd89e0faf2
BUG: 1393338
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/15810
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|