| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Fixes: #1438
Change-Id: If9f1c2e89e504512bcf77608fb4a4fedb19f0399
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Brick paths in Volinfo used `:` as delimiter, Geo-rep uses split
based on `:` char. This will go wrong with IPv6.
This patch handles the IPv6 case and handles the split properly.
Fixes: #1366
Change-Id: I25e88d693744381c0ccf3c1dbf1541b84be2499d
Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The issue is being hit during hybrid mode while handling rename on slave.
In this special case the rename is recorded as mkdir and geo-rep process it
by resolving the path form backend.
While resolving the backend path during this special handling one corner case is not considered.
<snip>
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 588, in entry_ops
src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 710, in get_slv_dir_path
dir_entry = os.path.join(pfx, pargfid, basename)
File "/usr/lib64/python2.7/posixpath.py", line 75, in join
if b.startswith('/'):
AttributeError: 'int' object has no attribute 'startswith'
In pyhthon3:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/lib64/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'int'
</snip>
Change-Id: I8b926899c60ad8c4ffc886d57028ba70fd21e332
Fixes: #1250
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Geo-replication has a large number of repeated imports as shown below:
```
from syncdutils import set_term_handler, finalize, lf
from syncdutils import log_raise_exception, FreeObject, escape
```
There imports can be clubbed together as shown below:
``
from syncdutils import (set_term_handler, finalize, lf,
log_raise_exception, FreeObject, escape)
```
Fixes: #1105
Change-Id: I59a48dd57a70fc851d93150b85e736ce41e8b793
Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch now you can notice log if it is due to EIO:
[2020-03-16 16:24:48.293837] E [syncdutils(worker /bricks/brick1/mbr3):348:log_raise_exception] <top>: Getting "Input/Output error" is most likely due to a. Brick is down or b. Split brain issue.
[2020-03-16 16:24:48.293915] E [syncdutils(worker /bricks/brick1/mbr3):352:log_raise_exception] <top>: This is expected as per design to keep the consistency of the file system. Once the above issue is resolved geo-rep would automatically proceed further.
Change-Id: Ie33f2440bc96089731ce12afa8dab91d9550a7ca
Fixes: #1104
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
problem: Geo-rep gsyncd process mounts the master and slave volume
on master nodes and slave nodes respectively and starts
the sync. But it doesn't wait for the mount to be in ready
state to accept I/O. The gluster mount is considered to be
ready when all the distribute sub-volumes is up. If the all
the distribute subvolumes are not up, it can cause ENOTCONN
error, when lookup on file comes and file is on the subvol
that is down.
solution: Added a Virtual Xattr "dht.subvol.status" which returns "1"
if all subvols are up and "0" if all subvols are not up.
Geo-rep then uses this virtual xattr after a fresh mount, to
check whether all subvols are up or not and then starts the
I/O.
fixes: bz#1664335
Change-Id: If3ad01d728b1372da7c08ccbe75a45bdc1ab2a91
Signed-off-by: Harpreet Kaur <hlalwani@redhat.com>
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When gsyncd failed with tracebacks during start, it prints
only exception object but not the error string. This patch
adds the error string as well.
Earlier:
"failed with ImportError"
Now:
"failed with ImportError: No module named _io."
fixes: bz#1771895
Change-Id: I0d772a250d4c2010a0c35053aa7b165b71f8434e
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
- libgfchangelog is simplified by removing unnecessary API Class
- Merged Agent logic into Worker instead of running Worker and Agent as
two separate processes and maintaining RPC between Worker and Agent.
- Geo-rep command Pause and Resume will continue without any changes.
But Agent functionality also gets paused with that.
Updates: #755
Change-Id: Ie2c00fa7dddf21f180f0649e0aaf084d29023c98
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
While syncing rename of directory in hybrid crawl, geo-rep
crashes as below.
Traceback (most recent call last):
File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/local/libexec/glusterfs/python/syncdaemon/resource.py", line 588, in entry_ops
src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 687, in get_slv_dir_path
[ENOENT], [ESTALE])
File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap
return call(*arg)
PermissionError: [Errno 13] Permission denied: '/bricks/brick1/b1/.glusterfs/8e/c0/8ec0fcd4-d50f-4a6e-b473-a7943ab66640'
Cause:
Conversion of gfid to path for a directory uses readlink on backend
.glusterfs gfid path. But this fails for non root user with
permission denied.
Fix:
Use gfid2path interface to get the path from gfid
Change-Id: I9d40c713a1b32cea95144cbc0f384ada82972222
fixes: bz#1763439
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The bug[1] addresses issue of data inconsistency when handling RENAME with
existing destination. This fix requires some performance tuning considering
this issue occurs in heavy rename workload.
Solution:
If distribution count for master volume is one do not verify op's on
master and go ahead with rename.
The performance improvement with this patch can only be observed if
master volume has distribution count one.
[1]. https://bugzilla.redhat.com/show_bug.cgi?id=1694820
fixes: bz#1753857
Change-Id: I8e9bcd575e7e35f40f9f78b7961c92dee642f47b
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Structured logging format changed in the below patch for better
readability and parsing. This patch changes the Geo-replication logs
to that format.
https://review.gluster.org/#/c/glusterfs/+/22987/
Before:
```
[2019-08-21 04:23:01.13672] I [monitor(monitor):157:monitor] Monitor: \
starting gsyncd worker<TAB>brick=/bricks/b1<TAB>slave_node=f29.sonne
```
After:
```
[2019-08-21 04:23:01.13672] I [monitor(monitor):157:monitor] Monitor: \
starting gsyncd worker [{brick=/bricks/b1}, {slave_node=f29.sonne}]
```
Change-Id: I65ec69a2869e2febeedc558f88620399e4a1a6a4
Updates: #657
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
gluster command not found.
Cause:
In Volinfo class we issue command 'gluster vol info' to get information
about volume like getting brick_root to perform various operation.
When geo-rep session is configured for non-root user Volinfo class
fails to issue gluster command due to unavailability of gluster
binary path for non-root user.
Solution:
Use config value 'slave-gluster-command-dir'/'gluster-command-dir' to get path
for gluster command based on caller.
fixes: bz#1722740
Change-Id: I4ec46373da01f5d00ecd160c4e8c6239da8b3859
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
- Fixed Relative import and non-package import related issues.
- socketserver import issues fix
- Renamed installed directory name to `gfevents` from `events`(To
avoid any issues with other global libs)
Fixes: bz#1679406
Change-Id: I3dc38bc92b23387a6dfbcc0ab8283178235bf756
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When geo-rep is configured as hybrid crawl
directory renames are not synced to the slave.
Solution: Rename sync of directory was failing due to incorrect
destination path calculation.
During check for existence on slave we miscalculated
realpath. <host:brickpath/dir>.
Change-Id: I23f1ea60e86a917598fe869d5d24f8da654d8a0a
fixes: bz#1665826
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
|
|
|
|
|
|
|
|
|
| |
1. scheduler - Popen
2. syncdutils - corner case on failure
fixes: bz#1643932
Change-Id: I65af97a244a8790e976acedc2728db6ebbf2ae10
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. ctypes/syscalls
A) arguments is expected to be encoded
B) Raw conversion of return value from bytearray into string
2. struct pack/unpack - Raw converstion of string to bytearray
3. basestring -> str
Updates: #411
Change-Id: I80f939adcdec0ed0022c87c0b76d057ad5559e5a
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
| |
1. Fix fdopen used for pid file
2. Fix sha256 checksum calculation
Updates: #411
Change-Id: Ic173d104a73822c29aca260ba6de872cd8d23f86
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The file objects for python3 by default is opened
in binary mode where as in python2 it's opened
as text by default.
The geo-rep code parses the output of Popen assuming
it as text, hence used the 'universal_newlines' flag
which provides backward compatibility for the same.
Change-Id: I371a03b6348af9666164cb2e8b93d47475431ad9
Updates: #411
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'os.pipe' returns pair of file descriptors
which are non-inheritable by child processes.
But geo-rep uses te inheritable nature of
pipe fds to communicate between parent and
child processes. Hence wrote a compatiable
pipe routine which works well both with python2
and python3 with inheritable nature.
Updates: #411
Change-Id: I869d7a52eeecdecf3851d44ed400e69b32a612d9
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Please review, it's not always just the comments that were fixed.
I've had to revert of course all calls to creat() that were changed
to create() ...
Only compile-tested!
Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. MKDIR/RMDIR is recorded on all bricks. So if
one brick succeeds creating it, other bricks
should ignore it. But this was not happening.
The fix rename of directories in hybrid crawl,
was trying to rename the directory to itself
and in the process crashing with ENOENT if the
directory is removed.
2. If file is created, deleted and a directory is
created with same name, it was failing to sync.
Again the issue is around the fix for rename
of directories in hybrid crawl. Fixed the same.
If the same case was done with hardlink present
for the file, it was failing. This patch fixes
that too.
fixes: bz#1598884
Change-Id: I6f3bca44e194e415a3d4de3b9d03cc8976439284
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Geo-rep mounts are private to worker. It uses
mount namespace using unshare command to achieve
the same. Well, the unshare command has to support
'--propagation' option. So geo-rep breaks on the
systems with older unshare version. The patch
makes it fall back to lazy umount behaviour if
the unshare does not support propagation option.
fixes: bz#1589782
Change-Id: Ia614f068aede288d63ac62fea4461b1865066054
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
see https://review.gluster.org/#/c/19788/,
https://review.gluster.org/#/c/19871/, and
https://review.gluster.org/#/c/19952/
This patch adds version agnostic imports for urllib, cpickle,
socketserver, _thread, queue, etc., suggested by Aravinda in
https://review.gluster.org/#/c/19767/1
Note: Fedora packaging guidelines require explicit shebangs, so
popular practices like #!/usr/bin/env python and #!/usr/bin/python
are not allowed; they must be #!/usr/bin/python2 or #!/usr/bin/python3
Note: Selected small fixes from 2to3 utility. Specifically apply,
basestring, funcattrs, idioms, numliterals, set_literal, types, urllib,
and zip have already been applied.
Note: these 2to3 fixes report no changes are necessary: exec, execfile,
exitfunc, filter, getcwdu, intern, itertools, metaclass, methodattrs, ne,
next, nonzero, operator, paren, raw_input, reduce, reload, renames, repr,
standarderror, sys_exc, throw, tuple_params, xreadlines.
Change-Id: I8d393064a1837874d8b4bc87c8ce05c679664642
updates: #411
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lazy umounting the master volume by worker causes
issues with rsync's usage of getcwd. Henc removing
the lazy umount and using private mount namespace
for the same. On the slave, the lazy umount is
retained as we can't use private namespace in non
root geo-rep setup.
Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a
BUG: 1546129
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once Geo-replication is started, it runs Gluster commands to get Volume
info from Master and Slave. With this patch, Georep can get Volume info
from Conf file if `--use-gconf-volinfo` argument is specified to monitor
Create a config(Or add to the config if exists) with following fields
[vars]
master-bricks=NODEID:HOSTNAME:PATH,..
slave-bricks=NODEID:HOSTNAME,..
master-volume-id=
slave-volume-id=
master-replica-count=
master-disperse_count=
Note: Exising Geo-replication is not affected since this is activated
only when `--use-gconf-volinfo` is passed while spawning `gsyncd
monitor`
Tiering support is not yet added since Tiering + Glusterd2 is still
under discussion.
Fixes: #396
Change-Id: I281baccbad03686c00f6488a8511dd6db0edc57a
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
| |
BUG: 1529480
Change-Id: If4775ed9886990c0e1bcf4e44c7dfef95cc4f0c3
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MD5 is not fips compliant. Hence replacing
with SHA256.
NOTE:
The hash is used to form the ctl_path for the ssh connection.
The length of ctl_path for ssh connection should not be > 108.
ssh fails with ctl_path too long if it is so. But when rsync
is piped to ssh, it is not taking > 90. rsync is failing with
error number 12. Hence using first 32 bytes of hash. Hash
collision doesn't matter as only one sock file is created
per directory.
Change-Id: I58aeb32a80b5422f6ac0188cf33fbecccbf08ae7
Updates: #230
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fixed Python pep8 issues
- Removed dead code
- Rewritten configuration management
- Rewritten Arguments/subcommands handling
- Added Args upgrade to accommodate all these changes without changing
glusterd code
- use of md5 removed, which was used to hash the brick path for workdir
Both Master and Slave nodes will have subdir for session in the
format "<mastervol>_<primary_slave_host>_<slavevol>
$GLUSTER_LOGDIR/geo-replication/<mastervol>_<primary_slave_host>_<slavevol>
$GLUSTER_LOGDIR/geo-replication-slaves/<mastervol>_<primary_slave_host>_<slavevol>
Log file paths renamed since session info is available with directory
name itself.
$LOG_DIR_MASTER/
- gsyncd.log - Gsyncd, Worker monitor logs
- mnt-<brick-path>.log - Aux mount logs, mounted by each worker
- changes-<brick-path>.log - Changelog related logs(One per brick)
$LOG_DIR_SLAVE/
- gsyncd.log - Slave Gsyncd logs
- mnt-<master-node>-<master-brick-path>.log - Aux mount logs,
mounted for each connection from master-node:master-brick
- mnt-mbr-<master-node>-<master-brick-path>.log - Same as above,
but mountbroker setup
Fixes: #73
Change-Id: I2ec2a21e4e2a92fd92899d026e8543725276f021
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In hybrid crawl, renames and unlink can't be
synced but directory renames can be detected.
While syncing the directory on slave, if the
gfid already exists, it should be rename.
Hence if directory gfid already exists, rename
it.
Change-Id: Ibf9f99e76a3e02795a3c2befd8cac48a5c365bb6
BUG: 1499566
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If there is a hardlink to a symlink on master
and if the symlink file is deleted on master,
geo-rep fails to sync the hardlink.
Typical Usecase:
It's easily hit with rsnapshot use case where
it uses hardlinks.
Example Reproducer:
Setup geo-replication between master and slave
volume and in master mount point, do the following.
1. mkdir /tmp/symlinkbug
2. ln -f -s /does/not/exist /tmp/symlinkbug/a_symlink
3. rsync -a /tmp/symlinkbug ./
4. cp -al symlinkbug symlinkbug.0
5. ln -f -s /does/not/exist2 /tmp/symlinkbug/a_symlink
6. rsync -a /tmp/symlinkbug ./
7. cp -al symlinkbug symlinkbug.1
Cause:
If the source was not present while syncing hardlink,
it was always packing the blob as regular file.
Fix:
If the source was not present while syncing hardlink,
pack the blob based on the mode.
Change-Id: Iaa12d6f99de47b18e0650e7c4eb455f23f8390f2
BUG: 1432046
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reported-by: Christian Lohmaier <lohmaier+rhbz@gmail.com>
Reviewed-on: https://review.gluster.org/18011
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libgfchangelog was encoding path using spec rfc3986, but encoding only
required for SPACE and NEWLINE chars since the NEWLINE char is used as
record separator and SPACE as field separator in the parsed changelogs
output.
Changed the encoding function to encode only SPACE and NEWLINE.
BUG: 1451724
Change-Id: I1936efad31788a9e636f912c832ed7d7efea4fe2
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: https://review.gluster.org/17787
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changed all log messages to structured log format
Change-Id: Idae25f8b4ad0bbae38f4362cbda7bbf51ce7607b
Updates: #240
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: https://review.gluster.org/17551
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flag: --ignore-missing-args
This Rsync flag reduces sync failures if the source file is
unlinked but present in --files-from list. This reduces
Rsync retries in Geo-rep and improves the performance
Flag: --existing
Rsync in Geo-rep never creates target files. Using RPC Geo-rep creates
entry in Slave and rsync --inplace used to prevent creating temporary file
and rename.(To avoid different GFID in Slave). If the entry is missing in
Slave then Geo-rep Rsync gets Permission denied errors when it tries to
create file with name as GFID inside .gfid dir.(Geo-rep rsync syncs data
using GFIDS with aux-gfid-mount)
To disable these flags,
gluster volume geo-replication <session> config \
rsync-opt-ignore-missing-args false
gluster volume geo-replication <session> config \
rsync-opt-existing false
Thanks Kotresh for finding these awesome tunables.
BUG: 1400924
Change-Id: I6a84fb86a589bf6edc8dfd1086456a84b05a64fc
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: https://review.gluster.org/16010
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On corner cases, mount cleanup might cause
worker crash. Fixing the same.
Change-Id: I38c0af51d10673765cdb37bc5b17bb37efd043b8
BUG: 1433506
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: https://review.gluster.org/17015
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EBUSY was added to retry list of errno_wrap
without importing. Fixing the same.
Change-Id: Ide81a9ccc9b948a96265b6890da078b722b45d51
BUG: 1434018
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: https://review.gluster.org/17011
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not crash on EBUSY error. Add EBUSY
retry errno list. Crash only if the error
persists even after max retries.
Change-Id: Ia067ccc6547731f28f2a315d400705e616cbf662
BUG: 1434018
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: https://review.gluster.org/16924
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to improve debuggability, it is important
to have access to geo-rep master and slave mounts.
With the default behaviour, geo-rep lazy unmounts
the mounts after changing the current working
directory into the mount point. It also cleans
up the mount points. So only geo-rep worker has
the access and it becomes impossible to take the
client profile info and do any other client statck
analysis. Hence the following new config is being
introduced to allow access to mounts.
gluster vol geo-rep <mastervol> <slavehost>::<slavevol> \
config access_mount true
The default value of 'access_mount' is false.
Change-Id: I53dce4ea86a6ffc979c82f9330e8954327180ca3
BUG: 1433506
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: https://review.gluster.org/16912
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To spawn workers for each local brick, Geo-rep was collecting all
the machine IPs based on hostname and finds based on the connectivity.
With this patch, Geo-rep finds local brick if host UUID matches with
UUID of the brick from Volume info.
BUG: 1401801
Change-Id: Ic83c65df89e43cb86346e3ede227aa84d17ffd79
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/16035
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added following events
EVENT_GEOREP_ACTIVE
{
"nodeid": NODEID,
"ts": TIMESTAMP,
"event": "GEOREP_ACTIVE",
"message": {
"master_volume": MASTER_VOLUME_NAME,
"slave_host": SLAVE_HOST,
"slave_volume": SLAVE_VOLUME,
"brick_path": BRICK_PATH
}
}
EVENT_GEOREP_PASSIVE
{
"nodeid": NODEID,
"ts": TIMESTAMP,
"event": "GEOREP_PASSIVE",
"message": {
"master_volume": MASTER_VOLUME_NAME,
"slave_host": SLAVE_HOST,
"slave_volume": SLAVE_VOLUME,
"brick_path": BRICK_PATH
}
}
EVENT_GEOREP_CHECKPOINT_COMPLETED
{
"nodeid": NODEID,
"ts": TIMESTAMP,
"event": "GEOREP_ACTIVE",
"message": {
"master_volume": MASTER_VOLUME_NAME,
"slave_host": SLAVE_HOST,
"slave_volume": SLAVE_VOLUME,
"brick_path": BRICK_PATH,
"checkpoint_time": CHECKPOINT_TIME,
"checkpoint_completion_time": CHECKPOINT_COMPLETION_TIME
}
}
BUG: 1379330
Change-Id: I90716175868c59dd65c8d202e73e0ede90347b6a
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/15630
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Tested-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If glusterfs-events rpm is not installed, Geo-replication will
fail since it imports eventtypes.
Any call to gsyncd will fail with Import error. Glusterd start
fails since it runs `gsyncd.py --version`
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py",
line 29, in <module>
from syncdutils import FreeObject, norm, grabpidfile, finalize
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 28, in <module>
from events import eventtypes
ImportError: No module named events
BUG: 1378057
Change-Id: I1a9bc086c3d52449ec7296cb2f9ceb16cd41a8a4
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/15539
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libgfchangelog was not respecting the log_level configured
in Geo-replication. With this patch Libgfchangelog log level
can be configured using `config changelog_log_level TRACE`.
Default Changelog log level is INFO
BUG: 1363965
Change-Id: Ida714931129f6a1331b9d0815da77efcb2b898e3
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/15078
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Event Type defined in #15351 to avoid merge conflicts
Add geo-rep events applicable to changes in
geo-rep session in the server side.
Change-Id: Ia66574d2abccad7fce6a96667efbc7c6c8903fc6
BUG: 1370445
Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-on: http://review.gluster.org/15328
Tested-by: Aravinda VK <avishwan@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, Data and Meta GFIDs are post processed. If Changelog has
UNLINK entry then remove from Data and Meta GFIDs list(If stat on GFID is
ENOENT in Master).
While processing Changelogs,
- Collect all the data and meta operations in a temporary database
- Delete all Data and Meta GFIDs which are already unlinked as per Changelogs
(unlink only if stat on GFID is ENOENT)
- Process all Entry operations as usual
- Process data and meta operations in batch(Fetch from Db in batch)
- Data sync is again batched based on number of changelogs(Default 1day
changelogs). Once the sync is complete, Update last Changelog's time as last_synced
time as usual.
Additionally maintain entry_stime on Brick root, ignore Entry ops if changelog
suffix time is less than entry_stime. If data stime is more than entry_stime,
this can happen only when passive worker updates stime by itself by getting
mount point stime. Use entry_stime = data_stime in this case.
New configurations:
max-rsync-retries - Default Value is 10
max-data-changelogs-in-batch - Max number of changelogs to be considered in a
batch for syncing. Default value is 5760(4 changelogs per min * 60 min *
24 hours)
max-history-changelogs-in-batch - Max number of history changelogs to be
processed at once. Default value 86400(4 changelogs per min * 60 min * 24
hours * 15 days)
BUG: 1364420
Change-Id: I7b665895bf4806035c2a8573d361257cbadbea17
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/15110
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add error logs if gf_history_changelog fails. If requested
changelog range is not available, log the error and exit
instead of continuing the loop and exiting in readdir
without logging. Also fixed the duplicate MSGID number in
'changelog-lib-messages.h'
Change-Id: Icd71b89ae23b48a71380657ba5649029c32fabfd
BUG: 1362151
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/15064
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If ssh returns 127 that means the remote gsyncd path is wrong
or push-pem failed during create. Existing error message was
pointing old documentation.
Change-Id: Ifbbb4a604fc0ae0fd5cb2746df6363bf28cde1e9
BUG: 1343943
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/14673
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handle ESTALE returned by lstat gracefully
by retrying it. Do not crash the worker.
Change-Id: I2527cd8bd1f7d2428cb4fa3f20782bebaf2df12a
BUG: 1247529
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/11772
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While doing RMDIR worker gets ENOTEMPTY because same directory will
have files from other bricks which are not deleted since that worker
is slow processing. So geo-rep does recursive_delete.
Recursive delete was done using shutil.rmtree. once started, it will
not check disk_gfid in between. So it ends up deleting the new files
created by other workers. Also if other worker creates files after one
worker gets list of files to be deleted, then first worker will again
get ENOTEMPTY again.
To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA.
And disk_gfid check added for original path for which recursive_delete is
called. This disk gfid check executed before every Unlink/Rmdir. If disk
gfid is not matching with GFID from Changelog, that means other worker
deleted the directory. Even if the subdir/file present, it belongs to
different parent. Exit without performing further deletes.
Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR
failed with ENOENT will not succeed unless parent directory is created
again.
Rsync errors handling was handling unlinked_gfids_list only for one
Changelog, but when processed in batch it fails to detect unlinked_gfids
and retries again. Finally skips the entire Changelogs in that batch.
Fixed this issue by moving self.unlinked_gfids reset logic before batch
start and after batch end.
Most of the Geo-rep races with rm -rf is eliminated with this patch,
but in some cases stale directories left in some bricks and in mount
point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log)
BUG: 1211037
Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/10204
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ENTRY operations failures on slave left no trace for debugging purposes.
This patch captures such failures on slave cluster and forwards them to
the master and logs them. Failures of specific interest are the ones
which return code EEXIST on the failing operations.
Change-Id: Iecab876f16593c746d53f4b7ec2e0783367856bb
BUG: 1207115
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/10048
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Making geo-rep use the common storage shared by nfs,
snapshot and geo-rep. The meta volume should be named
as gluster_shared_storage, and it should be mounted
at "/var/run/gluster/shared_storage/".
geo-rep will have create a directory called 'geo-rep'
in the meta-volume and all the lock files are created
inside it.
Change-Id: I82d0bff9be191f75f643606a9a21d53559047ac4
BUG: 1210344
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/10196
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CURRENT DESIGN AND ITS LIMITATIONS:
-----------------------------------
Geo-replication syncs changes across geography using changelogs captured
by changelog translator. Changelog translator sits on server side just
above posix translator. Hence, in distributed replicated setup, both
replica pairs collect changelogs w.r.t their bricks. Geo-replication
syncs the changes using only one brick among the replica pair at a time,
calling it as "ACTIVE" and other non syncing brick as "PASSIVE".
Let's consider below example of distributed replicated setup where
NODE-1 as b1 and its replicated brick b1r is in NODE-2
NODE-1 NODE-2
b1 b1r
At the beginning, geo-replication chooses to sync changes from NODE-1:b1
and NODE-2:b1r will be "PASSIVE". The logic depends on virtual getxattr
'trusted.glusterfs.node-uuid' which always returns first up subvolume
i.e., NODE-1. When NODE-1 goes down, the above xattr returns NODE-2 and
that is made 'ACTIVE'. But when NODE-1 comes back again, the above xattr
returns NODE-1 and it is made 'ACTIVE' again. So for a brief interval of
time, if NODE-2 had not finished processing the changelog, both NODE-2
and NODE-1 will be ACTIVE causing rename race as mentioned in the bug.
SOLUTION:
---------
1. Have a shared replicated storage, a glusterfs management volume specific
to geo-replication.
2. Geo-rep creates a file per replica set on management volume.
3. fcntl lock on the above said file is used for synchronization
between geo-rep workers belonging to same replica set.
4. If management volume is not configured, geo-replication will back
to previous logic of using first up sub volume.
Each worker tries to lock the file on shared storage, who ever wins will
be ACTIVE. With this, we are able to solve the problem but there is an
issue when the shared replicated storage goes down (when all replicas
goes down). In that case, the lock state is lost. So AFR needs to rebuild the
lock state after brick comes up.
NOTE:
-----
This patch brings in the, pre-requisite step of setting up management volume
for geo-replication during creation.
1. Create mgmt-vol for geo-replicatoin and start it. Management volume should
be part of master cluster and recommended to be three way replicated
volume having each brick in different nodes for availability.
2. Create geo-rep session.
3. Configure mgmt-vol created with geo-replication session as follows.
gluster vol geo-rep <mastervol> slavenode::<slavevol> config meta_volume \
<meta-vol-name>
4. Start geo-rep session.
Backward Compatiability:
-----------------------
If management volume is not configured, it falls back to previous logic of
using node-uuid virtual xattr. But it is not recommended.
Change-Id: I7319d2289516f534b69edd00c9d0db5a3725661a
BUG: 1196632
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/9759
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|