summaryrefslogtreecommitdiffstats
path: root/geo-replication/syncdaemon/resource.py
Commit message (Collapse)AuthorAgeFilesLines
* [geo-rep] Merging multiple import into single importkshithijiyer2020-04-081-14/+14
| | | | | | | | | | | | | | | | | | Geo-replication has a large number of repeated imports as shown below: ``` from syncdutils import set_term_handler, finalize, lf from syncdutils import log_raise_exception, FreeObject, escape ``` There imports can be clubbed together as shown below: `` from syncdutils import (set_term_handler, finalize, lf, log_raise_exception, FreeObject, escape) ``` Fixes: #1105 Change-Id: I59a48dd57a70fc851d93150b85e736ce41e8b793 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* geo-rep: Fix for "Transport End Point not connected" issueHarpreet Kaur2020-01-311-0/+11
| | | | | | | | | | | | | | | | | | | | | | problem: Geo-rep gsyncd process mounts the master and slave volume on master nodes and slave nodes respectively and starts the sync. But it doesn't wait for the mount to be in ready state to accept I/O. The gluster mount is considered to be ready when all the distribute sub-volumes is up. If the all the distribute subvolumes are not up, it can cause ENOTCONN error, when lookup on file comes and file is on the subvol that is down. solution: Added a Virtual Xattr "dht.subvol.status" which returns "1" if all subvols are up and "0" if all subvols are not up. Geo-rep then uses this virtual xattr after a fresh mount, to check whether all subvols are up or not and then starts the I/O. fixes: bz#1664335 Change-Id: If3ad01d728b1372da7c08ccbe75a45bdc1ab2a91 Signed-off-by: Harpreet Kaur <hlalwani@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: Merge Worker and Agent as a single processAravinda VK2019-11-071-19/+9
| | | | | | | | | | | | - libgfchangelog is simplified by removing unnecessary API Class - Merged Agent logic into Worker instead of running Worker and Agent as two separate processes and maintaining RPC between Worker and Agent. - Geo-rep command Pause and Resume will continue without any changes. But Agent functionality also gets paused with that. Updates: #755 Change-Id: Ie2c00fa7dddf21f180f0649e0aaf084d29023c98 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix spelling errorsShwetha K Acharya2019-09-241-1/+1
| | | | | | Fixes: bz#1741779 Change-Id: I708b6b7e6c520dee10445528e6f99ba38e141c25 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* geo-rep: performance improvement while syncing renames with existing gfidSunny Kumar2019-09-231-2/+11
| | | | | | | | | | | | | | | | | | | | Problem: The bug[1] addresses issue of data inconsistency when handling RENAME with existing destination. This fix requires some performance tuning considering this issue occurs in heavy rename workload. Solution: If distribution count for master volume is one do not verify op's on master and go ahead with rename. The performance improvement with this patch can only be observed if master volume has distribution count one. [1]. https://bugzilla.redhat.com/show_bug.cgi?id=1694820 fixes: bz#1753857 Change-Id: I8e9bcd575e7e35f40f9f78b7961c92dee642f47b Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* geo-rep: Fix sync hang with tarsshKotresh HR2019-05-131-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Geo-rep sync hangs when tarssh is used as sync engine at heavy workload. Analysis and Root cause: It's found out that the tar process was hung. When debugged further, it's found out that stderr buffer of tar process on master was full i.e., 64k. When the buffer was copied to a file from /proc/pid/fd/2, the hang is resolved. This can happen when files picked by tar process to sync doesn't exist on master anymore. If this count increases around 1k, the stderr buffer is filled up. Fix: The tar process is executed using Popen with stderr as PIPE. The final execution is something like below. tar | ssh <args> root@slave tar --overwrite -xf - -C <path> It was waiting on ssh process first using communicate() and then tar. Note that communicate() reads stdout and stderr. So when stderr of tar process is filled up, there is no one to read until untar via ssh is completed. This can't happen and leads to deadlock. Hence we should be waiting on both process parallely, so that stderr is read on both processes. Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf fixes: bz#1707728 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix sync-method configKotresh HR2019-05-091-3/+4
| | | | | | | | | | | | | | | | | | Problem: When 'use_tarssh' is set to true, it exits with successful message but the default 'rsync' was used as sync-engine. The new config 'sync-method' is not allowed to set from cli. Analysis and Fix: The 'use_tarssh' config is deprecated with new config framework and 'sync-method' is the new config to choose sync-method i.e. tarssh or rsync. This patch fixes the 'sync-method' config. The allowed values are tarssh and rsync. Change-Id: I0edb0319cad0455b29e49f2f08a64ce324735e84 fixes: bz#1707686 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix rename with existing destination with same gfidSunny Kumar2019-04-261-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Geo-rep fails to sync the rename properly if destination exists. It results in source to be remained on slave causing more number of files on slave. Also heavy rename workload like logrotate caused lot of ESTALE errors Cause: Geo-rep fails to sync rename if destination exists if creation of source file also falls into single batch of changelogs being processed. This is because, after fixing problematic gfids verifying from master, while re-processing original entries, CREATE also was re-processed causing more files on slave and rename to be failed. Solution: Entries need to be removed from retrial list after fixing problematic gfids on slave so that it's not re-created again on slave. Also treat ESTALE as EEXIST so that the error is properly handled verifying the op on master volume. Change-Id: I50cf289e06b997adddff0552bf2466d9201dd1f9 fixes: bz#1694820 Signed-off-by: Kotresh HR <khiremat@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* geo-rep: Fix syncing multiple rename of symlinkKotresh HR2019-03-291-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Geo-rep fails to sync rename of symlink if it's renamed multiple times if creation and rename happened successively Worker crash at slave: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", in entry_ops [ESTALE, EINVAL, EBUSY]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", in lsetxattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory Geo-rep Behaviour: 1. SYMLINK doesn't record target path in changelog. So while syncing SYMLINK, readlink is done on master to get target path. 2. Geo-rep will create destination if source is not present while syncing RENAME. Hence while syncing RENAME of SYMLINK, target path is collected from destination. Cause: If symlink is created and renamed multiple times, creation of symlink is ignored, as it's no longer present on master at that path. While symlink is renamed multiple times at master, when syncing first RENAME of SYMLINK, both source and destination is not present, hence target path is not known. In this case, while creating destination directly at slave, regular file attributes were encoded into blob instead of symlink, causing failure in gfid-access translator while decoding blob. Solution: While syncing of RENAME of SYMLINK, when target is not known and when src and destination is not present on the master, don't create destination. Ignore the rename. It's ok to ignore. If it's unliked, it's fine. If it's renamed to something else, it will be synced then. Change-Id: Ibdfa495513b7c05b5370ab0b89c69a6802338d87 fixes: bz#1693648 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep : fix rename sync on hybrid crawlSunny Kumar2019-01-171-0/+2
| | | | | | | | | | | | | | Problem: When geo-rep is configured as hybrid crawl directory renames are not synced to the slave. Solution: Rename sync of directory was failing due to incorrect destination path calculation. During check for existence on slave we miscalculated realpath. <host:brickpath/dir>. Change-Id: I23f1ea60e86a917598fe869d5d24f8da654d8a0a fixes: bz#1665826 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* geo-rep: Fix syncing of files with non-ascii filenamesKotresh HR2018-12-041-43/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Creation of files/directories with non-ascii names fails to sync to the slave. It crashes with below traceback on slave. ... File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 709, in entry_ops [ESTALE, EINVAL, EBUSY]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap return call(*arg) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 83, in lsetxattr cls.raise_oserr() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 38, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory Cause: The length calculation arguments passed to blob creation was done before encoding. Hence was failing in gfid-access layer. Fix: It appears that the calculating lenght properly fixes this issue. But it will cause issues in other places in 'python2' and not in 'python3'. So encoding and decoding each required string to make geo-rep compatible with both 'python2' and 'python3' is a nightmare and is not fool proof. Hence kept 'python2' code as is with out encode/decode and applied encode/decode only to 'python3' Added non-ascii filename tests to regression fixes: bz#1650893 Change-Id: I35cfaf848e07b1a0b5cb93c01b98b472f08271a6 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix traceback with symlink metadata syncKotresh HR2018-11-061-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | While syncing metadata, 'os.chmod', 'os.chown', 'os.utime' should be used without de-reference. But python supports only 'os.chown' without de-reference. That's mostly because Linux doesn't support 'chmod' on symlink file itself but it does support 'chown'. So while syncing metadata ops, if it's symlink we should only sync 'chown' and not do 'chmod' and 'utime'. It will lead to tracebacks with errors like EROFS, EPERM, ACCESS, ENOENT. All the three errors (EPERM, ACCESS, ENOENT) were handled except EROFS. But the way it was handled was not fool proof. The operation is tried and failure was handled based on the errors. All the errors with symlink file for 'chown', 'utime' had to be passed to safe errors list of 'errno_wrap'. This patch handles it better by avoiding 'chmod' and 'utime' if it's symlink file. fixes: bz#1646104 Change-Id: Ic354206455cdc7ab2a87d741d81f4efe1f19d77d Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix issue in gfid-conflict-resolutionKotresh HR2018-10-261-17/+24
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: During gfid-conflict-resolution, geo-rep crashes with 'ValueError: list.remove(x): x not in list' Cause and Analysis: During gfid-conflict-resolution, the entry blob is passed back to master along with additional information to verify it's integrity. If everything looks fine, the entry creation is ignored and is deleted from the original list. But it is crashing during removal of entry from the list saying entry not in list. The reason is that the stat information in the entry blob was modified and sent back to master if present. Fix: Send back the correct stat information for gfid-conflict-resolution. fixes: bz#1642865 Change-Id: I47a6aa60b2a495465aa9314eebcb4085f0b1c4fd Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: python2 to python3 compat - syscallsKotresh HR2018-10-081-6/+15
| | | | | | | | | | | | 1. ctypes/syscalls A) arguments is expected to be encoded B) Raw conversion of return value from bytearray into string 2. struct pack/unpack - Raw converstion of string to bytearray 3. basestring -> str Updates: #411 Change-Id: I80f939adcdec0ed0022c87c0b76d057ad5559e5a Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: python2 to python3 compatibility-mount writeKotresh HR2018-10-021-3/+9
| | | | | | | | | | python3 expects byte string for os.write. This works for both py2 and py3. Fixed the same for geo-rep mount testing code path. Updates: #411 Change-Id: I2dfedcb0869457707bcca4d2847ef0d52bff1987 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: python3 compatibility (Popen)Kotresh HR2018-09-251-7/+12
| | | | | | | | | | | | | | The file objects for python3 by default is opened in binary mode where as in python2 it's opened as text by default. The geo-rep code parses the output of Popen assuming it as text, hence used the 'universal_newlines' flag which provides backward compatibility for the same. Change-Id: I371a03b6348af9666164cb2e8b93d47475431ad9 Updates: #411 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: Fix python3 compatibility (os.pipe)Kotresh HR2018-09-251-2/+2
| | | | | | | | | | | | | | 'os.pipe' returns pair of file descriptors which are non-inheritable by child processes. But geo-rep uses te inheritable nature of pipe fds to communicate between parent and child processes. Hence wrote a compatiable pipe routine which works well both with python2 and python3 with inheritable nature. Updates: #411 Change-Id: I869d7a52eeecdecf3851d44ed400e69b32a612d9 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-221-2/+2
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* geo-rep: Fix issues with gfid conflict handlingKotresh HR2018-07-201-23/+34
| | | | | | | | | | | | | | | | | | | | | | | 1. MKDIR/RMDIR is recorded on all bricks. So if one brick succeeds creating it, other bricks should ignore it. But this was not happening. The fix rename of directories in hybrid crawl, was trying to rename the directory to itself and in the process crashing with ENOENT if the directory is removed. 2. If file is created, deleted and a directory is created with same name, it was failing to sync. Again the issue is around the fix for rename of directories in hybrid crawl. Fixed the same. If the same case was done with hardlink present for the file, it was failing. This patch fixes that too. fixes: bz#1598884 Change-Id: I6f3bca44e194e415a3d4de3b9d03cc8976439284 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix symlink rename syncing issueKotresh HR2018-07-121-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Geo-rep sometimes fails to sync the rename of symlink if the I/O is as follows 1. touch file1 2. ln -s "./file1" sym_400 3. mv sym_400 renamed_sym_400 4. mkdir sym_400 The file 'renamed_sym_400' failed to sync to slave Cause: Assume there are three distribute subvolume (brick1, brick2, brick3). The changelogs are recorded as follows for above I/O pattern. Note that the MKDIR is recorded on all bricks. 1. brick1: ------- CREATE file1 SYMLINK sym_400 RENAME sym_400 renamed_sym_400 MKDIR sym_400 2. brick2: ------- MKDIR sym_400 3. brick3: ------- MKDIR sym_400 The operations on 'brick1' should be processed sequentially. But since MKDIR is recorded on all the bricks, The brick 'brick2/brick3' processed MKDIR first before 'brick1' causing out of order syncing and created directory sym_400 first. Now 'brick1' processed it's changelog. CREATE file1 -> succeeds SYMLINK sym_400 -> No longer present in master. Ignored RENAME sym_400 renamed_sym_400 While processing RENAME, if source('sym_400') doesn't present, destination('renamed_sym_400') is created. But geo-rep stats the name 'sym_400' to confirm source file's presence. In this race, since source name 'sym_400' is present as directory, it doesn't create destination. Hence RENAME is ignored. Fix: The fix is not rely only on stat of source name during RENAME. It should stat the name and if the name is present, gfid should be same. Only then it can conclude the presence of source. fixes: bz#1600405 Change-Id: I9fbec4f13ca6a182798a7f81b356fe2003aff969 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* core/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | see https://review.gluster.org/#/c/19788/, https://review.gluster.org/#/c/19871/, https://review.gluster.org/#/c/19952/, https://review.gluster.org/#/c/20104/, https://review.gluster.org/#/c/20162/, https://review.gluster.org/#/c/20185/, https://review.gluster.org/#/c/20207/, https://review.gluster.org/#/c/20227/, https://review.gluster.org/#/c/20307/, https://review.gluster.org/#/c/20320/, https://review.gluster.org/#/c/20332/, and https://review.gluster.org/#/c/20364/ Fixes glupy.py python2isms, iteritems -> items, and some overlooked print() in georep/peer_mountbroker.in Note: Fedora packaging guidelines and SUSE rpmlint require explicit shebangs; popular practices like #!/usr/bin/env python and #!/usr/bin/python are not allowed; they must be #!/usr/bin/python2 or #!/usr/bin/python3 Note: Selected small fixes from 2to3 utility. Specifically apply, basestring, funcattrs, has_key, idioms, map, numliterals, raise, set_literal, types, urllib, and zip have already been applied. Also version agnostic imports for urllib, cpickle, socketserver, _thread, queue, etc., suggested by Aravinda in https://review.gluster.org/#/c/19767/1 Note: these 2to3 fixes report no changes are necessary: asserts, buffer, exec, execfile, exitfunc, filter, getcwdu, imports2, input, intern, itertools, metaclass, methodattrs, ne, next, nonzero, operator, paren, raw_input, reduce, reload, renames, repr, standarderror, sys_exc, throw, tuple_params, xreadlines. Change-Id: Idda031c1ec975417c79323aea33e7b694e752b2a updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* geo-rep: Fix geo-rep for older versions of unshareKotresh HR2018-06-221-3/+14
| | | | | | | | | | | | | | Geo-rep mounts are private to worker. It uses mount namespace using unshare command to achieve the same. Well, the unshare command has to support '--propagation' option. So geo-rep breaks on the systems with older unshare version. The patch makes it fall back to lazy umount behaviour if the unshare does not support propagation option. fixes: bz#1589782 Change-Id: Ia614f068aede288d63ac62fea4461b1865066054 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix syncing of symlinkKotresh HR2018-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: If symlink is created on master pointing to current directory (e.g symlink -> ".") with non root uid or gid, geo-rep worker crashes with ENOTSUP. Cause: Geo-rep creates the symlink on slave and fixes the uid and gid using chown cmd. os.chown dereferences the symlink which is pointing to ".gfid" which is not supported. Note that geo-rep operates on aux-gfid-mount (e.g. "/mnt/.gfid/<gfid-of-symlink-file>"). Solution: The uid or gid change is acutally on symlink file. So use os.lchown, i.e, don't deference. BUG: 1567209 Change-Id: I63575fc589d71f987bef1d350c030987738c78ad updates: bz#1567209 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Remove lazy umount and use mount namespacesKotresh HR2018-02-221-8/+11
| | | | | | | | | | | | | Lazy umounting the master volume by worker causes issues with rsync's usage of getcwd. Henc removing the lazy umount and using private mount namespace for the same. On the slave, the lazy umount is retained as we can't use private namespace in non root geo-rep setup. Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a BUG: 1546129 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* tests: Enable geo-rep test casesKotresh HR2018-01-051-3/+7
| | | | | | | | | | | | | | | | | This patch re-enables the geo-rep test cases. Along with it does following optimizations. 1. Use EXPECT_WITHIN instead of sleep 2. Clean up geo-rep ssh key after test 3. Changes to gverify.sh and S56glusterd-geo-rep-create-post.sh to use the given ssh identity file for geo-rep create 4. Make gluster-command-dir configurable and introduce slave-gluster-command-dir which points the parent directory of gluster binaries in master and slave respectively. Change-Id: Ia7696278d9dd3ba04224dcd7c3564088ca970b04 BUG: 1480491 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Log message improvementsAravinda VK2017-12-281-2/+2
| | | | | | BUG: 1529480 Change-Id: If4775ed9886990c0e1bcf4e44c7dfef95cc4f0c3 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Refactoring Config and Arguments parsingAravinda VK2017-11-151-784/+605
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fixed Python pep8 issues - Removed dead code - Rewritten configuration management - Rewritten Arguments/subcommands handling - Added Args upgrade to accommodate all these changes without changing glusterd code - use of md5 removed, which was used to hash the brick path for workdir Both Master and Slave nodes will have subdir for session in the format "<mastervol>_<primary_slave_host>_<slavevol> $GLUSTER_LOGDIR/geo-replication/<mastervol>_<primary_slave_host>_<slavevol> $GLUSTER_LOGDIR/geo-replication-slaves/<mastervol>_<primary_slave_host>_<slavevol> Log file paths renamed since session info is available with directory name itself. $LOG_DIR_MASTER/ - gsyncd.log - Gsyncd, Worker monitor logs - mnt-<brick-path>.log - Aux mount logs, mounted by each worker - changes-<brick-path>.log - Changelog related logs(One per brick) $LOG_DIR_SLAVE/ - gsyncd.log - Slave Gsyncd logs - mnt-<master-node>-<master-brick-path>.log - Aux mount logs, mounted for each connection from master-node:master-brick - mnt-mbr-<master-node>-<master-brick-path>.log - Same as above, but mountbroker setup Fixes: #73 Change-Id: I2ec2a21e4e2a92fd92899d026e8543725276f021 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix data sync issue during hardlink, renameKotresh HR2017-11-141-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The data is not getting synced if master witnessed IO as below. 1. echo "test_data" > f1 2. ln f1 f2 3. mv f2 f3 4. unlink f1 On master, 'f3' exists with data "test_data" but on slave, only f3 exists with zero byte file without backend gfid link. Cause: On master, since 'f2' no longer exists, the hardlink is skipped during processing. Later, on trying to sync rename, since source ('f2') doesn't exist, dst ('f3') is created with same gfid. But in this use case, it succeeds but backend gfid would not have linked as 'f1' exists with the same gfid. So, rsync would fail with ENOENT as backend gfid is not linked with 'f3' and 'f1' is unlinked. Fix: On processing rename, if src doesn't exist on slave, don't blindly create dst with same gfid. The gfid needs to be checked, if it exists, hardlink needs to be created instead of mknod. Thanks Aravinda for helping in RCA :) Change-Id: I5af4f99798ed1bcb297598a4bc796b701d1e0130 Signed-off-by: Kotresh HR <khiremat@redhat.com> BUG: 1512483 Reporter: dimitri.ars@gmail.com
* geo-rep: Fix rename of directory in hybrid crawlKotresh HR2017-11-101-155/+34
| | | | | | | | | | | | | In hybrid crawl, renames and unlink can't be synced but directory renames can be detected. While syncing the directory on slave, if the gfid already exists, it should be rename. Hence if directory gfid already exists, rename it. Change-Id: Ibf9f99e76a3e02795a3c2befd8cac48a5c365bb6 BUG: 1499566 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Add ENODATA to retry list on gfid getxattrKotresh HR2017-10-111-8/+5
| | | | | | | | | | | | | During xsync crawl, worker occasionally crashed with ENODATA on getting gfid from backend. This is not persistent and is transient. Worker restart invovles re-processing of few entries in changenlogs. So adding ENODATA to retry list to avoid worker restart. Change-Id: Ib78d1e925c0a83c78746f28f7c79792a327dfd3e BUG: 1499391 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Add ENOTSUP error to retry listKotresh HR2017-10-101-1/+3
| | | | | | | | | | os.listdir gives ENOTSUP on gfid path occasionally which is not persistant. Adding it to retry list to avoid worker to crash if it's transient error. Change-Id: Ic795dd1f02a27c9e5d901e20722ee32451838feb BUG: 1499180 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix syncing of hardlink of symlinkKotresh HR2017-08-241-58/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If there is a hardlink to a symlink on master and if the symlink file is deleted on master, geo-rep fails to sync the hardlink. Typical Usecase: It's easily hit with rsnapshot use case where it uses hardlinks. Example Reproducer: Setup geo-replication between master and slave volume and in master mount point, do the following. 1. mkdir /tmp/symlinkbug 2. ln -f -s /does/not/exist /tmp/symlinkbug/a_symlink 3. rsync -a /tmp/symlinkbug ./ 4. cp -al symlinkbug symlinkbug.0 5. ln -f -s /does/not/exist2 /tmp/symlinkbug/a_symlink 6. rsync -a /tmp/symlinkbug ./ 7. cp -al symlinkbug symlinkbug.1 Cause: If the source was not present while syncing hardlink, it was always packing the blob as regular file. Fix: If the source was not present while syncing hardlink, pack the blob based on the mode. Change-Id: Iaa12d6f99de47b18e0650e7c4eb455f23f8390f2 BUG: 1432046 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reported-by: Christian Lohmaier <lohmaier+rhbz@gmail.com> Reviewed-on: https://review.gluster.org/18011 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix syncing of self healed hardlinksKotresh HR2017-07-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In a distribute replicate volume, if the hardlinks are created when a subvolume is down, it gets healed from other subvolume when it comes up. If this subvolume becomes ACTIVE in geo-rep there are chances that those hardlinks won't be synced to slave. Cause: AFR can't detect hardlinks during self heal. It just create those files using mknod and the same is recorded in changelog. Geo-rep processes these mknod and ignores it as it finds gfid already on slave. Solution: Geo-rep should process the mknod as link if the gfid already exists on slave. Change-Id: I2f721b462b38a74c60e1df261662db4b99b32057 BUG: 1475308 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17880 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Handle possible entry failures gracefullyKotresh HR2017-07-211-12/+15
| | | | | | | | | | | Updates: #246 Change-Id: If0ce83fe8dd3068bfb671f398b2e82ac831288d0 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17577 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Structured log supportAravinda VK2017-06-201-39/+53
| | | | | | | | | | | | | Changed all log messages to structured log format Change-Id: Idae25f8b4ad0bbae38f4362cbda7bbf51ce7607b Updates: #240 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17551 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix meta data sync on symlinkKotresh HR2017-06-041-11/+37
| | | | | | | | | | | | | | | | | | chmod doesn't support 'no dereference' option. It always deference the symlink. But 'chown' does support metadata changes on symlink itself, which was not taken care while syncing. This patch fixes the same. Change-Id: Ic9985f4e39d15b5a9deb379841bcfb2c263d3e6c BUG: 1455559 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17389 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Rsync tunables for performance improvementsAravinda VK2017-05-231-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flag: --ignore-missing-args This Rsync flag reduces sync failures if the source file is unlinked but present in --files-from list. This reduces Rsync retries in Geo-rep and improves the performance Flag: --existing Rsync in Geo-rep never creates target files. Using RPC Geo-rep creates entry in Slave and rsync --inplace used to prevent creating temporary file and rename.(To avoid different GFID in Slave). If the entry is missing in Slave then Geo-rep Rsync gets Permission denied errors when it tries to create file with name as GFID inside .gfid dir.(Geo-rep rsync syncs data using GFIDS with aux-gfid-mount) To disable these flags, gluster volume geo-replication <session> config \ rsync-opt-ignore-missing-args false gluster volume geo-replication <session> config \ rsync-opt-existing false Thanks Kotresh for finding these awesome tunables. BUG: 1400924 Change-Id: I6a84fb86a589bf6edc8dfd1086456a84b05a64fc Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/16010 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix mount cleanupKotresh HR2017-04-271-1/+1
| | | | | | | | | | | | | | On corner cases, mount cleanup might cause worker crash. Fixing the same. Change-Id: I38c0af51d10673765cdb37bc5b17bb37efd043b8 BUG: 1433506 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17015 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Remove unlink fop during rmdirKotresh HR2017-04-171-10/+10
| | | | | | | | | | | | | | | | Even though it is known to be 'RMDIR', os.unlink was being tried and os.rmdir is issued upon receiving EISDIR. It's unnecessary unlink call for 'RMDIR'. Fixed the same. Change-Id: I8dbb680ee2c7f0c32b7799b1ed5351b3621cb42a BUG: 1441106 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17041 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix EBUSY tracebackKotresh HR2017-04-101-1/+1
| | | | | | | | | | | | | | EBUSY was added to retry list of errno_wrap without importing. Fixing the same. Change-Id: Ide81a9ccc9b948a96265b6890da078b722b45d51 BUG: 1434018 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17011 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Improve worker log messagesKotresh HR2017-04-071-0/+8
| | | | | | | | | | | | | | | | | | | Monitor process expects worker to establish SSH Tunnel to slave node and mount master volume locally with in 60 secs and acknowledge monitor process by closing feedback fd. If something goes wrong and worker does not close feedback fd with in 60 secs, monitor kills the worker. But there was no clue in log message about the actual issue. This patch adds log and indicates whether the worker is hung during SSH or master mount. Change-Id: Id08a12fa6f3bba1d4fe8036728dbc290e6c14c8c BUG: 1261689 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16997 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Retry on EBUSYKotresh HR2017-04-071-8/+9
| | | | | | | | | | | | | | | Do not crash on EBUSY error. Add EBUSY retry errno list. Crash only if the error persists even after max retries. Change-Id: Ia067ccc6547731f28f2a315d400705e616cbf662 BUG: 1434018 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16924 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Optionally allow access to mountsKotresh HR2017-03-191-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to improve debuggability, it is important to have access to geo-rep master and slave mounts. With the default behaviour, geo-rep lazy unmounts the mounts after changing the current working directory into the mount point. It also cleans up the mount points. So only geo-rep worker has the access and it becomes impossible to take the client profile info and do any other client statck analysis. Hence the following new config is being introduced to allow access to mounts. gluster vol geo-rep <mastervol> <slavehost>::<slavevol> \ config access_mount true The default value of 'access_mount' is false. Change-Id: I53dce4ea86a6ffc979c82f9330e8954327180ca3 BUG: 1433506 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16912 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Separate slave mount logs for each connectionKotresh HR2017-01-181-0/+6
| | | | | | | | | | | | | | | | | | | | | Geo-rep worker mounts the slave volume on the slave node. If multiple worker connects to same slave node, all workers share the same mount log file. This is very difficult to debug as logs are cluttered from different mounts. Hence creating separate mount log file for each connection from worker. Each connection from worker is identified uniquely using 'mastervol uuid', 'master host', 'master brickpath', 'salve vol'. The log file name will be combination of the above. Change-Id: I67871dc8e8ea5864e2ad55e2a82063be0138bf0c BUG: 1412689 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/16384 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Handle directory sync failure as hard errorKotresh HR2017-01-131-6/+17
| | | | | | | | | | | | | | | | | | If directory creation is failed, return immediately before further processing. Allowing it to further process will fail the entire directory tree syncing to slave. Hence master will log and raise exception if it's directory failure. Earlier, master used to log the failure and proceed. Change-Id: Iba2a8b5d3d0092e7a9c8a3c2cdf9e6e29c73ddf0 BUG: 1411607 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/16364 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix log-rsync-performance config issueAravinda VK2016-12-141-4/+5
| | | | | | | | | | | | | | | | If log-rsync-performance config is not set, gconf.get_realtime will return None, Added default value as False if config file doesn't have this option set. BUG: 1393678 Change-Id: I89016ab480a16179db59913d635d8553beb7e14f Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/16102 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Do not restart workers when log-rsync-performance config changeAravinda VK2016-12-071-2/+4
| | | | | | | | | | | | | | | | | | Geo-rep restarts workers when any of the configurations changed. We don't need to restart workers if tunables like log-rsync-performance is modified. With this patch, Geo-rep workers will get new "log-rsync-performance" config automatically without restart. BUG: 1393678 Change-Id: I40ec253892ea7e70c727fa5d3c540a11e891897b Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15816 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep/eventsapi: Add Master node information in Geo-rep EventsAravinda VK2016-12-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added Master node information to GEOREP_ACTIVE, GEOREP_PASSIVE, GEOREP_FAULTY and GEOREP_CHECKPOINT_COMPLETED events. EVENT_GEOREP_ACTIVE(master_node and master_node_id are new fields) { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_ACTIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "master_node": MASTER_NODE, "master_node_id": MASTER_NODE_ID, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_PASSIVE(master_node and master_node_id are new fields) { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_PASSIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "master_node": MASTER_NODE, "master_node_id": MASTER_NODE_ID, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_FAULTY(master_node and master_node_id are new fields) { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_FAULTY", "message": { "master_volume": MASTER_VOLUME_NAME, "master_node": MASTER_NODE, "master_node_id": MASTER_NODE_ID, "current_slave_host": CURRENT_SLAVE_HOST, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_CHECKPOINT_COMPLETED(master_node and master_node_id are new fields) { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_CHECKPOINT_COMPLETED", "message": { "master_volume": MASTER_VOLUME_NAME, "master_node": MASTER_NODE, "master_node_id": MASTER_NODE_ID, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH, "checkpoint_time": CHECKPOINT_TIME, "checkpoint_completion_time": CHECKPOINT_COMPLETION_TIME } } BUG: 1395660 Change-Id: Ic91af52fa248c8e982e93a06be861dfd69689f34 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15858 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Handle ENOENT during unlinkAravinda VK2016-11-221-2/+4
| | | | | | | | | | | | | | Do not raise traceback if a file/dir not exists during unlink or rmdir BUG: 1396062 Change-Id: Idd43ca1fa6ae6056c3cd493f0e2f151880a3968c Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15868 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep/eventsapi: Additional EventsAravinda VK2016-10-181-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added following events EVENT_GEOREP_ACTIVE { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_ACTIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_PASSIVE { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_PASSIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_CHECKPOINT_COMPLETED { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_ACTIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH, "checkpoint_time": CHECKPOINT_TIME, "checkpoint_completion_time": CHECKPOINT_COMPLETION_TIME } } BUG: 1379330 Change-Id: I90716175868c59dd65c8d202e73e0ede90347b6a Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15630 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Kotresh HR <khiremat@redhat.com>