summaryrefslogtreecommitdiffstats
path: root/geo-replication/syncdaemon/syncdutils.py
Commit message (Collapse)AuthorAgeFilesLines
* geo-rep: Fix string comparisonKotresh HR2020-08-171-1/+1
| | | | | | Fixes: #1438 Change-Id: If9f1c2e89e504512bcf77608fb4a4fedb19f0399 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-replication: Fix IPv6 parsingAravinda Vishwanathapura2020-07-121-3/+40
| | | | | | | | | | | Brick paths in Volinfo used `:` as delimiter, Geo-rep uses split based on `:` char. This will go wrong with IPv6. This patch handles the IPv6 case and handles the split properly. Fixes: #1366 Change-Id: I25e88d693744381c0ccf3c1dbf1541b84be2499d Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.io>
* geo-rep: Fix corner case in rename on mkdir during hybrid crawlSunny Kumar2020-07-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The issue is being hit during hybrid mode while handling rename on slave. In this special case the rename is recorded as mkdir and geo-rep process it by resolving the path form backend. While resolving the backend path during this special handling one corner case is not considered. <snip> Traceback (most recent call last):   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker     res = getattr(self.obj, rmeth)(*in_data[2:])   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 588, in entry_ops     src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 710, in get_slv_dir_path     dir_entry = os.path.join(pfx, pargfid, basename)   File "/usr/lib64/python2.7/posixpath.py", line 75, in join     if b.startswith('/'): AttributeError: 'int' object has no attribute 'startswith' In pyhthon3: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.8/posixpath.py", line 90, in join genericpath._check_arg_types('join', a, *p) File "/usr/lib64/python3.8/genericpath.py", line 152, in _check_arg_types raise TypeError(f'{funcname}() argument must be str, bytes, or ' TypeError: join() argument must be str, bytes, or os.PathLike object, not 'int' </snip> Change-Id: I8b926899c60ad8c4ffc886d57028ba70fd21e332 Fixes: #1250 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* [geo-rep] Merging multiple import into single importkshithijiyer2020-04-081-2/+2
| | | | | | | | | | | | | | | | | | Geo-replication has a large number of repeated imports as shown below: ``` from syncdutils import set_term_handler, finalize, lf from syncdutils import log_raise_exception, FreeObject, escape ``` There imports can be clubbed together as shown below: `` from syncdutils import (set_term_handler, finalize, lf, log_raise_exception, FreeObject, escape) ``` Fixes: #1105 Change-Id: I59a48dd57a70fc851d93150b85e736ce41e8b793 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* geo-rep: descriptive message when worker crashes due to EIOSunny Kumar2020-03-171-1/+12
| | | | | | | | | | | With this patch now you can notice log if it is due to EIO: [2020-03-16 16:24:48.293837] E [syncdutils(worker /bricks/brick1/mbr3):348:log_raise_exception] <top>: Getting "Input/Output error" is most likely due to a. Brick is down or b. Split brain issue. [2020-03-16 16:24:48.293915] E [syncdutils(worker /bricks/brick1/mbr3):352:log_raise_exception] <top>: This is expected as per design to keep the consistency of the file system. Once the above issue is resolved geo-rep would automatically proceed further. Change-Id: Ie33f2440bc96089731ce12afa8dab91d9550a7ca Fixes: #1104 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* geo-rep: Fix for "Transport End Point not connected" issueHarpreet Kaur2020-01-311-3/+17
| | | | | | | | | | | | | | | | | | | | | | problem: Geo-rep gsyncd process mounts the master and slave volume on master nodes and slave nodes respectively and starts the sync. But it doesn't wait for the mount to be in ready state to accept I/O. The gluster mount is considered to be ready when all the distribute sub-volumes is up. If the all the distribute subvolumes are not up, it can cause ENOTCONN error, when lookup on file comes and file is on the subvol that is down. solution: Added a Virtual Xattr "dht.subvol.status" which returns "1" if all subvols are up and "0" if all subvols are not up. Geo-rep then uses this virtual xattr after a fresh mount, to check whether all subvols are up or not and then starts the I/O. fixes: bz#1664335 Change-Id: If3ad01d728b1372da7c08ccbe75a45bdc1ab2a91 Signed-off-by: Harpreet Kaur <hlalwani@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Improve debuggingKotresh HR2019-11-131-1/+1
| | | | | | | | | | | | | | | | When gsyncd failed with tracebacks during start, it prints only exception object but not the error string. This patch adds the error string as well. Earlier: "failed with ImportError" Now: "failed with ImportError: No module named _io." fixes: bz#1771895 Change-Id: I0d772a250d4c2010a0c35053aa7b165b71f8434e Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: Merge Worker and Agent as a single processAravinda VK2019-11-071-2/+0
| | | | | | | | | | | | - libgfchangelog is simplified by removing unnecessary API Class - Merged Agent logic into Worker instead of running Worker and Agent as two separate processes and maintaining RPC between Worker and Agent. - Geo-rep command Pause and Resume will continue without any changes. But Agent functionality also gets paused with that. Updates: #755 Change-Id: Ie2c00fa7dddf21f180f0649e0aaf084d29023c98 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix Permission denied traceback on non root setupKotresh HR2019-10-211-9/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: While syncing rename of directory in hybrid crawl, geo-rep crashes as below. Traceback (most recent call last): File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/local/libexec/glusterfs/python/syncdaemon/resource.py", line 588, in entry_ops src_entry = get_slv_dir_path(slv_host, slv_volume, gfid) File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 687, in get_slv_dir_path [ENOENT], [ESTALE]) File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap return call(*arg) PermissionError: [Errno 13] Permission denied: '/bricks/brick1/b1/.glusterfs/8e/c0/8ec0fcd4-d50f-4a6e-b473-a7943ab66640' Cause: Conversion of gfid to path for a directory uses readlink on backend .glusterfs gfid path. But this fails for non root user with permission denied. Fix: Use gfid2path interface to get the path from gfid Change-Id: I9d40c713a1b32cea95144cbc0f384ada82972222 fixes: bz#1763439 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: performance improvement while syncing renames with existing gfidSunny Kumar2019-09-231-0/+11
| | | | | | | | | | | | | | | | | | | | Problem: The bug[1] addresses issue of data inconsistency when handling RENAME with existing destination. This fix requires some performance tuning considering this issue occurs in heavy rename workload. Solution: If distribution count for master volume is one do not verify op's on master and go ahead with rename. The performance improvement with this patch can only be observed if master volume has distribution count one. [1]. https://bugzilla.redhat.com/show_bug.cgi?id=1694820 fixes: bz#1753857 Change-Id: I8e9bcd575e7e35f40f9f78b7961c92dee642f47b Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* geo-rep: Structured logging new formatAravinda VK2019-08-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Structured logging format changed in the below patch for better readability and parsing. This patch changes the Geo-replication logs to that format. https://review.gluster.org/#/c/glusterfs/+/22987/ Before: ``` [2019-08-21 04:23:01.13672] I [monitor(monitor):157:monitor] Monitor: \ starting gsyncd worker<TAB>brick=/bricks/b1<TAB>slave_node=f29.sonne ``` After: ``` [2019-08-21 04:23:01.13672] I [monitor(monitor):157:monitor] Monitor: \ starting gsyncd worker [{brick=/bricks/b1}, {slave_node=f29.sonne}] ``` Change-Id: I65ec69a2869e2febeedc558f88620399e4a1a6a4 Updates: #657 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep : fix gluster command path for non-root sessionSunny Kumar2019-06-281-3/+9
| | | | | | | | | | | | | | | | | | | | | Problem: gluster command not found. Cause: In Volinfo class we issue command 'gluster vol info' to get information about volume like getting brick_root to perform various operation. When geo-rep session is configured for non-root user Volinfo class fails to issue gluster command due to unavailability of gluster binary path for non-root user. Solution: Use config value 'slave-gluster-command-dir'/'gluster-command-dir' to get path for gluster command based on caller. fixes: bz#1722740 Change-Id: I4ec46373da01f5d00ecd160c4e8c6239da8b3859 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* eventsapi: Fix Python3 compatibility issuesAravinda VK2019-02-251-5/+5
| | | | | | | | | | | - Fixed Relative import and non-package import related issues. - socketserver import issues fix - Renamed installed directory name to `gfevents` from `events`(To avoid any issues with other global libs) Fixes: bz#1679406 Change-Id: I3dc38bc92b23387a6dfbcc0ab8283178235bf756 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep : fix rename sync on hybrid crawlSunny Kumar2019-01-171-13/+9
| | | | | | | | | | | | | | Problem: When geo-rep is configured as hybrid crawl directory renames are not synced to the slave. Solution: Rename sync of directory was failing due to incorrect destination path calculation. During check for existence on slave we miscalculated realpath. <host:brickpath/dir>. Change-Id: I23f1ea60e86a917598fe869d5d24f8da654d8a0a fixes: bz#1665826 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* georep: python2 to python3 compat - schedulerKotresh HR2018-10-301-1/+1
| | | | | | | | | 1. scheduler - Popen 2. syncdutils - corner case on failure fixes: bz#1643932 Change-Id: I65af97a244a8790e976acedc2728db6ebbf2ae10 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: python2 to python3 compat - syscallsKotresh HR2018-10-081-12/+0
| | | | | | | | | | | | 1. ctypes/syscalls A) arguments is expected to be encoded B) Raw conversion of return value from bytearray into string 2. struct pack/unpack - Raw converstion of string to bytearray 3. basestring -> str Updates: #411 Change-Id: I80f939adcdec0ed0022c87c0b76d057ad5559e5a Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: python2 and python3 compat - bytes-strKotresh HR2018-10-021-3/+5
| | | | | | | | | 1. Fix fdopen used for pid file 2. Fix sha256 checksum calculation Updates: #411 Change-Id: Ic173d104a73822c29aca260ba6de872cd8d23f86 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: python3 compatibility (Popen)Kotresh HR2018-09-251-4/+7
| | | | | | | | | | | | | | The file objects for python3 by default is opened in binary mode where as in python2 it's opened as text by default. The geo-rep code parses the output of Popen assuming it as text, hence used the 'universal_newlines' flag which provides backward compatibility for the same. Change-Id: I371a03b6348af9666164cb2e8b93d47475431ad9 Updates: #411 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* georep: Fix python3 compatibility (os.pipe)Kotresh HR2018-09-251-0/+12
| | | | | | | | | | | | | | 'os.pipe' returns pair of file descriptors which are non-inheritable by child processes. But geo-rep uses te inheritable nature of pipe fds to communicate between parent and child processes. Hence wrote a compatiable pipe routine which works well both with python2 and python3 with inheritable nature. Updates: #411 Change-Id: I869d7a52eeecdecf3851d44ed400e69b32a612d9 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-221-1/+1
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* geo-rep: Fix issues with gfid conflict handlingKotresh HR2018-07-201-0/+35
| | | | | | | | | | | | | | | | | | | | | | | 1. MKDIR/RMDIR is recorded on all bricks. So if one brick succeeds creating it, other bricks should ignore it. But this was not happening. The fix rename of directories in hybrid crawl, was trying to rename the directory to itself and in the process crashing with ENOENT if the directory is removed. 2. If file is created, deleted and a directory is created with same name, it was failing to sync. Again the issue is around the fix for rename of directories in hybrid crawl. Fixed the same. If the same case was done with hardlink present for the file, it was failing. This patch fixes that too. fixes: bz#1598884 Change-Id: I6f3bca44e194e415a3d4de3b9d03cc8976439284 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix geo-rep for older versions of unshareKotresh HR2018-06-221-0/+17
| | | | | | | | | | | | | | Geo-rep mounts are private to worker. It uses mount namespace using unshare command to achieve the same. Well, the unshare command has to support '--propagation' option. So geo-rep breaks on the systems with older unshare version. The patch makes it fall back to lazy umount behaviour if the unshare does not support propagation option. fixes: bz#1589782 Change-Id: Ia614f068aede288d63ac62fea4461b1865066054 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* core/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-06-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | see https://review.gluster.org/#/c/19788/, https://review.gluster.org/#/c/19871/, and https://review.gluster.org/#/c/19952/ This patch adds version agnostic imports for urllib, cpickle, socketserver, _thread, queue, etc., suggested by Aravinda in https://review.gluster.org/#/c/19767/1 Note: Fedora packaging guidelines require explicit shebangs, so popular practices like #!/usr/bin/env python and #!/usr/bin/python are not allowed; they must be #!/usr/bin/python2 or #!/usr/bin/python3 Note: Selected small fixes from 2to3 utility. Specifically apply, basestring, funcattrs, idioms, numliterals, set_literal, types, urllib, and zip have already been applied. Note: these 2to3 fixes report no changes are necessary: exec, execfile, exitfunc, filter, getcwdu, intern, itertools, metaclass, methodattrs, ne, next, nonzero, operator, paren, raw_input, reduce, reload, renames, repr, standarderror, sys_exc, throw, tuple_params, xreadlines. Change-Id: I8d393064a1837874d8b4bc87c8ce05c679664642 updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* geo-rep: Remove lazy umount and use mount namespacesKotresh HR2018-02-221-6/+8
| | | | | | | | | | | | | Lazy umounting the master volume by worker causes issues with rsync's usage of getcwd. Henc removing the lazy umount and using private mount namespace for the same. On the slave, the lazy umount is retained as we can't use private namespace in non root geo-rep setup. Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a BUG: 1546129 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Support for using Volinfo from Conf fileAravinda VK2018-01-231-0/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | Once Geo-replication is started, it runs Gluster commands to get Volume info from Master and Slave. With this patch, Georep can get Volume info from Conf file if `--use-gconf-volinfo` argument is specified to monitor Create a config(Or add to the config if exists) with following fields [vars] master-bricks=NODEID:HOSTNAME:PATH,.. slave-bricks=NODEID:HOSTNAME,.. master-volume-id= slave-volume-id= master-replica-count= master-disperse_count= Note: Exising Geo-replication is not affected since this is activated only when `--use-gconf-volinfo` is passed while spawning `gsyncd monitor` Tiering support is not yet added since Tiering + Glusterd2 is still under discussion. Fixes: #396 Change-Id: I281baccbad03686c00f6488a8511dd6db0edc57a Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Log message improvementsAravinda VK2017-12-281-1/+1
| | | | | | BUG: 1529480 Change-Id: If4775ed9886990c0e1bcf4e44c7dfef95cc4f0c3 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* fips/geo-rep: Replace MD5 with SHA256Kotresh HR2017-12-221-6/+14
| | | | | | | | | | | | | | | | | | MD5 is not fips compliant. Hence replacing with SHA256. NOTE: The hash is used to form the ctl_path for the ssh connection. The length of ctl_path for ssh connection should not be > 108. ssh fails with ctl_path too long if it is so. But when rsync is piped to ssh, it is not taking > 90. rsync is failing with error number 12. Hence using first 32 bytes of hash. Hash collision doesn't matter as only one sock file is created per directory. Change-Id: I58aeb32a80b5422f6ac0188cf33fbecccbf08ae7 Updates: #230 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Refactoring Config and Arguments parsingAravinda VK2017-11-151-71/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fixed Python pep8 issues - Removed dead code - Rewritten configuration management - Rewritten Arguments/subcommands handling - Added Args upgrade to accommodate all these changes without changing glusterd code - use of md5 removed, which was used to hash the brick path for workdir Both Master and Slave nodes will have subdir for session in the format "<mastervol>_<primary_slave_host>_<slavevol> $GLUSTER_LOGDIR/geo-replication/<mastervol>_<primary_slave_host>_<slavevol> $GLUSTER_LOGDIR/geo-replication-slaves/<mastervol>_<primary_slave_host>_<slavevol> Log file paths renamed since session info is available with directory name itself. $LOG_DIR_MASTER/ - gsyncd.log - Gsyncd, Worker monitor logs - mnt-<brick-path>.log - Aux mount logs, mounted by each worker - changes-<brick-path>.log - Changelog related logs(One per brick) $LOG_DIR_SLAVE/ - gsyncd.log - Slave Gsyncd logs - mnt-<master-node>-<master-brick-path>.log - Aux mount logs, mounted for each connection from master-node:master-brick - mnt-mbr-<master-node>-<master-brick-path>.log - Same as above, but mountbroker setup Fixes: #73 Change-Id: I2ec2a21e4e2a92fd92899d026e8543725276f021 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix rename of directory in hybrid crawlKotresh HR2017-11-101-1/+237
| | | | | | | | | | | | | In hybrid crawl, renames and unlink can't be synced but directory renames can be detected. While syncing the directory on slave, if the gfid already exists, it should be rename. Hence if directory gfid already exists, rename it. Change-Id: Ibf9f99e76a3e02795a3c2befd8cac48a5c365bb6 BUG: 1499566 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix syncing of hardlink of symlinkKotresh HR2017-08-241-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If there is a hardlink to a symlink on master and if the symlink file is deleted on master, geo-rep fails to sync the hardlink. Typical Usecase: It's easily hit with rsnapshot use case where it uses hardlinks. Example Reproducer: Setup geo-replication between master and slave volume and in master mount point, do the following. 1. mkdir /tmp/symlinkbug 2. ln -f -s /does/not/exist /tmp/symlinkbug/a_symlink 3. rsync -a /tmp/symlinkbug ./ 4. cp -al symlinkbug symlinkbug.0 5. ln -f -s /does/not/exist2 /tmp/symlinkbug/a_symlink 6. rsync -a /tmp/symlinkbug ./ 7. cp -al symlinkbug symlinkbug.1 Cause: If the source was not present while syncing hardlink, it was always packing the blob as regular file. Fix: If the source was not present while syncing hardlink, pack the blob based on the mode. Change-Id: Iaa12d6f99de47b18e0650e7c4eb455f23f8390f2 BUG: 1432046 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reported-by: Christian Lohmaier <lohmaier+rhbz@gmail.com> Reviewed-on: https://review.gluster.org/18011 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix changelog encoding to encode only space and newlineAravinda VK2017-07-211-0/+15
| | | | | | | | | | | | | | | | | | libgfchangelog was encoding path using spec rfc3986, but encoding only required for SPACE and NEWLINE chars since the NEWLINE char is used as record separator and SPACE as field separator in the parsed changelogs output. Changed the encoding function to encode only SPACE and NEWLINE. BUG: 1451724 Change-Id: I1936efad31788a9e636f912c832ed7d7efea4fe2 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17787 Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Structured log supportAravinda VK2017-06-201-6/+21
| | | | | | | | | | | | | Changed all log messages to structured log format Change-Id: Idae25f8b4ad0bbae38f4362cbda7bbf51ce7607b Updates: #240 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17551 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Rsync tunables for performance improvementsAravinda VK2017-05-231-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flag: --ignore-missing-args This Rsync flag reduces sync failures if the source file is unlinked but present in --files-from list. This reduces Rsync retries in Geo-rep and improves the performance Flag: --existing Rsync in Geo-rep never creates target files. Using RPC Geo-rep creates entry in Slave and rsync --inplace used to prevent creating temporary file and rename.(To avoid different GFID in Slave). If the entry is missing in Slave then Geo-rep Rsync gets Permission denied errors when it tries to create file with name as GFID inside .gfid dir.(Geo-rep rsync syncs data using GFIDS with aux-gfid-mount) To disable these flags, gluster volume geo-replication <session> config \ rsync-opt-ignore-missing-args false gluster volume geo-replication <session> config \ rsync-opt-existing false Thanks Kotresh for finding these awesome tunables. BUG: 1400924 Change-Id: I6a84fb86a589bf6edc8dfd1086456a84b05a64fc Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/16010 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix mount cleanupKotresh HR2017-04-271-1/+6
| | | | | | | | | | | | | | On corner cases, mount cleanup might cause worker crash. Fixing the same. Change-Id: I38c0af51d10673765cdb37bc5b17bb37efd043b8 BUG: 1433506 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17015 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix EBUSY tracebackKotresh HR2017-04-101-1/+1
| | | | | | | | | | | | | | EBUSY was added to retry list of errno_wrap without importing. Fixing the same. Change-Id: Ide81a9ccc9b948a96265b6890da078b722b45d51 BUG: 1434018 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17011 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Retry on EBUSYKotresh HR2017-04-071-2/+2
| | | | | | | | | | | | | | | Do not crash on EBUSY error. Add EBUSY retry errno list. Crash only if the error persists even after max retries. Change-Id: Ia067ccc6547731f28f2a315d400705e616cbf662 BUG: 1434018 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16924 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Optionally allow access to mountsKotresh HR2017-03-191-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to improve debuggability, it is important to have access to geo-rep master and slave mounts. With the default behaviour, geo-rep lazy unmounts the mounts after changing the current working directory into the mount point. It also cleans up the mount points. So only geo-rep worker has the access and it becomes impossible to take the client profile info and do any other client statck analysis. Hence the following new config is being introduced to allow access to mounts. gluster vol geo-rep <mastervol> <slavehost>::<slavevol> \ config access_mount true The default value of 'access_mount' is false. Change-Id: I53dce4ea86a6ffc979c82f9330e8954327180ca3 BUG: 1433506 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16912 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Use Host UUID to find local Gluster nodeAravinda VK2016-12-131-42/+20
| | | | | | | | | | | | | | | | | To spawn workers for each local brick, Geo-rep was collecting all the machine IPs based on hostname and finds based on the connectivity. With this patch, Geo-rep finds local brick if host UUID matches with UUID of the brick from Volume info. BUG: 1401801 Change-Id: Ic83c65df89e43cb86346e3ede227aa84d17ffd79 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/16035 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep/eventsapi: Additional EventsAravinda VK2016-10-181-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added following events EVENT_GEOREP_ACTIVE { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_ACTIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_PASSIVE { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_PASSIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH } } EVENT_GEOREP_CHECKPOINT_COMPLETED { "nodeid": NODEID, "ts": TIMESTAMP, "event": "GEOREP_ACTIVE", "message": { "master_volume": MASTER_VOLUME_NAME, "slave_host": SLAVE_HOST, "slave_volume": SLAVE_VOLUME, "brick_path": BRICK_PATH, "checkpoint_time": CHECKPOINT_TIME, "checkpoint_completion_time": CHECKPOINT_COMPLETION_TIME } } BUG: 1379330 Change-Id: I90716175868c59dd65c8d202e73e0ede90347b6a Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15630 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Kotresh HR <khiremat@redhat.com>
* eventsapi/geo-rep: Geo-rep will not work without eventsapi rpmsAravinda VK2016-09-281-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | If glusterfs-events rpm is not installed, Geo-replication will fail since it imports eventtypes. Any call to gsyncd will fail with Import error. Glusterd start fails since it runs `gsyncd.py --version` Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 29, in <module> from syncdutils import FreeObject, norm, grabpidfile, finalize File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 28, in <module> from events import eventtypes ImportError: No module named events BUG: 1378057 Change-Id: I1a9bc086c3d52449ec7296cb2f9ceb16cd41a8a4 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15539 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Use configured log_level for libgfchangelog logsAravinda VK2016-09-091-0/+17
| | | | | | | | | | | | | | | | libgfchangelog was not respecting the log_level configured in Geo-replication. With this patch Libgfchangelog log level can be configured using `config changelog_log_level TRACE`. Default Changelog log level is INFO BUG: 1363965 Change-Id: Ida714931129f6a1331b9d0815da77efcb2b898e3 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15078 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: add geo-rep events for server side changesSaravanakumar Arumugam2016-08-311-0/+9
| | | | | | | | | | | | | | | | | Event Type defined in #15351 to avoid merge conflicts Add geo-rep events applicable to changes in geo-rep session in the server side. Change-Id: Ia66574d2abccad7fce6a96667efbc7c6c8903fc6 BUG: 1370445 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/15328 Tested-by: Aravinda VK <avishwan@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Post process Data and Meta ChangelogsAravinda VK2016-08-261-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this patch, Data and Meta GFIDs are post processed. If Changelog has UNLINK entry then remove from Data and Meta GFIDs list(If stat on GFID is ENOENT in Master). While processing Changelogs, - Collect all the data and meta operations in a temporary database - Delete all Data and Meta GFIDs which are already unlinked as per Changelogs (unlink only if stat on GFID is ENOENT) - Process all Entry operations as usual - Process data and meta operations in batch(Fetch from Db in batch) - Data sync is again batched based on number of changelogs(Default 1day changelogs). Once the sync is complete, Update last Changelog's time as last_synced time as usual. Additionally maintain entry_stime on Brick root, ignore Entry ops if changelog suffix time is less than entry_stime. If data stime is more than entry_stime, this can happen only when passive worker updates stime by itself by getting mount point stime. Use entry_stime = data_stime in this case. New configurations: max-rsync-retries - Default Value is 10 max-data-changelogs-in-batch - Max number of changelogs to be considered in a batch for syncing. Default value is 5760(4 changelogs per min * 60 min * 24 hours) max-history-changelogs-in-batch - Max number of history changelogs to be processed at once. Default value 86400(4 changelogs per min * 60 min * 24 hours * 15 days) BUG: 1364420 Change-Id: I7b665895bf4806035c2a8573d361257cbadbea17 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15110 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/libgfchangelog: Log failure in gf_histroy_changelogKotresh HR2016-08-091-0/+2
| | | | | | | | | | | | | | | | | Add error logs if gf_history_changelog fails. If requested changelog range is not available, log the error and exit instead of continuing the loop and exiting in readdir without logging. Also fixed the duplicate MSGID number in 'changelog-lib-messages.h' Change-Id: Icd71b89ae23b48a71380657ba5649029c32fabfd BUG: 1362151 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/15064 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Error message cleanupAravinda VK2016-06-151-10/+13
| | | | | | | | | | | | | | | | If ssh returns 127 that means the remote gsyncd path is wrong or push-pem failed during create. Existing error message was pointing old documentation. Change-Id: Ifbbb4a604fc0ae0fd5cb2746df6363bf28cde1e9 BUG: 1343943 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/14673 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Do not crash worker on ESTALEKotresh HR2015-08-031-9/+1
| | | | | | | | | | | | | | Handle ESTALE returned by lstat gracefully by retrying it. Do not crash the worker. Change-Id: I2527cd8bd1f7d2428cb4fa3f20782bebaf2df12a BUG: 1247529 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Minimize rm -rf race in Geo-repAravinda VK2015-05-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While doing RMDIR worker gets ENOTEMPTY because same directory will have files from other bricks which are not deleted since that worker is slow processing. So geo-rep does recursive_delete. Recursive delete was done using shutil.rmtree. once started, it will not check disk_gfid in between. So it ends up deleting the new files created by other workers. Also if other worker creates files after one worker gets list of files to be deleted, then first worker will again get ENOTEMPTY again. To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA. And disk_gfid check added for original path for which recursive_delete is called. This disk gfid check executed before every Unlink/Rmdir. If disk gfid is not matching with GFID from Changelog, that means other worker deleted the directory. Even if the subdir/file present, it belongs to different parent. Exit without performing further deletes. Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR failed with ENOENT will not succeed unless parent directory is created again. Rsync errors handling was handling unlinked_gfids_list only for one Changelog, but when processed in batch it fails to detect unlinked_gfids and retries again. Finally skips the entire Changelogs in that batch. Fixed this issue by moving self.unlinked_gfids reset logic before batch start and after batch end. Most of the Geo-rep races with rm -rf is eliminated with this patch, but in some cases stale directories left in some bricks and in mount point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log) BUG: 1211037 Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10204 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: log ENTRY failures from slave on masterMilind Changire2015-04-271-1/+1
| | | | | | | | | | | | | | | ENTRY operations failures on slave left no trace for debugging purposes. This patch captures such failures on slave cluster and forwards them to the master and logs them. Failures of specific interest are the ones which return code EEXIST on the failing operations. Change-Id: Iecab876f16593c746d53f4b7ec2e0783367856bb BUG: 1207115 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/10048 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: Adhering to the common storage for geo-repKotresh HR2015-04-271-3/+0
| | | | | | | | | | | | | | | | | | | Making geo-rep use the common storage shared by nfs, snapshot and geo-rep. The meta volume should be named as gluster_shared_storage, and it should be mounted at "/var/run/gluster/shared_storage/". geo-rep will have create a directory called 'geo-rep' in the meta-volume and all the lock files are created inside it. Change-Id: I82d0bff9be191f75f643606a9a21d53559047ac4 BUG: 1210344 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10196 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* feature/geo-rep: Active Passive Switching logic flockKotresh HR2015-03-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CURRENT DESIGN AND ITS LIMITATIONS: ----------------------------------- Geo-replication syncs changes across geography using changelogs captured by changelog translator. Changelog translator sits on server side just above posix translator. Hence, in distributed replicated setup, both replica pairs collect changelogs w.r.t their bricks. Geo-replication syncs the changes using only one brick among the replica pair at a time, calling it as "ACTIVE" and other non syncing brick as "PASSIVE". Let's consider below example of distributed replicated setup where NODE-1 as b1 and its replicated brick b1r is in NODE-2 NODE-1 NODE-2 b1 b1r At the beginning, geo-replication chooses to sync changes from NODE-1:b1 and NODE-2:b1r will be "PASSIVE". The logic depends on virtual getxattr 'trusted.glusterfs.node-uuid' which always returns first up subvolume i.e., NODE-1. When NODE-1 goes down, the above xattr returns NODE-2 and that is made 'ACTIVE'. But when NODE-1 comes back again, the above xattr returns NODE-1 and it is made 'ACTIVE' again. So for a brief interval of time, if NODE-2 had not finished processing the changelog, both NODE-2 and NODE-1 will be ACTIVE causing rename race as mentioned in the bug. SOLUTION: --------- 1. Have a shared replicated storage, a glusterfs management volume specific to geo-replication. 2. Geo-rep creates a file per replica set on management volume. 3. fcntl lock on the above said file is used for synchronization between geo-rep workers belonging to same replica set. 4. If management volume is not configured, geo-replication will back to previous logic of using first up sub volume. Each worker tries to lock the file on shared storage, who ever wins will be ACTIVE. With this, we are able to solve the problem but there is an issue when the shared replicated storage goes down (when all replicas goes down). In that case, the lock state is lost. So AFR needs to rebuild the lock state after brick comes up. NOTE: ----- This patch brings in the, pre-requisite step of setting up management volume for geo-replication during creation. 1. Create mgmt-vol for geo-replicatoin and start it. Management volume should be part of master cluster and recommended to be three way replicated volume having each brick in different nodes for availability. 2. Create geo-rep session. 3. Configure mgmt-vol created with geo-replication session as follows. gluster vol geo-rep <mastervol> slavenode::<slavevol> config meta_volume \ <meta-vol-name> 4. Start geo-rep session. Backward Compatiability: ----------------------- If management volume is not configured, it falls back to previous logic of using node-uuid virtual xattr. But it is not recommended. Change-Id: I7319d2289516f534b69edd00c9d0db5a3725661a BUG: 1196632 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/9759 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>