summaryrefslogtreecommitdiffstats
path: root/geo-replication
Commit message (Collapse)AuthorAgeFilesLines
* contrib: Remove contrib/ipaddr-py entirely.Nigel Babu2018-07-091-2/+2
| | | | | | | | This module is no longer being used. Fixes: bz#1597512 Change-Id: Ie5faf55c5961d9d7b5082c9c257351af712c41d7 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* core/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-07-093-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | see https://review.gluster.org/#/c/19788/, https://review.gluster.org/#/c/19871/, https://review.gluster.org/#/c/19952/, https://review.gluster.org/#/c/20104/, https://review.gluster.org/#/c/20162/, https://review.gluster.org/#/c/20185/, https://review.gluster.org/#/c/20207/, https://review.gluster.org/#/c/20227/, https://review.gluster.org/#/c/20307/, https://review.gluster.org/#/c/20320/, https://review.gluster.org/#/c/20332/, and https://review.gluster.org/#/c/20364/ Fixes glupy.py python2isms, iteritems -> items, and some overlooked print() in georep/peer_mountbroker.in Note: Fedora packaging guidelines and SUSE rpmlint require explicit shebangs; popular practices like #!/usr/bin/env python and #!/usr/bin/python are not allowed; they must be #!/usr/bin/python2 or #!/usr/bin/python3 Note: Selected small fixes from 2to3 utility. Specifically apply, basestring, funcattrs, has_key, idioms, map, numliterals, raise, set_literal, types, urllib, and zip have already been applied. Also version agnostic imports for urllib, cpickle, socketserver, _thread, queue, etc., suggested by Aravinda in https://review.gluster.org/#/c/19767/1 Note: these 2to3 fixes report no changes are necessary: asserts, buffer, exec, execfile, exitfunc, filter, getcwdu, imports2, input, intern, itertools, metaclass, methodattrs, ne, next, nonzero, operator, paren, raw_input, reduce, reload, renames, repr, standarderror, sys_exc, throw, tuple_params, xreadlines. Change-Id: Idda031c1ec975417c79323aea33e7b694e752b2a updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* geo-rep: Fix geo-rep for older versions of unshareKotresh HR2018-06-223-7/+42
| | | | | | | | | | | | | | Geo-rep mounts are private to worker. It uses mount namespace using unshare command to achieve the same. Well, the unshare command has to support '--propagation' option. So geo-rep breaks on the systems with older unshare version. The patch makes it fall back to lazy umount behaviour if the unshare does not support propagation option. fixes: bz#1589782 Change-Id: Ia614f068aede288d63ac62fea4461b1865066054 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix problems in python2 -> python3 compatKotresh HR2018-06-112-3/+3
| | | | | | | | | 1. Import configparser module correctly 2. Import thread module correctly Updates: #411 Change-Id: I522453d23c256b694fa58d285f413b8c4dd6595c Signed-off-by: Kotresh HR <khiremat@redhat.com>
* core/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-06-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | see https://review.gluster.org/#/c/19788/, https://review.gluster.org/#/c/19871/, https://review.gluster.org/#/c/19952/, and https://review.gluster.org/#/c/20104/ https://review.gluster.org/#/c/20162/ This patch changes uses of map() and raise(), and a few cases of print() that were overlooked in the prior patch that fixed print. Note: Fedora packaging guidelines require explicit shebangs, so popular practices like #!/usr/bin/env python and #!/usr/bin/python are not allowed; they must be #!/usr/bin/python2 or #!/usr/bin/python3 Note: Selected small fixes from 2to3 utility. Specifically apply, basestring, funcattrs, idioms, numliterals, set_literal, types, urllib, zip, map, and raise have already been applied. Also version agnostic imports for urllib, cpickle, socketserver, _thread, queue, etc., suggested by Aravinda in https://review.gluster.org/#/c/19767/1 Note: these 2to3 fixes report no changes are necessary: asserts, buffer, exec, execfile, exitfunc, filter, getcwdu, intern, itertools, metaclass, methodattrs, ne, next, nonzero, operator, paren, raw_input, reduce, reload, renames, repr, standarderror, sys_exc, throw, tuple_params, xreadlines. Change-Id: Id62ea491e4ab5dd390075c5c6d9d889cf6f9da27 updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* core/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-06-048-6/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | see https://review.gluster.org/#/c/19788/, https://review.gluster.org/#/c/19871/, and https://review.gluster.org/#/c/19952/ This patch adds version agnostic imports for urllib, cpickle, socketserver, _thread, queue, etc., suggested by Aravinda in https://review.gluster.org/#/c/19767/1 Note: Fedora packaging guidelines require explicit shebangs, so popular practices like #!/usr/bin/env python and #!/usr/bin/python are not allowed; they must be #!/usr/bin/python2 or #!/usr/bin/python3 Note: Selected small fixes from 2to3 utility. Specifically apply, basestring, funcattrs, idioms, numliterals, set_literal, types, urllib, and zip have already been applied. Note: these 2to3 fixes report no changes are necessary: exec, execfile, exitfunc, filter, getcwdu, intern, itertools, metaclass, methodattrs, ne, next, nonzero, operator, paren, raw_input, reduce, reload, renames, repr, standarderror, sys_exc, throw, tuple_params, xreadlines. Change-Id: I8d393064a1837874d8b4bc87c8ce05c679664642 updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* geo-rep: Fix upgrade issueKotresh HR2018-05-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | Cause and Analysis: The last synced changelog for entry operations is marked in current version to avoid re-processing of already processed entry operations in a batch during crash/restart of geo-rep. This was not present in previous versoins. The marker is maintained in the dictionary with the key 'last_synced_entry' and dictionary is persisted into status file. So upgrading to current version in which the marker is present was failing with KeyError. Solution: Load the dictionary with default keys first which contains all the keys including latest ones and then load the values from status file instead of doing otherwise. fixes: bz#1575490 Change-Id: Ic654e6f9a3c97f616761f1362f890352a2186fb4 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* core/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-05-024-9/+13
| | | | | | | | | | see https://review.gluster.org/#/c/19788/ use print fn from __future__ Change-Id: If5075d8d9ca9641058fbc71df8a52aa35804cda4 updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* geo-rep: Fix syncing of symlinkKotresh HR2018-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: If symlink is created on master pointing to current directory (e.g symlink -> ".") with non root uid or gid, geo-rep worker crashes with ENOTSUP. Cause: Geo-rep creates the symlink on slave and fixes the uid and gid using chown cmd. os.chown dereferences the symlink which is pointing to ".gfid" which is not supported. Note that geo-rep operates on aux-gfid-mount (e.g. "/mnt/.gfid/<gfid-of-symlink-file>"). Solution: The uid or gid change is acutally on symlink file. So use os.lchown, i.e, don't deference. BUG: 1567209 Change-Id: I63575fc589d71f987bef1d350c030987738c78ad updates: bz#1567209 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* core/build/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-04-1211-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note 1) we're not supposed to be using #!/usr/bin/env python, see https://fedoraproject.org/wiki/Packaging:Guidelines?rd=Packaging/Guidelines#Shebang_lines Note 2) we're also not supposed to be using "!/usr/bin/python, see https://fedoraproject.org/wiki/Changes/Avoid_usr_bin_python_in_RPM_Build#Quick_Opt-Out The previous patch (https://review.gluster.org/19767) tried to do too much in one patch, so it was abandoned. This patch does two things: 1) minor cleanup of configure(.ac) to explicitly use python2 2) change all the shebang lines to #!/usr/bin/python2 and add them where they were missing based on warnings emitted during rpmbuild. In a follow-up patch python2 will eventually be changed to python3. Before that python2-isms (e.g. print, string.join(), etc.) need to be converted to python3. Some of those can be rewritten in version agnostic python. E.g. print statements become print() with "from __future_ import print_function". The python 2to3 utility will be used for some of those. Also Aravinda has given guidance in the comments to the first patch for changes. updates: #411 Change-Id: I471730962b2526022115a1fc33629fb078b74338 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* python: Remove all uses of find_library. Fixes #1450593Niklas Hambüchen2018-03-242-4/+2
| | | | | | | | `find_library()` doesn't consider LD_LIBRARY_PATH on Python < 3.6. Change-Id: Iee26085cb5d14061001f19f032c2664d69a378a8 BUG: 1450593 Signed-off-by: Niklas Hambüchen <mail@nh2.me>
* geo-rep: Remove lazy umount and use mount namespacesKotresh HR2018-02-224-15/+29
| | | | | | | | | | | | | Lazy umounting the master volume by worker causes issues with rsync's usage of getcwd. Henc removing the lazy umount and using private mount namespace for the same. On the slave, the lazy umount is retained as we can't use private namespace in non root geo-rep setup. Change-Id: I403375c02cb3cc7d257a5f72bbdb5118b4c8779a BUG: 1546129 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Remove unused working directory check in gsyncdAravinda VK2018-01-291-37/+1
| | | | | | | | | | | To append the default config file path, gsyncd calls gluster command to get the workdir path and constructs config file path. This is not required now since the Config management in Geo-replication is changed with patch 18257(Issue #73) BUG: 1539545 Change-Id: Ia7eb39e36ed59ece4de65ea7ec71a0f615e338bb Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Detailed JSON output for configAravinda VK2018-01-263-10/+45
| | | | | | | | | | | | | | | | | JSON output of `config-get` command now returns in the following format { "name": CONFIG_NAME, "value": CONFIG_VALUE, "default_value": DEFAULT_VALUE, # Only if modified == true "configurable": true|false, "modified": true|false } Change-Id: I6193de48cd33655df7ecef5a0d83d7cb147089cf Fixes: #361 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Support for using Volinfo from Conf fileAravinda VK2018-01-234-50/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | Once Geo-replication is started, it runs Gluster commands to get Volume info from Master and Slave. With this patch, Georep can get Volume info from Conf file if `--use-gconf-volinfo` argument is specified to monitor Create a config(Or add to the config if exists) with following fields [vars] master-bricks=NODEID:HOSTNAME:PATH,.. slave-bricks=NODEID:HOSTNAME,.. master-volume-id= slave-volume-id= master-replica-count= master-disperse_count= Note: Exising Geo-replication is not affected since this is activated only when `--use-gconf-volinfo` is passed while spawning `gsyncd monitor` Tiering support is not yet added since Tiering + Glusterd2 is still under discussion. Fixes: #396 Change-Id: I281baccbad03686c00f6488a8511dd6db0edc57a Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Improve geo-rep pre-validation logsKotresh HR2018-01-221-4/+5
| | | | | | | | | | | | | | | | | | | | | | Geo-rep runs gverify.sh which does pre-validation. As part of it, master and slave volume is mounted to verify the size. If for some reason, the mount fails, the error message does not point out the mount log file location. Also both master and slave mount logs are same. Patch does following improvements. 1. Master and slave mount logs are separated and error message points the log file to be looked for. 2. The log location is changed to /var/log/glusterfs/geo-replication instead of /var/log/glusterfs/geo-replication-slaves 3. The log file name is changed to "gverify-mastermnt.log" and "gverify-slavemnt.log" for master and slave mount respectively Fixes: #395 Change-Id: Ia644ec0afebbdaae92e01adf03c635e5f8866a02 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Validate availability of gluster binary on slaveKotresh HR2018-01-191-1/+12
| | | | | | | | | | 1. Adds validation to check if gluster binary is available on slave 2. Add a simple geo-rep setup test case to verify whether setup is fine. It's named in such a way that it runs first. BUG: 1532591 Change-Id: Ie777e55ae13db8fa97d4e32464ad82269ee5fd07 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* tests: Enable geo-rep test casesKotresh HR2018-01-054-10/+33
| | | | | | | | | | | | | | | | | This patch re-enables the geo-rep test cases. Along with it does following optimizations. 1. Use EXPECT_WITHIN instead of sleep 2. Clean up geo-rep ssh key after test 3. Changes to gverify.sh and S56glusterd-geo-rep-create-post.sh to use the given ssh identity file for geo-rep create 4. Make gluster-command-dir configurable and introduce slave-gluster-command-dir which points the parent directory of gluster binaries in master and slave respectively. Change-Id: Ia7696278d9dd3ba04224dcd7c3564088ca970b04 BUG: 1480491 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Log message improvementsAravinda VK2017-12-285-9/+9
| | | | | | BUG: 1529480 Change-Id: If4775ed9886990c0e1bcf4e44c7dfef95cc4f0c3 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* fips/geo-rep: Replace MD5 with SHA256Kotresh HR2017-12-221-6/+14
| | | | | | | | | | | | | | | | | | MD5 is not fips compliant. Hence replacing with SHA256. NOTE: The hash is used to form the ctl_path for the ssh connection. The length of ctl_path for ssh connection should not be > 108. ssh fails with ctl_path too long if it is so. But when rsync is piped to ssh, it is not taking > 90. rsync is failing with error number 12. Hence using first 32 bytes of hash. Hash collision doesn't matter as only one sock file is created per directory. Change-Id: I58aeb32a80b5422f6ac0188cf33fbecccbf08ae7 Updates: #230 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Cleanup stale unprocessed xsync changelogsKotresh HR2017-12-111-0/+5
| | | | | | | Fixes: #376 Change-Id: Ib92920c716c7d27e1eeb4bc4ebaf3efb48e0694d Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix access-mount geo-rep configKotresh HR2017-11-303-4/+12
| | | | | | | | | Fix access-mount and slave-access-mount configs. Change-Id: Ib586677755e76a51b9f20093e441b72789b4fecc Signed-off-by: Kotresh HR <khiremat@redhat.com> BUG: 1517633
* geo-rep: Fix slave side custom config issueAravinda VK2017-11-272-4/+21
| | | | | | | | | | | | | Slave gsyncd will not use session config files, Slave configs are stored in Master config file itself and sent as argument to slave gsyncd. With this patch, gconf default values are overwritten if argument name starts with "slave-" Change-Id: Iebc51f52232c0cd30b29199f03015f97b70ce537 Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1517068
* geo-rep: JSON output for status and configAravinda VK2017-11-244-4/+26
| | | | | | | | | For Glusterd2 integration, JSON output of status and config is very useful from gsyncd Fixes: #361 Change-Id: I53c61f19033ad4ac601ea49469e4e7c7c8e9af3d Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Refactoring Config and Arguments parsingAravinda VK2017-11-1518-2310/+2605
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fixed Python pep8 issues - Removed dead code - Rewritten configuration management - Rewritten Arguments/subcommands handling - Added Args upgrade to accommodate all these changes without changing glusterd code - use of md5 removed, which was used to hash the brick path for workdir Both Master and Slave nodes will have subdir for session in the format "<mastervol>_<primary_slave_host>_<slavevol> $GLUSTER_LOGDIR/geo-replication/<mastervol>_<primary_slave_host>_<slavevol> $GLUSTER_LOGDIR/geo-replication-slaves/<mastervol>_<primary_slave_host>_<slavevol> Log file paths renamed since session info is available with directory name itself. $LOG_DIR_MASTER/ - gsyncd.log - Gsyncd, Worker monitor logs - mnt-<brick-path>.log - Aux mount logs, mounted by each worker - changes-<brick-path>.log - Changelog related logs(One per brick) $LOG_DIR_SLAVE/ - gsyncd.log - Slave Gsyncd logs - mnt-<master-node>-<master-brick-path>.log - Aux mount logs, mounted for each connection from master-node:master-brick - mnt-mbr-<master-node>-<master-brick-path>.log - Same as above, but mountbroker setup Fixes: #73 Change-Id: I2ec2a21e4e2a92fd92899d026e8543725276f021 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix data sync issue during hardlink, renameKotresh HR2017-11-141-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The data is not getting synced if master witnessed IO as below. 1. echo "test_data" > f1 2. ln f1 f2 3. mv f2 f3 4. unlink f1 On master, 'f3' exists with data "test_data" but on slave, only f3 exists with zero byte file without backend gfid link. Cause: On master, since 'f2' no longer exists, the hardlink is skipped during processing. Later, on trying to sync rename, since source ('f2') doesn't exist, dst ('f3') is created with same gfid. But in this use case, it succeeds but backend gfid would not have linked as 'f1' exists with the same gfid. So, rsync would fail with ENOENT as backend gfid is not linked with 'f3' and 'f1' is unlinked. Fix: On processing rename, if src doesn't exist on slave, don't blindly create dst with same gfid. The gfid needs to be checked, if it exists, hardlink needs to be created instead of mknod. Thanks Aravinda for helping in RCA :) Change-Id: I5af4f99798ed1bcb297598a4bc796b701d1e0130 Signed-off-by: Kotresh HR <khiremat@redhat.com> BUG: 1512483 Reporter: dimitri.ars@gmail.com
* geo-rep: Fix rename of directory in hybrid crawlKotresh HR2017-11-104-240/+276
| | | | | | | | | | | | | In hybrid crawl, renames and unlink can't be synced but directory renames can be detected. While syncing the directory on slave, if the gfid already exists, it should be rename. Hence if directory gfid already exists, rename it. Change-Id: Ibf9f99e76a3e02795a3c2befd8cac48a5c365bb6 BUG: 1499566 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix passive brick's last sync timeKotresh HR2017-10-111-0/+1
| | | | | | | | | | | | | | Passive brick's stime was not updated to the status file immediately after updating the brick root. As a result the last sync time was showing '0' until it finishes first crawl if passive worker becomes active after restart. Fix is to update the status file immediately after upgrading the brick root. Change-Id: I248339497303bad20b7f5a1d42ab44a1fe6bca99 BUG: 1500346 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Add EINTR to retry list while doing readlinkKotresh HR2017-10-111-4/+6
| | | | | | | | | | | Worker occasionally crashed with EINTR on readlink. This is not persistent and is transient. Worker restart invovles re-processing of few entries in changenlogs. So adding EINTR to retry list to avoid worker restart. Change-Id: Iefe641437b5d5be583f079fc2a7a8443bcd19f9d BUG: 1499393 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Add ENODATA to retry list on gfid getxattrKotresh HR2017-10-111-8/+5
| | | | | | | | | | | | | During xsync crawl, worker occasionally crashed with ENODATA on getting gfid from backend. This is not persistent and is transient. Worker restart invovles re-processing of few entries in changenlogs. So adding ENODATA to retry list to avoid worker restart. Change-Id: Ib78d1e925c0a83c78746f28f7c79792a327dfd3e BUG: 1499391 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix status transitionKotresh HR2017-10-111-1/+0
| | | | | | | | | | | | | | | | | | | The status transition is as below which is wrong. Created->Initializing->Active->Active/Passive->Stopped As soon as the monitor spawns the worker, the state is changed from 'Initializing' to 'Active' and then to 'Active/Passive' based on whether worker gets the lock or not. This is wrong and it should directly tranistion as below. Created->Initializing->Active/Passive->Stopped Change-Id: Ibf5ca5c4fdf168c403c6da01db60b93f0604aae7 BUG: 1500284 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Add ENOTSUP error to retry listKotresh HR2017-10-101-1/+3
| | | | | | | | | | os.listdir gives ENOTSUP on gfid path occasionally which is not persistant. Adding it to retry list to avoid worker to crash if it's transient error. Change-Id: Ic795dd1f02a27c9e5d901e20722ee32451838feb BUG: 1499180 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix syncing of hardlink of symlinkKotresh HR2017-08-243-64/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If there is a hardlink to a symlink on master and if the symlink file is deleted on master, geo-rep fails to sync the hardlink. Typical Usecase: It's easily hit with rsnapshot use case where it uses hardlinks. Example Reproducer: Setup geo-replication between master and slave volume and in master mount point, do the following. 1. mkdir /tmp/symlinkbug 2. ln -f -s /does/not/exist /tmp/symlinkbug/a_symlink 3. rsync -a /tmp/symlinkbug ./ 4. cp -al symlinkbug symlinkbug.0 5. ln -f -s /does/not/exist2 /tmp/symlinkbug/a_symlink 6. rsync -a /tmp/symlinkbug ./ 7. cp -al symlinkbug symlinkbug.1 Cause: If the source was not present while syncing hardlink, it was always packing the blob as regular file. Fix: If the source was not present while syncing hardlink, pack the blob based on the mode. Change-Id: Iaa12d6f99de47b18e0650e7c4eb455f23f8390f2 BUG: 1432046 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reported-by: Christian Lohmaier <lohmaier+rhbz@gmail.com> Reviewed-on: https://review.gluster.org/18011 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Convert gfid mismatch logs to structured loggingKotresh HR2017-07-281-10/+17
| | | | | | | | | | | | | | Convert the logs related to entry failures fix due to gfid mismatch logs into structured logging format Change-Id: I9bce950c5339b48d3ec8b84bddee38b0473b7634 Updates: #246 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17896 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix syncing of self healed hardlinksKotresh HR2017-07-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In a distribute replicate volume, if the hardlinks are created when a subvolume is down, it gets healed from other subvolume when it comes up. If this subvolume becomes ACTIVE in geo-rep there are chances that those hardlinks won't be synced to slave. Cause: AFR can't detect hardlinks during self heal. It just create those files using mknod and the same is recorded in changelog. Geo-rep processes these mknod and ignores it as it finds gfid already on slave. Solution: Geo-rep should process the mknod as link if the gfid already exists on slave. Change-Id: I2f721b462b38a74c60e1df261662db4b99b32057 BUG: 1475308 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17880 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Handle possible entry failures gracefullyKotresh HR2017-07-212-13/+107
| | | | | | | | | | | Updates: #246 Change-Id: If0ce83fe8dd3068bfb671f398b2e82ac831288d0 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17577 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix changelog encoding to encode only space and newlineAravinda VK2017-07-212-13/+33
| | | | | | | | | | | | | | | | | | libgfchangelog was encoding path using spec rfc3986, but encoding only required for SPACE and NEWLINE chars since the NEWLINE char is used as record separator and SPACE as field separator in the parsed changelogs output. Changed the encoding function to encode only SPACE and NEWLINE. BUG: 1451724 Change-Id: I1936efad31788a9e636f912c832ed7d7efea4fe2 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17787 Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Fix entry failure because parent dir doesn't existKotresh HR2017-07-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a distributed volume on master, it can so happen that the RMDIR followed by MKDIR is recorded in changelog on a particular subvolume with same gfid and pargfid/bname but not on all subvolumes as below. E 61c67a2e-07f2-45a9-95cf-d8f16a5e9c36 RMDIR \ 9cc51be8-91c3-4ef4-8ae3-17596fcfed40%2Ffedora2 E 61c67a2e-07f2-45a9-95cf-d8f16a5e9c36 MKDIR 16877 0 0 \ 9cc51be8-91c3-4ef4-8ae3-17596fcfed40%2Ffedora2 While processing this changelog, geo-rep thinks RMDIR is successful and does recursive rmdir on slave. But in the master the directory still exists. This could lead to data discrepancy between master and slave. Cause: RMDIR-MKDIR pair gets recorded so in changelog when the directory removal is successful on cached subvolume and failed in one of hashed subvol for some reason (may be down). In this case, the directory is re-created on cached subvol which gets recorded as MKDIR again in changelog. Solution: So while processing RMDIR geo-replication should stat on master with gfid and should not delete it if it's present. Change-Id: If5da1d6462eb4d9ebe2e88b3a70cc454411a133e BUG: 1467718 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17695 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Structured log supportAravinda VK2017-06-207-210/+289
| | | | | | | | | | | | | Changed all log messages to structured log format Change-Id: Idae25f8b4ad0bbae38f4362cbda7bbf51ce7607b Updates: #240 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17551 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Added metrics related to Sync TimeAravinda VK2017-06-151-2/+11
| | | | | | | | | | | | | | | | | | | | In Geo-rep, Sync jobs can be configured using, `config sync-jobs 3`. This patch adds following information related to the sync job(Rsync/Tarssh) Example output: [2017-06-13 09:09:32.532181] I [master(/bricks/b1):1713:syncjob] Syncer: \ Sync Time Taken (Job:2 Files:5484 ReturnCode:0): 4.8774 secs Change-Id: Ifceb96d4b8d14e00fd1290c0aeff60d64b4d7f37 BUG: 1455179 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17531 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix string format issue caused due to #17489Aravinda VK2017-06-131-1/+1
| | | | | | | | | | | | | | | | With Patch #17489, values from Geo-rep config always represented as Unicode string, which is not compatible with rest of the code. Changed the format with this patch to fix the issue. BUG: 1459620 Change-Id: I935fca0d24f02e90757f688f92ef73fad9f9b8e1 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17503 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix ConfigInterface Template issueAravinda VK2017-06-081-0/+6
| | | | | | | | | | | | | | | | | | ConfigParser uses string Template to substitute the dynamic values for config. For some of the configurations, Geo-rep worker will not restart. Due to this conf object may have non string values. If val is not string in Template(val), then it fails with "TypeError: expected string or buffer" BUG: 1459620 Change-Id: I25b8bbc1df42f6f29e9563a55b3e27a228321c44 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17489 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix meta data sync on symlinkKotresh HR2017-06-041-11/+37
| | | | | | | | | | | | | | | | | | chmod doesn't support 'no dereference' option. It always deference the symlink. But 'chown' does support metadata changes on symlink itself, which was not taken care while syncing. This patch fixes the same. Change-Id: Ic9985f4e39d15b5a9deb379841bcfb2c263d3e6c BUG: 1455559 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17389 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* geo-rep: Log time taken to sync entriesKotresh HR2017-05-291-1/+60
| | | | | | | | | | | | | | | | | | | With each batch having the type and count of each fop helps to know the kind of I/O. Having time taken to sync entry ops, metadata ops and data ops gives us good understanding into where the more time is being spent. This patch does the same. Change-Id: Ib52a0f9ede905f28a468b68bdf6d23e4b043f3e3 BUG: 1455179 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17066 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Make changelog-batch-size configurableKotresh HR2017-05-242-9/+8
| | | | | | | | | | | | | | | | | | | Changelog batch size is set to 727040 bytes which is the size of all the changelogs in a single batch. It's based on few tests which approximately processes 5K entries. But it might vary on different machines. Making it configurable gives more control on the frequency of stime updates. This patch does the same. Change-Id: I9a5ebb3d92c1327dded0e0a712c43a5a9046c1b0 BUG: 1454872 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17376 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Rsync tunables for performance improvementsAravinda VK2017-05-233-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flag: --ignore-missing-args This Rsync flag reduces sync failures if the source file is unlinked but present in --files-from list. This reduces Rsync retries in Geo-rep and improves the performance Flag: --existing Rsync in Geo-rep never creates target files. Using RPC Geo-rep creates entry in Slave and rsync --inplace used to prevent creating temporary file and rename.(To avoid different GFID in Slave). If the entry is missing in Slave then Geo-rep Rsync gets Permission denied errors when it tries to create file with name as GFID inside .gfid dir.(Geo-rep rsync syncs data using GFIDS with aux-gfid-mount) To disable these flags, gluster volume geo-replication <session> config \ rsync-opt-ignore-missing-args false gluster volume geo-replication <session> config \ rsync-opt-existing false Thanks Kotresh for finding these awesome tunables. BUG: 1400924 Change-Id: I6a84fb86a589bf6edc8dfd1086456a84b05a64fc Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/16010 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: Fix mount cleanupKotresh HR2017-04-272-2/+7
| | | | | | | | | | | | | | On corner cases, mount cleanup might cause worker crash. Fixing the same. Change-Id: I38c0af51d10673765cdb37bc5b17bb37efd043b8 BUG: 1433506 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17015 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Remove unlink fop during rmdirKotresh HR2017-04-171-10/+10
| | | | | | | | | | | | | | | | Even though it is known to be 'RMDIR', os.unlink was being tried and os.rmdir is issued upon receiving EISDIR. It's unnecessary unlink call for 'RMDIR'. Fixed the same. Change-Id: I8dbb680ee2c7f0c32b7799b1ed5351b3621cb42a BUG: 1441106 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17041 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Fix EBUSY tracebackKotresh HR2017-04-102-2/+2
| | | | | | | | | | | | | | EBUSY was added to retry list of errno_wrap without importing. Fixing the same. Change-Id: Ide81a9ccc9b948a96265b6890da078b722b45d51 BUG: 1434018 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17011 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* geo-rep: Improve worker log messagesKotresh HR2017-04-073-2/+16
| | | | | | | | | | | | | | | | | | | Monitor process expects worker to establish SSH Tunnel to slave node and mount master volume locally with in 60 secs and acknowledge monitor process by closing feedback fd. If something goes wrong and worker does not close feedback fd with in 60 secs, monitor kills the worker. But there was no clue in log message about the actual issue. This patch adds log and indicates whether the worker is hung during SSH or master mount. Change-Id: Id08a12fa6f3bba1d4fe8036728dbc290e6c14c8c BUG: 1261689 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/16997 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>