summaryrefslogtreecommitdiffstats
path: root/geo-replication/syncdaemon
Commit message (Collapse)AuthorAgeFilesLines
* geo-rep: Do not use xsync_upper_limit for change detectionAravinda VK2015-03-152-34/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use register time(xsync_upper_limit) only for stime update, do not use for change detection. Problem 1: If a file created before geo-rep, xtime xattr does not exist. Geo-rep updates xtime of the file to current time if not exists. xtime > upper_limit so geo-rep will not pick those files. Changelog either will have SETXATTR, and fails to sync the file. Problem 2: If a file is created before geo-rep create and updated after geo-rep start. xtime of the file is greater than upper limit(geo-rep start time/changelog register time). Geo-rep(XSync) will not pick this file for syncing. Changelog will have only DATA recorded for that file. Geo-rep tries DATA without any ENTRY ops and fails with rsync error. BUG: 1200733 Change-Id: Ie4e8f284db689d2c755ef8e7ecbb658db1c0785f Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9855 Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Tested-by: Saravanakumar Arumugam <sarumuga@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* feature/geo-rep: Active Passive Switching logic flockKotresh HR2015-03-154-4/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CURRENT DESIGN AND ITS LIMITATIONS: ----------------------------------- Geo-replication syncs changes across geography using changelogs captured by changelog translator. Changelog translator sits on server side just above posix translator. Hence, in distributed replicated setup, both replica pairs collect changelogs w.r.t their bricks. Geo-replication syncs the changes using only one brick among the replica pair at a time, calling it as "ACTIVE" and other non syncing brick as "PASSIVE". Let's consider below example of distributed replicated setup where NODE-1 as b1 and its replicated brick b1r is in NODE-2 NODE-1 NODE-2 b1 b1r At the beginning, geo-replication chooses to sync changes from NODE-1:b1 and NODE-2:b1r will be "PASSIVE". The logic depends on virtual getxattr 'trusted.glusterfs.node-uuid' which always returns first up subvolume i.e., NODE-1. When NODE-1 goes down, the above xattr returns NODE-2 and that is made 'ACTIVE'. But when NODE-1 comes back again, the above xattr returns NODE-1 and it is made 'ACTIVE' again. So for a brief interval of time, if NODE-2 had not finished processing the changelog, both NODE-2 and NODE-1 will be ACTIVE causing rename race as mentioned in the bug. SOLUTION: --------- 1. Have a shared replicated storage, a glusterfs management volume specific to geo-replication. 2. Geo-rep creates a file per replica set on management volume. 3. fcntl lock on the above said file is used for synchronization between geo-rep workers belonging to same replica set. 4. If management volume is not configured, geo-replication will back to previous logic of using first up sub volume. Each worker tries to lock the file on shared storage, who ever wins will be ACTIVE. With this, we are able to solve the problem but there is an issue when the shared replicated storage goes down (when all replicas goes down). In that case, the lock state is lost. So AFR needs to rebuild the lock state after brick comes up. NOTE: ----- This patch brings in the, pre-requisite step of setting up management volume for geo-replication during creation. 1. Create mgmt-vol for geo-replicatoin and start it. Management volume should be part of master cluster and recommended to be three way replicated volume having each brick in different nodes for availability. 2. Create geo-rep session. 3. Configure mgmt-vol created with geo-replication session as follows. gluster vol geo-rep <mastervol> slavenode::<slavevol> config meta_volume \ <meta-vol-name> 4. Start geo-rep session. Backward Compatiability: ----------------------- If management volume is not configured, it falls back to previous logic of using node-uuid virtual xattr. But it is not recommended. Change-Id: I7319d2289516f534b69edd00c9d0db5a3725661a BUG: 1196632 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/9759 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Handle ENOENT during cleanupAravinda VK2015-03-051-1/+6
| | | | | | | | | | | | | | | | shutil.rmtree was failing to remove file if file was not exists. Added error handling function to ignore ENOENT if a file/dir not present. BUG: 1198101 Change-Id: I1796db2642f81d9e2b5e52c6be34b4ad6f1c9786 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9792 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
* geo-rep: Add support for xattrsAravinda VK2015-03-023-3/+13
| | | | | | | | | | | | | | | | | | | | | | This patch adds support for xattrs. When it sees SETXATTR in Changelog, it adds the file to data queue. rsync/tar+ssh will take care of syncing xattrs. User set xattrs will be synced to Slave. New config interface is introduced, sync-xattrs Which can be set using geo-rep config(Default is True) gluster volume geo-replication <VOLUME> <SLAVEHOST>::<SLAVEVOL> \ config sync-xattrs false Change-Id: I70626d854a0d616469dd54d61e5ef155ed8b67d8 BUG: 1196690 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9499 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Archive Changelogs and avoid generating empty XSync changelogsAravinda VK2015-02-192-5/+79
| | | | | | | | | | | | | | | With this patch, - Hybrid Crawl will not generate empty Changelogs - Archives Changelogs when processed(Hybrid(XSync), History, and Changelog Crawl - Passive worker cleans up its processing directory BUG: 1169331 Change-Id: I1383ffaed261cdf50da91b14260b4d43177657d1 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9453 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fixing the typo errorsarao2015-02-036-13/+13
| | | | | | | | | Change-Id: Iacc67e4ba9ac45e0858f3befe84ffb8fccf7e1c3 BUG: 1075417 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/9502 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
* geo-rep: Handle Volume status error while getting slave nodesAravinda VK2015-01-121-1/+6
| | | | | | | | | | | | | | | gluster volume status command not returns xml output, when any error like "Transaction in Progress", we need to handle returncode along with xml error. BUG: 1151412 Signed-off-by: Aravinda VK <avishwan@redhat.com> Change-Id: Id5b7712df7cff58744b4c5a0d00870aec1d926a8 Reviewed-on: http://review.gluster.org/9432 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Error handling in tar+ssh modeAravinda VK2014-12-291-12/+12
| | | | | | | | | | | | | | | | | | | | | | | Georep raises exception if tar+ssh fails and worker dies due to the exception. This patch adds resilience to tar+ssh error and geo-rep worker retries when error, and skips those changelogs after maximum retries.(same as rsync mode) Removed warning messages for each rsync/tar+ssh failure per GFID, since skipped list will be populated after Max retry. Retry changelog files log also available, hence warning message for each GFID is redundent. BUG: 1177527 Change-Id: I3019c5c1ada7fc0822e4b14831512d283755b1ea Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9356 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Sync atime, mtime to slave when SETATTRAravinda VK2014-12-252-3/+15
| | | | | | | | | | | | Existing geo-rep only syncs chown and chmod, with this patch geo-rep also syncs atime and mtime. Change-Id: Iea52d86682873bb4a47eeb0d325f5b9ddf2de2cf Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1176934 Reviewed-on: http://review.gluster.org/9331 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: gid is not set in entry opsAravinda VK2014-10-281-2/+2
| | | | | | | | | | | | | | uid is sent in place of gid while CREATE and MKDIR. Change-Id: Icd1072cb9dcbfc1f419a3cdd456f3d02168175fa BUG: 1104954 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7984 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Failover when a Slave node goes downAravinda VK2014-10-191-5/+64
| | | | | | | | | | | | | | | | | | When a slave node goes down, worker in master node can connect to different slave node and resume the operation. Existing georep was not checking the status of slave node before worker restart. With this patch, geo-rep worker will check the node status using `gluster volume status` when it goes faulty. BUG: 1151412 Change-Id: If3ab7fdcf47f5b3f3ba383c515703c5f1f9dd668 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8921 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* feature/geo-rep: Fix skipped files logging.Kotresh H R2014-10-161-0/+10
| | | | | | | | | | | | | | | | When changelog is failed to process even after max retries, files are skipped but were not logging. This patch addresses this issue to log the skipped files. Change-Id: Ic617e4151231fe18a171efa57a119e21164e1a6a BUG: 1103577 Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/7945 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fix rename of directory syncing.Kotresh HR2014-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | The rename of directories are captured in all distributed brick changelogs. gsyncd processess these changelogs on each brick parallellaly. The first changelog to get processed will be successful. All subsequent ones will stat the 'src' and if not present, tries to create freshly on slave. It should be done only for files and not for directories. Hence when this code path was hit, regular file's blob is sent as directory's blob and gfid-access translator was erroring out as 'Invalid blob length' with errno as 'ENOMEM' Change-Id: I50545b02b98846464876795159d2446340155c82 BUG: 1146823 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/8865 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: fix same file different gfid in master and slaveAravinda VK2014-09-262-17/+38
| | | | | | | | | | | | | | | | | | | | | While processing RENAME in changelog, if the file is unlinked in master, then geo-rep was sending UNLINK to slave instead of RENAME. If rsync job fails if one of the file failed to sync in the job. This patch adds logic to remove GFID from data list if the same changelog has UNLINK entry for it after the DATA. Or it removes those GFIDs during retry of changelogs processing. BUG: 1143853 Change-Id: I982dc976397cd0ab676bb912583f66a28f821926 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8761 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fixing issue with xsync upper limitAravinda VK2014-08-192-59/+50
| | | | | | | | | | | | | | | | | | | | While identifying the file/dir to sync, xtime of the file was compared with xsync_upper_limit as `xtime < xsync_upper_limit` After the sync, xtime of parent directory is updated as stime. With the upper limit condition, stime is updated as MIN(xtime_parent, xsync_upper_limit) With this files will get missed if `xtime_of_file == xsync_upper_limit` With this patch xtime_of_file is compared as xtime_of_file <= xsync_upper_limit BUG: 1128093 Change-Id: I5022ce67ba503ed4621531a649a16fc06b2229d9 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8439 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Handle RMDIR recursivelyAravinda VK2014-08-141-0/+12
| | | | | | | | | | | | | | | | If RMDIR is recorded in brick changelog which is due to self heal traffic then it will not have UNLINK entries for child files. Geo-rep hangs with ENOTEMPTY error on slave. Now geo-rep recursively deletes the dir if it gets ENOTEMPTY. BUG: 1129702 Change-Id: Iacfe6a05d4b3a72b68c3be7fd19f10af0b38bcd1 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8477 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* build: make GLUSTERD_WORKDIR rely on localstatedirHarshavardhana2014-08-071-7/+3
| | | | | | | | | | | | | | | | | | | | | | - Break-way from '/var/lib/glusterd' hard-coded previously, instead rely on 'configure' value from 'localstatedir' - Provide 's/lib/db' as default working directory for gluster management daemon for BSD and Darwin based installations - loff_t is really off_t on Darwin - fix-off the warnings generated by clang on FreeBSD/Darwin - Now 'tests/*' use GLUSTERD_WORKDIR a common variable for all platforms. - Define proper environment for running tests, define correct PATH and LD_LIBRARY_PATH when running tests, so that the desired version of glusterfs is used, regardless where it is installed. (Thanks to manu@netbsd.org for this additional work) Change-Id: I2339a0d9275de5939ccad3e52b535598064a35e7 BUG: 1111774 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/8246 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: minimize xsync crawl usage and set upper limit to xsync crawlAravinda VK2014-07-232-38/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For effective handling of deletes and renames use history crawl as much as possible. History crawl will run in loop till it syncs all data before live changelog time. When it uses xsync crawl(fallback when changelog not available, or very first crawl) it sets upper limit to crawl. After completing History crawl, it checks actual end time returned by history api to compare with register time, if actual end is less than register time then run history crawl one more time. If first turn history processing time is less than the CHANGELOG ROLLOVER TIME then sleep for the difference, After sleep if it is guaranteed that rollover will happen and switches to live changelog consumption without switching to xsync. This sleep is only when history processing completed < CHANGELOG_ROLLOVER_TIME and sleep only after the first turn, So will not affect the performance. BUG: 1112238 Change-Id: I644ef24b07e42e81cec96a025ebd21244a555ec0 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8151 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Avoid duplicate stat in xsync changelog processingAravinda VK2014-07-091-6/+24
| | | | | | | | | | | | | | | | | | | | When A file/dir is identified for metadata sync, it was doing duplicate stat to get the metadata to sync. With this patch it avoids doing one additional stat call. Xsync performance will improve. rsync will copy files metadata, so no need to include for processing. BUG: 1111490 Change-Id: I79dad6375fa4742d9aaca7d9856993c184a744dc Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8124 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: History Change detector method select issueAravinda VK2014-07-011-1/+2
| | | | | | | | | | | | | | | | | Geo-rep does history crawl even if change-detector is set to xsync. This patch fixes by taking priority to user configured change detector. BUG: 1113525 Change-Id: Ic6c34e187c9cb6608c9ef8a010ea07015ba60a80 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8183 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Fix the fd leak in worker/agent spawnAravinda VK2014-06-301-2/+5
| | | | | | | | | | | | | | | | | | worker and agent uses pipe to communicate, if worker dies for some reason agent should get EOF and terminate. Each worker-agent spawning is done in thread, Due to race if multiple workers in same node retain the pipe refs of other workers. Hence agent will not get EOF even if worker dies. BUG: 1114003 Change-Id: I36b9709b9392299483606bd3ef1db764fa3f2bff Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8194 Tested-by: Justin Clift <justin@gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* feature/geo-rep: Fix to retain pause state of gsyncd on restartKotresh H R2014-06-202-14/+18
| | | | | | | | | | | | | | On soft reboot, geo-rep monitor is writing 'faulty' into status file. It should not do it if previous state is paused as glusterd depend on the state file on node restart. Change-Id: Idd45abf13350b087371935f1b4f6e1a346433d27 BUG: 1101410 Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/8097 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* feature/geo-rep: Fix for changelog agent becoming zombie.Kotresh H R2014-06-171-4/+12
| | | | | | | | | | | | | | | | | | Monitor process spawns changelog agent and is not wait on it, hence becoming zombie. When worker is dies/killed, it respawns both worker and corresponding agent leaving the earlier changelog agent in zombie state. This patch addresses this issue by waiting on agent process in montor process. Change-Id: I571b7d6487133848edca67e7446f1caa70ae01c9 BUG: 1103643 Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/7956 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Making replica failover check interval configurableAravinda VK2014-06-112-1/+3
| | | | | | | | | | | | | | | | | | Replica failover check interval is hardcoded to 60 sec by default. Now this option is made configurable and defaulted to 1 sec. To change the default value gluster volume geo-replication <MASTERVOL> \ <SLAVEHOST>::<SLAVEVOL> config replica_failover_interval 15 Change-Id: Iada1b80d510452dcfedebd8a21bebd62394b0597 BUG: 1066410 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/8003 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: entry_ops errors handlingAravinda VK2014-06-112-8/+4
| | | | | | | | | | | | | | | Xattr.lsetxattr_l call will not raise OSError which errno_wrap can handle, so we need to use Xattr.lsetxattr to get proper OSError and errno_wrap ignores or retries accordingly. Change-Id: Ie0a777152ddbaf9ed80c977e4704974fec997bea BUG: 1105083 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7972 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* gsyncd / geo-rep: FSH recommended log locationsVenky Shankar2014-06-104-8/+31
| | | | | | | | | | | | | | | | Upgrading "working_dir" on the fly is a bit unclean yet (though it works) as currently config upgrade does not support "old" values to be expanded by using configuration variables. Change-Id: I44ed65c281f2e0ce3b6b467addc5c1c88ac674e7 BUG: 1077516 Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Kotresh H R <khiremat@redhat.com> Signed-off-by: Aravinda VK <avishwan@redhat.com> Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/7375 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gsyncd / geo-rep: Xsync crawl metadata synchronizationVenky Shankar2014-06-101-0/+2
| | | | | | | | | | | | | Added "metadata" record for directory and file creations during the intial crawl. Change-Id: I811ae26e0144cadf7249cb64541ec354ab83fe66 BUG: 1106604 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/8018 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* feature/geo-rep: Fix to retain pause state of gsyncd on restart.Kotresh H R2014-06-053-2/+16
| | | | | | | | | | | | | | | A new gsyncd options '--pause-on-start' is introduced. When node reboots, if the status is paused, gsyncd is started with this option. After gsyncd spawns worker and agent, worker will send SIGSTOP to negative pid of monitor to enter pause mode. Change-Id: I5aad82c9a9fc8c243f384940b77d25e26e520d6d BUG: 1101410 Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/7885 Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* gsyncd / geo-rep: Mountbroker cli to use INET socketsVenky Shankar2014-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | unprivileged geo-replication session runs the slave gsyncd process as unprivileged, thereby executing gluster cli commands as an unprivileged user. By default, cli to glusterd uses unix domain sockets, thereby restricting cli command execution by non root users. This patch introduces '--remote-host' cli option to force cli to use INET socket. For this to work, the following needs to be added in glusterd volfile option rpc-auth-allow-insecure on Change-Id: I84b1711281bbcbde156200f80ebdb065afb55488 BUG: 1077452 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/7820 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gsyncd / geo-rep: fix cli query for volinfo fetchVenky Shankar2014-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | With an unprivileged geo-replication session, monitor was using user@slave for --remote-host option for gluster cli, thereby failing to sucessfully connect to the slave glusterd. This patch fixes the issue by selecting the hostname/IP from the speicified slave endpoint url. - For privileged geo-replication sessions, this patch has no effect as the slave endpoint url is just the hostname/IP. Change-Id: I88f66c406a8d9a34db7fc626965f949075e3ceac BUG: 1077452 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/7818 Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gsyncd/geo-rep: Fix remote vol info fetching for non-rootKotresh H R2014-05-151-1/+1
| | | | | | | | | | Signed-off-by: Kotresh H R <khiremat@redhat.com> Change-Id: If1d2cab3fcfe2391105551e54f0b9729a7c204e4 BUG: 1077452 Reviewed-on: http://review.gluster.org/7767 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* gsyncd / geo-rep: Partial support for Non-root geo-replication.Venky Shankar2014-05-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This patch enables geo-replication to be run as an unprivileged user. As of now, this is just the partial support, but is very close to achieve full functionality. Current limitation * Geo-replication executed Gluster CLI commands on the slave via SSH. On a non-root setup, Gluster CLI would run as an unprivileged user, failing to execute the command. As a workaround (for testing), setuid(2) Gluster CLI executable or use the glusterd option to accept commands by unprivileged CLI process. The nature of cli commands are "system::" commands (for key management) and remote volume info fetching. Remote volume info fetching has been modified to use --remote-host gluster cli option rather than ssh and remote cli execution. Change-Id: Ica89e2ba9b7f48fd6e1c876c477d7822dc693617 BUG: 1077452 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/7658 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: Changelog History API changesAravinda VK2014-05-132-10/+17
| | | | | | | | | | | | | | | Additional argument added to API gf_history_changelog, actual_end - The end time till where changelogs are available. Added sort to history_get_changes API output. BUG: 1091961 Change-Id: Id043409882a83cd0a7b9adc3d34d5147d17e532e Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7747 Reviewed-by: ajeet jha <ajha@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Pause and Resume feature for geo-replicationAravinda VK2014-05-097-71/+162
| | | | | | | | | | | | | | | | Changelog consumption/processing now happens in seperate process group than monitor. When monitor process group gets SIGSTOP all worker process, ssh, rsync will be paused except the changelog processing. When it gets SIGCONT it resumes its operation. Changelog agent runs as RepceServer, geo-rep worker communicates with changelog agent using RepceClient. Change-Id: I35c333e4d8b13d03a7808aed601960eef23cfa04 BUG: 1093602 Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7322
* geo-rep: Changelog History consumption more fixesAravinda VK2014-05-093-12/+14
| | | | | | | | | | | Number of parallel threads to process changelog history is made configurable via sync_jobs Change-Id: Idcd8e655d9df540cfa48648b9e98af941f95e9d0 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7660 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Loading libgfchangelog.so only while running geo-repAravinda VK2014-05-093-5/+6
| | | | | | | | | | | | | | | | | In source install, libgfchangelog is installed in /usr/local/lib When glusterd runs /usr/local/libexec/glusterfs/python/gsyncd --version it fails to find library without LD_LIBRARY_PATH. This patch avoids loading library when it is run from glusterd during start. BUG: 1096026 Change-Id: I59912227ac27ff4877d947a7c8f1fe2e8c5be06e Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7713 Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* geo-rep: Consume Changelog History APIAravinda VK2014-04-304-31/+198
| | | | | | | | | | | | | | | | | | | | | | | | Every time when geo-rep restarts it first does FS crawl using XCrawl and then switches to Changelog Mode. This is because changelog only had live API, that is we can get changes only after registering. Now this(http://review.gluster.org/#/c/6930/) patch introduces History API for changelogs. If history is available then geo-rep will use it instead of FS Crawl. History API returns TS till what time history is available for given start and end time. If TS < endtime then switch to FS Crawl. (History => FS Crawl => Live Changelog) If TS >= endtime, then switch directly to Changelog mode (History => Live Changelog) Change-Id: I4922f62b9f899c40643bd35720e0c81c36b2f255 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/6938 Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: fix the code bug introduced due to flake8 refactoringAravinda VK2014-04-081-6/+9
| | | | | | | | | | | Sorry for the bug, which got introduced due to code refactoring. http://review.gluster.org/#/c/7311/ Change-Id: Ide519ca114aa8a7d7624d7af99945c857f069069 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7417 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: code pep8/flake8 fixesAravinda VK2014-04-0712-447/+912
| | | | | | | | | | | | | | | | | | | | | | | | | | | pep8 is a style guide for python. http://legacy.python.org/dev/peps/pep-0008/ pep8 can be installed using, `pip install pep8` Usage: `pep8 <python file>`, For example, `pep8 master.py` will display all the coding standard errors. flake8 is used to identify unused imports and other issues in code. pip install flake8 cd $GLUSTER_REPO/geo-replication/ flake8 syncdaemon Updated license headers to each source file. Change-Id: I01c7d0a6091d21bfa48720e9fb5624b77fa3db4a Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7311 Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: Fix ValueError - signal only works in main threadAravinda VK2014-03-201-8/+7
| | | | | | | | | | | | | | | | | | | | When a worker process not confirmed within 60 seconds of start then monitor thread was terminated instead of stopping and restarting the worker thread. Before terminate monitor thread tries to add a signal handler for SIGTERM to cleanup the stuff before terminate. Signal handling will not work inside thread, so ValueError was raised. This patch will not terminate monitor thread, instead only kills and restarts the worker. Change-Id: I14df26c0cc3097af29293c81536c13b86075e28f BUG: 1078068 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/7294 Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: quick-fix for remote xtime set failedKotresh H R2014-03-071-1/+1
| | | | | | | | | | | | | | | | Remote xtime is required for failover/failback, this patch is quick fix to avoid the OSError. Code is masked out, this need to be resolved when failover/failback is worked on. Change-Id: If339d88a2ccd8ef18a3b3c015df765c93dcb020c BUG: 1073844 Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/7206 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Config file upgradeAravinda VK2014-02-102-1/+55
| | | | | | | | | | | | | | | | | | | When old config file is used with new geo-rep, config item like 'georep_session_working_dir' was missing in old config file. With this patch geo-rep sets the default value for new items. Following config options supported: - georep_session_working_dir - gluster_params - ssh_command_tar BUG: 1036539 Change-Id: I389c62e749f3b567f9ecf96d4b41367ef962c025 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/6934 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gsyncd / geo-rep: invoke changelog process() on non-empty change listVenky Shankar2014-02-061-1/+2
| | | | | | | | | Change-Id: Ida4890abdc90d683a4a83127a1573bbb3829ea23 BUG: 1036539 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/6793 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gsyncd / geo-rep: ignore DHTs sticky bit file during crawlVenky Shankar2014-02-062-0/+25
| | | | | | | | | | Change-Id: Ide927759c6a3d5301475eac9f6e785aa901d426e BUG: 1036539 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/6792 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gsyncd / geo-rep: "patch" up missing stimeVenky Shankar2014-02-062-0/+5
| | | | | | | | | | | | | In cases (mostly upgrade) of unavailability of "stime" key and availability of "xtime" (slave's xtime), introduce "stime" key on the fly by setting it to the value to "xtime". Change-Id: Iaa424662d838154c8abc2cf00830c7f9d6be45ac BUG: 1036539 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/6791 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gsyncd / geo-rep: cleanup the "tar" processVenky Shankar2014-02-061-0/+4
| | | | | | | | | | | | A missing cleanup for the "tar" process (when tar+ssh is used as the sync engine). Change-Id: Ib9599b43e7ec606c70b7c5598793417142be3c0b BUG: 1036539 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/6794 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: get new config value after config value resetAravinda VK2014-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | When config.read is called it preserves the previous values from the the previously opened config file. For example import ConfigParser config = ConfigParser.RawConfigParser() config.read("defaults.conf") config.read("preferences.conf") When change in config file is identified it will open new instance of config to avoid getting old config values. Change-Id: Iec677e61ebd2c59c95aea94481f569d78bd913e4 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/6747 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: optimizing update stime after directory synchronizationAravinda VK2014-01-221-6/+14
| | | | | | | | | | | | | | | | | | | | | Since xsync crawl generates new changelog when number of entries reaches 8K or when directory is reached. If a directory has number of files less than 8K then respective changelog file will have less entries. Since xsync generated changelog files processed one after the other, so syncjobs are underutilized. hence low bandwidth utilization. With this patch, changelog will be generated for 8K entries only, but stime will be accumulated. Multiple dirs stime will be updated together since the generated changelog will have entries accross the dirs. Change-Id: Ib0b40962a070f855f47f887d0840e412fb7928e1 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/6744 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* gsyncd / geo-rep: geo-replication fixesAjeet Jha2013-12-125-267/+761
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -> "threaded" hybrid crawl. -> Enabling metatadata synchronization. -> Handling EINVAL/ESTALE gracefully while syncing metadata. -> Improvments to changelog crawl code. -> Initial crawl changelog generation format. -> No gsyncd restart when checkpoint updated. -> Fix symlink handling in hybrid crawl. -> Slave's xtime key is 'stime'. -> tar+ssh as data synchronization. -> Instead of 'raise', just log in warning level for xtime missing cases. -> Fix for JSON object load failure -> Get new config value after config value reset. -> Skip already processed changelogs. -> Saving status of each individual worker thread. -> GFID fetch on slave for purges. -> Add tar ssh keys and config options. -> Fix nlink count when using backend. -> Include "data" operation for hardlink. -> Use changelog time prefix as slave's time. -> Process changelogs in parallel. Change-Id: I09fcbb2e2e418149a6d8435abd2ac6b2f015bb06 BUG: 1036539 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6404 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* geo-rep: logrotate: Logrotate handlingAravinda VK2013-10-032-15/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In existing georep logrotate was implemented by handling SIGSTOP and SIGCONT, gsyncd was failing to start again after SIGSTOP. New approach uses WatchedFileHandler in logging, which tracks the log file changes or logrotate. Reopens the log file if logrotate is triggered or if same log file is updated from other process. As per python doc: http://docs.python.org/2/library/logging.handlers.html: The WatchedFileHandler class, located in the logging.handlers module, is a FileHandler which watches the file it is logging to. If the file changes, it is closed and reopened using the file name. A file change can happen because of usage of programs such as newsyslog and logrotate which perform log file rotation. This handler, intended for use under Unix/Linux, watches the file to see if it has changed since the last emit. (A file is deemed to have changed if its device or inode have changed.) If the file has changed, the old file stream is closed, and the file opened to get a new stream. Change-Id: I30f65eb1e9778b12943d6e43b60a50344a7885c6 BUG: 1012776 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/5968 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>