geo-rep: Post process Data and Meta Changelogs

With this patch, Data and Meta GFIDs are post processed. If Changelog has UNLINK entry then remove from Data and Meta GFIDs list(If stat on GFID is ENOENT in Master). While processing Changelogs, - Collect all the data and meta operations in a temporary database - Delete all Data and Meta GFIDs which are already unlinked as per Changelogs (unlink only if stat on GFID is ENOENT) - Process all Entry operations as usual - Process data and meta operations in batch(Fetch from Db in batch) - Data sync is again batched based on number of changelogs(Default 1day changelogs). Once the sync is complete, Update last Changelog's time as last_synced time as usual. Additionally maintain entry_stime on Brick root, ignore Entry ops if changelog suffix time is less than entry_stime. If data stime is more than entry_stime, this can happen only when passive worker updates stime by itself by getting mount point stime. Use entry_stime = data_stime in this case. New configurations: max-rsync-retries - Default Value is 10 max-data-changelogs-in-batch - Max number of changelogs to be considered in a batch for syncing. Default value is 5760(4 changelogs per min * 60 min * 24 hours) max-history-changelogs-in-batch - Max number of history changelogs to be processed at once. Default value 86400(4 changelogs per min * 60 min * 24 hours * 15 days) BUG: 1364420 Change-Id: I7b665895bf4806035c2a8573d361257cbadbea17 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15110 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
author: Aravinda VK <avishwan@redhat.com> 2016-08-08 17:02:37 +0530
committer: Aravinda VK <avishwan@redhat.com> 2016-08-26 10:45:58 -0700
commit: 6c283f107b646405936520e2549510115bf2ef64 (patch)
tree: 67459f0c7a502a68413c5cfad5865ca9dcb240e0 /geo-replication/syncdaemon/changelogsdb.py
parent: 4a3454753f6e4ddc309c8d1cb11a6e4e432c1da6 (diff)
1 files changed, 111 insertions, 0 deletions
diff --git a/geo-replication/syncdaemon/changelogsdb.py b/geo-replication/syncdaemon/changelogsdb.py
new file mode 100644
index 00000000000..7e64158e7af
--- /dev/null
+++ b/geo-replication/syncdaemon/changelogsdb.py
@@ -0,0 +1,111 @@
+#
+# Copyright (c) 2016 Red Hat, Inc. <http://www.redhat.com>
+# This file is part of GlusterFS.
+
+# This file is licensed to you under your choice of the GNU Lesser
+# General Public License, version 3 or any later version (LGPLv3 or
+# later), or the GNU General Public License, version 2 (GPLv2), in all
+# cases as published by the Free Software Foundation.
+#
+
+import os
+import sqlite3
+from errno import ENOENT
+
+conn = None
+cursor = None
+
+
+def db_commit():
+    conn.commit()
+
+
+def db_init(db_path):
+    global conn, cursor
+    # Remove Temp Db
+    try:
+        os.unlink(db_path)
+        os.unlink(db_path + "-journal")
+    except OSError as e:
+        if e.errno != ENOENT:
+            raise
+
+    conn = sqlite3.connect(db_path)
+    cursor = conn.cursor()
+    cursor.execute("DROP TABLE IF EXISTS data")
+    cursor.execute("DROP TABLE IF EXISTS meta")
+    query = """CREATE TABLE IF NOT EXISTS data(
+    gfid           VARCHAR(100) PRIMARY KEY ON CONFLICT IGNORE,
+    changelog_time VARCHAR(100)
+    )"""
+    cursor.execute(query)
+
+    query = """CREATE TABLE IF NOT EXISTS meta(
+    gfid           VARCHAR(100) PRIMARY KEY ON CONFLICT IGNORE,
+    changelog_time VARCHAR(100)
+    )"""
+    cursor.execute(query)
+
+
+def db_record_data(gfid, changelog_time):
+    query = "INSERT INTO data(gfid, changelog_time) VALUES(?, ?)"
+    cursor.execute(query, (gfid, changelog_time))
+
+
+def db_record_meta(gfid, changelog_time):
+    query = "INSERT INTO meta(gfid, changelog_time) VALUES(?, ?)"
+    cursor.execute(query, (gfid, changelog_time))
+
+
+def db_remove_meta(gfid):
+    query = "DELETE FROM meta WHERE gfid = ?"
+    cursor.execute(query, (gfid, ))
+
+
+def db_remove_data(gfid):
+    query = "DELETE FROM data WHERE gfid = ?"
+    cursor.execute(query, (gfid, ))
+
+
+def db_get_data(start, end, limit, offset):
+    query = """SELECT gfid FROM data WHERE changelog_time
+    BETWEEN ? AND ? LIMIT ? OFFSET ?"""
+    cursor.execute(query, (start, end, limit, offset))
+    out = []
+    for row in cursor:
+        out.append(row[0])
+
+    return out
+
+
+def db_get_meta(start, end, limit, offset):
+    query = """SELECT gfid FROM meta WHERE changelog_time
+    BETWEEN ? AND ? LIMIT ? OFFSET ?"""
+    cursor.execute(query, (start, end, limit, offset))
+    out = []
+    for row in cursor:
+        out.append(row[0])
+
+    return out
+
+
+def db_delete_meta_if_exists_in_data():
+    query = """
+    DELETE FROM meta WHERE gfid in
+    (SELECT M.gfid
+     FROM meta M INNER JOIN data D
+     ON M.gfid = D.gfid)
+    """
+    cursor.execute(query)
+
+
+def db_get_data_count():
+    query = "SELECT COUNT(gfid) FROM data"
+    cursor.execute(query)
+    return cursor.fetchone()[0]
+
+
+def db_get_meta_count():
+    query = "SELECT COUNT(gfid) FROM meta"
+    cursor.execute(query)
+    return cursor.fetchone()[0]
author	Aravinda VK <avishwan@redhat.com>	2016-08-08 17:02:37 +0530
committer	Aravinda VK <avishwan@redhat.com>	2016-08-26 10:45:58 -0700
commit	6c283f107b646405936520e2549510115bf2ef64 (patch)
tree	67459f0c7a502a68413c5cfad5865ca9dcb240e0 /geo-replication/syncdaemon/changelogsdb.py
parent	4a3454753f6e4ddc309c8d1cb11a6e4e432c1da6 (diff)