summaryrefslogtreecommitdiffstats
path: root/extras
diff options
context:
space:
mode:
authorAvra Sengupta <asengupt@redhat.com>2015-02-09 18:03:20 +0530
committerVijay Bellur <vbellur@redhat.com>2015-03-18 18:31:54 -0700
commit7c4461329bba38b72536ee71a8172bc861ddf890 (patch)
treec9cb709f17892f20b3e3addafad5f62a41590b1e /extras
parent3e18f093974c85ac92a4c48f0cd13aa9ff9c5cac (diff)
snapshot/scheduling: A cron based scheduler for snapshot scheduling
GlusterFS volume snapshot provides point-in-time copy of a GlusterFS volume. Currently, GlusterFS volume snapshots can be easily scheduled by setting up cron jobs on one of the nodes in the GlusterFS trusted storage pool. This has a single point failure (SPOF), as scheduled jobs can be missed if the node running the cron jobs dies. The solution to the above problems is addressed in this patch. The snap_scheduler.py helper script expects the user to install the argparse python module before using it. Further details for the same are available at: http://www.gluster.org/community/documentation/index.php/Features/Scheduling_of_Snapshot Change-Id: I2c357af5b7d3e66f270d20eef50cdeecdcbe15c7 BUG: 1198027 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/9788 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Diffstat (limited to 'extras')
-rw-r--r--extras/Makefile.am2
-rw-r--r--extras/snap_scheduler/Makefile.am7
-rw-r--r--extras/snap_scheduler/README.md125
-rwxr-xr-xextras/snap_scheduler/gcron.py146
-rwxr-xr-xextras/snap_scheduler/snap_scheduler.py525
5 files changed, 804 insertions, 1 deletions
diff --git a/extras/Makefile.am b/extras/Makefile.am
index 89f69440423..1b66708fd90 100644
--- a/extras/Makefile.am
+++ b/extras/Makefile.am
@@ -5,7 +5,7 @@ EditorModedir = $(docdir)
EditorMode_DATA = glusterfs-mode.el glusterfs.vim
SUBDIRS = init.d systemd benchmarking hook-scripts $(OCF_SUBDIR) LinuxRPM \
- $(GEOREP_EXTRAS_SUBDIR) ganesha
+ $(GEOREP_EXTRAS_SUBDIR) ganesha snap_scheduler
confdir = $(sysconfdir)/glusterfs
conf_DATA = glusterfs-logrotate gluster-rsyslog-7.2.conf gluster-rsyslog-5.8.conf \
diff --git a/extras/snap_scheduler/Makefile.am b/extras/snap_scheduler/Makefile.am
new file mode 100644
index 00000000000..24aa8241727
--- /dev/null
+++ b/extras/snap_scheduler/Makefile.am
@@ -0,0 +1,7 @@
+snap_schedulerdir = $(DESTDIR)$(sbindir)/
+
+snap_scheduler_SCRIPTS = gcron.py snap_scheduler.py
+
+EXTRA_DIST = gcron.py snap_scheduler.py
+
+CLEANFILES =
diff --git a/extras/snap_scheduler/README.md b/extras/snap_scheduler/README.md
new file mode 100644
index 00000000000..1316bb76469
--- /dev/null
+++ b/extras/snap_scheduler/README.md
@@ -0,0 +1,125 @@
+Snapshot Scheduler
+==============================
+
+SUMMARY
+-------
+
+GlusterFS volume snapshot provides point-in-time copy of a GlusterFS volume. Currently, GlusterFS volume snapshots can be easily scheduled by setting up cron jobs on one of the nodes in the GlusterFS trusted storage pool. This has a single point failure (SPOF), as scheduled jobs can be missed if the node running the cron jobs dies.
+
+We can avoid the SPOF by distributing the cron jobs to all nodes of the trusted storage pool.
+
+DETAILED DESCRIPTION
+--------------------
+
+The solution to the above problems involves the usage of:
+
+* A shared storage - This can be any shared storage (another gluster volume, a NFS mount, etc.) that will be used to share the schedule configuration and will help in the coordination of the jobs.
+* An agent - This agent will perform the actual snapshot commands, instead of cron. It will contain the logic to perform coordinated snapshots.
+* A helper script - This script will allow the user to initialise the scheduler on the local node, enable/disable scheduling, add/edit/list/delete snapshot schedules.
+* cronie - It is the default cron daemon shipped with RHEL. It invokes the agent at the appropriate intervals as mentioned by the user to perform the snapshot operation on the volume as mentioned by the user in the schedule.
+
+INITIAL SETUP
+-------------
+
+The administrator needs to create a shared storage that can be available to nodes across the cluster. A GlusterFS volume can also be used for the same. It is preferable that the *shared volume* be a replicate volume to avoid SPOF.
+
+Once the shared storage is created, it should be mounted on all nodes in the trusted storage pool which will be participating in the scheduling. The location where the shared_storage should be mounted (/var/run/gluster/snaps/shared_storage) in these nodes is fixed and is not configurable. Each node participating in the scheduling then needs to perform an initialisation of the snapshot scheduler by invoking the following:
+
+snap_scheduler.py init
+
+NOTE: This command needs to be run on all the nodes participating in the scheduling
+
+HELPER SCRIPT
+-------------
+
+The helper script(snap_scheduler.py) will initialise the scheduler on the local node, enable/disable scheduling, add/edit/list/delete snapshot schedules.
+
+a) snap_scheduler.py init
+
+This command initialises the snap_scheduler and interfaces it with the crond running on the local node. This is the first step, before executing any scheduling related commands from a node.
+
+NOTE: The helper script needs to be run with this option on all the nodes participating in the scheduling. Other options of the helper script can be run independently from any node, where initialisation has been successfully completed.
+
+b) snap_scheduler.py enable
+
+The snap scheduler is disabled by default after initialisation. This command enables the snap scheduler.
+
+c) snap_scheduler.py disable
+
+This command disables the snap scheduler.
+
+d) snap_scheduler.py status
+
+This command displays the current status(Enabled/Disabled) of the snap scheduler.
+
+e) snap_scheduler.py add "Job Name" "Schedule" "Volume Name"
+
+This command adds a new snapshot schedule. All the arguments must be provided within double-quotes(""). It takes three arguments:
+
+-> Job Name: This name uniquely identifies this particular schedule, and can be used to reference this schedule for future events like edit/delete. If a schedule already exists for the specified Job Name, the add command will fail.
+
+-> Schedule: The schedules are accepted in the format crond understands:-
+
+Example of job definition:
+.---------------- minute (0 - 59)
+| .------------- hour (0 - 23)
+| | .---------- day of month (1 - 31)
+| | | .------- month (1 - 12) OR jan,feb,mar,apr ...
+| | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
+| | | | |
+* * * * * user-name command to be executed
+Although we accept all valid cron schedules, currently we support granularity of snapshot schedules to a maximum of half-hourly snapshots.
+
+-> Volume Name: The name of the volume on which the scheduled snapshot operation will be performed.
+
+f) snap_scheduler.py edit "Job Name" "Schedule" "Volume Name"
+
+This command edits an existing snapshot schedule. It takes the same three arguments that the add option takes. All the arguments must be provided within double-quotes(""). If a schedule does not exists for the specified Job Name, the edit command will fail.
+
+g) snap_scheduler.py delete "Job Name"
+
+This command deletes an existing snapshot schedule. It takes the job name of the schedule as argument. The argument must be provided within double-quotes(""). If a schedule does not exists for the specified Job Name, the delete command will fail.
+
+h) snap_scheduler.py list
+
+This command lists the existing snapshot schedules in the following manner: Pseudocode:
+
+# snap_scheduler.py list
+JOB_NAME SCHEDULE OPERATION VOLUME NAME
+--------------------------------------------------------------------
+Job0 * * * * * Snapshot Create test_vol
+
+THE AGENT
+---------
+
+The snapshots scheduled with the help of the helper script, are read by crond which then invokes the agent(gcron.py) at the scheduled intervals to perform the snapshot operations on the specified volumes. It then performs the scheduled snapshots using the following algorithm to coordinate.
+
+start_time = get current time
+lock_file = job_name passed as an argument
+vol_name = volume name psased as an argument
+try POSIX locking the $lock_file
+ if lock is obtained, then
+ mod_time = Get modification time of $entry
+ if $mod_time < $start_time, then
+ Take snapshot of $entry.name (Volume name)
+ if snapshot failed, then
+ log the failure
+ Update modification time of $entry to current time
+ unlock the $entry
+
+The coordination with other scripts running on other nodes, is handled by the use of POSIX locks. All the instances of the script will attempt to lock the lock_file which is essentialy an empty file with the job name, and one which gets the lock will take the snapshot.
+
+To prevent redoing a done task, the script will make use of the mtime attribute of the entry. At the beginning execution, the script would have saved its start time. Once the script obtains the lock on the lock_file, before taking the snapshot, it compares the mtime of the entry with the start time. The snapshot will only be taken if the mtime is smaller than start time. Once the snapshot command completes, the script will update the mtime of the lock_file to the current time before unlocking.
+
+If a snapshot command fails, the script will log the failure (in syslog) and continue with its operation. It will not attempt to retry the failed snapshot in the current schedule, but will attempt it again in the next schedules. It is left to the administrator to monitor the logs and decide what to do after a failure.
+
+ASSUMPTIONS AND LIMITATIONS
+---------------------------
+
+It is assumed that all nodes in the have their times synced using NTP or any other mechanism. This is a hard requirement for this feature to work.
+
+The administrator needs to have python2.7 or higher installed, as well as the argparse module installed, to be able to use the helper script(snap_scheduler.py).
+
+There is a latency of one minute, between providing a command by the helper script and that command taking effect. Hence, currently we do not support snapshot schedules with per minute granularity.
+
+The administrator can however leverage the scheduler to schedule snapshots with granularity of half-hourly/hourly/daily/weekly/monthly/yearly periodic intervals. They can also schedule snapshots, which are customised mentioning which minute of the hour, which day of the week, which week of the month, and which month of the year, they want to schedule the snapshot operation.
diff --git a/extras/snap_scheduler/gcron.py b/extras/snap_scheduler/gcron.py
new file mode 100755
index 00000000000..763eb1460b8
--- /dev/null
+++ b/extras/snap_scheduler/gcron.py
@@ -0,0 +1,146 @@
+#!/usr/bin/env python
+#
+# Copyright (c) 2015 Red Hat, Inc. <http://www.redhat.com>
+# This file is part of GlusterFS.
+#
+# This file is licensed to you under your choice of the GNU Lesser
+# General Public License, version 3 or any later version (LGPLv3 or
+# later), or the GNU General Public License, version 2 (GPLv2), in all
+# cases as published by the Free Software Foundation.
+
+from __future__ import print_function
+import subprocess
+import os
+import os.path
+import sys
+import time
+import logging
+import logging.handlers
+import fcntl
+
+
+GCRON_TASKS = "/var/run/gluster/snaps/shared_storage/glusterfs_snap_cron_tasks"
+GCRON_CROND_TASK = "/etc/cron.d/glusterfs_snap_cron_tasks"
+LOCK_FILE_DIR = "/var/run/gluster/snaps/shared_storage/lock_files/"
+log = logging.getLogger("gcron-logger")
+start_time = 0.0
+
+
+def initLogger(script_name):
+ log.setLevel(logging.DEBUG)
+ logFormat = "[%(asctime)s %(filename)s:%(lineno)s %(funcName)s] "\
+ "%(levelname)s %(message)s"
+ formatter = logging.Formatter(logFormat)
+
+ sh = logging.handlers.SysLogHandler()
+ sh.setLevel(logging.ERROR)
+ sh.setFormatter(formatter)
+
+ process = subprocess.Popen(["gluster", "--print-logdir"],
+ stdout=subprocess.PIPE)
+ out, err = process.communicate()
+ if process.returncode == 0:
+ logfile = os.path.join(out.strip(), script_name[:-3]+".log")
+
+ fh = logging.FileHandler(logfile)
+ fh.setLevel(logging.DEBUG)
+ fh.setFormatter(formatter)
+
+ log.addHandler(sh)
+ log.addHandler(fh)
+
+
+def takeSnap(volname=""):
+ success = True
+ if volname == "":
+ log.debug("No volname given")
+ return False
+
+ timeStr = time.strftime("%Y%m%d%H%M%S")
+ cli = ["gluster",
+ "snapshot",
+ "create",
+ "%s-snapshot-%s %s" % (volname, timeStr, volname)]
+ log.debug("Running command '%s'", " ".join(cli))
+
+ p = subprocess.Popen(cli, stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE)
+ out, err = p.communicate()
+ rv = p.returncode
+
+ log.debug("Command '%s' returned '%d'", " ".join(cli), rv)
+
+ if rv:
+ log.error("Snapshot of %s failed", volname)
+ log.error("Command output:")
+ log.error(err)
+ success = False
+ else:
+ log.info("Snapshot of %s successful", volname)
+
+ return success
+
+
+def doJob(name, lockFile, jobFunc, volname):
+ success = True
+ try:
+ f = os.open(lockFile, os.O_RDWR | os.O_NONBLOCK)
+ try:
+ fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
+ mtime = os.path.getmtime(lockFile)
+ global start_time
+ log.debug("%s last modified at %s", lockFile, time.ctime(mtime))
+ if mtime < start_time:
+ log.debug("Processing job %s", name)
+ if jobFunc(volname):
+ log.info("Job %s succeeded", name)
+ else:
+ log.error("Job %s failed", name)
+ success = False
+ os.utime(lockFile, None)
+ else:
+ log.info("Job %s has been processed already", name)
+ fcntl.flock(f, fcntl.LOCK_UN)
+ except IOError as (errno, strerror):
+ log.info("Job %s is being processed by another agent", name)
+ os.close(f)
+ except IOError as (errno, strerror):
+ log.debug("Failed to open lock file %s : %s", lockFile, strerror)
+ log.error("Failed to process job %s", name)
+ success = False
+
+ return success
+
+
+def main():
+ script_name = os.path.basename(__file__)
+ initLogger(script_name)
+ global start_time
+ if sys.argv[1] == "--update":
+ if os.lstat(GCRON_TASKS).st_mtime > \
+ os.lstat(GCRON_CROND_TASK).st_mtime:
+ try:
+ process = subprocess.Popen(["touch", "-h", GCRON_CROND_TASK],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE)
+ out, err = process.communicate()
+ if process.returncode != 0:
+ log.error("Failed to touch %s. Error: %s.",
+ GCRON_CROND_TASK, err)
+ except IOError as (errno, strerror):
+ log.error("Failed to touch %s. Error: %s.",
+ GCRON_CROND_TASK, strerror)
+ return
+
+ volname = sys.argv[1]
+ locking_file = os.path.join(LOCK_FILE_DIR, sys.argv[2])
+ log.debug("locking_file = %s", locking_file)
+ log.debug("volname = %s", volname)
+
+ start_time = int(time.time())
+
+ doJob("Snapshot-" + volname, locking_file, takeSnap, volname)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/extras/snap_scheduler/snap_scheduler.py b/extras/snap_scheduler/snap_scheduler.py
new file mode 100755
index 00000000000..fda0fd3310f
--- /dev/null
+++ b/extras/snap_scheduler/snap_scheduler.py
@@ -0,0 +1,525 @@
+#!/usr/bin/env python
+#
+# Copyright (c) 2015 Red Hat, Inc. <http://www.redhat.com>
+# This file is part of GlusterFS.
+#
+# This file is licensed to you under your choice of the GNU Lesser
+# General Public License, version 3 or any later version (LGPLv3 or
+# later), or the GNU General Public License, version 2 (GPLv2), in all
+# cases as published by the Free Software Foundation.
+
+from __future__ import print_function
+import subprocess
+import os
+import os.path
+import logging
+import argparse
+import fcntl
+import logging.handlers
+from errno import EEXIST
+
+
+SCRIPT_NAME = "snap_scheduler"
+scheduler_enabled = False
+log = logging.getLogger(SCRIPT_NAME)
+GCRON_DISABLED = "/var/run/gluster/snaps/shared_storage/gcron_disabled"
+GCRON_ENABLED = "/var/run/gluster/snaps/shared_storage/gcron_enabled"
+GCRON_TASKS = "/var/run/gluster/snaps/shared_storage/glusterfs_snap_cron_tasks"
+GCRON_CROND_TASK = "/etc/cron.d/glusterfs_snap_cron_tasks"
+LOCK_FILE_DIR = "/var/run/gluster/snaps/shared_storage/lock_files/"
+LOCK_FILE = "/var/run/gluster/snaps/shared_storage/lock_files/lock_file"
+TMP_FILE = "/var/run/gluster/snaps/shared_storage/tmp_file"
+GCRON_UPDATE_TASK = "/etc/cron.d/gcron_update_task"
+tasks = {}
+longest_field = 12
+
+
+def output(msg):
+ print("%s: %s" % (SCRIPT_NAME, msg))
+
+
+def initLogger():
+ log.setLevel(logging.DEBUG)
+ logFormat = "[%(asctime)s %(filename)s:%(lineno)s %(funcName)s] "\
+ "%(levelname)s %(message)s"
+ formatter = logging.Formatter(logFormat)
+
+ sh = logging.handlers.SysLogHandler()
+ sh.setLevel(logging.ERROR)
+ sh.setFormatter(formatter)
+
+ process = subprocess.Popen(["gluster", "--print-logdir"],
+ stdout=subprocess.PIPE)
+ logfile = os.path.join(process.stdout.read()[:-1], SCRIPT_NAME + ".log")
+
+ fh = logging.FileHandler(logfile)
+ fh.setLevel(logging.DEBUG)
+ fh.setFormatter(formatter)
+
+ log.addHandler(sh)
+ log.addHandler(fh)
+
+
+def scheduler_status():
+ success = False
+ global scheduler_enabled
+ try:
+ f = os.path.realpath(GCRON_TASKS)
+ if f != GCRON_ENABLED or not os.path.exists(GCRON_ENABLED):
+ log.info("Snapshot scheduler is currently disabled.")
+ scheduler_enabled = False
+ else:
+ log.info("Snapshot scheduler is currently enabled.")
+ scheduler_enabled = True
+ success = True
+ except:
+ log.error("Failed to enable snapshot scheduling. Error: "
+ "Failed to check the status of %s.", GCRON_DISABLED)
+
+ return success
+
+
+def enable_scheduler():
+ ret = scheduler_status()
+ if ret:
+ if not scheduler_enabled:
+ log.info("Enabling snapshot scheduler.")
+ try:
+ if os.path.exists(GCRON_DISABLED):
+ os.remove(GCRON_DISABLED)
+ if os.path.lexists(GCRON_TASKS):
+ os.remove(GCRON_TASKS)
+ try:
+ f = os.open(GCRON_ENABLED, os.O_CREAT | os.O_NONBLOCK,
+ 0644)
+ os.close(f)
+ except IOError as (errno, strerror):
+ log.error("Failed to open %s. Error: %s.",
+ GCRON_ENABLED, strerror)
+ ret = False
+ return ret
+ os.symlink(GCRON_ENABLED, GCRON_TASKS)
+ log.info("Snapshot scheduling is enabled")
+ output("Snapshot scheduling is enabled")
+ except IOError as (errno, strerror):
+ log.error("Failed to enable snapshot scheduling. Error: %s.",
+ strerror)
+ ret = False
+ else:
+ print_str = "Failed to enable snapshot scheduling. " \
+ "Error: Snapshot scheduling is already enabled."
+ log.error(print_str)
+ output(print_str)
+
+ return ret
+
+
+def disable_scheduler():
+ ret = scheduler_status()
+ if ret:
+ if scheduler_enabled:
+ log.info("Disabling snapshot scheduler.")
+ try:
+ if os.path.exists(GCRON_DISABLED):
+ os.remove(GCRON_DISABLED)
+ if os.path.lexists(GCRON_TASKS):
+ os.remove(GCRON_TASKS)
+ f = os.open(GCRON_DISABLED, os.O_CREAT, 0644)
+ os.close(f)
+ os.symlink(GCRON_DISABLED, GCRON_TASKS)
+ log.info("Snapshot scheduling is disabled")
+ output("Snapshot scheduling is disabled")
+ except IOError as (errno, strerror):
+ log.error("Failed to disable snapshot scheduling. Error: %s.",
+ strerror)
+ ret = False
+ else:
+ print_str = "Failed to disable scheduling. " \
+ "Error: Snapshot scheduling is already disabled."
+ log.error(print_str)
+ output(print_str)
+
+ return ret
+
+
+def load_tasks_from_file():
+ global tasks
+ global longest_field
+ try:
+ with open(GCRON_ENABLED, 'r') as f:
+ for line in f:
+ line = line.rstrip('\n')
+ if not line:
+ break
+ line = line.split("gcron.py")
+ schedule = line[0].split("root")[0].rstrip(' ')
+ line = line[1].split(" ")
+ volname = line[1]
+ jobname = line[2]
+ longest_field = max(longest_field, len(jobname), len(volname),
+ len(schedule))
+ tasks[jobname] = schedule+":"+volname
+ ret = True
+ except IOError as (errno, strerror):
+ log.error("Failed to open %s. Error: %s.", GCRON_ENABLED, strerror)
+ ret = False
+
+ return ret
+
+
+def list_schedules():
+ ret = scheduler_status()
+ if ret:
+ if scheduler_enabled:
+ log.info("Listing snapshot schedules.")
+ ret = load_tasks_from_file()
+ if ret:
+ if len(tasks) == 0:
+ output("No snapshots scheduled")
+ else:
+ jobname = "JOB_NAME".ljust(longest_field+5)
+ schedule = "SCHEDULE".ljust(longest_field+5)
+ operation = "OPERATION".ljust(longest_field+5)
+ volname = "VOLUME NAME".ljust(longest_field+5)
+ hyphens = "".ljust((longest_field+5) * 4, '-')
+ print(jobname+schedule+operation+volname)
+ print(hyphens)
+ for key in sorted(tasks):
+ jobname = key.ljust(longest_field+5)
+ schedule = tasks[key].split(":")[0].ljust(
+ longest_field + 5)
+ volname = tasks[key].split(":")[1].ljust(
+ longest_field + 5)
+ operation = "Snapshot Create".ljust(longest_field+5)
+ print(jobname+schedule+operation+volname)
+ else:
+ log.error("Failed to load tasks from %s.", GCRON_ENABLED)
+ else:
+ print_str = "Failed to list snapshot schedules. " \
+ "Error: Snapshot scheduling is currently disabled."
+ log.error(print_str)
+ output(print_str)
+
+ return ret
+
+
+def write_tasks_to_file():
+ ret = False
+ try:
+ with open(TMP_FILE, "w", 0644) as f:
+ # If tasks is empty, just create an empty tmp file
+ if len(tasks) != 0:
+ for key in sorted(tasks):
+ jobname = key
+ schedule = tasks[key].split(":")[0]
+ volname = tasks[key].split(":")[1]
+ f.write("%s root PATH=$PATH:/usr/local/sbin:/usr/sbin "
+ "gcron.py %s %s\n" % (schedule, volname, jobname))
+ f.write("\n")
+ f.flush()
+ os.fsync(f.fileno())
+ except IOError as (errno, strerror):
+ log.error("Failed to open %s. Error: %s.", TMP_FILE, strerror)
+ ret = False
+ return ret
+
+ os.rename(TMP_FILE, GCRON_ENABLED)
+ ret = True
+
+ return ret
+
+
+def add_schedules(jobname, schedule, volname):
+ ret = scheduler_status()
+ if ret:
+ if scheduler_enabled:
+ log.info("Adding snapshot schedules.")
+ ret = load_tasks_from_file()
+ if ret:
+ if jobname in tasks:
+ print_str = ("%s already exists in schedule. Use "
+ "'edit' to modify %s" % (jobname, jobname))
+ log.error(print_str)
+ output(print_str)
+ else:
+ tasks[jobname] = schedule + ":" + volname
+ ret = write_tasks_to_file()
+ if ret:
+ # Create a LOCK_FILE for the job
+ job_lockfile = LOCK_FILE_DIR + jobname
+ try:
+ f = os.open(job_lockfile,
+ os.O_CREAT | os.O_NONBLOCK, 0644)
+ os.close(f)
+ except IOError as (errno, strerror):
+ log.error("Failed to open %s. Error: %s.",
+ job_lockfile, strerror)
+ ret = False
+ return ret
+ log.info("Successfully added snapshot schedule %s"
+ % jobname)
+ output("Successfully added snapshot schedule")
+ else:
+ log.error("Failed to load tasks from %s.", GCRON_ENABLED)
+ else:
+ print_str = ("Failed to add snapshot schedule. "
+ "Error: Snapshot scheduling is currently disabled.")
+ log.error(print_str)
+ output(print_str)
+
+ return ret
+
+
+def delete_schedules(jobname):
+ ret = scheduler_status()
+ if ret:
+ if scheduler_enabled:
+ log.info("Delete snapshot schedules.")
+ ret = load_tasks_from_file()
+ if ret:
+ if jobname in tasks:
+ del tasks[jobname]
+ ret = write_tasks_to_file()
+ if ret:
+ # Delete the LOCK_FILE for the job
+ job_lockfile = LOCK_FILE_DIR+jobname
+ try:
+ os.remove(job_lockfile)
+ except IOError as (errno, strerror):
+ log.error("Failed to open %s. Error: %s.",
+ job_lockfile, strerror)
+ log.info("Successfully deleted snapshot schedule %s"
+ % jobname)
+ output("Successfully deleted snapshot schedule")
+ else:
+ print_str = ("Failed to delete %s. Error: No such "
+ "job scheduled" % jobname)
+ log.error(print_str)
+ output(print_str)
+ else:
+ log.error("Failed to load tasks from %s.", GCRON_ENABLED)
+ else:
+ print_str = ("Failed to delete snapshot schedule. "
+ "Error: Snapshot scheduling is currently disabled.")
+ log.error(print_str)
+ output(print_str)
+
+ return ret
+
+
+def edit_schedules(jobname, schedule, volname):
+ ret = scheduler_status()
+ if ret:
+ if scheduler_enabled:
+ log.info("Editing snapshot schedules.")
+ ret = load_tasks_from_file()
+ if ret:
+ if jobname in tasks:
+ tasks[jobname] = schedule+":"+volname
+ ret = write_tasks_to_file()
+ if ret:
+ log.info("Successfully edited snapshot schedule %s"
+ % jobname)
+ output("Successfully edited snapshot schedule")
+ else:
+ print_str = ("Failed to edit %s. Error: No such "
+ "job scheduled" % jobname)
+ log.error(print_str)
+ output(print_str)
+ else:
+ log.error("Failed to load tasks from %s.", GCRON_ENABLED)
+ else:
+ print_str = ("Failed to edit snapshot schedule. "
+ "Error: Snapshot scheduling is currently disabled.")
+ log.error(print_str)
+ output(print_str)
+
+ return ret
+
+
+def initialise_scheduler():
+ try:
+ with open("/tmp/crontab", "w+", 0644) as f:
+ updater = ("* * * * * root PATH=$PATH:/usr/local/sbin:"
+ "/usr/sbin gcron.py --update\n")
+ f.write("%s\n" % updater)
+ f.flush()
+ os.fsync(f.fileno())
+ except IOError as (errno, strerror):
+ log.error("Failed to open /tmp/crontab. Error: %s.", strerror)
+ ret = False
+ return ret
+
+ os.rename("/tmp/crontab", GCRON_UPDATE_TASK)
+
+ if not os.path.lexists(GCRON_TASKS):
+ try:
+ f = open(GCRON_TASKS, "w", 0644)
+ f.close()
+ except IOError as (errno, strerror):
+ log.error("Failed to open %s. Error: %s.", GCRON_TASKS, strerror)
+ ret = False
+ return ret
+
+ if os.path.lexists(GCRON_CROND_TASK):
+ os.remove(GCRON_CROND_TASK)
+
+ os.symlink(GCRON_TASKS, GCRON_CROND_TASK)
+
+ log.info("Successfully inited snapshot scheduler for this node")
+ output("Successfully inited snapshot scheduler for this node")
+
+ ret = True
+ return ret
+
+
+def perform_operation(args):
+ ret = False
+
+ # Initialise snapshot scheduler on local node
+ if args.action == "init":
+ ret = initialise_scheduler()
+ if not ret:
+ output("Failed to initialise snapshot scheduling")
+ else:
+ return ret
+
+ # Check if the symlink to GCRON_TASKS is properly set in the shared storage
+ if (not os.path.lexists(GCRON_UPDATE_TASK) or
+ not os.path.lexists(GCRON_CROND_TASK) or
+ os.readlink(GCRON_CROND_TASK) != GCRON_TASKS):
+ print_str = ("Please run 'snap_scheduler.py' init to initialise "
+ "the snap scheduler for the local node.")
+ log.error(print_str)
+ output(print_str)
+ return
+
+ # Check status of snapshot scheduler.
+ if args.action == "status":
+ ret = scheduler_status()
+ if ret:
+ if scheduler_enabled:
+ output("Snapshot scheduling status: Enabled")
+ else:
+ output("Snapshot scheduling status: Disabled")
+ else:
+ output("Failed to check status of snapshot scheduler")
+ return ret
+
+ # Enable snapshot scheduler
+ if args.action == "enable":
+ ret = enable_scheduler()
+ if not ret:
+ output("Failed to enable snapshot scheduling")
+ else:
+ subprocess.Popen(["touch", "-h", GCRON_TASKS])
+ return ret
+
+ # Disable snapshot scheduler
+ if args.action == "disable":
+ ret = disable_scheduler()
+ if not ret:
+ output("Failed to disable snapshot scheduling")
+ else:
+ subprocess.Popen(["touch", "-h", GCRON_TASKS])
+ return ret
+
+ # List snapshot schedules
+ if args.action == "list":
+ ret = list_schedules()
+ if not ret:
+ output("Failed to list snapshot schedules")
+ return ret
+
+ # Add snapshot schedules
+ if args.action == "add":
+ ret = add_schedules(args.jobname, args.schedule, args.volname)
+ if not ret:
+ output("Failed to add snapshot schedule")
+ else:
+ subprocess.Popen(["touch", "-h", GCRON_TASKS])
+ return ret
+
+ # Delete snapshot schedules
+ if args.action == "delete":
+ ret = delete_schedules(args.jobname)
+ if not ret:
+ output("Failed to delete snapshot schedule")
+ else:
+ subprocess.Popen(["touch", "-h", GCRON_TASKS])
+ return ret
+
+ # Edit snapshot schedules
+ if args.action == "edit":
+ ret = edit_schedules(args.jobname, args.schedule, args.volname)
+ if not ret:
+ output("Failed to edit snapshot schedule")
+ else:
+ subprocess.Popen(["touch", "-h", GCRON_TASKS])
+ return ret
+
+ return ret
+
+
+def main():
+ initLogger()
+ parser = argparse.ArgumentParser()
+ subparsers = parser.add_subparsers(dest="action")
+ subparsers.add_parser('init',
+ help="Initialise the node for snapshot scheduling")
+
+ subparsers.add_parser("status",
+ help="Check if snapshot scheduling is "
+ "enabled or disabled")
+ subparsers.add_parser("enable",
+ help="Enable snapshot scheduling")
+ subparsers.add_parser("disable",
+ help="Disable snapshot scheduling")
+ subparsers.add_parser("list",
+ help="List snapshot schedules")
+ parser_add = subparsers.add_parser("add",
+ help="Add snapshot schedules")
+ parser_add.add_argument("jobname", help="Job Name")
+ parser_add.add_argument("schedule", help="Schedule")
+ parser_add.add_argument("volname", help="Volume Name")
+
+ parser_delete = subparsers.add_parser("delete",
+ help="Delete snapshot schedules")
+ parser_delete.add_argument("jobname", help="Job Name")
+ parser_edit = subparsers.add_parser("edit",
+ help="Edit snapshot schedules")
+ parser_edit.add_argument("jobname", help="Job Name")
+ parser_edit.add_argument("schedule", help="Schedule")
+ parser_edit.add_argument("volname", help="Volume Name")
+
+ args = parser.parse_args()
+
+ if not os.path.exists(LOCK_FILE_DIR):
+ try:
+ os.makedirs(LOCK_FILE_DIR)
+ except IOError as (errno, strerror):
+ if errno != EEXIST:
+ log.error("Failed to create %s : %s", LOCK_FILE_DIR, strerror)
+ output("Failed to create %s. Error: %s"
+ % (LOCK_FILE_DIR, strerror))
+
+ try:
+ f = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR | os.O_NONBLOCK, 0644)
+ try:
+ fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
+ perform_operation(args)
+ fcntl.flock(f, fcntl.LOCK_UN)
+ except IOError as (errno, strerror):
+ log.info("%s is being processed by another agent.", LOCK_FILE)
+ output("Another snap_scheduler command is running. "
+ "Please try again after some time.")
+ os.close(f)
+ except IOError as (errno, strerror):
+ log.error("Failed to open %s : %s", LOCK_FILE, strerror)
+ output("Failed to open %s. Error: %s" % (LOCK_FILE, strerror))
+
+ return
+
+
+if __name__ == "__main__":
+ main()