path: root/design/Cloud-Archival/
diff options
Diffstat (limited to 'design/Cloud-Archival/')
1 files changed, 60 insertions, 0 deletions
diff --git a/design/Cloud-Archival/ b/design/Cloud-Archival/
new file mode 100644
index 0000000..9506429
--- /dev/null
+++ b/design/Cloud-Archival/
@@ -0,0 +1,60 @@
+This document gives a high level overview of CloudArchival. The design is being
+refined as we go along, and this document will be updated along the way.
+## Introduction
+This design solves the usecase where data that requires high-speed access is
+retained internally i.e. Glusterfs and lower-priority data is moved to a
+low-cost cloud-based archive storage. This will allow reduction in storage cost
+for usecases where a majority of data is cold and can be archived.
+## Architectural Overview
+CloudeArchival has two components. A scanner/uploader tool and a downloader
+xlator in Glusterfs stack.
+### 1. Scanner/uploader
+This tool will scan the file system and based on a policy, will upload the data
+to a predecided Cloud Storage. The policy can be user defined. A simple example
+would be, upload any file that has not been accessed for one month.
+### 2. Downloader
+This xlator will download the file from Cloud-Storage when an access for
+read/write (basically any data modification) request is made. This xlator will
+be placed on the client side as AFR and EC xlators are client xlators.
+## Work Flow
+ - Phase I - Post scanning, the uploader will filter out files to be archived
+ to Cloud. Once the data migration is complete to Cloud, the uploader will do
+a setxattr operation on the file to inform the downloader xlator to truncate
+the data. As part of this maintenance, downloader will store the size
+information as an xattr on the file to serve lookup/stat etc and then will
+truncate the data.
+- Phase II - While the data resides on Cloud, all meta-data operation can be
+ performed locally on Glusterfs. The data will be downloaded only when a data
+modification is requested. For read/write request, the downloader will stub the
+request and start downloading the file from Cloud. Upon successful download,
+the stubbed request will be resumed.
+## Cloud Information and Security
+Cloud information like which Cloud provider and it's access information can be
+stored per volume basis through Glusterd. There can only one cloud storage be
+attached to a volume.
+Since the communication channel to Cloud needs to be secured, the access
+information for Cloud should and must reside on the trusted storage pool.
+GF-proxy fits this requirement nicely as it runs on the trusted storage pool
+(as for now). Hence, the downloader will be part of GF-proxy daemon on the
+trusted storage pool.
+#### Note: Initial implementation will integrate with Amazon Web Service (AWS).
+Integration with other Cloud Storage will be left open for development to the