summaryrefslogtreecommitdiffstats
path: root/glusterfs-hadoop/README
diff options
context:
space:
mode:
Diffstat (limited to 'glusterfs-hadoop/README')
-rw-r--r--glusterfs-hadoop/README182
1 files changed, 0 insertions, 182 deletions
diff --git a/glusterfs-hadoop/README b/glusterfs-hadoop/README
deleted file mode 100644
index 3026f11c035..00000000000
--- a/glusterfs-hadoop/README
+++ /dev/null
@@ -1,182 +0,0 @@
-GlusterFS Hadoop Plugin
-=======================
-
-INTRODUCTION
-------------
-
-This document describes how to use GlusterFS (http://www.gluster.org/) as a backing store with Hadoop.
-
-
-REQUIREMENTS
-------------
-
-* Supported OS is GNU/Linux
-* GlusterFS and Hadoop installed on all machines in the cluster
-* Java Runtime Environment (JRE)
-* Maven (needed if you are building the plugin from source)
-* JDK (needed if you are building the plugin from source)
-
-NOTE: Plugin relies on two *nix command line utilities to function properly. They are:
-
-* mount: Used to mount GlusterFS volumes.
-* getfattr: Used to fetch Extended Attributes of a file
-
-Make sure they are installed on all hosts in the cluster and their locations are in $PATH
-environment variable.
-
-
-INSTALLATION
-------------
-
-** NOTE: Example below is for Hadoop version 0.20.2 ($GLUSTER_HOME/hdfs/0.20.2) **
-
-* Building the plugin from source [Maven (http://maven.apache.org/) and JDK is required to build the plugin]
-
- Change to glusterfs-hadoop directory in the GlusterFS source tree and build the plugin.
-
- # cd $GLUSTER_HOME/hdfs/0.20.2
- # mvn package
-
- On a successful build the plugin will be present in the `target` directory.
- (NOTE: version number will be a part of the plugin)
-
- # ls target/
- classes glusterfs-0.20.2-0.1.jar maven-archiver surefire-reports test-classes
- ^^^^^^^^^^^^^^^^^^
-
- Copy the plugin to lib/ directory in your $HADOOP_HOME dir.
-
- # cp target/glusterfs-0.20.2-0.1.jar $HADOOP_HOME/lib
-
- Copy the sample configuration file that ships with this source (conf/core-site.xml) to conf
- directory in your $HADOOP_HOME dir.
-
- # cp conf/core-site.xml $HADOOP_HOME/conf
-
-* Installing the plugin from RPM
-
- See the plugin documentation for installing from RPM.
-
-
-CLUSTER INSTALLATION
---------------------
-
- In case it is tedious to do the above steps(s) on all hosts in the cluster; use the build-and-deploy.py script to
- build the plugin in one place and deploy it (along with the configuration file on all other hosts).
-
- This should be run on the host which is that hadoop master [Job Tracker].
-
-* STEPS (You would have done Step 1 and 2 anyway while deploying Hadoop)
-
- 1. Edit conf/slaves file in your hadoop distribution; one line for each slave.
- 2. Setup password-less ssh b/w hadoop master and slave(s).
- 3. Edit conf/core-site.xml with all glusterfs related configurations (see CONFIGURATION)
- 4. Run the following
- # cd $GLUSTER_HOME/hdfs/0.20.2/tools
- # python ./build-and-deploy.py -b -d /path/to/hadoop/home -c
-
- This will build the plugin and copy it (and the config file) to all slaves (mentioned in $HADOOP_HOME/conf/slaves).
-
- Script options:
- -b : build the plugin
- -d : location of hadoop directory
- -c : deploy core-site.xml
- -m : deploy mapred-site.xml
- -h : deploy hadoop-env.sh
-
-
-CONFIGURATION
--------------
-
- All plugin configuration is done in a single XML file (core-site.xml) with <name><value> tags in each <property>
- block.
-
- Brief explanation of the tunables and the values they accept (change them where-ever needed) are mentioned below
-
- name: fs.glusterfs.impl
- value: org.apache.hadoop.fs.glusterfs.GlusterFileSystem
-
- The default FileSystem API to use (there is little reason to modify this).
-
- name: fs.default.name
- value: glusterfs://server:port
-
- The default name that hadoop uses to represent file as a URI (typically a server:port tuple). Use any host
- in the cluster as the server and any port number. This option has to be in server:port format for hadoop
- to create file URI; but is not used by plugin.
-
- name: fs.glusterfs.volname
- value: volume-dist-rep
-
- The volume to mount.
-
-
- name: fs.glusterfs.mount
- value: /mnt/glusterfs
-
- This is the directory that the plugin will use to mount (FUSE mount) the volume.
-
- name: fs.glusterfs.server
- value: 192.168.1.36, hackme.zugzug.org
-
- To mount a volume the plugin needs to know the hostname or the IP of a GlusterFS server in the cluster.
- Mention it here.
-
- name: quick.slave.io
- value: [On/Off], [Yes/No], [1/0]
-
- NOTE: This option is not tested as of now.
-
- This is a performance tunable option. Hadoop schedules jobs to hosts that contain the file data part. The job
- then does I/O on the file (via FUSE in case of GlusterFS). When this option is set, the plugin will try to
- do I/O directly from the backed filesystem (ext3, ext4 etc..) the file resides on. Hence read performance
- will improve and job would run faster.
-
-
-USAGE
------
-
- Once configured, start Hadoop Map/Reduce daemons
-
- # cd $HADOOP_HOME
- # ./bin/start-mapred.sh
-
- If the map/reduce job/task trackers are up, all I/O will be done to GlusterFS.
-
-
-FOR HACKERS
------------
-
-* Source Layout
-
-** version specific: hdfs/<version> **
-./src
-./src/main
-./src/main/java
-./src/main/java/org
-./src/main/java/org/apache
-./src/main/java/org/apache/hadoop
-./src/main/java/org/apache/hadoop/fs
-./src/main/java/org/apache/hadoop/fs/glusterfs
-./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSBrickClass.java
-./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSXattr.java <--- Fetch/Parse Extended Attributes of a file
-./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEInputStream.java <--- Input Stream (instantiated during open() calls; quick read from backed FS)
-./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSBrickRepl.java
-./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEOutputStream.java <--- Output Stream (instantiated during creat() calls)
-./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java <--- Entry Point for the plugin (extends Hadoop FileSystem class)
-./src/test
-./src/test/java
-./src/test/java/org
-./src/test/java/org/apache
-./src/test/java/org/apache/hadoop
-./src/test/java/org/apache/hadoop/fs
-./src/test/java/org/apache/hadoop/fs/glusterfs
-./src/test/java/org/apache/hadoop/fs/glusterfs/AppTest.java <--- Your test cases go here (if any :-))
-./tools/build-deploy-jar.py <--- Build and Deployment Script
-./conf
-./conf/core-site.xml <--- Sample configuration file
-./pom.xml <--- build XML file (used by maven)
-
-** toplevel: hdfs/ **
-./COPYING <--- License
-./README <--- This file