From 2e69ae03c3c3fcb86e4d5347008834ad1dfb42b4 Mon Sep 17 00:00:00 2001 From: Niels de Vos Date: Sun, 24 Jan 2016 13:23:19 +0100 Subject: Point users of glusterfs-hadoop to the upstream project The code we have in glusterfs-hadoop/ is old and should not be used anymore. The plugin for Hadoop HCFS is maintained at the glusterfs-hadoop project on GitHub: https://github.com/gluster/glusterfs-hadoop Removing the old code from the repository, and adding a pointer to the projects wiki in the MAINTAINERS file. Change-Id: Ia86d08fb0c73a3f75b706b1e0793e3d7a0f4984c BUG: 1301352 CC: Jay Vyas CC: Bradley Childs Signed-off-by: Niels de Vos Reviewed-on: http://review.gluster.org/13286 Smoke: Gluster Build System Reviewed-by: Kaleb KEITHLEY Reviewed-by: Venky Shankar NetBSD-regression: NetBSD Build System CentOS-regression: Gluster Build System Reviewed-by: Vijay Bellur --- glusterfs-hadoop/README | 182 ------------------------------------------------ 1 file changed, 182 deletions(-) delete mode 100644 glusterfs-hadoop/README (limited to 'glusterfs-hadoop/README') diff --git a/glusterfs-hadoop/README b/glusterfs-hadoop/README deleted file mode 100644 index 3026f11c035..00000000000 --- a/glusterfs-hadoop/README +++ /dev/null @@ -1,182 +0,0 @@ -GlusterFS Hadoop Plugin -======================= - -INTRODUCTION ------------- - -This document describes how to use GlusterFS (http://www.gluster.org/) as a backing store with Hadoop. - - -REQUIREMENTS ------------- - -* Supported OS is GNU/Linux -* GlusterFS and Hadoop installed on all machines in the cluster -* Java Runtime Environment (JRE) -* Maven (needed if you are building the plugin from source) -* JDK (needed if you are building the plugin from source) - -NOTE: Plugin relies on two *nix command line utilities to function properly. They are: - -* mount: Used to mount GlusterFS volumes. -* getfattr: Used to fetch Extended Attributes of a file - -Make sure they are installed on all hosts in the cluster and their locations are in $PATH -environment variable. - - -INSTALLATION ------------- - -** NOTE: Example below is for Hadoop version 0.20.2 ($GLUSTER_HOME/hdfs/0.20.2) ** - -* Building the plugin from source [Maven (http://maven.apache.org/) and JDK is required to build the plugin] - - Change to glusterfs-hadoop directory in the GlusterFS source tree and build the plugin. - - # cd $GLUSTER_HOME/hdfs/0.20.2 - # mvn package - - On a successful build the plugin will be present in the `target` directory. - (NOTE: version number will be a part of the plugin) - - # ls target/ - classes glusterfs-0.20.2-0.1.jar maven-archiver surefire-reports test-classes - ^^^^^^^^^^^^^^^^^^ - - Copy the plugin to lib/ directory in your $HADOOP_HOME dir. - - # cp target/glusterfs-0.20.2-0.1.jar $HADOOP_HOME/lib - - Copy the sample configuration file that ships with this source (conf/core-site.xml) to conf - directory in your $HADOOP_HOME dir. - - # cp conf/core-site.xml $HADOOP_HOME/conf - -* Installing the plugin from RPM - - See the plugin documentation for installing from RPM. - - -CLUSTER INSTALLATION --------------------- - - In case it is tedious to do the above steps(s) on all hosts in the cluster; use the build-and-deploy.py script to - build the plugin in one place and deploy it (along with the configuration file on all other hosts). - - This should be run on the host which is that hadoop master [Job Tracker]. - -* STEPS (You would have done Step 1 and 2 anyway while deploying Hadoop) - - 1. Edit conf/slaves file in your hadoop distribution; one line for each slave. - 2. Setup password-less ssh b/w hadoop master and slave(s). - 3. Edit conf/core-site.xml with all glusterfs related configurations (see CONFIGURATION) - 4. Run the following - # cd $GLUSTER_HOME/hdfs/0.20.2/tools - # python ./build-and-deploy.py -b -d /path/to/hadoop/home -c - - This will build the plugin and copy it (and the config file) to all slaves (mentioned in $HADOOP_HOME/conf/slaves). - - Script options: - -b : build the plugin - -d : location of hadoop directory - -c : deploy core-site.xml - -m : deploy mapred-site.xml - -h : deploy hadoop-env.sh - - -CONFIGURATION -------------- - - All plugin configuration is done in a single XML file (core-site.xml) with tags in each - block. - - Brief explanation of the tunables and the values they accept (change them where-ever needed) are mentioned below - - name: fs.glusterfs.impl - value: org.apache.hadoop.fs.glusterfs.GlusterFileSystem - - The default FileSystem API to use (there is little reason to modify this). - - name: fs.default.name - value: glusterfs://server:port - - The default name that hadoop uses to represent file as a URI (typically a server:port tuple). Use any host - in the cluster as the server and any port number. This option has to be in server:port format for hadoop - to create file URI; but is not used by plugin. - - name: fs.glusterfs.volname - value: volume-dist-rep - - The volume to mount. - - - name: fs.glusterfs.mount - value: /mnt/glusterfs - - This is the directory that the plugin will use to mount (FUSE mount) the volume. - - name: fs.glusterfs.server - value: 192.168.1.36, hackme.zugzug.org - - To mount a volume the plugin needs to know the hostname or the IP of a GlusterFS server in the cluster. - Mention it here. - - name: quick.slave.io - value: [On/Off], [Yes/No], [1/0] - - NOTE: This option is not tested as of now. - - This is a performance tunable option. Hadoop schedules jobs to hosts that contain the file data part. The job - then does I/O on the file (via FUSE in case of GlusterFS). When this option is set, the plugin will try to - do I/O directly from the backed filesystem (ext3, ext4 etc..) the file resides on. Hence read performance - will improve and job would run faster. - - -USAGE ------ - - Once configured, start Hadoop Map/Reduce daemons - - # cd $HADOOP_HOME - # ./bin/start-mapred.sh - - If the map/reduce job/task trackers are up, all I/O will be done to GlusterFS. - - -FOR HACKERS ------------ - -* Source Layout - -** version specific: hdfs/ ** -./src -./src/main -./src/main/java -./src/main/java/org -./src/main/java/org/apache -./src/main/java/org/apache/hadoop -./src/main/java/org/apache/hadoop/fs -./src/main/java/org/apache/hadoop/fs/glusterfs -./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSBrickClass.java -./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSXattr.java <--- Fetch/Parse Extended Attributes of a file -./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEInputStream.java <--- Input Stream (instantiated during open() calls; quick read from backed FS) -./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFSBrickRepl.java -./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEOutputStream.java <--- Output Stream (instantiated during creat() calls) -./src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java <--- Entry Point for the plugin (extends Hadoop FileSystem class) -./src/test -./src/test/java -./src/test/java/org -./src/test/java/org/apache -./src/test/java/org/apache/hadoop -./src/test/java/org/apache/hadoop/fs -./src/test/java/org/apache/hadoop/fs/glusterfs -./src/test/java/org/apache/hadoop/fs/glusterfs/AppTest.java <--- Your test cases go here (if any :-)) -./tools/build-deploy-jar.py <--- Build and Deployment Script -./conf -./conf/core-site.xml <--- Sample configuration file -./pom.xml <--- build XML file (used by maven) - -** toplevel: hdfs/ ** -./COPYING <--- License -./README <--- This file -- cgit