#Managing Hadoop Compatible Storage GlusterFS provides compatibility for Apache Hadoop and it uses the standard file system APIs available in Hadoop to provide a new storage option for Hadoop deployments. Existing MapReduce based applications can use GlusterFS seamlessly. This new functionality opens up data within Hadoop deployments to any file-based or object-based application. ##Architecture Overview The following diagram illustrates Hadoop integration with GlusterFS: ![ Hadoop Architecture ](../images/Hadoop_Architecture.png) ##Advantages The following are the advantages of Hadoop Compatible Storage with GlusterFS: - Provides simultaneous file-based and object-based access within Hadoop. - Eliminates the centralized metadata server. - Provides compatibility with MapReduce applications and rewrite is not required. - Provides a fault tolerant file system. ##Preparing to Install Hadoop Compatible Storage This section provides information on pre-requisites and list of dependencies that will be installed during installation of Hadoop compatible storage. ###Pre-requisites The following are the pre-requisites to install Hadoop Compatible Storage : - Hadoop 0.20.2 is installed, configured, and is running on all the machines in the cluster. - Java Runtime Environment - Maven (mandatory only if you are building the plugin from the source) - JDK (mandatory only if you are building the plugin from the source) - getfattr - command line utility ##Installing, and Configuring Hadoop Compatible Storage This section describes how to install and configure Hadoop Compatible Storage in your storage environment and verify that it is functioning correctly. 1. Download `glusterfs-hadoop-0.20.2-0.1.x86_64.rpm` file to each server on your cluster. You can download the file from [][]. 2. To install Hadoop Compatible Storage on all servers in your cluster, run the following command: `# rpm –ivh --nodeps glusterfs-hadoop-0.20.2-0.1.x86_64.rpm` The following files will be extracted: - /usr/local/lib/glusterfs-Hadoop-version-gluster\_plugin\_version.jar - /usr/local/lib/conf/core-site.xml 3. (Optional) To install Hadoop Compatible Storage in a different location, run the following command: `# rpm –ivh --nodeps –prefix /usr/local/glusterfs/hadoop glusterfs-hadoop- 0.20.2-0.1.x86_64.rpm` 4. Edit the `conf/core-site.xml` file. The following is the sample `conf/core-site.xml` file: fs.glusterfs.impl org.apache.hadoop.fs.glusterfs.Gluster FileSystem fs.default.name glusterfs://fedora1:9000 fs.glusterfs.volname hadoopvol fs.glusterfs.mount /mnt/glusterfs fs.glusterfs.server fedora2 quick.slave.io Off The following are the configurable fields: Property Name | Default Value | Description --- | --- | --- fs.default.name | glusterfs://fedora1:9000 | Any hostname in the cluster as the server and any port number. fs.glusterfs.volname | hadoopvol | GlusterFS volume to mount. fs.glusterfs.mount | /mnt/glusterfs | The directory used to fuse mount the volume. fs.glusterfs.server | fedora2 | Any hostname or IP address on the cluster except the client/master. quick.slave.io | Off | Performance tunable option. If this option is set to On, the plugin will try to perform I/O directly from the disk file system (like ext3 or ext4) the file resides on. Hence read performance will improve and job would run faster. **Note*: This option is not tested widely 5. Create a soft link in Hadoop’s library and configuration directory for the downloaded files (in Step 3) using the following commands: `# ln -s >` For example, `# ln –s /usr/local/lib/glusterfs-0.20.2-0.1.jar /lib/glusterfs-0.20.2-0.1.jar` `# ln –s /usr/local/lib/conf/core-site.xml /conf/core-site.xml ` 6. (Optional) You can run the following command on Hadoop master to build the plugin and deploy it along with core-site.xml file, instead of repeating the above steps: `# build-deploy-jar.py -d -c ` ##Starting and Stopping the Hadoop MapReduce Daemon To start and stop MapReduce daemon - To start MapReduce daemon manually, enter the following command: `# /bin/start-mapred.sh` - To stop MapReduce daemon manually, enter the following command: `# /bin/stop-mapred.sh ` > **Note** > You must start Hadoop MapReduce daemon on all servers. []: http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/glusterfs-hadoop-0.20.2-0.1.x86_64.rpm