From 36d2975714ed6ef98c0f86a2fac22fc382ea8a9d Mon Sep 17 00:00:00 2001 From: Xavier Hernandez Date: Mon, 29 Sep 2014 15:48:55 +0200 Subject: doc: added documentation for dispersed volumes Change-Id: I8a8368bdbe31af30a239aaf8cc478429e10c3f57 BUG: 1147563 Signed-off-by: Xavier Hernandez Reviewed-on: http://review.gluster.org/8885 Tested-by: Gluster Build System Reviewed-by: Jeff Darcy --- .../en-US/markdown/admin_setting_volumes.md | 170 ++++++++++++++++++++- doc/gluster.8 | 2 +- 2 files changed, 170 insertions(+), 2 deletions(-) diff --git a/doc/admin-guide/en-US/markdown/admin_setting_volumes.md b/doc/admin-guide/en-US/markdown/admin_setting_volumes.md index 028cd30647a..395aa2c79a9 100644 --- a/doc/admin-guide/en-US/markdown/admin_setting_volumes.md +++ b/doc/admin-guide/en-US/markdown/admin_setting_volumes.md @@ -52,11 +52,24 @@ start it before attempting to mount it. and performance is critical. In this release, configuration of this volume type is supported only for Map Reduce workloads. + - **Dispersed** - Dispersed volumes are based on erasure codes, + providing space-efficient protection against disk or server failures. + It stores an encoded fragment of the original file to each brick in + a way that only a subset of the fragments is needed to recover the + original file. The number of bricks that can be missing without + losing access to data is configured by the administrator on volume + creation time. + + - **Distributed Dispersed** - Distributed dispersed volumes distribute + files across dispersed subvolumes. This has the same advantages of + distribute replicate volumes, but using disperse to store the data + into the bricks. + **To create a new volume** - Create a new volume : - `# gluster volume create [stripe | replica ] [transport tcp | rdma | tcp, rdma] ` + `# gluster volume create [stripe | replica | disperse] [transport tcp | rdma | tcp, rdma] ` For example, to create a volume called test-volume consisting of server3:/exp3 and server4:/exp4: @@ -389,6 +402,161 @@ of this volume type is supported only for Map Reduce workloads. > Use the `force` option at the end of command if you want to create the volume in this case. +##Creating Dispersed Volumes + +Dispersed volumes are based on erasure codes. It stripes the encoded data of +files, with some redundancy addedd, across multiple bricks in the volume. You +can use dispersed volumes to have a configurable level of reliability with a +minimum space waste. + +**Redundancy** + +Each dispersed volume has a redundancy value defined when the volume is +created. This value determines how many bricks can be lost without +interrupting the operation of the volume. It also determines the amount of +usable space of the volume using this formula: + + = * (#Bricks - Redundancy) + +All bricks of a disperse set should have the same capacity otherwise, when +the smaller brick becomes full, no additional data will be allowed in the +disperse set. + +It's important to note that a configuration with 3 bricks and redundancy 1 +will have less usable space (66.7% of the total physical space) than a +configuration with 10 bricks and redundancy 1 (90%). However the first one +will be safer than the second one (roughly the probability of failure of +the second configuration if more than 4.5 times bigger than the first one). + +For example, a dispersed volume composed by 6 bricks of 4TB and a redundancy +of 2 will be completely operational even with two bricks inaccessible. However +a third inaccessible brick will bring the volume down because it won't be +possible to read or write to it. The usable space of the volume will be equal +to 16TB. + +The implementation of erasure codes in GlusterFS limits the redundancy to a +value smaller than #Bricks / 2 (or equivalently, redundancy * 2 < #Bricks). +Having a redundancy equal to half of the number of bricks would be almost +equivalent to a replica-2 volume, and probably a replicated volume will +perform better in this case. + +**Optimal volumes** + +One of the worst things erasure codes have in terms of performance is the +RMW (Read-Modify-Write) cycle. Erasure codes operate in blocks of a certain +size and it cannot work with smaller ones. This means that if a user issues +a write of a portion of a file that doesn't fill a full block, it needs to +read the remaining portion from the current contents of the file, merge them, +compute the updated encoded block and, finally, writing the resulting data. + +This adds latency, reducing performance when this happens. Some GlusterFS +performance xlators can help to reduce or even eliminate this problem for +some workloads, but it should be taken into account when using dispersed +volumes for a specific use case. + +Current implementation of dispersed volumes use blocks of a size that depends +on the number of bricks and redundancy: 512 * (#Bricks - redundancy) bytes. +This value is also known as the stripe size. + +Using combinations of #Bricks/redundancy that give a power of two for the +stripe size will make the disperse volume perform better in most workloads +because it's more typical to write information in blocks that are multiple of +two (for example databases, virtual machines and many applications). + +These combinations are considered *optimal*. + +For example, a configuration with 6 bricks and redundancy 2 will have a stripe +size of 512 * (6 - 2) = 2048 bytes, so it's considered optimal. A configuration +with 7 bricks and redundancy 2 would have a stripe size of 2560 bytes, needing +a RMW cycle for many writes (of course this always depends on the use case). + +**To create a dispersed volume** + +1. Create a trusted storage pool. + +2. Create the dispersed volume: + + `# gluster volume create [disperse []] [redundancy ] [transport tcp | rdma | tcp,rdma]` + + A dispersed volume can be created by specifying the number of bricks in a + disperse set, by specifying the number of redundancy bricks, or both. + + If *disperse* is not specified, or the _<count>_ is missing, the + entire volume will be treated as a single disperse set composed by all + bricks enumerated in the command line. + + If *redundancy* is not specified, it is computed automatically to be the + optimal value. If this value does not exist, it's assumed to be '1' and a + warning message is shown: + + # gluster volume create test-volume disperse 4 server{1..4}:/bricks/test-volume + There isn't an optimal redundancy value for this configuration. Do you want to create the volume with redundancy 1 ? (y/n) + + In all cases where *redundancy* is automatically computed and it's not + equal to '1', a warning message is displayed: + + # gluster volume create test-volume disperse 6 server{1..6}:/bricks/test-volume + The optimal redundancy for this configuration is 2. Do you want to create the volume with this value ? (y/n) + + _redundancy_ must be greater than 0, and the total number of bricks must + be greater than 2 * _redundancy_. This means that a dispersed volume must + have a minimum of 3 bricks. + + If the transport type is not specified, *tcp* is used as the default. You + can also set additional options if required, like in the other volume + types. + + > **Note**: + + > - Make sure you start your volumes before you try to mount them or + > else client operations after the mount will hang. + + > - GlusterFS will fail to create a dispersed volume if more than one brick of a disperse set is present on the same peer. + + > ``` + # gluster volume create disperse 3 server1:/brick{1..3} + volume create: : failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. + Do you still want to continue creating the volume? (y/n)``` + + > Use the `force` option at the end of command if you want to create the volume in this case. + +##Creating Distributed Dispersed Volumes + +Distributed dispersed volumes are the equivalent to distributed replicated +volumes, but using dispersed subvolumes instead of replicated ones. + +**To create a distributed dispersed volume** + +1. Create a trusted storage pool. + +2. Create the distributed dispersed volume: + + `# gluster volume create disperse [redundancy ] [transport tcp | rdma tcp,rdma]` + + To create a distributed dispersed volume, the *disperse* keyword and + <count> is mandatory, and the number of bricks specified in the + command line must must be a multiple of the disperse count. + + *redundancy* is exactly the same as in the dispersed volume. + + If the transport type is not specified, *tcp* is used as the default. You + can also set additional options if required, like in the other volume + types. + + > **Note**: + + > - Make sure you start your volumes before you try to mount them or + > else client operations after the mount will hang. + + > - GlusterFS will fail to create a distributed dispersed volume if more than one brick of a disperse set is present on the same peer. + + > ``` + # gluster volume create disperse 3 server1:/brick{1..6} + volume create: : failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. + Do you still want to continue creating the volume? (y/n)``` + + > Use the `force` option at the end of command if you want to create the volume in this case. + ##Starting Volumes You must start your volumes before you try to mount them. diff --git a/doc/gluster.8 b/doc/gluster.8 index c4603e1a877..b7e5e205e9f 100644 --- a/doc/gluster.8 +++ b/doc/gluster.8 @@ -36,7 +36,7 @@ The Gluster Console Manager is a command line utility for elastic volume managem \fB\ volume info [all|] \fR Display information about all volumes, or the specified volume. .TP -\fB\ volume create [stripe ] [replica ] [transport ] ... \fR +\fB\ volume create [stripe ] [replica ] [disperse []] [redundancy ] [transport ] ... \fR Create a new volume of the specified type using the specified bricks and transport type (the default transport type is tcp). To create a volume with both transports (tcp and rdma), give 'transport tcp,rdma' as an option. .TP -- cgit