glusterfs-afrv1.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cli/glusterd: Changes to quota command Quota feature	Raghavendra G	2013-11-26	1	-23/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	re-work. Following are the cli commands that are new/re-worked: ====================================================== volume quota <VOLNAME> {enable\|disable\|list [<path> ...]\|remove <path>\| default-soft-limit <percent>} \| volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} \| volume quota <VOLNAME> {alert-time\|soft-timeout\|hard-timeout} {<time>} volume status [all \| <VOLNAME> [nfs\|shd\|<BRICK>\|quotad]] [detail\|clients\|mem\|inode\|fd\|callpool] volume statedump <VOLNAME> [nfs\|quotad] [all\|mem\|iobuf\|callpool\|priv\|fd\|inode\|history] glusterd changes: ================= * Quota limits are now set as extended attributes by glusterd from the aux mount created by the cli. * The gfids of the directories on which quota limits are set for a given volume are stored in /var/lib/glusterd/vols/<volname>/quota.conf file in binary format, and whose cksum and version is stored in /var/lib/glusterd/vols/<volname>/quota.cksum. Original-author: Krutika Dhananjay <kdhananj@redhat.com> Original-author: Krishnan Parthasarathi <kparthas@redhat.com> BUG: 969461 Change-Id: If32bba36c67f9c2a30417af9c6389045b2b7c13b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6003 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	Transparent data encryption and metadata authentication	Edward Shishkin	2013-11-13	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	.. in the systems with non-trusted server This new functionality can be useful in various cloud technologies. It is implemented via a special encryption/crypt translator,which works on the client side and performs encryption and authentication; 1. Class of supported algorithms The crypt translator can support any atomic symmetric block cipher algorithms (which require to pad plain/cipher text before performing encryption/decryption transform (see glossary in atom.c for definitions). In particular, it can support algorithms with the EOF issue (which require to pad the end of file by extra-data). Crypt translator performs translations user -> (offset, size) -> (aligned-offset, padded-size) ->server (and backward), and resolves individual FOPs (write(), truncate(), etc) to read-modify-write sequences. A volume can contain files encrypted by different algorithms of the mentioned class. To change some option value just reconfigure the volume. Currently only one algorithm is supported: AES_XTS. Example of algorithms, which can not be supported by the crypt translator: 1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA; 2. Symmetric block cipher algorithms with inline MACs for data authentication. 2. Implementation notes. a) Atomic algorithms Since any process in a stackable file system manipulates with local data (which can be obsoleted by local data of another process), any atomic cipher algorithm without proper support can lead to non-POSIX behavior. To resolve the "collisions" we introduce locks: before performing FOP->read(), FOP->write(), etc. the process should first lock the file. b) Algorithms with EOF issue Such algorithms require to pad the end of file with some extra-data. Without proper support this will result in losing information about real file size. Keeping a track of real file size is a responsibility of the crypt translator. A special extended attribute with the name "trusted.glusterfs.crypt.att.size" is used for this purpose. All files contained in bricks of encrypted volume do have "padded" sizes. 3. Non-trusted servers and Metadata authentication We assume that server, where user's data is stored on is non-trusted. It means that the server can be subjected to various attacks directed to reveal user's encrypted personal data. We provide protection against such attacks. Every encrypted file has specific private attributes (cipher algorithm id, atom size, etc), which are packed to a string (so-called "format string") and stored as a special extended attribute with the name "trusted.glusterfs.crypt.att.cfmt". We protect the string from tampering. This protection is mandatory, hardcoded and is always on. Without such protection various attacks (based on extending the scope of per-file secret keys) are possible. Our authentication method has been developed in tight collaboration with Red Hat security team and is implemented as "metadata loader of version 1" (see file metadata.c). This method is NIST-compliant and is based on checking 8-byte per-hardlink MACs created(updated) by FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the following unique entities: . file (hardlink) name; . verified file's object id (gfid). Every time, before manipulating with a file, we check it's MACs at FOP->open() time. Some FOPs don't require a file to be opened (e.g. FOP->truncate()). In such cases the crypt translator opens the file mandatory. 4. Generating keys Unique per-file keys are derived by NIST-compliant methods from the a) parent key; b) unique verified object-id of the file (gfid); Per-volume master key, provided by user at mount time is in the root of this "tree of keys". Those keys are used to: 1) encrypt/decrypt file data; 2) encrypt/decrypt file metadata; 3) create per-file and per-link MACs for metadata authentication. 5. Instructions Getting started with crypt translator Example: 1) Create a volume "myvol" and enable encryption: # gluster volume create myvol pepelac:/vols/xvol # gluster volume set myvol encryption on 2) Set location (absolute pathname) of your master key: # gluster volume set myvol encryption.master-key /home/me/mykey 3) Set other options to override default options, if needed. Start the volume. 4) On the client side make sure that the file /home/me/mykey exists and contains proper per-volume master key (that is 256-bit AES key). This key has to be in hex form, i.e. should be represented by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}. The key should start at the beginning of the file. All symbols at offsets >= 64 are ignored. 5) Mount the volume "myvol" on the client side: # glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt After successful mount the file which contains master key may be removed. NOTE: Keeping the master key between mount sessions is in user's competence. ******************************************************************** WARNING! Losing the master key will make content of all regular files inaccessible. Mount with improper master key allows to access content of directories: file names are not encrypted. ******************************************************************** 6. Options of crypt translator 1) "master-key": specifies location (absolute pathname) of the file which contains per-volume master key. There is no default location for master key. 2) "data-key-size": specifies size of per-file key for data encryption Possible values: . "256" default value . "512" 3) "block-size": specifies atom size. Possible values: . "512" . "1024" . "2048" . "4096" default value; 7. Test cases Any workload, which involves the following file operations: ->create(); ->open(); ->readv(); ->writev(); ->truncate(); ->ftruncate(); ->link(); ->unlink(); ->rename(); ->readdirp(). 8. TODOs: 1) Currently size of IOs issued by crypt translator is restricted by block_size (4K by default). We can use larger IOs to improve performance. Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee BUG: 1030058 Signed-off-by: Edward Shishkin <edward@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4667 Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	bd: Add aio support to BD xlator	M. Mohan Kumar	2013-11-13	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Volume option bd-aio controls AIO feature for BD xlator. Code taken from posix-aio.c Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5748 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	features/compress: Compression/DeCompression translator	Prashanth Pai	2013-11-11	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* When a writev call occurs, the client compresses the data before sending it to server. On the server, compressed data is decompressed. Similarly, when a readv call occurs, the server compresses the data before sending it to client. On the client, the compressed data is decompressed. Thus the amount of data sent over the wire is minimized. * Compression/Decompression is done using Zlib library. * During normal operation, this is the format of data sent over wire : <compressed-data> + trailer(8) The trailer contains the CRC32 checksum and length of original uncompressed data. This is used for validation. HOW TO USE ---------- Turning on compression xlator: gluster volume set <vol_name> compress on Configurable options: gluster volume set <vol_name> compress.compression-level 8 gluster volume set <vol_name> compress.min-size 50 Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805 BUG: 923540 Original-author : Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Prashanth Pai <nullpai@gmail.com> Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: http://review.gluster.org/3251 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpcsvc: implement per-client RPC throttling	Anand Avati	2013-10-28	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a limit on the total number of outstanding RPC requests from a given cient. Once the limit is reached the client socket is removed from POLL-IN event polling. Change-Id: I8071b8c89b78d02e830e6af5a540308199d6bdcd BUG: 1008301 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6114 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	gNFS: NFS daemon is limiting IOs to 64KB	Santosh Kumar Pradhan	2013-09-20	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Gluster NFS server is hard-coding the max rsize/wsize to 64KB which is very less for NFS running over 10GE NIC. The existing options nfs.read-size, nfs.write-size are not working as expected. FIX: Make the options nfs.read-size (for rsize) and nfs.write-size (for wsize) work to tune the NFS I/O size. Value range would be 4KB(Min)-64KB(Default)-1MB(max). NB: Credit to "Richard Wareing" for catching it. Change-Id: I2754ecb0975692304308be8bcf496c713355f1c8 BUG: 1009223 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5964 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	features/marker: force xtime updates (configurable) for client-pid = -1	Venky Shankar	2013-09-04	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is required by Geo-Replication that does auxillary mount with client-pid as -1 (which has special treatment at specific places in GlusterFS), to trigger xtime updates on the intermediate master in a cascading setup. Marker too had a check to "not" mark updates for geo-replication's auxillary mounts. With the new geo-replication design, xtimes are not set by the master on the slave for all entities. Due to this cascading setups were broken. This patch introduces "geo-replication.ignore-pid-check" option as a "override" for the client-pid check for gsyncd's client-pid. When this options is enabled, marker start "marking" even if the updates are from the special client. Geo-Replication on the detection of itself being an intermediate master, enables this option. Change-Id: I9f7140edd12fef5480595ee0f93f35b94cdb8345 BUG: 996371 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5591 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	performance/readdir-ahead: introduce directory read-ahead translator	Brian Foster	2013-09-04	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a translator to improve the performance of typical, sequential directory reads (i.e., ls). readdir-ahead begins preloading the contents of a directory on open and serves readdir requests from the preloaded content. readdir-ahead is currently implemented to only handle the single threaded directory read case. readdir-ahead is currently disabled by default. It can be enabled with the following command: gluster volume set <volname> readdir-ahead on The following are results of a getdents test on a single brick volume. Test info: - Single VM, gluster client/server. - Volume mounted with native client using --gid-timeout=2. - getdents on single directory with 100k 0-byte files. Test results: - !readdir-ahead read 3120080 bytes from offset 0 3 MiB, 4348 ops, 0:00:07.00 (416.590 KiB/sec and 594.4737 ops/sec) - readdir-ahead read 3120080 bytes from offset 0 3 MiB, 4348 ops, 0:00:03.00 (820.116 KiB/sec and 1170.3043 ops/sec) BUG: 980517 Change-Id: Ieceb9e1eb47d1d5b5af8da2bf03839537364653f Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4519 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	features/qemu-block: support for QCOW2 and QED formats	Anand Avati	2013-09-03	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for internals snapshots using QCOW2 and general framework for external snapshots (next patch) with QCOW2 and QED. For internal snapshots, the file must be "initialized" or "formatted" into QCOW2 format, and specify a file size. Snapshots can be created, deleted, and applied ("goto"). e.g: // Format and Initialize sh# setfattr -n trusted.glusterfs.block-format -v qcow2:10GB /mnt/imgfile sh# ls -l /mnt/imgfile -rw-r--r-- 1 root root 10G Jul 18 21:20 imgfile // Create a snapshot sh# setfattr -n trusted.glusterfs.block-snapshot-create -v name1 imgfile // Apply a snapshot sh# setfattr -n trusted.gluterfs.block-snapshot-goto -v name1 imgfile Change-Id: If993e057a9455967ba3fa9dcabb7f74b8b2cf4c3 BUG: 986775 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5367 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	nfs: persistent caching of connected NFS-clients	Niels de Vos	2013-08-28	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce /var/lib/glusterfs/nfs/rmtab to contain a list of NFS-clients which have a volume mounted. The volume option 'nfs.mount-rmtab' can be set to an alternative filename. When the file is located on shared storage, multiple gNFS servers can use the same file to present a single NFS-server. This cache is read when a system administrator calls 'showmount -a' and updated when an NFS-client calls MNT or UMNT from the MOUNT protocol. Usage: - create a volume for storing the shared rmtab file - mount the volume on all storage servers, at the same location - make sure that the volume is mounted at boot (add to /etc/fstab) - place the rmtab file on the volume: # gluster volume set <VOLUME> nfs.mount-rmtab <MOUNTPOINT>/<FILENAME> - any subsequent mount requests will add an entry to this file - 'showmount -a' requests will return the NFS-clients using the cluster Note: The NFS-server does currently not support reconfigure(). When a configuration option is set/changed, the NFS-server glusterfs process gets restarted. This causes the active NFS-clients to be forgotten (the entries are saved in the old rmtab, but we do not have a reference to that file any more, so we can't re-add them). Therefor a re-mount done by the NFS-clients is needed before they get listed in the rmtab again. Change-Id: I58f47135d60ad112849d647bea4e1129683dd2b3 BUG: 904065 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/4430 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
*	Add switch and nufa options to 'gluster cli'	Harshavardhana	2013-08-03	1	-6/+31
\| \| \| \| \| \| \| \| \|	Change-Id: Ic3c43291e0e1ead0d89c0436e8d70aa5dee2f543 BUG: 924488 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/5391 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	storage/posix: implement batched fsync in a single thread	Anand Avati	2013-07-23	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because of the extra fsync()s issued by AFR transaction, they could potentially "clog" all the io-threads denying unrelated operations from making progress. This patch assigns a dedicated thread to issues fsyncs, as an experimental feature to understand performance characteristics with the approach. As a basis, incoming individual fsync requests are grouped into batches, falling in the same @batch-fsync-delay-usec window of time. These windows can extend in practice, as processing of the previous batch can take longer than @batch-fsync-delay-usec while new requests are getting batched. The feature support three modes (similar to the -S modes of fs_mark) - syncfs: In this mode one syncfs() is issued per batch, instead of N fsync()s (one per file.) - syncfs-single-fsync: In this mode one syncfs() is issued per batch (which, on Linux, guarantees the completion of write-out of dirty pages in the filesystem up to that point) and one single fsync() to synchronize or flush the controller/drive cache. This corresponds to -S 2 of fsmark. - syncfs-reverse-fsync: In this mode, one syncfs() is issued per batch, and all the open files in that batch are fsync()'ed in the reverse order of the queue. This corresponds to -S 4 of fsmark. - reverse-fsync: In this mode, no syncfs() is issued and all the files in the batch are fsync()'ed in the reverse order. This corresponds to -S 3 of fsmark. Change-Id: Ia1e170a810c780c8d80e02cf910accc4170c4cd4 BUG: 927146 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	features/changelog: changelog translator	Avra Sengupta	2013-07-22	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the initial version of the Changelog Translator. What is it ----------- Goal is to capture changes performed on a GlusterFS volume. The translator needs to be loaded on the server (bricks) and captures changes in a plain text file inside a configured directory path (controlled by "changelog-dir", should be somewhere in <export>/.glusterfs/changelog by default). Changes are classified into 3 types: - Data: : TYPE-I - Metadata : TYPE-II - Entry : TYPE-III Changelog file is rolled over after a certain time interval (defauls to 60 seconds) after which a changelog is started. The thing to be noted here is that for a time interval (time slice) multiple changes for an inode are recorded only once (ie. say for 100+ writes on an inode that happens within the time slice has only a single corresponding entry in the changelog file). That way we do not bloat up the changelog and also save lots of writes. Changelog Format ----------------- TYPE-I and TYPE-II changes have the gfid on the entity on which the operation happened. TYPE-III being a entry op requires the parent gfid and the basename. Changelog format has been kept to a minimal and it's upto the consumers to do the heavy loading of figuring out deletes, renames etc.. A single changelog file records all three types of changes, with each change starting with an identifier ("D": DATA, "M": METADATA and "E": ENTRY). Option is provided for the encoding type (See TUNABLES). Consumers ---------- The only consumer as of today would be geo-replication, although backup utilities, self-heal, bit-rot detection could be possible consumers in the future. CLI ---- By default, change-logging is disabled (the translator is present in the server graph but does nothing). When enabled (via cli) each brick starts to log the changes. There are a set of tunable that can be used to change the translators behaviour: - enable/disable changelog (disabled by default) gluster volume set <volume> changelog {on\|off} - set the logging directory (<brick>/.glusterfs/changelogs is the default) gluster volume set <volume> changelog-dir /path/to/dir - select encoding type (binary (default) or ascii) gluster volume set <volume> encoding {binary\|ascii} - change the rollover time for the logs (60 secs by default) gluster volume set <volume> rollover-time <secs> - when secs > 0, changelog file is not open()'d with O_SYNC flag - and fsync is trigerred periodically every <secs> seconds. gluster volume set <volume> fsync-interval <secs> features/changelog: changelog consumer library (libgfchangelog) A shared library is provided for the consumer of the changelogs for easy acess via APIs. Application can link against this library and request for changelog updates. Conversion of binary logs to human-readable ascii format is also taken care by the library which keeps a copy of the changelog in application provided working directory. Change-Id: I75575fb7f1c53d2bec3dba1a329ea7bb3c628497 BUG: 847839 Original Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5127 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd: Correct op-version of some options	Kaushal M	2013-07-11	1	-23/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	New options being introduced in the master branch should now have op-version set to the GD_OP_VERSION_MAX (3). Some of the options have been backported to release-3.3 branch and hence should have their op-version reduced. Some other options had op-version incorrectly set as 1. Change-Id: If40325b7b2da7aa36f90261024117cd18cf51ef0 BUG: 981278 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5318 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	posix: add a simple health-checker	Niels de Vos	2013-07-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Goal of this health-checker is to detect fatal issues of the underlying storage that is used for exporting a brick. The current implementation requires the filesystem to detect the storage error, after which it will notify the parent xlators and exit the glusterfsd (brick) process to prevent further troubles. The interval the health-check runs can be configured per volume with the storage.health-check-interval option. The default interval is 30 seconds. It is not trivial to write an automated test-case with the current prove-framework. These are the manual steps that can be done to verify the functionality: - setup a Logical Volume (/dev/bz970960/xfs) and format is as XFS for brick usage - create a volume with the one brick # gluster volume create failing_xfs glufs1:/bricks/failing_xfs/data # gluster volume start failing_xfs - mount the volume and verify the functionality - make the storage fail (use device-mapper, or pull disks) # dmsetup table .. bz970960-xfs: 0 196608 linear 7:0 2048 # echo 0 196608 error > dmsetup-error-target # dmsetup load bz970960-xfs dmsetup-error-target # dmsetup resume bz970960-xfs # dmsetup table ... bz970960-xfs: 0 196608 error - notice the errors caught by syslog: Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 buf count 512 Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): I/O Error Detected. Shutting down filesystem Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s) Jun 24 11:31:49 vm130-32 kernel: VFS:Filesystem freeze failed Jun 24 11:31:50 vm130-32 GlusterFS[1969]: [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down Jun 24 11:32:09 vm130-32 kernel: XFS (dm-2): xfs_log_force: error 5 returned. Jun 24 11:32:20 vm130-32 GlusterFS[1969]: [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM - these errors are in the log of the brick as well: [2013-06-24 10:31:50.500607] W [posix-helpers.c:1102:posix_health_check_thread_proc] 0-failing_xfs-posix: stat() on /bricks/failing_xfs/data returned: Input/output error [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM - the glusterfsd process has exited correctly: # gluster volume status Status of volume: failing_xfs Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick glufs1:/bricks/failing_xfs/data N/A N N/A NFS Server on localhost 2049 Y 1897 Change-Id: Ic247fbefb97f7e861307a5998a9a7a3ecc80aa07 BUG: 971774 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/5176 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr: Provide an option to disable afr durability	Pranith Kumar K	2013-07-03	1	-0/+5
\| \| \| \| \| \| \| \| \|	Change-Id: I40eec20ca6b3f857245a2438883822e251077ee9 BUG: 979365 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5269 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	rpc: duplicate request cache for nfs	Rajesh Amaravathi	2013-06-21	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Duplicate request cache provides a mechanism for detecting duplicate rpc requests from clients. DRC caches replies and on duplicate requests, sends the cached reply instead of re-processing the request. Change-Id: I3d62a6c4aa86c92bf61f1038ca62a1a46bf1c303 BUG: 847624 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4049 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	nfs: option to disable acl	Rajesh Amaravathi	2013-06-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Option to disable or enable acl with nfs.acl boolean option. 2. Deregister the acl service with the portmapper service when no longer required. Change-Id: I6562b6b40138d040aa2bf1e5641f4c0e0e9f9d09 BUG: 970070 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/5136 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterd-volgen: Improve volume op-versions calculation	Kaushal M	2013-05-31	1	-451/+458
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Volume op-versions calculations now take into account if an option, a. enables/disables an xlator, or b. is a boolean option. This prevents op-versions from being updated when a feature is disabled. Also, correctly close the dynamically loaded xlators in xlator_volopt_dynload() and prevent leaks. Change-Id: I895ddeeec6f6a33e509325f0ce6f01b7aad3cf5c BUG: 954256 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4952 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd: Introduce volume op-versions	Kaushal M	2013-04-26	1	-343/+416
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each volume is now associated with two op-versions, * op_version - the op-version of the highest op-versioned feature enabled * client_op_version - the op-version of the highest op-versioned feature enabled which affects the clients only. These two op-versions are generated dynamically and kept updated during runtime. Glusterd now uses the respective volumes' client-op-version during getspec requests. To achieve the above a new field in the vme table is introduced, client_option, this boolean field tells if the option is a client side option. Change-Id: I12c83b1dd29ab506026efd50d448cebbcee53c27 BUG: 907311 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4584 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	storage/posix: introduce node-uuid-pathinfo	Venky Shankar	2013-04-05	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabling this option has an effect on pathinfo xattr request returning <node-uuid>:<path> instead of the default - which is <hostname>:<path>. Change-Id: Ice1b38abf8e5df1568bab6d79ec0d53dfa520332 BUG: 765380 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/4567 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	mgmt/glusterd: Enable write-behind in nfs	Pranith Kumar K	2013-04-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	We observed that the number of write requests thus inodelks are increasing very rapidly to thousands without write-behind in the graph. Change-Id: Id71c9c2b0a4c9601a4644a58a933221c62dab0c0 BUG: 928341 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4734 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	dht: make DHT xattr names configurable	Jeff Darcy	2013-03-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is necessary to support "DHT over DHT" configurations, so that the upper and lower instances of DHT don't step all over each other. Why would we even consider such a thing? Because it gives us the ability to do data tiering and rack-aware placement, either by themselves or as complements to other functionality such as erasure codes or deduplication which save space but cost performance. By setting up the top-level DHT to place data into one of several lower-level DHT pools based on policy instead of pure elastic hashing, we get better performance for 90% of accesses and better storage efficiency for 90% of data, all for relatively low effort. Change-Id: I72e65c29edfc80babf39f7a2a00090f4588c4070 BUG: 924265 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4694 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterd: Added description for nfs.transport-type option in volume set help.	Avra Sengupta	2013-03-01	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	Change-Id: I9fe81dc1c3172158e8dd86c4fa2a04af18cb9dde BUG: 782285 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4582 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterd: Added the validation function for subvols-per-directory	Avra Sengupta	2013-02-28	1	-18/+55
\| \| \| \| \| \| \| \| \|	Change-Id: Ie2259023b9001311a2032792639c3093054f6750 BUG: 896431 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4552 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	glusterd: Fix some options in vme table	Kaushal M	2013-02-28	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the options had invalid '.flags' members. In the original table these table these were supposed to be the op-versions, but, the entries for the below options were missing the flags field the op-version was entered in that place. Change-Id: I408f5a972743eb37d9a58a809e8be8cb385bced8 BUG: 903478 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4593 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	glusterd: Added validation function for stripe-block-size.	Avra Sengupta	2013-02-27	1	-2/+41
\| \| \| \| \| \| \| \| \|	Change-Id: I050d01b01eac46550aa435da7d96a972e0393d35 BUG: 770655 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4561 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	mgmt/glusterd: Expose error-gen options through volume set.	Vijay Bellur	2013-02-26	1	-0/+24
\| \| \| \| \| \| \| \| \|	Change-Id: I7c696d99b43544923fb96d177229cdbac32c09fe BUG: 915280 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4577 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	glusterd: Added validation function for quota-timeout.	Avra Sengupta	2013-02-22	1	-2/+50
\| \| \| \| \| \| \| \| \|	Change-Id: I7f82f4217a41e0fe41272e6ef82925e1fe97fcd5 BUG: 765230 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4557 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	glusterd: Added validation function for performance cache max and min size.	Avra Sengupta	2013-02-19	1	-2/+151
\| \| \| \| \| \| \| \| \|	Change-Id: I0b8dbc4b65412b8aff24873f030c03e3dcfcb988 BUG: 782095 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4541 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	features/quota: Add option to consider the quota limit in statfs estimation	Varun Shastry	2013-02-19	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds an option, features.quota-deem-statfs (default off) to consider the quota limits while calculating the volume stats. Eg: Backend is of size 10GB and limit set on / is 5GB. If the option is off df show actual size to be 10GB and when it is on df shows 5GB. Change-Id: Ib30733bb69afecce1dea9d0491af89d551d214cc BUG: 905425 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4511 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	performance/open-behind: use anonymous fd for doing fstat and readv	Raghavendra Bhat	2013-02-19	1	-0/+5
\| \| \| \| \| \| \| \| \|	Change-Id: I61a3c221e0a15736ab6315e2538c03dac27480a5 BUG: 846240 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4483 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterd: Added option description, and validation function fields.	Avra Sengupta	2013-02-19	1	-14/+22
\| \| \| \| \| \| \| \| \| \| \| \|	In volopt_map_entry table, added option description field, and option validation function pointer. Change-Id: I21c6bccd175970592b470ce3ef3f418cb99a5a43 BUG: 903478 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4535 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	distribute: add hash-name-regex option	Jeff Darcy	2013-02-18	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to support the common "write to temp file then rename" idiom. In our case this causes us to create a linkfile during the rename in (N-1)/N cases, with a significant impact on performance e.g. for UFO which uses this idiom. This patch allows the user to specify up to two regular expressions that separate the permanent and transient parts of a temp-file name: rsync-hash-regex reimplements the existing RSYNC_FRIENDLY_NAME pattern where the temporary name for EXAMPLE is .EXAMPLE.suffix extra-hash-regex can be used for a second pattern, active concurrently, and can include alternatives via the POSIX extended-regex syntax Change-Id: Ic1a6ed19324bc31fefe91ee34b8478877a9c5d2c BUG: 912564 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4116 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterd: Changing the volume entry table's representation.	Avra Sengupta	2013-02-18	1	-138/+699
\| \| \| \| \| \| \| \| \|	Change-Id: Ifd3a20452cc18ec2604aa97776e705b1998be03c BUG: 903478 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4518 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	rpc: bring in root-squashing behavior in rpc	Raghavendra Bhat	2013-02-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* requests coming in as root are converted to nfsnobody * with open-behind some acl checks wont happen and nfsnobody can read the file "whose owner is root and other users do not have permission to read the file". This is becasue open-behind does not send the open to the brick and sends success to the application, thus the acl related tests on the file wont happen which would have prevented the file from being opened. Change-Id: I73afbfd904f0beb3a2ebe807b938ac2fecd4976b BUG: 887145 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4516 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
*	glusterd: Moved the volume entry table to a separate file.	Avra Sengupta	2013-02-15	1	-0/+241
	Change-Id: I893f41bd505fc0e02aec1e71f7a6209759b24a89 BUG: 903478 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4517 Tested-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>