summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd
Commit message (Collapse)AuthorAgeFilesLines
...
* snapshot: Snapshot restoreRajesh Joseph2013-11-156-37/+752
| | | | | | | | | | | | GL-31: Ability to restore snapshot Implemented snapshot restore for thin logical volume. As of now snapshot restore for CG is not tested. Testing for snapshot restore of a volume is done by changing the snapshot create process to create a thick snapshot. This is done because --merge option to restore thin volume is not working in the latest kernel. Change-Id: Ia3ded7e6c4da5957a74e269a25ba3200e6fb2d8b Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd: save snapshot config values in storeshishir gowda2013-11-151-1/+71
| | | | | Change-Id: Ia755e5c4af84827cc9b8876054cc48cfdc598876 Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd: store option to update only snap_list.infoshishir gowda2013-11-152-0/+53
| | | | | Change-Id: I16b17ca60b5f9b34b7d238d8a3424a3b7a1dc435 Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd: snapshot config changesshishir gowda2013-11-155-153/+481
| | | | | | | | Also refactored code in glusterd for create command Additionally, removed brick-op func from mgmt_iniate_all_phases Change-Id: Iddcc332009c5716adee7f2b04c93b352fb983446 Signed-off-by: shishir gowda <sgowda@redhat.com>
* CLI : Snapshot List,Integration with glusterdSachin Pandit2013-11-151-10/+20
| | | | | | | | | | | | | | | | | | Change in Naming convention: "snap_details", "snap_count" and so on is replaced by "snap-details", "snap-count" so on. Total snapcount introduced. Separate check is made for repeated Volume Name Ex : "gluster snapshot list vol1 vol2 vol1 vol2" is considered as "gluster snapshot list vol1 vol2" *This is still a work in progress* *have to test CG list once CG Store is ready* Change-Id: I45e2904eb8bdbf78de8665f20ba9605c38320307 Signed-off-by: Sachin Pandit <spandit@redhat.com>
* mgmt/glusterd: Fix glusterd crash due to extra unrefshishir gowda2013-11-151-2/+4
| | | | | | Change-Id: I9d600b4d971b7fdcd54da50e4a069eab19648fa6 Original-author: Rajesh Joseph <rajeshatredhat@redhat.com> Signed-off-by: shishir gowda <sgowda@redhat.com>
* Snapshot: Added new XDR typesRajesh Joseph2013-11-151-13/+26
| | | | | | | Added new XDR types for all the snapshot command. Change-Id: I46c02ea8e9c81c7967a773386c4b77b5eb6d5075 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd: handle snap volume store and actual volume store separatelyRaghavendra Bhat2013-11-154-21/+64
| | | | | Change-Id: I8b88fe94d0f9ee1089cafdda037abcf2f7a180ca Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: Snapstore fixes to make info file persistentshishir gowda2013-11-151-6/+11
| | | | | Change-Id: I30cbbeb135c2d0a780e9e414ac0a96739e25647b Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/snapshot: brick op for starting/stopping barriershishir gowda2013-11-155-4/+278
| | | | | Change-Id: Iafbd0ec95de0c41455fb79953fb4bb07721334a5 Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd: changes to start the brick process of the snap volumeRaghavendra Bhat2013-11-153-10/+54
| | | | | Change-Id: I54db2fa67ebb6b57629f9536c296fbae07a1d159 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd:store for CGshishir gowda2013-11-154-110/+493
| | | | | Change-Id: I6ec888a5553ad29ded032c02c80dd940b2aae007 Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd: snapshot create commandRaghavendra Bhat2013-11-1511-98/+1511
| | | | | | | | | | | | | | | | | This is still a work in progress. As of now, these things are done: * Take the snapshot of the backend brick * Create the new volume for the snapshot * Create the brick and the client volfiles * Store the snapshot related info in /var/lib/glusterd * Create the snap object representing the snapshot TODO: Start the brick processes for the snapshot Change-Id: I26fbb0f8e5cf004d4c1dbca51819bab1cd1bac15 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: Snapshot list supportRajesh Joseph2013-11-152-19/+1005
| | | | | | | | Handles snapshot list command issued by cli. Details of all the snapshots will be sent back to the caller in required format. Change-Id: I01e512290548007c06e90b40a59cdde048fab954 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd: Store for snapshotshishir gowda2013-11-156-13/+644
| | | | | | | | | | | | | | | | | | | | Introduced a new store for storing snapshot list for a given volume. $GLUSTERD_INSTALL_PATH/vols/<volname>/snap_list.info $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/ $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/info <-snapshot volume info $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/bricks <-snapshot volume brick dir $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/bricks/<infos> <-snapshot volume brick info files store delete options TODO - $GLUSTERD_INSTALL_PATH/CG/ <-place holder for all cg's .../CG/<cg-name>/info <- per cg information placeholder Change-Id: I1f9fd8ff7cc0682d05b33965736a43dca6adb3e9 Signed-off-by: shishir gowda <sgowda@redhat.com>
* glusterd/locks: Adding multiple volume locks supportsAvra Sengupta2013-11-156-109/+275
| | | | | | | Also linking snap create command to mgmt_v3 Change-Id: If2ed29be072e10d0b0bd271d53e48eeaa6501ed7 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* glusterd: Defining interfaces to make use of mgmt_v3 rpcsAvra Sengupta2013-11-153-1/+1246
| | | | | | | | | Defining separate interfaces for every phase to make use of the rpcs and providing set of integrated interfaces for commands to consume Change-Id: I6d464326c5a8b5875a7c2539c9df072b23fe61a9 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/mgmt_v3: Initating complete synctask on glusterdAvra Sengupta2013-11-158-218/+1038
| | | | | | | | | | | | glusterd mgmt_v3 is nothing but a complete synctask approach for glusterd to function. The commands making use of this won't be using the op-state machine to inject events and will be using the synctask framework to perform operations across all nodes in the cluster. This patch defines the program and the handlers used. Change-Id: Ibff2c62b0187c40cdea7254c85786297bba60372 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* cli: snapshot create cli interface.Avra Sengupta2013-11-154-0/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | $ gluster snapshot help snapshot help - display help for snapshot commands snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] - Snapshot Create. $ gluster snapshot create vol1 snapshot create: ???: snap created successfully $ gluster snapshot create vol1 vol2 snapshot create: ???: consistency group created successfully (The ??? will be replaced by the glusterd snap create command with the generated snap-name or cg-name) $ gluster snapshot create vol1 vol2 -n CG1 snapshot create: CG1: consistency group created successfully $ gluster snapshot create vol1 -n snap1 -d Description snapshot create: snap1: snap created successfully $ gluster snapshot create vol1 -n snap1 -d "Description can have -d within quotes" snapshot create: snap1: snap created successfully $ gluster snapshot create vol1 -n snap1 -d Description cant have -d without quotes snapshot create: failed: Options(-n/-d) are not valid descriptions Usage: snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] $ gluster snapshot create vol1 -n "Multi word snap name" -d Description snapshot create: failed: Invalid snap name Usage: snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] $ gluster snapshot create vol1 -d Description -n "-d" snapshot create: failed: Options(-n/-d) are not valid snap names Usage: snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] $ gluster snapshot create vol1 -d -n snap1 snapshot create: failed: No description provided Usage: snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] Change-Id: I74b5a8406d72282fbb7ba7d07e0c7fe395148d38 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/utils: Get brick mount's device nameshishir gowda2013-11-152-0/+47
| | | | | Change-Id: I03ff9e8094e7e36b28b521380949c7e9044c2e4e Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd: Introduce snapshot infrastructureshishir gowda2013-11-1518-190/+2329
| | | | | | | | API's for creating, adding, finding, removing snapshots and consistency groups are provided. Change-Id: Ic28da69a075b062aefdf14754c68259ca58bd427 Signed-off-by: shishir gowda <sgowda@redhat.com>
* glusterd: Volume locks and transaction specific opinfosAvra Sengupta2013-11-151-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | With this patch we are replacing the existing cluster-wide lock taken on glusterds across the cluster, with volume locks which are also taken on glusterds across the cluster, but are volume specific. So with the volume locks we are able to perform more than one gluster operation at the same time, as long as the operations are being performed on different volumes. We maintain a global list of volume-locks (using a dict for a list) where the key is the volume name, and which saves the uuid of the originator glusterd. These locks are held and released per volume transaction. In order to acheive multiple gluster operations occuring at the same time, we also separate opinfos in the op-state-machine, as a part of this patch. To do so, we generate a unique transaction-id (uuid) per gluster transaction. An opinfo is then associated with this transaction id, which is used throughout the transaction. We maintain a run-time global list(using a dict) of transaction-ids, and their respective opinfos to achieve this. Change-Id: Iaad505a854bac8de8f83beec0357eb6cde3f7ea8 BUG: 1011470 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* gNFS: RFE for NFS connection behaviorSantosh Kumar Pradhan2013-11-144-0/+148
| | | | | | | | | | | | | | | Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Transparent data encryption and metadata authenticationEdward Shishkin2013-11-132-11/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. in the systems with non-trusted server This new functionality can be useful in various cloud technologies. It is implemented via a special encryption/crypt translator,which works on the client side and performs encryption and authentication; 1. Class of supported algorithms The crypt translator can support any atomic symmetric block cipher algorithms (which require to pad plain/cipher text before performing encryption/decryption transform (see glossary in atom.c for definitions). In particular, it can support algorithms with the EOF issue (which require to pad the end of file by extra-data). Crypt translator performs translations user -> (offset, size) -> (aligned-offset, padded-size) ->server (and backward), and resolves individual FOPs (write(), truncate(), etc) to read-modify-write sequences. A volume can contain files encrypted by different algorithms of the mentioned class. To change some option value just reconfigure the volume. Currently only one algorithm is supported: AES_XTS. Example of algorithms, which can not be supported by the crypt translator: 1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA; 2. Symmetric block cipher algorithms with inline MACs for data authentication. 2. Implementation notes. a) Atomic algorithms Since any process in a stackable file system manipulates with local data (which can be obsoleted by local data of another process), any atomic cipher algorithm without proper support can lead to non-POSIX behavior. To resolve the "collisions" we introduce locks: before performing FOP->read(), FOP->write(), etc. the process should first lock the file. b) Algorithms with EOF issue Such algorithms require to pad the end of file with some extra-data. Without proper support this will result in losing information about real file size. Keeping a track of real file size is a responsibility of the crypt translator. A special extended attribute with the name "trusted.glusterfs.crypt.att.size" is used for this purpose. All files contained in bricks of encrypted volume do have "padded" sizes. 3. Non-trusted servers and Metadata authentication We assume that server, where user's data is stored on is non-trusted. It means that the server can be subjected to various attacks directed to reveal user's encrypted personal data. We provide protection against such attacks. Every encrypted file has specific private attributes (cipher algorithm id, atom size, etc), which are packed to a string (so-called "format string") and stored as a special extended attribute with the name "trusted.glusterfs.crypt.att.cfmt". We protect the string from tampering. This protection is mandatory, hardcoded and is always on. Without such protection various attacks (based on extending the scope of per-file secret keys) are possible. Our authentication method has been developed in tight collaboration with Red Hat security team and is implemented as "metadata loader of version 1" (see file metadata.c). This method is NIST-compliant and is based on checking 8-byte per-hardlink MACs created(updated) by FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the following unique entities: . file (hardlink) name; . verified file's object id (gfid). Every time, before manipulating with a file, we check it's MACs at FOP->open() time. Some FOPs don't require a file to be opened (e.g. FOP->truncate()). In such cases the crypt translator opens the file mandatory. 4. Generating keys Unique per-file keys are derived by NIST-compliant methods from the a) parent key; b) unique verified object-id of the file (gfid); Per-volume master key, provided by user at mount time is in the root of this "tree of keys". Those keys are used to: 1) encrypt/decrypt file data; 2) encrypt/decrypt file metadata; 3) create per-file and per-link MACs for metadata authentication. 5. Instructions Getting started with crypt translator Example: 1) Create a volume "myvol" and enable encryption: # gluster volume create myvol pepelac:/vols/xvol # gluster volume set myvol encryption on 2) Set location (absolute pathname) of your master key: # gluster volume set myvol encryption.master-key /home/me/mykey 3) Set other options to override default options, if needed. Start the volume. 4) On the client side make sure that the file /home/me/mykey exists and contains proper per-volume master key (that is 256-bit AES key). This key has to be in hex form, i.e. should be represented by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}. The key should start at the beginning of the file. All symbols at offsets >= 64 are ignored. 5) Mount the volume "myvol" on the client side: # glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt After successful mount the file which contains master key may be removed. NOTE: Keeping the master key between mount sessions is in user's competence. ********************************************************************** WARNING! Losing the master key will make content of all regular files inaccessible. Mount with improper master key allows to access content of directories: file names are not encrypted. ********************************************************************** 6. Options of crypt translator 1) "master-key": specifies location (absolute pathname) of the file which contains per-volume master key. There is no default location for master key. 2) "data-key-size": specifies size of per-file key for data encryption Possible values: . "256" default value . "512" 3) "block-size": specifies atom size. Possible values: . "512" . "1024" . "2048" . "4096" default value; 7. Test cases Any workload, which involves the following file operations: ->create(); ->open(); ->readv(); ->writev(); ->truncate(); ->ftruncate(); ->link(); ->unlink(); ->rename(); ->readdirp(). 8. TODOs: 1) Currently size of IOs issued by crypt translator is restricted by block_size (4K by default). We can use larger IOs to improve performance. Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee BUG: 1030058 Signed-off-by: Edward Shishkin <edward@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4667 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* bd: Add support to create clone, snapshot and merge of LV images.M. Mohan Kumar2013-11-134-10/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | Special xattr names "clone" & "snapshot" can be used to create full and linked clone of the LV images. GFID of destination posix file (to be mapped) is passed as a value to the xattr. Destination posix file must exist before running this operation. These operations form a basis for offloading storage related operations from QEMU to GlusterFS. Syntax for full clone: xattr name: "clone" value: "gfid-of-dest-file" Syntax for linked clone: xattr name: "snapshot" value: "gfid-of-dest-file" Syntax for merging: xattr name: "merge" value: "path-to-snapshot-file" Example: setfattr -n clone -v <gfid-of-dest-file> /media/source setfattr -n snapshot -v <gfid-of-dest-file> /media/source setfattr -n merge -v "/media/sn" /media/sn Change-Id: Id9f984a709d4c2e52a64ae75bb12a8ecb01f8776 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add aio support to BD xlatorM. Mohan Kumar2013-11-131-0/+4
| | | | | | | | | | | | Volume option bd-aio controls AIO feature for BD xlator. Code taken from posix-aio.c Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5748 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: posix/multi-brick support to BD xlatorM. Mohan Kumar2013-11-1311-1/+367
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current BD xlator (block backend) has a few limitations such as * Creation of directories not supported * Supports only single brick * Does not use extended attributes (and client gfid) like posix xlator * Creation of special files (symbolic links, device nodes etc) not supported Basic limitation of not allowing directory creation is blocking oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM creates multi-level directories when GlusterFS is used as storage backend for storing VM images. To overcome these limitations a new BD xlator with following improvements is suggested. * New hybrid BD xlator that handles both regular files and block device files * The volume will have both POSIX and BD bricks. Regular files are created on POSIX bricks, block devices are created on the BD brick (VG) * BD xlator leverages exiting POSIX xlator for most POSIX calls and hence sits above the POSIX xlator * Block device file is differentiated from regular file by an extended attribute * The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a posix file to Logical Volume (LV). * When a client sends a request to set BD_XATTR on a posix file, a new LV is created and mapped to posix file. So every block device will have a representative file in POSIX brick with 'user.glusterfs.bd' (BD_XATTR) set. * Here after all operations on this file results in LV related operations. For example opening a file that has BD_XATTR set results in opening the LV block device, reading results in reading the corresponding LV block device. When BD xlator gets request to set BD_XATTR via setxattr call, it creates a LV and information about this LV is placed in the xattr of the posix file. xattr "user.glusterfs.bd" used to identify that posix file is mapped to BD. Usage: Server side: [root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2 It creates a distributed gluster volume 'bdvol' with Volume Group vg1 using posix brick /storage/vg1_info in host1 and Volume Group vg2 using /storage/vg2_info in host2. [root@host1 ~]# gluster volume start bdvol Client side: [root@node ~]# mount -t glusterfs host1:/bdvol /media [root@node ~]# touch /media/posix It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# mkdir /media/image [root@node ~]# touch /media/image/lv1 It also creates regular posix file 'lv1' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1 [root@node ~]# Above setxattr results in creating a new LV in corresponding brick's VG and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size' [root@node ~]# truncate -s5G /media/image/lv1 It results in resizig LV 'lv1'to 5G New BD xlator code is placed in xlators/storage/bd directory. Also add volume-uuid to the VG so that same VG can't be used for other bricks/volumes. After deleting a gluster volume, one has to manually remove the associated tag using vgchange <vg-name> --deltag <trusted.glusterfs.volume-id:<volume-id>> Changes from previous version V5: * Removed support for delayed deleting of LVs Changes from previous version V4: * Consolidated the patches * Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR. Changes from previous version V3: * Added support in FUSE to support full/linked clone * Added support to merge snapshots and provide information about origin * bd_map xlator removed * iatt structure used in inode_ctx. iatt is cached and updated during fsync/flush * aio support * Type and capabilities of volume are exported through getxattr Changes from version 2: * Used inode_context for caching BD size and to check if loc/fd is BD or not. * Added GlusterFS server offloaded copy and snapshot through setfattr FOP. As part of this libgfapi is modified. * BD xlator supports stripe * During unlinking if a LV file is already opened, its added to delete list and bd_del_thread tries to delete from this list when a last reference to that file is closed. Changes from previous version: * gfid is used as name of LV * ? is used to specify VG name for creating BD volume in volume create, add-brick. gluster volume create volname host:/path?vg * open-behind issue is fixed * A replicate brick can be added dynamically and LVs from source brick are replicated to destination brick * A distribute brick can be added dynamically and rebalance operation distributes existing LVs/files to the new brick * Thin provisioning support added. * bd_map xlator support retained * setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and setfattr -n user.glusterfs.bd -v "thin" creates thin LV * Capability and backend information added to gluster volume info (and --xml) so that management tools can exploit BD xlator. * tracing support for bd xlator added TODO: * Add support to display snapshots for a given LV * Display posix filename for list-origin instead of gfid Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/4809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd_map: Remove bd_map xlatorM. Mohan Kumar2013-11-1311-423/+16
| | | | | | | | | | | Remove bd_map xlator and CLI related changes. Change-Id: If7086205df1907127c1a1fa4ba603f1c48421d09 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5747 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/compress: Compression/DeCompression translatorPrashanth Pai2013-11-112-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * When a writev call occurs, the client compresses the data before sending it to server. On the server, compressed data is decompressed. Similarly, when a readv call occurs, the server compresses the data before sending it to client. On the client, the compressed data is decompressed. Thus the amount of data sent over the wire is minimized. * Compression/Decompression is done using Zlib library. * During normal operation, this is the format of data sent over wire : <compressed-data> + trailer(8) The trailer contains the CRC32 checksum and length of original uncompressed data. This is used for validation. HOW TO USE ---------- Turning on compression xlator: gluster volume set <vol_name> compress on Configurable options: gluster volume set <vol_name> compress.compression-level 8 gluster volume set <vol_name> compress.min-size 50 Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805 BUG: 923540 Original-author : Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Prashanth Pai <nullpai@gmail.com> Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: http://review.gluster.org/3251 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: modify remove-brick CLI message.Ravishankar N ravishankar@redhat.com2013-11-071-5/+4
| | | | | | | | | | | | | | | | | | | | | In the current context "replica_cnt" is used just to know whether the specific key exists or not by calling "dict_get_int32", which we can replace by "dict_get ()". And changing the log message as it is more appropriate to say "migration of data" rather than "rebalance". This patch refactors commit 51c6fa7a354826744de98 against BZ 961669 reviewed on : http://review.gluster.org/5566 Change-Id: I48eae206a28d4083975e64407ed8fe4539f9c24b BUG: 1027270 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Original patch: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6001 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: susant palai <spalai@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: add option to specify a different base-portKaleb S. KEITHLEY2013-10-314-117/+127
| | | | | | | | | | | | | | | | | | This is (arguably) a hack to work around a bug in libvirt which is not well behaved wrt to using TCP ports in the unreserved space between 49152-65535. (See RFC 6335) Normally glusterd starts and binds to the first available port in range, usually 49152. libvirt's live migration also tries to use ports in this range, but has no fallback to use (an)other port(s) when the one it wants is already in use. Change-Id: Id8fe35c08b6ce4f268d46804bbb6dddab7a6b7bb BUG: 1018178 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/6210 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Release big-lock after log-rotate handler returnsKrutika Dhananjay2013-10-301-1/+8
| | | | | | | | | | | | ... so that subsequent volume commands don't block waiting forever, for the lock to be released. Change-Id: I24b5ec47f6982900ab74ff1b492d523f31ecfb7f BUG: 1022055 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6122 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd : Improved quota volume reset commandAnuradha2013-10-282-10/+23
| | | | | | | | | | | | | | | | | | | | | | | Quota volume reset command without "force" option fixed, doesn't fail anymore. It resets unprotected fields and not the protected ones. Also, an appropriate message is provided to the user for the following cases : 1. only unprotected fields are reset, "force" option should be used to reset protected fields. 2. Both protected and unprotected fields are reset. 3. No field was reset, "force" option required. Test case for the same also added. Change-Id: I24e8f1be87b79ccd81bf6f933e00608b861c7a16 BUG: 1022905 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/6135 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rpcsvc: implement per-client RPC throttlingAnand Avati2013-10-281-0/+12
| | | | | | | | | | | | | | | | Implement a limit on the total number of outstanding RPC requests from a given cient. Once the limit is reached the client socket is removed from POLL-IN event polling. Change-Id: I8071b8c89b78d02e830e6af5a540308199d6bdcd BUG: 1008301 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6114 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: Relax extended attribute checks for volume create and add ↵Vijay Bellur2013-10-172-5/+10
| | | | | | | | | | | | | | brick force. Expectation with force is that user is aware of the consequences of sanity checks not being triggered. Change-Id: I79dfeed16a23829a7217cef33ab83f9f0ffae336 Signed-off-by: Vijay Bellur <vbellur@redhat.com> BUG: 1007509 Reviewed-on: http://review.gluster.org/5746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli,glusterd: Changes to cli-glusterd communicationKaushal M2013-10-174-10/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Glusterd changes: With this patch, glusterd creates a socket file in DATADIR/run/glusterd.socket , and listen on it for cli requests. It listens for 2 rpc programs on the socket file, - The glusterd cli rpc program, for all cli commands - A reduced glusterd handshake program, just for the 'system:: getspec' command The location of the socket file can be changed with the glusterd option 'glusterd-sockfile'. To retain compatibility with the '--remote-host' cli option, glusterd also listens for the cli requests on port 24007. But, for the sake of security, it listens using a reduced cli rpc program on the port. The reduced rpc program only contains read-only procs used for 'volume (info|list|status)', 'peer status' and 'system:: getwd' cli commands. CLI changes: The gluster cli now uses the glusterd socket file for communicating with glusterd by default. A new option '--gluster-sock' has been added to allow specifying the sockfile used to connect. Using the '--remote-host' option will make cli connect to the given host & port. Tests changes: cluster.rc has been modified to make use of socket files and use different log files for each glusterd. Some of the tests using cluster.rc have been fixed. Change-Id: Iaf24bc22f42f8014a5fa300ce37c7fc9b1b92b53 BUG: 980754 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5280 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Add monotonic clocking counter for timer threadHarshavardhana2013-10-151-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gettimeofday() returns the current wall clock time and timezone. Using these functions in order to measure the passage of time (how long an operation took) therefore seems like a no-brainer. This time suffer's from some limitations: a. They have a low resolution: “High-performance” timing by definition, requires clock resolutions into the microseconds or better. b. They can jump forwards and backwards in time: Computer clocks all tick at slightly different rates, which causes the time to drift. Most systems have NTP enabled which periodically adjusts the system clock to keep them in sync with “actual” time. The adjustment can cause the clock to suddenly jump forward (artificially inflating your timing numbers) or jump backwards (causing your timing calculations to go negative or hugely positive). In such cases timer thread could go into an infinite loop. From 'man gettimeofday': ---------- .. .. The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2). .. .. ---------- Rationale: For calculating interval timing for Timer thread, all that’s needed should be clock as a simple counter that increments at a stable rate. This is necessary to avoid the jumps which are caused by using "wall time", this counter must be monotonic that can never “tick” backwards, ever. Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb BUG: 1017993 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: [Feature] Command implementation to get heal-countVenkatesh Somyajulu2013-10-144-25/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently to know the number of files to be healed, either user has to go to backend and check the number of entries present in indices/xattrop directory. But if a volume consists of large number of bricks, going to each backend and counting the number of entries is a time-taking task. Otherwise user can give gluster volume heal vol-name info command but with this approach if no. of entries are very hugh in the indices/ xattrop directory, it will comsume time. So as a feature, new command is implemented. Command 1: gluster volume heal vn statistics heal-count This command will get the number of entries present in every brick of a volume. The output displays only entries count. Command 2: gluster volume heal vn statistics heal-count replica 192.168.122.1:/home/user/brickname Here if we are concerned with just one replica. So providing any one of the brick of a replica will get the number of entries to be healed for that replica only. Example: Replicate volume with replica count 2. Backend status: -------------- [root@dhcp-0-17 xattrop]# ls -lia | wc -l 1918 NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy entries so actual no. of entries to be healed are 1916. [root@dhcp-0-17 xattrop]# pwd /home/user/2ty/.glusterfs/indices/xattrop Command output: -------------- Gathering count of entries to be healed on volume volume3 has been successful Brick 192.168.122.1:/home/user/22iu Status: Brick is Not connected Entries count is not available Brick 192.168.122.1:/home/user/2ty Number of entries: 1916 Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290 BUG: 1015990 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6044 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr : Implementation of command "gluster volume heal vn statistics"Venkatesh Somyajulu2013-10-142-1/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "gluster volume heal volumename statistics" command gives the summary of the afr crawl done based on the entries present in the xattrop directory. Whenever afr crawls are attempted, the beginning time of crawl, end time of crawl, no of files healed, heal-failed count and number of files in split brain are shown along with the type of the crawl. If crawl is already in progress then it will give the number of files healed, heal failed count and number of files in split-brain from the beginning of the crawl and instead of telling the end time of the crawl, "CRAWL IN PROGRESS" message will be shown. Output format: command: "gluster volume heal volume-name statistics" Output: Gathering afr crawl statistics crawl statistics on volume volume-name has been successful ------------------------------------------------ Crawl statistics for brick no 0 Hostname of brick 192.168.122.248 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 ------------------------------------------------ Crawl statistics for brick no 1 Hostname of brick 192.168.122.1 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 -------------------------------------------------- Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9 BUG: 949400 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4790 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli,glusterd: Implement 'volume status tasks'Krutika Dhananjay2013-10-084-27/+91
| | | | | | | | | | | | | | | | | | | oVirt's Gluster Integration needs an inexpensive command that can be executed every 10 seconds to monitor async tasks and their parameters, for all volumes. The solution involves adding a 'tasks' sub-command to 'volume status' to fetch only the async task IDs, type and other relevant parameters. Only the originator glusterd participates in this command as all the information needed is available on all the nodes. This is to make the command suitable for being executed every 10 seconds. Change-Id: I1edc607baf29b001a5585079dec681d7c641b3d1 BUG: 1012346 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6006 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* cli: add node uuid in rebalance and remove brick status xml outputBala.FA2013-10-031-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds node uuid in rebalance/remove-brick status xml output. Output XML will look like <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <opRet>0</opRet> <opErrno>0</opErrno> <opErrstr/> <volRebalance> <op>3</op> <nodeCount>1</nodeCount> <node> <nodeName>localhost</nodeName> ==>> <id>883626f8-4d29-4d02-8c5d-c9f48c5b2445</id> <files>0</files> <size>0</size> <lookups>0</lookups> <failures>0</failures> <status>3</status> <statusStr>completed</statusStr> </node> <aggregate> <files>0</files> <size>0</size> <lookups>0</lookups> <failures>0</failures> <status>3</status> <statusStr>completed</statusStr> </aggregate> </volRebalance> </cliOutput> Change-Id: I5a1d4f9043b33b9e88150647a243ddb16154e843 BUG: 1012296 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6005 Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Validating invalid log-level under geo-rep config options.Ajeet Jha2013-10-032-5/+18
| | | | | | | | | | | Change-Id: I8ff6b48ef41fd6e9ea68c42dfb9878f8a08ed627 BUG: 1010874 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/5989 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* glusterd: Calculate volume op-versions only on set/resetKaushal M2013-09-293-6/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The volume op-versions are calculated during a volume set/reset, reading a volume from disk and importing a volume during probe or volume sync. The calculation of the volume op-version depends on the clusters op-version as some features are enabled automatically depending on the clusters op-version. We also don't store the volume op-versions persistently and don't export the volume op-versions during sync. Due to this, there can occur cases which will lead to inconsistencies in volumes in different peers. One such case is below, Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N. The cluster has two volumes V1 and V2, which have volume op-versions N (since volume op-version cannot be greater than cluster op-version). We have, Cluster-op-version = N V1 op-version = N V2 op-version = N A set operation on V1 causes the clusters op-version to be bumped up to N+1. Assume that there exist some features that are automatically enabled on op-version N+1. The op-version of V2 remains at N as no operation has been performed on it. So, Cluster op-version = N+1 V1 op-version = N+1 V2 op-version = N Now, we probe a new peer P4. On the new peer we will have the following op-versions, Cluster op-version = N+1 V1 op-version = N+1 V2 op-version = N+1 This happens because we don't send volume op-versions during the sync after probe. P4 will freshly calculate the op-version of V2 (assuming features have been auto enabled due to the cluster op-version being N+1) as N+1. Another case is when glusterd on a peer restarts. Assume P3 was restarted, glusterd will recalculate the volume op-versions during the restore state. Again, op-version of V2 will be calculated as N+1 assuming auto enabled features. This will lead to inconsistency in the volume representation in memory and on disk, as glusterd will assume the volume contains auto enabled features, but the volfiles don't contain them as they were not regenrated. These kind of issues can be solved by calculating the volume op-version only when features are enabled and disabled (ie. during volume set/reset), persisting the volume-op-versions and exporting/importing them. Change-Id: I52de0668c92628622e85f4588fb28829a7231132 BUG: 1005043 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5568 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Fix storing volumes on setting global optsKaushal M2013-09-291-3/+6
| | | | | | | | | | | | | | | | | | Glusterd would not store all the volumes when a global options were set. When setting a global option, like 'nfs.*' options, glusterd used to modify the volinfo for all the volumes, but would store only the volinfo for the named volume. This lead to mismatch in the persisted and the in-memory representation of those volumes, which lead to problems like peers being rejected because of volume mismatches. Change-Id: I8bca10585e34b7135cb32af0055dbd462b3fb9b5 BUG: 1012400 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6007 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Set errstr appropriately on peer op failureKrutika Dhananjay2013-09-291-11/+17
| | | | | | | | | | Change-Id: I27f5f7cd54115d7b236b42f6beaaa05a8b379dd7 BUG: 1010153 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/5978 Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gNFS: Incorrect NFS ACL encoding for XFSSantosh Kumar Pradhan2013-09-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Incorrect NFS ACL encoding causes "system.posix_acl_default" setxattr failure on bricks on XFS file system. XFS (potentially others?) doesn't understand when the 0x10 prefix is added to the ACL type field for default ACLs (which the Linux NFS client adds) which causes setfacl()->setxattr() to fail silently. NFS client adds NFS_ACL_DEFAULT(0x1000) for default ACL. FIX: Mask the prefix (added by NFS client) OFF, so the setfacl is not rejected when it hits the FS. Original patch by: "Richard Wareing" Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5980 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* logging: Remove multiple definitions of DEFAULT_LOG_FILE_DIRECTORYVijay Bellur2013-09-241-1/+0
| | | | | | | | | Change-Id: I8d670a228d3c1282aa7d70b151f166d04abc40e5 BUG: 764890 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/5909 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* glusterd/cli: Status detail cli parse check and vol geo status crash fixAvra Sengupta2013-09-211-7/+12
| | | | | | | | | | Change-Id: I1841864273fc4242de15fbfcf76fd5de40269f28 BUG: 1006249 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5889 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Adding transaction checks for cluster unlock.Avra Sengupta2013-09-201-3/+12
| | | | | | | | | | | | | | | | | | | | | | While a gluster command holding lock is in execution, any other gluster command which tries to run will fail to acquire the lock. As a result command#2 will follow the cleanup code flow, which also includes unlocking the held locks. As both the commands are run from the same node, command#2 will end up releasing the locks held by command#1 even before command#1 reaches completion. Now we call the unlock routine in the code path, of the cluster has been locked during the same transaction. Signed-off-by: Avra Sengupta <asengupt@redhat.com> Change-Id: I7b7aa4d4c7e565e982b75b8ed1e550fca528c834 BUG: 1008172 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5937 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: NFS daemon is limiting IOs to 64KBSantosh Kumar Pradhan2013-09-201-0/+18
| | | | | | | | | | | | | | | | | | | | | | | Problem: Gluster NFS server is hard-coding the max rsize/wsize to 64KB which is very less for NFS running over 10GE NIC. The existing options nfs.read-size, nfs.write-size are not working as expected. FIX: Make the options nfs.read-size (for rsize) and nfs.write-size (for wsize) work to tune the NFS I/O size. Value range would be 4KB(Min)-64KB(Default)-1MB(max). NB: Credit to "Richard Wareing" for catching it. Change-Id: I2754ecb0975692304308be8bcf496c713355f1c8 BUG: 1009223 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5964 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>