summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src/glusterd-volgen.c
Commit message (Collapse)AuthorAgeFilesLines
...
* afr: arbiter xlatorRavishankar N2015-03-191-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds the arbiter translator into the tree. This is a server side xlator used for replica 3 volumes. It sits above posix and will be loaded on the 3rd (last) brick of every afr subvolume in a replica 3 configuration. It intercepts inode read/write operations: reads are unwound with ENOTCONN, inode writes are unwound with success without actually passing them down to posix. Metadata operations are allowed to pass through. The CLI for creating a 3 way replica with arbiter is also added but kept disabled (A 'normal' 3 way replica is created instead). This patch is a part of the arbiter logic implementation for 3 way AFR, details of which can be found at http://review.gluster.org/#/c/9656/ Change-Id: I395b81f49d5da52c466daf5c8518f1bbad9c16fa BUG: 1199985 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9840 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: CLI commands to create and manage tiered volumes.Dan Lambright2015-03-191-22/+159
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A tiered volume is a normal volume with some number of new bricks representing "hot" storage. The "hot" bricks can be attached or detached dynamically to a normal volume. When this happens, a new graph is constructed. The root of the new graph is an instance of the tier translator. One subvolume of the tier translator leads to the old volume, and another leads to the new hot bricks. attach-tier <VOLNAME> [<replica> <COUNT>] <NEW-BRICK> ... [force] volume detach-tier <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force> gluster volume rebalance <volume> tier start gluster volume rebalance <volume> tier stop gluster volume rebalance <volume> tier status The "tier start" CLI command starts a server side daemon. The daemon initiates file level migration based on caching policies. The daemon's status can be monitored and stopped. Note development on the "tier status" command is incomplete. It will be added in a subsequent patch. When the "hot" storage is detached, the tier translator is removed from the graph and the tiered volume reverts to its original state as described in the volume's info file. For more background and design see the feature page [1]. [1] http://www.gluster.org/community/documentation/index.php/Features/data-classification Change-Id: Ic8042ce37327b850b9e199236e5be3dae95d2472 BUG: 1194753 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9753 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Adding ChangeTimeRecorder(CTR) Xlator to GlusterFSJoseph Fernandes2015-03-191-2/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ********************************************************************** ChangeTimeRecorder(CTR) Xlator | ********************************************************************** ChangeTimeRecorder(CTR) is server side xlator(translator) which sits just above posix xlator. The main role of this xlator is to record the access/write patterns on a file residing the brick. It records the read(only data) and write(data and metadata) times and also count on how many times a file is read or written. This xlator also captures the hard links to a file(as its required by data tiering to move files). CTR Xlator is the consumer of libgfdb. To Enable/Disable CTR Xlator: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gluster volume set <volume-name> features.ctr-enabled {on/off} To Enable/Disable Frequency Counter Recording in CTR Xlator: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gluster volume set <volume-name> features.record-counters {on/off} Change-Id: I5d3cf056af61ac8e3f8250321a27cb240a214ac2 BUG: 1194753 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/9935 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* NFS-Ganesha: Volume set option for managing NFS-Ganesha exports.Meghana Madhusudhan2015-03-181-28/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | A dummy translator has been introduced as a place holder for functions related to managing NFS-Ganesha exports. A volume set option is introduced to manage volume level exports. gluster vol set <volname> ganesha.enable ON/OFF 1. gluster volume set <volname> ganesha.enable ON It creates the export config file with a unique export ID. Sends a DBus signal to export this volume dynamically. 2. gluster vol set <volname> ganesha.enable OFF Unexports the specific volume. Deletes the specfic config file related to the volume. This change also removes the handling of the older keys "nfs-ganesha.enable" and "nfs-ganesha.host" Change-Id: I8d4a0b542326a6a0c8e4711600b106274d666587 BUG: 1188184 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/9585 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Upcall: New xlator to store various states and send cbk eventsSoumya Koduri2015-03-171-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | Framework on the server-side, to handle certain state of the files accessed and send notifications to the clients connected. A generic and extensible framework, used to maintain states in the glusterfsd process for each of the files accessed (including the clients info doing the fops) and send notifications to the respective glusterfs clients incase of any change in that state. This patch handles "Inode Update/Invalidation" upcall event. Feature page: URL: http://www.gluster.org/community/documentation/index.php/Features/Upcall-infrastructure Below link has a writeup which explains the code changes done - URL: https://soumyakoduri.wordpress.com/2015/02/25/glusterfs-understanding-upcall-infrastructure-and-cache-invalidation-support/ Change-Id: Ie3d724be9a3419fcf18901a753e8ec2df2ac802f BUG: 1200262 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/9535 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Features/trash : Combined patches for trash translatorAnoop C S2015-03-161-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | This is the combined patch set for supporting trash feature. http://www.gluster.org/community/documentation/index.php/Features/Trash Current patch includes the following features: * volume set options for enabling trash globally and exclusively for internal operations like self-heal and re-balance * volume set options for setting the eliminate path, trash directory path and maximum trashable file size. * test script for checking the functionality of the feature * brief documentation on different aspects of trash feature. Change-Id: Ic7486982dcd6e295d1eba0f4d5ee6d33bf1b4cb3 BUG: 1132465 Signed-off-by: Anoop C S <achiraya@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/8312 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: Changes required for disperse volume heal commandsPranith Kumar K2015-03-101-0/+6
| | | | | | | | | | | | | - Include xattrop64-watchlist for index xlator for disperse volumes. - Change the functions that exist to consider disperse volumes also for sending commands to disperse xls in self-heal-daemon. Change-Id: Iae75a5d3dd5642454a2ebf5840feba35780d8adb BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9793 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: Replace libglusterfs lists with liburcu listsKaushal M2015-03-031-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces usage of the libglusterfs lists data structures and API in glusterd with the lists data structures and API from liburcu. The liburcu data structes and APIs are a drop-in replacement for libglusterfs lists. All usages have been changed to keep the code consistent, and free from confusion. NOTE: glusterd_conf_t->xprt_list still uses the libglusterfs data structures and API, as it holds rpc_transport_t objects, which is not a part of glusterd and is not being changed in this patch. This change was developed on the git branch at [1]. This commit is a combination of the following commits on the development branch. 6dac576 Replace libglusterfs lists with liburcu lists a51b5ab Fix compilation issues d98a06f Fix merge issues a5d918e Remove merge remnant 1cca113 More style cleanup 1917be3 Address review comments on 9624/1 8d10f13 Use cds_lists for glusterd_svc_t 524ad5d Add rculist header in glusterd-conn-helper.c 646f294 glusterd: add list_add_order API honouring rcu [1]: https://github.com/kshlm/glusterfs/tree/urcu Change-Id: Ic613c5b6e496a677b9d3de15fc042a0492109fb0 BUG: 1191030 Signed-off-by: Kaushal M <kaushal@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/9624 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
* mgmt/glusterd: Refactor brick graph generation in volgen.Vijay Bellur2015-02-261-280/+548
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit does the following: 1. Adds several new functions for generation of brick xlator units in a volgen. Each such function takes care of generation of only one xlator in volgen. 2. A new table, server_graph_table, links all individual graph generation functions together. The order of xlator function generators in the table determines the topology of the brick graph. 3. server_graph_builder() invokes individual graph generators by walking through server_graph_table. Addition of debug xlators into the brick graph is also handled by this walk. As a result, a lot of cruft that is present in the exisiting implementation of this function gets cleaned up. 4. get_server_xlator() now makes use of server_graph_table to determine whether a xlator key corresponds to a server xlator or not. Change-Id: I46bb6e331544150302eb5b33c4007917aff2586d BUG: 1188196 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/9751 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: nfs,shd,quotad,snapd daemons refactoringAtin Mukherjee2015-02-201-252/+86
| | | | | | | | | | | | | This patch ports nfs, shd, quotad & snapd with the approach suggested in http://www.gluster.org/pipermail/gluster-devel/2014-December/043180.html Change-Id: I4ea5b38793f87fc85cc9d2cf873727351dedffd2 BUG: 1191486 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/9428 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
* glusterd/geo-rep: Allow replace/remove brick if geo-rep is stopped.Kotresh HR2015-02-161-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Replace brick: If geo-replication was configured on a volume, replace brick used to fail. This patch allows replace brick to go through if all geo-rep sessions corresponding to that volume is stopped. Remove brick: There was no check for geo-replication for remove brick. Enforce 'remove brick commit' to fail if geo-rep session corresponding to volume is running. Allow 'remove brick commit' only if all of the geo-rep sessions corresponding to that volume is stopped. Code is re-organized for better readability. Change-Id: I02282c2764d8b81e319489c977847e6e437511a4 BUG: 1179638 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/9402 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: ajeet jha <ajha@redhat.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* mgmt/glusterd: Implement Volume heal enable/disablePranith Kumar K2015-01-201-160/+270
| | | | | | | | | | | | | | | | | | For volumes with replicate, disperse xlators, self-heal daemon should do healing. This patch provides enable/disable functionality for the xlators to be part of self-heal-daemon. Replicate already had this functionality with 'gluster volume set cluster.self-heal-daemon on/off'. But this patch makes it uniform for both types of volumes. Internally it still does 'volume set' based on the volume type. Change-Id: Ie0f3799b74c2afef9ac658ef3d50dce3e8072b29 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9358 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: Refactor glusterd-utils.cAvra Sengupta2015-01-081-0/+1
| | | | | | | | | | | | | | | | Refactor glusterd-utils.c to create glusterd-snapshot-utils.c consisting of all snapshot utility functions. Change-Id: Id9823a2aec9b115f9c040c9940f288d4fe753d9b BUG: 1176770 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/9391 Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd/uss: Create rebalance volfile.Avra Sengupta2014-11-301-14/+47
| | | | | | | | | | | | | | | | | | | | Create a new rebalance volfile, which will not contain snap-view client translators, irrespective of the status of USS. This volfile, will be created and regenerated everytime the fuse-volfile is generated, and will be consumed by the rebalance process. Change-Id: I514a8e88d06c0b8fb6949c3a3e6dc4dbe55e38af BUG: 1164711 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/9190 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* rdma :mount fails for nfs protocol in rdma volumesJiffin Tony Thottan2014-11-191-8/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we mount rdma only volume or tcp,rdma volume using newly peer probed IP's(nfs-server on new nodes) through nfs protocol, mount fails for rdma only volume and mount happens with help of tcp protocol in the case of tcp,rdma volumes. That is for newly added servers will always get transport type as "socket". This is due to nfs_transport_type is exported correctly and imported wrongly. This can be verified by the following , * Create a rdma only volume or tcp,rdma volume * Add a new server into the trusted pool. * Checkout the client transport type specified nfs-server volgraph.It will be always tcp(socket type) instead of rdma. * And also for rdma only volume in the nfs log, we can see 'connection refused' message for every reconnect between nfs server and glusterfsd. BUG: 1157381 Change-Id: I6bd4979e31adfc72af92c1da06a332557b6289e2 Author: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/8975 Reviewed-by: Meghana M <mmadhusu@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com>
* rdma: Wrong volfile fetch on fuse mounting tcp,rdma volume via rdmaAnoop C S2014-11-181-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of now for both tcp only volumes and rdma only volumes, volfile names are in the format <volname>-fuse.vol. This patch will change the client volfile namings as shown below. * TCP mounts always use <volname>-fuse.vol * RDMA mounts always use <volname>.rdma-fuse.vol Following the above naming convention, for tcp,rdma volumes both volfiles will be present under /var/lib/glusterd/vols/<volname>/ such that rdma only volume can be mounted as mount -t glusterfs -o transport=rdma <server/ip>:/<volname> <mount-point> OR mount -t glusterfs <server/ip>:/<volname>.rdma <mount-point> The above command format can also be used to fuse mount a tcp,rdma volume via rdma transport. When we try to fuse mount a tcp,rdma volume with transport-type as rdma it silently mounts via tcp. This change will also make sure that it fetches the correct volfile based on the transport-type specified from client side. BUG: 1131502 Change-Id: I34da4b01ac813b69494a43188f51145457412923 Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/8498 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* mgmt/glusterd: Validate the options of ussvmallika2014-11-141-1/+4
| | | | | | | | | | | Change-Id: Id13dc4cd3f5246446a9dfeabc9caa52f91477524 BUG: 1111554 Signed-off-by: Varun Shastry <vshastry@redhat.com> Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/8133 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: file-snapshot and features-encryption options should be validate ↵Gaurav Kumar Garg2014-09-251-2/+3
| | | | | | | | | | | | | | | | | | | | | | correctly By giving non-boolean value to volume set command for features.file-snapshot and features.encryption option the command failed after that subsequent volume set request with valid value of the existing any volume set option fail. Previously when user supplies a non-boolean value in volume set command for features.file-snapshot and features.encryption option's then validation of that value was done by volinfo->dict but actual value of that option store in input dictonary. Now with this change it will refer correct dictonary for validation of supplies value. Change-Id: I4a93d8be848cd33fdf4b4eb9b1a8d15ec9d1e66a BUG: 1140162 Reviewed-on: http://review.gluster.org/8688 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* cli/glusterd: Support of volume get for a specific volume optionAtin Mukherjee2014-08-261-134/+5
| | | | | | | | | | | | | | This patch introduces a cli command to display a specific volume option/all volume options of a specific volume with the following usage: Usage: volume get <VOLNAME> <key|all> Change-Id: Ic88edb33c5509d7a37cd5ade6341e45e3cdbf59d BUG: 983317 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/8305 Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Coverity fix for going out of scope leaks of directory pointerggarg2014-08-131-0/+2
| | | | | | | | | | | | | | | | In function volgen_apply_filters() directory stream associated with "filterdir" should be close after opening directory stream corresponding to directory name. closedir() also closes the underlying file descriptor associated with "filterdir". Coverity CID: 1124723 Change-Id: I78ed04047ded98bf95d201afed01c727aa506882 BUG: 789278 Reviewed-on: http://review.gluster.org/8088 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* feature/geo-rep: Keep marker.tstamp's mtime unchangeable during snapshot.Kotresh H R2014-08-041-11/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Geo-replicatoin does a full xsync crawl after snapshot restoration of slave and master. It does not do history crawl. Analysis: Marker creates 'marker.tstamp' file when geo-rep is started for the first time. The virtual extended attribute 'trusted.glusterfs.volume-mark' is maintained and whenever it is queried on gluster mount point, marker fills it on the fly and returns the combination of uuid, ctime of marker.tstamp and others. So ctime of marker.tstamp, in other sense 'volume-mark' marks the geo-rep start time when the session is freshly created. From the above, after the first filesystem crawl(xsync) is done during first geo-rep start, stime should always be less than 'volume-mark'. So whenever stime is less than volume-mark, it does full filesystem crawl (xsync). Root Cause: When snapshot is restored, marker.tstamp file is freshly created losing the timestamps, it was originally created with. Solution: 1. Change is made to depend on mtime instead of ctime. 2. mtime and atime of marker.tstamp is restored back when snapshot is created and restored. Change-Id: I4891b112f4aedc50cfae402832c50c5145807d7a BUG: 1125918 Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/8401 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli/glusterd: Added support for dispersed volumesXavier Hernandez2014-07-111-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two new options have been added to the 'create' command of the cli interface: disperse [<count>] redundancy <count> Both are optional. A dispersed volume is created by specifying, at least, one of them. If 'disperse' is missing or it's present but '<count>' does not, the number of bricks enumerated in the command line is taken as the disperse count. If 'redundancy' is missing, the lowest optimal value is assumed. A configuration is considered optimal (for most workloads) when the disperse count - redundancy count is a power of 2. If the resulting redundancy is 1, the volume is created normally, but if it's greater than 1, a warning is shown to the user and he/she must answer yes/no to continue volume creation. If there isn't any optimal value for the given number of bricks, a warning is also shown and, if the user accepts, a redundancy of 1 is used. If 'redundancy' is specified and the resulting volume is not optimal, another warning is shown to the user. A distributed-disperse volume can be created using a number of bricks multiple of the disperse count. Change-Id: Iab93efbe78e905cdb91f54f3741599f7ea6645e4 BUG: 1118629 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/7782 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* socket: add certificate-depth and cipher-list options for SSLJeff Darcy2014-07-041-0/+75
| | | | | | | | | | Change-Id: I82757f8461807301a4a4f28c4f5bf7f0ee315113 BUG: 1114604 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/8040 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rpc/auth: allow SSL identity to be used for authorizationJeff Darcy2014-07-021-0/+10
| | | | | | | | | | | | | | | | | | | Access to a volume is now controlled by the following options, based on whether SSL is enabled or not. * server.ssl-allow: get identity from certificate, no password needed * auth.allow: get identity and matching password from command line It is not possible to allow both simultaneously, since the connection itself is either using SSL or it isn't. Change-Id: I5a5be66520f56778563d62f4b3ab35c66cc41ac0 BUG: 1114604 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/3695 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gNFS: Make NFS DRC off by defaultNiels de Vos2014-06-091-1/+1
| | | | | | | | | | | | | | | DRC in NFS causes memory bloat and there are known memory corruptions. It would be good to disable drc by default till the feature is stable. Change-Id: I93db6ef5298672c56fb117370bb582a5e5550b17 BUG: 1105524 Original-patch-by: Santosh Kumar Pradhan <spradhan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8004 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Changes to provide interface for USSVarun Shastry2014-06-031-128/+328
| | | | | | | | | | | | | | | The changes which consists of the translators for the USS (User Servicable Snapshots) is submitted as a separate patch. Current patch provides the CLI access to the feature. Change-Id: I6b98a42fcfa82f0870d8048fe0bb53141565e9c6 BUG: 1094815 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/7705 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: Disable ping-timer between glusterd and brick processVijaikumar M2014-05-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | When there are too many IO happening, brick process epoll thread will be busy and fails to respond to the glusterd pick packet within 30sec. Also epoll thread can be blocked by a big-lock. Solution is to disable ping-timer by default and only enable where ever required Later when the epoll thread model changed and made lighter, we need to revert back this change. http://review.gluster.com/3842 is one such approach. Change-Id: I7f80ad3eb00f7d9c4d4527305932f7cf4920e73f BUG: 1097224 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7753 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rpc: implement server.manage-gids for group resolving on the bricksNiels de Vos2014-05-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | The new volume option 'server.manage-gids' can be enabled in environments where a user belongs to more than the current absolute maximum of 93 groups. This option triggers the following behavior: 1. The AUTH_GLUSTERFS structure sent by GlusterFS clients (fuse, nfs or libgfapi) will contain only one (1) auxiliary group, instead of a full list. This reduces network usage and prevents problems in encoding the AUTH_GLUSTERFS structure which should fit in 400 bytes. 2. The single group in the RPC Calls received by the server is replaced by resolving the groups server-side. Permission checks and similar in lower xlators are applied against the full list of groups where the user belongs to, and not the single auxiliary group that the client sent. Change-Id: I9e540de13e3022f8b63ff893ecba511129a47b91 BUG: 1053579 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/7501 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd : Volname, brickpath & volfpath length validationAtin Mukherjee2014-05-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | While creating a volume and adding a brick validation for _POSIX_PATH_MAX is done on absolute pathname instead of relative pathname due to which a brickpath having less than _POSIX_PATH_MAX may also fail the validation if the directory length is greater than (_POSIX_PATH_MAX -strlen(brickpath/volume name). Also this fix addresses one cli response message correction which says the volume file is too long instead of brick path is too long (when brickpath length validation doesn't fail and vol file length validation fails.) It is also important to note that with the current design of volfile naming, it can not be guranteed that volname and brickpath can have max of _POSIX_PATH_MAX characters. Change-Id: I1283d1f9dea96ae797620002c8723719f26a866d BUG: 1085330 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/7420 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli/hooks : Add volume set options to enable/disable nfs-ganesha support.Meghana M2014-05-031-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. gluster volume set nfs-ganesha.enable ON/OFF If the option is set to ON, the volume field in the nfs-ganesha configuartion file is edited. Gluster-nfs is disabled on that volume and the volume is exported using nfs-ganesha. 2.gluster volume set nfs-ganesha.host IP This is used to provide the IP of the nfs-ganesha host. Note : nfs-ganesha.host MUST be set before using nfs-ganesha.enable ON The switch from gluster-nfs to nfs-ganesha is mostly done by the hook-scripts in the post phase of the 'set' option. As a result, gluster volume reset does not function as it is expected to. By default, nfs-ganesha will be set to off but the process will not be killed. Hence, a few changes have to be made post 'reset' option as well. Those changes also have been added. Change-Id: I7fdc14ee49d1724af96eda33c6a3ec08b1020788 BUG: 1092283 Signed-off-by: Meghana <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/7321 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd/snapshot: Move read-only xlator to client graphRajesh Joseph2014-05-021-2/+16
| | | | | | | | | | | | | | | read-only xlator is moved from server graph to client graph so that AFR & DHT healing can take place at server Change-Id: I140ec962330c59d3b44f9bc8084a1544a1fd6c54 BUG: 1061685 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/7582 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* logging: Introduce suppression of repetitive log messagesKrutika Dhananjay2014-04-301-1/+75
| | | | | | | | | Change-Id: I8efa08cc9832ad509fba65a88bb0cddbaf056404 BUG: 1075611 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/7475 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: Add a cli command to enable/disable barrierKaushal M2014-04-291-3/+17
| | | | | | | | | | | | | | | This patch adds a new 'gluster volume barrier <VOLNAME> {enable|disable}' cli command. This helps in testing the brick op code path when testing the barrier xlator. This patch can be reverted later if not required for end users. Change-Id: Icd86a2d13e7f276dda1ecbb2593d60638ece7dcd BUG: 1060002 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6958 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd/snapshot: Perform missed snap deletes and restores.Avra Sengupta2014-04-281-125/+0
| | | | | | | | | | | | | | | | | Replacing is_volume_restored(gf_boolean_t) with restored_from_snap(uuid_t) in glusterd_volinfo_ Also removed gd_restore_snap_volume from glusterd-volgen.c to glusterd-snapshot.c Change-Id: Ic615a1658cfaffa98d4590506ac82f20bf709ad6 BUG: 1089906 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7455 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Barrier: Barrier translator options configurationAtin Mukherjee2014-04-271-0/+4
| | | | | | | | | | | barrier enable/disable, barrier-timeout configuration in barrier translator. Change-Id: I7cbf9cd4f5e55d42dcc6b7cd6827234566c7b6f3 BUG: 1060002 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/7177 Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* build: MacOSX Porting fixesHarshavardhana2014-04-241-10/+10
| | | | | | | | | | | | | | | | | | | | | git@forge.gluster.org:~schafdog/glusterfs-core/osx-glusterfs Working functionality on MacOSX - GlusterD (management daemon) - GlusterCLI (management cli) - GlusterFS FUSE (using OSXFUSE) - GlusterNFS (without NLM - issues with rpc.statd) Change-Id: I20193d3f8904388e47344e523b3787dbeab044ac BUG: 1089172 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Signed-off-by: Dennis Schafroth <dennis@schafroth.com> Tested-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Dennis Schafroth <dennis@schafroth.com> Reviewed-on: http://review.gluster.org/7503 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: pass the right argument for perf subgraphKrishnan Parthasarathi2014-04-211-1/+1
| | | | | | | | Change-Id: Ic292dcd8e477066c1079f0f1e170f5153459b029 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/7514 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* logging: Make logger and log format configurable through cliKrutika Dhananjay2014-04-111-6/+81
| | | | | | | | | | Change-Id: Ic4b701a6621578848ff67ae4ecb5a10b5f32f93b BUG: 1075611 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/7372 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gluster: GlusterFS Volume Snapshot FeatureAvra Sengupta2014-04-111-1/+131
| | | | | | | | | | | | | | | | | | | | | | | | | This is the initial patch for the Snapshot feature. Current patch includes following features: * Snapshot create * Snapshot delete * Snapshot restore * Snapshot list * Snapshot info * Snapshot status * Snapshot config Change-Id: I2f46920c0d61c515f6a60e0f8b46fff886d9f6a9 BUG: 1061685 Signed-off-by: shishir gowda <sgowda@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Signed-off-by: Vijaikumar M <vmallika@redhat.com> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7128 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: persistent client xlator/ afr changelog namesRavishankar N2014-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | -Add a unique brick-id field to glusterd_brickinfo_t -Persist the id to the brickinfo file -Use the brick-id as the client xlator name during vol create, add-brick and replace-brick operations. -For older volumes,generate the id in-memory during glusterd restore but defer writing it to the brickinfo file until the next volume set operation. -send and receive the brick-ids during peer probe. Feature page: www.gluster.org/community/documentation/index.php/Features/persistent-AFR-changelog-xattributes Related patch: http://review.gluster.org/#/c/7122 Change-Id: Ib7f1570004e33f4144476410eec2b84df4e41448 BUG: 1066778 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7155 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* feature/compress: Validate option and enable docPrashanth Pai2014-02-261-5/+10
| | | | | | | | | | | | | | | | | * Validate network.compression option * Enable descriptions of xlator configurable options * Improve indentation in code * Make network.compression.mode not configurable by user. This is similar to "iam-self-heal-daemon" option in AFR xlator. Fixes BUGs: 1065658, 1065640, 1065655 Change-Id: I99d82b574ee0e5c8c2baf5f5d52dbf8d015d330a BUG: 1065640 Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: http://review.gluster.org/7024 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* add build-gfid option to enable pgfid tracking ...Krishnan Parthasarathi2014-02-141-3/+13
| | | | | | | | | | | .. for inode to pathname mapping Change-Id: I0486d85b02e86d739fc1d8ea16d118fb666abf60 BUG: 1064863 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6989 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* protocol/server: do not do root-squashing for trusted clientsRaghavendra Bhat2014-02-101-1/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * As of now clients mounting within the storage pool using that machine's ip/hostname are trusted clients (i.e clients local to the glusterd). * Be careful when the request itself comes in as nfsnobody (ex: posix tests). So move the squashing part to protocol/server when it creates a new frame for the request, instead of auth part of rpc layer. * For nfs servers do root-squashing without checking if it is trusted client, as all the nfs servers would be running within the storage pool, hence will be trusted clients for the bricks. * Provide one more option for mounting which actually says root-squash should/should not happen. This value is given priority only for the trusted clients. For non trusted clients, the volume option takes the priority. But for trusted clients if root-squash should not happen, then they have to be mounted with root-squash=no option. (This is done because by default blocking root-squashing for the trusted clients will cause problems for smb and UFO clients for which the requests have to be squashed if the option is enabled). * For geo-replication and defrag clients do not do root-squashing. * Introduce a new option in open-behind for doing read after successful open. Change-Id: I8a8359840313dffc34824f3ea80a9c48375067f0 BUG: 954057 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4863 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: Fix double free.Poornima2014-02-051-3/+2
| | | | | | | | | Change-Id: I7a8b7772849715b019c86c6c768f33c1d9dcb27c BUG: 789278 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6881 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/compress: rename "compress" option to "network.compression"Niels de Vos2014-01-241-4/+4
| | | | | | | | | | | | | | | | | | Prevent mistaking the "compress" options for storage (at rest) compression. The cdc-xlator is implemented to support compressing of network traffic (READ and WRITE FOPs). URL: http://www.gluster.org/community/documentation/index.php/Features/On-Wire_Compression_+_Decompression Change-Id: I9fedf4106dcb226d135ab92e4b533aff284881d7 BUG: 1053670 CC: Venky Shankar <vshankar@redhat.com> CC: Prashanth Pai <ppai@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/6765 Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: make volinfo a refcnt'ed object.Krishnan Parthasarathi2013-12-201-1/+1
| | | | | | | | | | | | | Add glusterd_volinfo_remove(..) which removes @volinfo from the list of volumes in the cluster and performs an unref on @volinfo Change-Id: I5f546ca58f61bc334ab1bab4c51c4a21e1f66161 BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6521 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli/glusterd: Changes to quota command Quota featureRaghavendra G2013-11-261-66/+196
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | re-work. Following are the cli commands that are new/re-worked: ====================================================== volume quota <VOLNAME> {enable|disable|list [<path> ...]|remove <path>| default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} volume status [all | <VOLNAME> [nfs|shd|<BRICK>|quotad]] [detail|clients|mem|inode|fd|callpool] volume statedump <VOLNAME> [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history] glusterd changes: ================= * Quota limits are now set as extended attributes by glusterd from the aux mount created by the cli. * The gfids of the directories on which quota limits are set for a given volume are stored in /var/lib/glusterd/vols/<volname>/quota.conf file in binary format, and whose cksum and version is stored in /var/lib/glusterd/vols/<volname>/quota.cksum. Original-author: Krutika Dhananjay <kdhananj@redhat.com> Original-author: Krishnan Parthasarathi <kparthas@redhat.com> BUG: 969461 Change-Id: If32bba36c67f9c2a30417af9c6389045b2b7c13b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6003 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* posix: placeholders for GFID to path conversionRaghavendra G2013-11-261-19/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | what? ===== The following is an attempt to generate the paths of a file when only its gfid is known. To find the path of a directory, the symlink handle to the directory maintained in the ".glusterfs" backend directory is read. The symlink handle is generated using the gfid of the directory. It (handle) contains the directory's name and parent gfid, which are used to recursively construct the absolute path as seen by the user from the mount point. A similar approach cannot be used for a regular file or a symbolic link since its hardlink handle, generated using its gfid, doesn't contain its parent gfid and basename. So xattrs are set to store the parent gfids and the number of hardlinks to a file or a symlink having the same parent gfid. When an user/application requests for the paths of a regular file or a symlink with multiple hardlinks, using the parent gfids stored in the xattrs, the paths of the parent directories are generated as mentioned earlier. The base names of the hardlinks (with the same parent gfid) are determined by matching the actual backend inode numbers of each entry in the parent directory with that of the hardlink handle. Xattr is set on a regular file, link, and symbolic link as follows, Xattr name : trusted.pgfid.<pargfidstr> Xattr value : <number of hardlinks to a regular file/symlink with the same parentgfid> If a regular file, hard link, symbolic link is created then an xattr in the above format is set in the backend. how to use? =========== This functionality can be used through getxattr interface. Two keys - glusterfs.ancestry.dentry and glusterfs.ancestry.path - enable usage of this functionality. A successful getxattr will have the result stored under same keys. Values will be, glusterfs.ancestry.dentry: -------------------------- A linked list of gf-dirent structures for all possible paths from root to this gfid. If there are multiple paths, the linked-list will be a series of paths one after another. Each path will be a series of dentries representing all components of the path. This key is primarily for internal usage within glusterfs. glusterfs.ancestry.path: ------------------------ A string containing all possible paths from root to this gfid. Multiple hardlinks of a file or a symlink are displayed as a colon seperated list (this could interfere with path components containing ':'). e.g. If there is a file "file1" in root directory with two hardlinks, "/dir2/link2tofile1" and "/dir1/link1tofile1", then [root@alpha gfsmntpt]# getfattr -n glusterfs.ancestry.path -e text file1 glusterfs.ancestry.path="/file1:/dir2/link2tofile1:/dir1/link1tofile1" Thanks Amar, Avati and Venky for the inputs. Original Author: Ramana Raja <rraja@redhat.com> BUG: 990028 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I0eaa9101e333e0c1f66ccefd9e95944dd4a27497 Reviewed-on: http://review.gluster.org/5951 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: RFE for NFS connection behaviorSantosh Kumar Pradhan2013-11-141-0/+48
| | | | | | | | | | | | | | | Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Transparent data encryption and metadata authenticationEdward Shishkin2013-11-131-11/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. in the systems with non-trusted server This new functionality can be useful in various cloud technologies. It is implemented via a special encryption/crypt translator,which works on the client side and performs encryption and authentication; 1. Class of supported algorithms The crypt translator can support any atomic symmetric block cipher algorithms (which require to pad plain/cipher text before performing encryption/decryption transform (see glossary in atom.c for definitions). In particular, it can support algorithms with the EOF issue (which require to pad the end of file by extra-data). Crypt translator performs translations user -> (offset, size) -> (aligned-offset, padded-size) ->server (and backward), and resolves individual FOPs (write(), truncate(), etc) to read-modify-write sequences. A volume can contain files encrypted by different algorithms of the mentioned class. To change some option value just reconfigure the volume. Currently only one algorithm is supported: AES_XTS. Example of algorithms, which can not be supported by the crypt translator: 1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA; 2. Symmetric block cipher algorithms with inline MACs for data authentication. 2. Implementation notes. a) Atomic algorithms Since any process in a stackable file system manipulates with local data (which can be obsoleted by local data of another process), any atomic cipher algorithm without proper support can lead to non-POSIX behavior. To resolve the "collisions" we introduce locks: before performing FOP->read(), FOP->write(), etc. the process should first lock the file. b) Algorithms with EOF issue Such algorithms require to pad the end of file with some extra-data. Without proper support this will result in losing information about real file size. Keeping a track of real file size is a responsibility of the crypt translator. A special extended attribute with the name "trusted.glusterfs.crypt.att.size" is used for this purpose. All files contained in bricks of encrypted volume do have "padded" sizes. 3. Non-trusted servers and Metadata authentication We assume that server, where user's data is stored on is non-trusted. It means that the server can be subjected to various attacks directed to reveal user's encrypted personal data. We provide protection against such attacks. Every encrypted file has specific private attributes (cipher algorithm id, atom size, etc), which are packed to a string (so-called "format string") and stored as a special extended attribute with the name "trusted.glusterfs.crypt.att.cfmt". We protect the string from tampering. This protection is mandatory, hardcoded and is always on. Without such protection various attacks (based on extending the scope of per-file secret keys) are possible. Our authentication method has been developed in tight collaboration with Red Hat security team and is implemented as "metadata loader of version 1" (see file metadata.c). This method is NIST-compliant and is based on checking 8-byte per-hardlink MACs created(updated) by FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the following unique entities: . file (hardlink) name; . verified file's object id (gfid). Every time, before manipulating with a file, we check it's MACs at FOP->open() time. Some FOPs don't require a file to be opened (e.g. FOP->truncate()). In such cases the crypt translator opens the file mandatory. 4. Generating keys Unique per-file keys are derived by NIST-compliant methods from the a) parent key; b) unique verified object-id of the file (gfid); Per-volume master key, provided by user at mount time is in the root of this "tree of keys". Those keys are used to: 1) encrypt/decrypt file data; 2) encrypt/decrypt file metadata; 3) create per-file and per-link MACs for metadata authentication. 5. Instructions Getting started with crypt translator Example: 1) Create a volume "myvol" and enable encryption: # gluster volume create myvol pepelac:/vols/xvol # gluster volume set myvol encryption on 2) Set location (absolute pathname) of your master key: # gluster volume set myvol encryption.master-key /home/me/mykey 3) Set other options to override default options, if needed. Start the volume. 4) On the client side make sure that the file /home/me/mykey exists and contains proper per-volume master key (that is 256-bit AES key). This key has to be in hex form, i.e. should be represented by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}. The key should start at the beginning of the file. All symbols at offsets >= 64 are ignored. 5) Mount the volume "myvol" on the client side: # glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt After successful mount the file which contains master key may be removed. NOTE: Keeping the master key between mount sessions is in user's competence. ********************************************************************** WARNING! Losing the master key will make content of all regular files inaccessible. Mount with improper master key allows to access content of directories: file names are not encrypted. ********************************************************************** 6. Options of crypt translator 1) "master-key": specifies location (absolute pathname) of the file which contains per-volume master key. There is no default location for master key. 2) "data-key-size": specifies size of per-file key for data encryption Possible values: . "256" default value . "512" 3) "block-size": specifies atom size. Possible values: . "512" . "1024" . "2048" . "4096" default value; 7. Test cases Any workload, which involves the following file operations: ->create(); ->open(); ->readv(); ->writev(); ->truncate(); ->ftruncate(); ->link(); ->unlink(); ->rename(); ->readdirp(). 8. TODOs: 1) Currently size of IOs issued by crypt translator is restricted by block_size (4K by default). We can use larger IOs to improve performance. Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee BUG: 1030058 Signed-off-by: Edward Shishkin <edward@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4667 Tested-by: Gluster Build System <jenkins@build.gluster.com>