summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* syncops: expose @flags in syncop_rmdir()Anand Avati2013-11-214-5/+5
| | | | | | | | | Change-Id: I9b73c1db728e4cb3948fc118cceb292b21d48b96 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6112 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cli: fix possible memory leaksBala.FA2013-11-212-1/+6
| | | | | | | | | BUG: 955548 Change-Id: Iae410712e7e6d7a76cd537c77f1919e3b4cdf6bb Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6328 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Add description for git based installations.Vijay Bellur2013-11-211-2/+4
| | | | | | | | Change-Id: I60e445539f255b3220f885bd790f800e4c1ea55a Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6333 Reviewed-by: Lalatendu Mohanty <lmohanty@redhat.com> Tested-by: Lalatendu Mohanty <lmohanty@redhat.com>
* libglusterfs: use correct check for linux falloc.h availabilityBrian Foster2013-11-201-2/+11
| | | | | | | | | | | | We should check for HAVE_LINUX_FALLOC_H rather than HAVE_FALLOC_H to determine whether to include linux/falloc.h. Change-Id: I05eca4de2893a88d6b9cc5ebfce738708b9960d4 BUG: 1032378 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/6314 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* Build storage/posix xlator if fallocate() does not existsEmmanuel Dreyfus2013-11-201-1/+12
| | | | | | | | | | | If fallocate() does not exists, just return EOPNOTSUPP BUG: 764655 Change-Id: I808114f733c88985519dc47fb7537e1ced1db077 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6289 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add Zerofill FOP supportM. Mohan Kumar2013-11-205-3/+295
| | | | | | | | | | BUG: 1028673 Change-Id: I9ba8e3e6cf2f888640b4d2a2eb934a27ff903c42 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6290 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mount/fuse: handle --gid-timeout=0 properlyAnand Avati2013-11-202-2/+5
| | | | | | | | | | | | | | Fix the bug which was using the timeout value as a flag to indicate if it was set (and hence would fail when timeout=0 would evaluate as False) Change-Id: Ie9a8f28d35603458cdac26c9a4e0343e7eda7344 BUG: 1032438 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6308 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gNFS: Coverity fix for CID 1128906Santosh Kumar Pradhan2013-11-201-11/+13
| | | | | | | | | | | | | Fix the Coverity issue introduced in RFE: NFS volume set/reset commit i.e. http://review.gluster.org/6236 Change-Id: I817b9da03a3ce7f5511303faea0c50dfdad60ff4 BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6307 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Start rebalance only where requiredKaushal M2013-11-203-1/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | Gluster was starting rebalance processes on peers where it wasn't required in two cases. - For a normal rebalance command on a volume, rebalance processes were started on all peers instead of just the peers which contain bricks of the volume - For rebalance process being restarted by a volume sync, caused by a new peer being probed or a peer restarting, rebalance processes were started on all peers, for both a normal rebalance and for remove-brick needing rebalance. This patch adds a new check before starting rebalance process in the above two cases. - For rebalance process required by a rebalance command, each peer will check if it contains atleast one brick of the volume - For rebalance process required by a remove-brick command, each peer will check if it contains atleast one of the bricks being removed Change-Id: I512da16994f0d5482889c3a009c46dc20a8a15bb BUG: 1031887 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6301 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: List only nodes which have rebalance started in rebalance statusKaushal M2013-11-202-214/+137
| | | | | | | | | | | | | | | | Listing the nodes on which rebalance hasn't been started is just giving out extraneous information. Also, refactor the rebalance status printing code into a single function and use it for both rebalance and remove-brick status. BUG: 1031887 Change-Id: I47bd561347dfd6ef76c52a1587916d6a71eac369 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6300 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: fix undefined sybmol error related to BDPranith Kumar K2013-11-193-1/+5
| | | | | | | | | | Change-Id: I2210f1ac7de04c6025c0ec02d998b626d41466ae BUG: 1028672 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6303 Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Do not build fallocate FUSE FOP if the system call does not existEmmanuel Dreyfus2013-11-191-0/+4
| | | | | | | | | BUG: 764655 Change-Id: Ica310e75bee16741b837e658981238c1b99c254f Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6288 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Python build flag detectionEmmanuel Dreyfus2013-11-191-0/+15
| | | | | | | | | | | Ask python-config for proper python build flags BUG: 764655 Change-Id: I7aede0f93637c61dbafc43580bff46af60f0f0d3 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6283 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* NetBSD missing backtrace(3) portability fixEmmanuel Dreyfus2013-11-193-1/+60
| | | | | | | | | | | | Implement backtrace(3) and backtrace_symbols(3) which do not exist in NetBSD While there, remove duplicate #include <stdio.h> BUG: 764655 Change-Id: Iccd695765906e085c3f8fcb670506d4fea68fa39 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6285 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Search gettext() in -lintlEmmanuel Dreyfus2013-11-191-0/+1
| | | | | | | | | | | | If gettext() is not found in libc, look it up in libintl (this is where NetBSD has it) BUG: 764655 Change-Id: Ifba8681b8603ead5d0b8587b71457250982077e1 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6287 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* autogen.sh portability fixesEmmanuel Dreyfus2013-11-191-2/+11
| | | | | | | | | | | | - Do not assume tar has --version, as BSD tar does not - Allow specifying python binary through PYTHONBIN in case it is e.g. python2.7 BUG: 764655 Change-Id: I71f0f4830e10915782775de811c92db8e6ab4c55 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6281 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Fix xml compilation errorM. Mohan Kumar2013-11-191-1/+6
| | | | | | | | | | | | | | | Compiling GlusterFS without xml package results in following build error cli-rpc-ops.o: In function `gf_cli_status_cbk': /home/mohan/Work/glusterfs/cli/src/cli-rpc-ops.c:6430: undefined reference to `cli_xml_output_vol_status_tasks_detail' Change-Id: I49b3c46ac3340c40e372bef4690cedb41df20e8a Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6295 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* BD fixes for coverity scanM. Mohan Kumar2013-11-191-5/+9
| | | | | | | | | BUG: 1028672 Change-Id: I2e7889fb113cedd2d5928b210149d3fd7b8b22ab Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6292 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Fixes for ZF reported by coverityM. Mohan Kumar2013-11-193-3/+11
| | | | | | | | | BUG: 1028673 Change-Id: I7c75738cca22c81c5629d579ef5bea24000e622e Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6291 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Have #include <signal.h> for kill(2)Emmanuel Dreyfus2013-11-182-0/+2
| | | | | | | | | BUG: 764655 Change-Id: I4d18c9a6c00cb4696645fcb437398562f00b9d24 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6284 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* NetBSD missing loff_t portability fixEmmanuel Dreyfus2013-11-171-0/+4
| | | | | | | | | | | define loff_t as off_t, is is already long long anyway. BUG: 764655 Change-Id: I99edda9b804475a8696c2d32ccf8eae152851e21 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6286 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: add peerid to volume status xml outputBala.FA2013-11-143-0/+46
| | | | | | | | | | | | This patch adds <peerid> tag to bricks and nfs/shd like services to volume status xml output. BUG: 955548 Change-Id: I9aaa9266e4d56f632235eaeef565e92d757c0694 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-1424-36/+36
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* fuse: Check the return status from state->resolve_nowVijaykumar M2013-11-142-7/+49
| | | | | | | | | Change-Id: I85fc6dd393449d365bb908b38c2827b58cb08171 BUG: 1030208 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6262 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: RFE for NFS connection behaviorSantosh Kumar Pradhan2013-11-1424-159/+1041
| | | | | | | | | | | | | | | Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: Closed the logfile fd in glfs_finiPoornima2013-11-141-0/+3
| | | | | | | | | | | | | | | | The logfile fd is not closed even after calling glfs_fini, hence in smb mount if connection to glusterfs volume fails at a point after the log file was opened, the fd would remain open until the process dies. This patch closes the logfile fd in glfs_fini. Change-Id: I608bfac9c6833b42750b0383ad26fd33ee378ee1 BUG: 1030228 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6263 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Transparent data encryption and metadata authenticationEdward Shishkin2013-11-1315-12/+8408
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. in the systems with non-trusted server This new functionality can be useful in various cloud technologies. It is implemented via a special encryption/crypt translator,which works on the client side and performs encryption and authentication; 1. Class of supported algorithms The crypt translator can support any atomic symmetric block cipher algorithms (which require to pad plain/cipher text before performing encryption/decryption transform (see glossary in atom.c for definitions). In particular, it can support algorithms with the EOF issue (which require to pad the end of file by extra-data). Crypt translator performs translations user -> (offset, size) -> (aligned-offset, padded-size) ->server (and backward), and resolves individual FOPs (write(), truncate(), etc) to read-modify-write sequences. A volume can contain files encrypted by different algorithms of the mentioned class. To change some option value just reconfigure the volume. Currently only one algorithm is supported: AES_XTS. Example of algorithms, which can not be supported by the crypt translator: 1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA; 2. Symmetric block cipher algorithms with inline MACs for data authentication. 2. Implementation notes. a) Atomic algorithms Since any process in a stackable file system manipulates with local data (which can be obsoleted by local data of another process), any atomic cipher algorithm without proper support can lead to non-POSIX behavior. To resolve the "collisions" we introduce locks: before performing FOP->read(), FOP->write(), etc. the process should first lock the file. b) Algorithms with EOF issue Such algorithms require to pad the end of file with some extra-data. Without proper support this will result in losing information about real file size. Keeping a track of real file size is a responsibility of the crypt translator. A special extended attribute with the name "trusted.glusterfs.crypt.att.size" is used for this purpose. All files contained in bricks of encrypted volume do have "padded" sizes. 3. Non-trusted servers and Metadata authentication We assume that server, where user's data is stored on is non-trusted. It means that the server can be subjected to various attacks directed to reveal user's encrypted personal data. We provide protection against such attacks. Every encrypted file has specific private attributes (cipher algorithm id, atom size, etc), which are packed to a string (so-called "format string") and stored as a special extended attribute with the name "trusted.glusterfs.crypt.att.cfmt". We protect the string from tampering. This protection is mandatory, hardcoded and is always on. Without such protection various attacks (based on extending the scope of per-file secret keys) are possible. Our authentication method has been developed in tight collaboration with Red Hat security team and is implemented as "metadata loader of version 1" (see file metadata.c). This method is NIST-compliant and is based on checking 8-byte per-hardlink MACs created(updated) by FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the following unique entities: . file (hardlink) name; . verified file's object id (gfid). Every time, before manipulating with a file, we check it's MACs at FOP->open() time. Some FOPs don't require a file to be opened (e.g. FOP->truncate()). In such cases the crypt translator opens the file mandatory. 4. Generating keys Unique per-file keys are derived by NIST-compliant methods from the a) parent key; b) unique verified object-id of the file (gfid); Per-volume master key, provided by user at mount time is in the root of this "tree of keys". Those keys are used to: 1) encrypt/decrypt file data; 2) encrypt/decrypt file metadata; 3) create per-file and per-link MACs for metadata authentication. 5. Instructions Getting started with crypt translator Example: 1) Create a volume "myvol" and enable encryption: # gluster volume create myvol pepelac:/vols/xvol # gluster volume set myvol encryption on 2) Set location (absolute pathname) of your master key: # gluster volume set myvol encryption.master-key /home/me/mykey 3) Set other options to override default options, if needed. Start the volume. 4) On the client side make sure that the file /home/me/mykey exists and contains proper per-volume master key (that is 256-bit AES key). This key has to be in hex form, i.e. should be represented by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}. The key should start at the beginning of the file. All symbols at offsets >= 64 are ignored. 5) Mount the volume "myvol" on the client side: # glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt After successful mount the file which contains master key may be removed. NOTE: Keeping the master key between mount sessions is in user's competence. ********************************************************************** WARNING! Losing the master key will make content of all regular files inaccessible. Mount with improper master key allows to access content of directories: file names are not encrypted. ********************************************************************** 6. Options of crypt translator 1) "master-key": specifies location (absolute pathname) of the file which contains per-volume master key. There is no default location for master key. 2) "data-key-size": specifies size of per-file key for data encryption Possible values: . "256" default value . "512" 3) "block-size": specifies atom size. Possible values: . "512" . "1024" . "2048" . "4096" default value; 7. Test cases Any workload, which involves the following file operations: ->create(); ->open(); ->readv(); ->writev(); ->truncate(); ->ftruncate(); ->link(); ->unlink(); ->rename(); ->readdirp(). 8. TODOs: 1) Currently size of IOs issued by crypt translator is restricted by block_size (4K by default). We can use larger IOs to improve performance. Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee BUG: 1030058 Signed-off-by: Edward Shishkin <edward@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4667 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht - rebalance: handle the rebalance @ inode level (!fd level)Amar Tumballi2013-11-134-128/+148
| | | | | | | | | | | | | * migrate all the fd's on an inode to newer subvol after rebalance * use the migration in progress flag in inode, so all the operations on the inode can make use of it Change-Id: Ib807a46e927a1062688fc15119c916797c52a350 BUG: 1013456 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5891 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/inode: introduce new APIs for ctx handlingAmar Tumballi2013-11-133-21/+265
| | | | | | | | | | | | | | * inode_ctx_reset{0,1,2}() for reseting value1, value2, and both respectively * inode_ctx_get0() - to get the first value only * inode_ctx_set0() - to set the first value only * inode_ctx_get1() - to get the second value only * inode_ctx_set1() - to set the second value only Change-Id: I4dfbdac81d6a3f4e5784e060c76edabb1692ce03 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5890 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* bd: Add test case for bd xlatorM. Mohan Kumar2013-11-131-0/+131
| | | | | | | | | Change-Id: I73a0bfa7085d2e71b2489687fa53f5fe7d1e8ea1 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6050 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add support to create clone, snapshot and merge of LV images.M. Mohan Kumar2013-11-137-56/+654
| | | | | | | | | | | | | | | | | | | | | | | | | | Special xattr names "clone" & "snapshot" can be used to create full and linked clone of the LV images. GFID of destination posix file (to be mapped) is passed as a value to the xattr. Destination posix file must exist before running this operation. These operations form a basis for offloading storage related operations from QEMU to GlusterFS. Syntax for full clone: xattr name: "clone" value: "gfid-of-dest-file" Syntax for linked clone: xattr name: "snapshot" value: "gfid-of-dest-file" Syntax for merging: xattr name: "merge" value: "path-to-snapshot-file" Example: setfattr -n clone -v <gfid-of-dest-file> /media/source setfattr -n snapshot -v <gfid-of-dest-file> /media/source setfattr -n merge -v "/media/sn" /media/sn Change-Id: Id9f984a709d4c2e52a64ae75bb12a8ecb01f8776 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add aio support to BD xlatorM. Mohan Kumar2013-11-137-24/+651
| | | | | | | | | | | | Volume option bd-aio controls AIO feature for BD xlator. Code taken from posix-aio.c Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5748 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add BD support to other xlatorsM. Mohan Kumar2013-11-133-26/+67
| | | | | | | | | | | | | | | | Make changes to distributed xlator to work with BD xlator. Unlike files, a block device can't be removed when its opened. So some part of the code were moved down to avoid this situation. Also before truncating a BD file its BD_XATTR should be set otherwise truncate will result in truncating posix file. So file is created with needed BD_XATTR and truncate is invoked. Also enables BD xlator in stripe volume type. Change-Id: If127516e261fac5fc5b137e7fe33e100bc92acc0 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5235 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: posix/multi-brick support to BD xlatorM. Mohan Kumar2013-11-1321-3/+3289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current BD xlator (block backend) has a few limitations such as * Creation of directories not supported * Supports only single brick * Does not use extended attributes (and client gfid) like posix xlator * Creation of special files (symbolic links, device nodes etc) not supported Basic limitation of not allowing directory creation is blocking oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM creates multi-level directories when GlusterFS is used as storage backend for storing VM images. To overcome these limitations a new BD xlator with following improvements is suggested. * New hybrid BD xlator that handles both regular files and block device files * The volume will have both POSIX and BD bricks. Regular files are created on POSIX bricks, block devices are created on the BD brick (VG) * BD xlator leverages exiting POSIX xlator for most POSIX calls and hence sits above the POSIX xlator * Block device file is differentiated from regular file by an extended attribute * The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a posix file to Logical Volume (LV). * When a client sends a request to set BD_XATTR on a posix file, a new LV is created and mapped to posix file. So every block device will have a representative file in POSIX brick with 'user.glusterfs.bd' (BD_XATTR) set. * Here after all operations on this file results in LV related operations. For example opening a file that has BD_XATTR set results in opening the LV block device, reading results in reading the corresponding LV block device. When BD xlator gets request to set BD_XATTR via setxattr call, it creates a LV and information about this LV is placed in the xattr of the posix file. xattr "user.glusterfs.bd" used to identify that posix file is mapped to BD. Usage: Server side: [root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2 It creates a distributed gluster volume 'bdvol' with Volume Group vg1 using posix brick /storage/vg1_info in host1 and Volume Group vg2 using /storage/vg2_info in host2. [root@host1 ~]# gluster volume start bdvol Client side: [root@node ~]# mount -t glusterfs host1:/bdvol /media [root@node ~]# touch /media/posix It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# mkdir /media/image [root@node ~]# touch /media/image/lv1 It also creates regular posix file 'lv1' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1 [root@node ~]# Above setxattr results in creating a new LV in corresponding brick's VG and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size' [root@node ~]# truncate -s5G /media/image/lv1 It results in resizig LV 'lv1'to 5G New BD xlator code is placed in xlators/storage/bd directory. Also add volume-uuid to the VG so that same VG can't be used for other bricks/volumes. After deleting a gluster volume, one has to manually remove the associated tag using vgchange <vg-name> --deltag <trusted.glusterfs.volume-id:<volume-id>> Changes from previous version V5: * Removed support for delayed deleting of LVs Changes from previous version V4: * Consolidated the patches * Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR. Changes from previous version V3: * Added support in FUSE to support full/linked clone * Added support to merge snapshots and provide information about origin * bd_map xlator removed * iatt structure used in inode_ctx. iatt is cached and updated during fsync/flush * aio support * Type and capabilities of volume are exported through getxattr Changes from version 2: * Used inode_context for caching BD size and to check if loc/fd is BD or not. * Added GlusterFS server offloaded copy and snapshot through setfattr FOP. As part of this libgfapi is modified. * BD xlator supports stripe * During unlinking if a LV file is already opened, its added to delete list and bd_del_thread tries to delete from this list when a last reference to that file is closed. Changes from previous version: * gfid is used as name of LV * ? is used to specify VG name for creating BD volume in volume create, add-brick. gluster volume create volname host:/path?vg * open-behind issue is fixed * A replicate brick can be added dynamically and LVs from source brick are replicated to destination brick * A distribute brick can be added dynamically and rebalance operation distributes existing LVs/files to the new brick * Thin provisioning support added. * bd_map xlator support retained * setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and setfattr -n user.glusterfs.bd -v "thin" creates thin LV * Capability and backend information added to gluster volume info (and --xml) so that management tools can exploit BD xlator. * tracing support for bd xlator added TODO: * Add support to display snapshots for a given LV * Display posix filename for list-origin instead of gfid Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/4809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd_map: Remove bd_map xlatorM. Mohan Kumar2013-11-1335-4602/+20
| | | | | | | | | | | Remove bd_map xlator and CLI related changes. Change-Id: If7086205df1907127c1a1fa4ba603f1c48421d09 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5747 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: introduce glfs_readdir() and glfs_readdirplus() APIsAnand Avati2013-11-135-2/+96
| | | | | | | | | Change-Id: I6b233bf647585675f233898351bf593f251716cc BUG: 839950 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6201 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* server/rpc: bricks goes offline and comes back online, with lots of "No such ↵Vikhyat Umrao2013-11-121-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | file or directory" log messages Problem: Messages were getting logged very frequently at log level INFO. [2013-03-01 11:34:28.029222] I [server3_1-fops.c:1541:server_open_cbk] vol-server: 993888: OPEN (null) (--) ==> -1 (No such file or directory) [2013-03-01 11:34:28.031579] I [server3_1-fops.c:252:server_inodelk_cbk] vol-server: 993896: INODELK (null) (--) ==> -1 (No such file or directory) [2013-03-01 11:34:28.034041] I [server3_1-fops.c:252:server_inodelk_cbk] vol-server: 993914: INODELK (null) (--) ==> -1 (No such file or directory) [2013-03-01 11:34:28.040435] I [server3_1-fops.c:1338:server_flush_cbk] vol-server: 993938: FLUSH -2 (--) ==> -1 (No such file or directory) Solution: Moved them to DEBUG log level if error number equlas to ENOENT else to ERROR log level. It will help in decreasing the size of log files at INFO log level. For server_open_cbk and for some other functions we already have a patch, below is the URL for it. URL- http://review.gluster.org/6241 This patch solves logging problem for functions server_inodelk_cbk and server_flush_cbk. Change-Id: I57372e851371e466f1674726015e28378b826f5f BUG: 1029372 Signed-off-by: Vikhyat Umrao<vumrao@redhat.com> Reviewed-on: http://review.gluster.org/6252 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* zerofill: Update API versionM. Mohan Kumar2013-11-121-1/+1
| | | | | | | | | | | | version 6 adds zerofill FOP BUG: 1028673 Change-Id: I27cfc48cd6f7f0f6daf94e1c9cfbe420a0d090af Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6255 Reviewed-by: Bharata B Rao <bharata.rao@gmail.com> Tested-by: Bharata B Rao <bharata.rao@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/compress: Compression/DeCompression translatorPrashanth Pai2013-11-1111-1/+1256
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * When a writev call occurs, the client compresses the data before sending it to server. On the server, compressed data is decompressed. Similarly, when a readv call occurs, the server compresses the data before sending it to client. On the client, the compressed data is decompressed. Thus the amount of data sent over the wire is minimized. * Compression/Decompression is done using Zlib library. * During normal operation, this is the format of data sent over wire : <compressed-data> + trailer(8) The trailer contains the CRC32 checksum and length of original uncompressed data. This is used for validation. HOW TO USE ---------- Turning on compression xlator: gluster volume set <vol_name> compress on Configurable options: gluster volume set <vol_name> compress.compression-level 8 gluster volume set <vol_name> compress.min-size 50 Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805 BUG: 923540 Original-author : Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Prashanth Pai <nullpai@gmail.com> Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: http://review.gluster.org/3251 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: Set the o/p width of hostname to 8 charactersVijaykumar M2013-11-111-1/+1
| | | | | | | | | Change-Id: I91dcb19ba4d31c17e6041155c0e59af457b87f1b BUG: 1028871 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6245 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* server/rpc: Numerous entries of error - "No such file or directory" in ↵Vikhyat Umrao2013-11-111-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bricks log Problem: Messages were getting logged very frequently at log level INFO. One of the log file snippet - [2013-10-27 00:05:01.501355] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846575: STAT (null) (--) ==> -1 (No such file or directory) [2013-10-27 00:05:01.505101] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846577: STAT (null) (--) ==> -1 (No such file or directory) [2013-10-27 00:05:01.507299] I [server3_1-fops.c:1707:server_stat_cbk] 0-vol-server: 24846578: STAT (null) (--) ==> -1 (No such file or directory) [2013-10-20 19:50:35.554563] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714687: OPEN <gfid:01c70ca0-1952-4e82-abee-a07205757d8e> (01c70ca0-1952-4e82-abee-a07205757d8e) ==> -1 (No such file or directory) [2013-10-20 19:50:35.555520] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714697: OPEN <gfid:01c70ca0-1952-4e82-abee-a07205757d8e> (01c70ca0-1952-4e82-abee-a07205757d8e) ==> -1 (No such file or directory) [2013-10-20 19:50:35.558292] I [server3_1-fops.c:1538:server_open_cbk] 0-vol-server: 18714712: OPEN (null) (--) ==> -1 (No such file or directory) Solution: Moved them to DEBUG log level if error number equlas to ENOENT else to ERROR log level. It will help in decreasing the size of log files at INFO log level. Change-Id: I23d74320c9c21bbce4805c20295556cc2cc0a8d8 BUG: 808073 Signed-off-by: Vikhyat Umrao <vumrao@redhat.com> Reviewed-on: http://review.gluster.org/6241 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: write 'volume rebalance' error message in xml format whenDawit Alemu2013-11-101-4/+12
| | | | | | | | | | | | | | | | | | | --xml is specified When 'volume rebalance' encounters an error the cli prints the error message in plain text independent of whether --xml is specified. This throws off client application that expect xml output (as mentioned in bz1026143). Now, if the --xml flag is supplied, the cli print 'volume rebalance' error messages in xml format. Change-Id: I16c6a7a4cdd2819eb73422ab849125986dc299a6 BUG: 1026143 Signed-off-by: Dawit Alemu <dalemu@redhat.com> Reviewed-on: http://review.gluster.org/6242 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/qemu-block: simplify coroutine model to use single synctask, ucontextBrian Foster2013-11-109-249/+297
| | | | | | | | | | | | | | | | | | | | | | | | The current coroutine model, mapping synctasks 1-1 with qemu internal Coroutines, has some unresolved raciness issues. This problem usually manifests as lifecycle mismatches between top-level (gluster created) synctasks and the subsequently created internal coroutines from that context. Qemu's internal queueing (and locking) can cause situations where the top-level synctask is destroyed before the internal scheduler has released references to memory, leading to use after free crashes and asserts. Simplify the coroutine model to use a single synctask as a coroutine processor and rely on the existing native ucontext coroutine implementation. The syncenv thread is donated to qemu and ensures a single top-level coroutine is processed at a time. Qemu now has complete control over coroutine scheduling. BUG: 986775 Change-Id: I38223479a608d80353128e390f243933fc946fd6 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/6110 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/qemu-block: add qemu backing image support (clone)Brian Foster2013-11-104-23/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add basic backing image support to the block-format mechanism. This is a functionality checkpoint that enables the raw mechanism required to support client driven "snapshot" and "clone" requests. This change enhances the block-format setxattr command to support an additional and optional backing image reference. For example: setxattr -n trusted.glusterfs.block-format -v "qcow2:10GB:<bimg>" ./newimage ... where <bimg> refers to the backing image for unallocated blocks in newimage. <bimg> can be provided in one of two formats: - a gfid string in the following format (assuming a valid gfid): <gfid:00000000-0000-0000-0000-000000000000> - or a filename that must be resident in the same directory as the new clone file being formatted. E.g., setxattr -n trusted.glusterfs.block-format -v "qcow2:10GB:baseimg" ./newimage This latter format is more restrictive, simply provided for convenience or until something more refined is available. This change makes no assumptions about the backing image file and affords no additional protection. It is up to the user/client to recognize the relationship between the files and manage them appropriately (i.e., no writes to the backing image, etc.). BUG: 986775 Change-Id: I7aff7bdc59b85a6459001a6bfeae4db6bf74f703 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5967 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs: zerofill supportM. Mohan Kumar2013-11-1036-16/+1661
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: fix bug in timespec adjustmentRavishankar N2013-11-103-6/+6
| | | | | | | | | | | | | The argument to the timespec_adjust_delta() function introudced in commit 6836118b21 needs to be passed by reference rather than by value for the function to do it's job. BUG: 1028663 Change-Id: I62a3636906e67ed35b7786e9553f6819b48f3626 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/6243 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: remove unnecessary call to glfs_resolve_base()Anand Avati2013-11-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling glfs_resolve_base() on the root inode for every resolver invocation is unnecessary and wasteful. Here are the results from running a test program which performs path based operations (creates and deletes 1000 files): Without patch: [root@blackbox ~]# sync [root@blackbox ~]# time ./a.out 1 real 0m4.314s user 0m1.923s sys 0m1.144s [root@blackbox ~]# sync [root@blackbox ~]# time ./a.out 1 real 0m4.383s user 0m1.940s sys 0m1.177s [root@blackbox ~]# sync [root@blackbox ~]# time ./a.out 1 real 0m4.339s user 0m1.863s sys 0m1.129s With patch: [root@blackbox ~]# sync [root@blackbox ~]# time ./a.out 1 real 0m3.005s user 0m1.162s sys 0m0.816s [root@blackbox ~]# sync [root@blackbox ~]# time ./a.out 1 real 0m3.188s user 0m1.222s sys 0m0.867s [root@blackbox ~]# sync [root@blackbox ~]# time ./a.out 1 real 0m2.999s user 0m1.131s sys 0m0.832s Change-Id: Id160a24f44b4dccfcfce99a6f69ddb8938523cd5 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6131 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: modify remove-brick CLI message.Ravishankar N ravishankar@redhat.com2013-11-072-6/+5
| | | | | | | | | | | | | | | | | | | | | In the current context "replica_cnt" is used just to know whether the specific key exists or not by calling "dict_get_int32", which we can replace by "dict_get ()". And changing the log message as it is more appropriate to say "migration of data" rather than "rebalance". This patch refactors commit 51c6fa7a354826744de98 against BZ 961669 reviewed on : http://review.gluster.org/5566 Change-Id: I48eae206a28d4083975e64407ed8fe4539f9c24b BUG: 1027270 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Original patch: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6001 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: susant palai <spalai@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: Fix excessive logging resulting from get_real_filename failure from sambaKrutika Dhananjay2013-11-071-2/+3
| | | | | | | | | | Change-Id: I641d028165da7b8501bd372c62d2df89a9d4db1f BUG: 1027174 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6237 Reviewed-by: poornima g <pgurusid@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gNFS: NFS segfaults with nfstest_posix toolSantosh Kumar Pradhan2013-11-042-31/+48
| | | | | | | | | | | | | | | | | | | | | | Problem: nfs3_stat_to_fattr3() missed a NULL check. FIX: (1) Added a NULL check. (2) In all fop cbk path, if the op_ret is -1 and op_errno is 0, then handle it as a special case. Set the NFS3 status as NFS3ERR_SERVERFAULT instead of NFS3_OK. (3) The other component of FIX would be in DHT module and is on the way. Change-Id: I6f03c9a02d794f8b807574f2755094dab1b90c92 BUG: 1010241 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6026 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>