summaryrefslogtreecommitdiffstats
path: root/xlators/storage/posix/src
Commit message (Collapse)AuthorAgeFilesLines
...
* storage/posix: Janitor should guard against dir renames.Pranith Kumar K2014-06-121-1/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Directory rename while a brick is down can cause gfid handle of that directory to be deleted until next lookup happens on that directory. *) Self-heal does not have intelligence to detect renames at the moment. So it has to delete the directory 'd' using special flags, because it has to perform 'rm -rf' of that directory as it is not empty. Posix xlator implements this by renaming the directory deleted to 'landfill' directory in '.glusterfs' where janitor thread will perform actual rm -rf by traversing the directory. Janitor thread wakes up every 10 minutes to check if there are any directories to be deleted and deletes them. As part of deleting it also deletes the gfid-handles. Steps to hit the problem: 1) On a replicate volume create a directory 'd', file in 'd' called 'f' so the directory 'd' is not empty. 2) bring one of the bricks down (lets call it brick-a, the other one is brick-b 3) Rename d to d1 4) When brick-a comes online again, self-heal deletes directory 'd' and creates directory 'd1' on brick-a for performing self-heal. So on brick-a, gfid-handle of 'd' pointing to 'da is deleted and recreated to point to 'd1'. 5) This directory 'b' with all its directory hierarchy (for now just the file 'f') will be under 'landfill' directory. 6) When janitor thread wakes up and deletes directory 'd' and gfid-handle of 'd' without realizing that it is now pointing to 'd1'. Thus 'd1' loses its gfid-handle Fix: Delete gfid-handle for a directory only when the gfid-handle is stale. Change-Id: I21265b3bd3852f0967d916aaa21108ae5c9e7373 BUG: 1101143 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7879 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Remove special handling of ENOTSUP for [f]setxattr.Pranith Kumar K2014-05-311-18/+2
| | | | | | | | | | | Change-Id: I7df7b2263336af0abe5bc91c674f9401aff6c3e0 BUG: 1098794 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7788 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Posix/readdirp : Avoid filling wrong stat info for dentry in readdirpSusant Palai2014-05-281-2/+5
| | | | | | | | | | | | | | | | | Problem: Upon no entry found for a dentry, posix_readdirp_fill used to fill the stat for the current entry with the previous one. Solution: Continue with other entries if lstat failed for current one Change-Id: Ic96b5900451ed6c8de59acf9fee2e116649d3cdb BUG: 1096578 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/7733 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: Don't allow open on symlinkPranith Kumar K2014-05-221-2/+8
| | | | | | | | | | | | Change-Id: Ie38ecad621d5cb351c607c9676814c573834cb0b BUG: 1099858 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7823 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: do not dereference gfid symlinks before ↵Xavier Hernandez2014-05-022-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | posix_handle_mkdir_hashes() Whenever a new directory is created, its corresponding gfid file must also be created. This was done first calling MAKE_HANDLE_PATH() to get the path of the gfid file, then calling posix_handle_mkdir_hashes() to create the parent directories of the gfid, and finally creating the soft-link. In normal circumstances, the gfid we want to create won't exist and MAKE_HANDLE_PATH() will return a simple path to the new gfid. However if the volume is damaged and a self-heal is running, it is possible that we try to create an already existing gfid. In this case, MAKE_HANDLE_PATH() will return a path to the directory instead of the path to the gfid. To solve this problem, every time a path to a gfid is needed, a call to MAKE_HANDLE_ABSPATH() is made instead of the call to MAKE_HANDLE_PATH(). Change-Id: Ic319cc38c170434db8e86e2f89f0b8c28c0d611a BUG: 859581 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/5075 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: Remove dead code in xattropPranith Kumar K2014-05-011-32/+0
| | | | | | | | | Change-Id: I48ba81ad342e3c8fe1d81f8e57e61676dd356fb0 BUG: 1092749 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7318 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: add list xattr capability to lookupPranith Kumar K2014-04-292-16/+94
| | | | | | | | | BUG: 1078061 Change-Id: Ie26d28b8a74aa0d1eceff14a84c3cd3e302dcdb5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7293 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Update references to the maillinglist to gluster-devel@gluster.orgNiels de Vos2014-04-271-1/+1
| | | | | | | | | | | | | | gluster-devel@nongnu.org has moved to gluster-devel@gluster.org. All occurrences in the current (non legacy) documentation and code have been adjusted. Change-Id: I053162e633f7ea14fd3eed239ded017df165147c BUG: 1091705 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/7573 Reviewed-by: Justin Clift <justin@gluster.org> Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* build: MacOSX Porting fixesHarshavardhana2014-04-243-34/+197
| | | | | | | | | | | | | | | | | | | | | git@forge.gluster.org:~schafdog/glusterfs-core/osx-glusterfs Working functionality on MacOSX - GlusterD (management daemon) - GlusterCLI (management cli) - GlusterFS FUSE (using OSXFUSE) - GlusterNFS (without NLM - issues with rpc.statd) Change-Id: I20193d3f8904388e47344e523b3787dbeab044ac BUG: 1089172 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Signed-off-by: Dennis Schafroth <dennis@schafroth.com> Tested-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Dennis Schafroth <dennis@schafroth.com> Reviewed-on: http://review.gluster.org/7503 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: GlusterFS Unit Test FrameworkLuis Pabon2014-03-061-0/+10
| | | | | | | | | | | | | | | | | | | | This patch will allow for developers to create unit tests for their code. Documentation has been added to the patch and is available here: doc/hacker-guide/en-US/markdown/unittest.md Also, unit tests are run when RPM is created. BUG: 1067059 Change-Id: I95cf8bb0354d4ca4ed4476a0f2385436a17d2369 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Luis Pabon <lpabon@redhat.com> Reviewed-on: http://review.gluster.org/7145 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Justin Clift <justin@gluster.org> Tested-by: Justin Clift <justin@gluster.org>
* Reduce logging caused by non-existing extended attributesNiels de Vos2014-03-011-2/+4
| | | | | | | | | | | | | | | | | This changes the following log messages from INFO (default value) to DEBUG. We do not really care if someone tries to read extended attributes that do not exist. [2013-12-09 12:19:05.924497] E [posix.c:3539:posix_fgetxattr] 0-dis-rep-posix: fgetxattr failed on key system.posix_acl_access (No data available) [2013-12-09 12:19:05.924545] I [server-rpc-fops.c:863:server_fgetxattr_cbk] 0-dis-rep-server: 13074: FGETXATTR 1 (b8381953-ffa5-40fa-90dd-ae122335cc4b) (system.posix_acl_access) ==> (No data available) Change-Id: Idbbeb026f81e67025a2b36d7bfeb125ad2a1f61b BUG: 1027174 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/7171 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: lgetxattr called with invalid keys on the bricksKaleb S. KEITHLEY2014-02-161-110/+112
| | | | | | | | | | | | | | More invalid keys have crept in since this was fixed. We need a better strategy for avoiding this than the current noticed-in-an-strace... Cleaning tabs while I'm at it. Change-Id: I2ea97f6d1ab2a9fd569b5b5e01a4de891401fb81 BUG: 765202 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/7003 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* add build-gfid option to enable pgfid tracking ...Krishnan Parthasarathi2014-02-141-1/+1
| | | | | | | | | | | .. for inode to pathname mapping Change-Id: I0486d85b02e86d739fc1d8ea16d118fb666abf60 BUG: 1064863 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6989 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: calculate checksum only over read dataAnand Avati2014-02-131-2/+2
| | | | | | | | | | | | | | If the last block of a file is not aligned to the requested size, checksum is calculated over junk data in the iobuf. Or it could be zeroes, resulting in a spurious checksum match in self-heal. Change-Id: I41422e08de90013dabfc348ec6fbb8ecdd4f8fb8 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6932 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* storage/posix: perform chmod after chown.Ravishankar N2014-02-111-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: When a replica brick is added to a volume, set-user-ID and set-group-ID permission bits of files are not set correctly in the new brick. The issue is in the posix_setattr() call where we do a chmod followed by a chown. But according to the man pages for chown: When the owner or group of an executable file are changed by an unprivileged user the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown(). Fix: Swap the chmod and chown calls in posix_setattr() Change-Id: I094e47a995c210d2fdbc23ae7a5718286e7a9cf8 BUG: 1058797 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/6862 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/marker-quota: more stringent error handling in rename.Raghavendra G2014-02-081-1/+4
| | | | | | | | | | | | | | | If an error occurs and op_errno is not set to non-zero value, we can end up in loosing a frame resulting in a hung syscall. This patch adds code setting op_errno appropriately in storage/posix and makes marker to set err to a default non-zero value in case of op_errno being zero. Change-Id: Idc2c3e843b932709a69b32ba67deb284547168f2 BUG: 833586 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/5032 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Problem : getfattr fails and mount point becomes inaccessible while Atin Mukherjee2014-02-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | fetching glusterfs.ancestry.path key value for a gfid from aux-gfid-mount based mount point. Analysis : The caller of posix_make_ancestryfromgfid() function i.e. posix_get_ancestry_non_directory() is sending an incorrect type value as an argument leading to a crash in posix_make_ancestral_node() (head dirent pointer is NULL and an attempt to dereference causes seg fault). For a non directory operation this piece of code should not have been executed but due to incorrect type value this happened. Solution : In posix_make_ancestryfromgfid() call type is passed as logical or of type and POSIX_ANCESTRY_PATH instead of logical or of POSIX_ANCESTRY_PATH and POSX_ANCESTRY_DENTRY. Change-Id: Iaf844bea91396c5e2ee295d5a082998fda66f0a9 BUG: 1060104 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/6892 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/quota: remove in-memory accounting of files in enforcerRaghavendra G2014-01-251-1/+0
| | | | | | | | | | | | | | | Accounting was done in enforcer (though marker is the ultimate source of truth) to offset cached directory size becoming stale. However, with enforcer being moved to brick we can no longer maintain correct cluster wide size for a directory. Hence removing accounting code from enforcer. Change-Id: I5ea94234da4da85ed5f5ced1354d8de3454b3fcb BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6434 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: HANDLE_PFX is redundant use GF_HIDDEN_PATH insteadHarshavardhana2014-01-223-13/+12
| | | | | | | | | | | | | GF_HIDDEN_PATH usage would help in better readability of the code and avoids bugs produced from redundant macro constants. Change-Id: I2fd7e92e87783ba462ae438ced2cf4f720a25f5c BUG: 990028 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6756 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* consolidate code for #ifdef HAVE_LINKAT usageVijay Bellur2014-01-142-32/+6
| | | | | | | | | | | | | | | | | | | | | sys_link() now does ifdef HAVE_LINKAT linkat (...) else link (...) endif Use sys_link() in all places where we previously had the conditional behavior. Change-Id: I8bce5ac1175efd2ba7ab4bb5b372f6d1e0365d28 BUG: 764655 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6633 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: UNWIND right op_error and op_errno in *setxattr()Vijay Bellur2014-01-142-4/+8
| | | | | | | | | | | | | | | | | | | | | | 1. errno was being set after gf_log() in posix_{f}handle_pair, this would cause errno to be overwritten. 2. dht would expect -1 for indication of failure in setxattr callback (dht_err_cbk()). posix_{f}setxattr has been changed to set op_ret as -1 instead of -op_errno. 3. dict_foreach() has been changed to return an error if the invoked fn() returns < 0. Bug report and test case credits to Zorro Lang <zlang@redhat.com> Change-Id: I96c15f12a5d7717b7584ba392f390a0b4f704a98 BUG: 1051896 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6684 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* pathinfo: Provide user namespace access.Vijaykumar M2014-01-031-0/+5
| | | | | | | | | | | Do not allow to setxattr for pathinfo This change was missed out when submitted patch: http://review.gluster.org/5101/ Change-Id: Ifd32d95089b9bacc5dee80a8b924bb8713dca8a1 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6535 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* pathinfo: Provide user namespace access.Vijaykumar M2013-12-162-5/+6
| | | | | | | | | | | | | | | | | Locality can be now queried by unprivileged users with key "glusterfs.pathinfo". Setting both "glusterfs.pathinfo" and "trusted.glusterfs.pathinfo" on disk is prevented with this patch. Original Author: Vijay Bellur <vbellur@redhat.com> Change-Id: I4f7a0db8ad59165c4aeda04b23173255157a8b79 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/5101 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: if brick-uid or brick-gid is not specified, do not setAnand Avati2013-12-121-12/+29
| | | | | | | | | | | | | | Current code would set owner uid/gid explicitly to 0/0 on start even if none was specified. Fix it. Change-Id: I72dec9e79c51bd1eb3af5334c42b7c23b01d0258 BUG: 1040275 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6476 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Lukáš Bezdička <lukas.bezdicka@gooddata.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: do not allow to set/get "trusted.glusterfs.volume-id" xattrVijaykumar M2013-11-261-0/+15
| | | | | | | | | Change-Id: I2e9a2264b1fd5ebc1ed0aff30225e89acbd0bcb4 BUG: 1034716 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6361 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: fix errno for non-existent GFIDAnand Avati2013-11-261-0/+6
| | | | | | | | | | | | | | | | | | | When clients refer to a GFID which does not exist, the errno to be returned in ESTALE (and not ENOENT). Even though ENOENT might look "proper" most of the time, as the application eventually expects ENOENT even if a parent directory does not exist, not returning ESTALE results in resolvers (FUSE and GFAPI) to not retry resolution in uncached mode. This can result in spurious ENOENTs during concurrent path modification operations. Change-Id: I7a06ea6d6a191739f2e9c6e333a1969615e05936 BUG: 1032894 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6318 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/marker: quota friendly changesRaghavendra G2013-11-261-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | * handles renames on dht linkfiles correctly * nameless lookup friendly changes. uses gfid-to-path conversion functionality from storage/posix to build ancestry till root. * log message cleanup. * build inode contexts in readdirp * Accounting still not correct with hardlinks. Credits: ======== Vijay Bellur <vbellur@redhat.com> Raghavendra Bhat <rabhat@redhat.com> Change-Id: I415b6fbbc9691f5a38d9fd3c5d083a61e578bb81 BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5953 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* posix: placeholders for GFID to path conversionRaghavendra G2013-11-265-152/+963
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | what? ===== The following is an attempt to generate the paths of a file when only its gfid is known. To find the path of a directory, the symlink handle to the directory maintained in the ".glusterfs" backend directory is read. The symlink handle is generated using the gfid of the directory. It (handle) contains the directory's name and parent gfid, which are used to recursively construct the absolute path as seen by the user from the mount point. A similar approach cannot be used for a regular file or a symbolic link since its hardlink handle, generated using its gfid, doesn't contain its parent gfid and basename. So xattrs are set to store the parent gfids and the number of hardlinks to a file or a symlink having the same parent gfid. When an user/application requests for the paths of a regular file or a symlink with multiple hardlinks, using the parent gfids stored in the xattrs, the paths of the parent directories are generated as mentioned earlier. The base names of the hardlinks (with the same parent gfid) are determined by matching the actual backend inode numbers of each entry in the parent directory with that of the hardlink handle. Xattr is set on a regular file, link, and symbolic link as follows, Xattr name : trusted.pgfid.<pargfidstr> Xattr value : <number of hardlinks to a regular file/symlink with the same parentgfid> If a regular file, hard link, symbolic link is created then an xattr in the above format is set in the backend. how to use? =========== This functionality can be used through getxattr interface. Two keys - glusterfs.ancestry.dentry and glusterfs.ancestry.path - enable usage of this functionality. A successful getxattr will have the result stored under same keys. Values will be, glusterfs.ancestry.dentry: -------------------------- A linked list of gf-dirent structures for all possible paths from root to this gfid. If there are multiple paths, the linked-list will be a series of paths one after another. Each path will be a series of dentries representing all components of the path. This key is primarily for internal usage within glusterfs. glusterfs.ancestry.path: ------------------------ A string containing all possible paths from root to this gfid. Multiple hardlinks of a file or a symlink are displayed as a colon seperated list (this could interfere with path components containing ':'). e.g. If there is a file "file1" in root directory with two hardlinks, "/dir2/link2tofile1" and "/dir1/link1tofile1", then [root@alpha gfsmntpt]# getfattr -n glusterfs.ancestry.path -e text file1 glusterfs.ancestry.path="/file1:/dir2/link2tofile1:/dir1/link1tofile1" Thanks Amar, Avati and Venky for the inputs. Original Author: Ramana Raja <rraja@redhat.com> BUG: 990028 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I0eaa9101e333e0c1f66ccefd9e95944dd4a27497 Reviewed-on: http://review.gluster.org/5951 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Build storage/posix xlator if fallocate() does not existsEmmanuel Dreyfus2013-11-201-1/+12
| | | | | | | | | | | If fallocate() does not exists, just return EOPNOTSUPP BUG: 764655 Change-Id: I808114f733c88985519dc47fb7537e1ced1db077 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6289 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Fixes for ZF reported by coverityM. Mohan Kumar2013-11-191-1/+5
| | | | | | | | | BUG: 1028673 Change-Id: I7c75738cca22c81c5629d579ef5bea24000e622e Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6291 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-141-8/+8
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs: zerofill supportM. Mohan Kumar2013-11-102-16/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: Fix excessive logging resulting from get_real_filename failure from sambaKrutika Dhananjay2013-11-071-2/+3
| | | | | | | | | | Change-Id: I641d028165da7b8501bd372c62d2df89a9d4db1f BUG: 1027174 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6237 Reviewed-by: poornima g <pgurusid@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: Fix readv FOPM. Mohan Kumar2013-10-172-10/+2
| | | | | | | | | | | Suggested by Anand Avati in BD xlator code review. Change-Id: I31c353a26dfdeb3d0023c3f7e03ed25461d13c16 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> BUG: 837495 Reviewed-on: http://review.gluster.org/6077 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: Lower log severity for ENODATA errors in getxattr()Vijay Bellur2013-10-151-1/+2
| | | | | | | | | Change-Id: I101e329cce4c1305615c63ebcb42355f6c3e85e0 BUG: 918052 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6084 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: Incorrect NFS ACL encoding for XFSSantosh Kumar Pradhan2013-09-292-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Incorrect NFS ACL encoding causes "system.posix_acl_default" setxattr failure on bricks on XFS file system. XFS (potentially others?) doesn't understand when the 0x10 prefix is added to the ACL type field for default ACLs (which the Linux NFS client adds) which causes setfacl()->setxattr() to fail silently. NFS client adds NFS_ACL_DEFAULT(0x1000) for default ACL. FIX: Mask the prefix (added by NFS client) OFF, so the setfacl is not rejected when it hits the FS. Original patch by: "Richard Wareing" Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5980 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: block unused signals in created threadsAnand Avati2013-09-253-7/+7
| | | | | | | | | | | | | | | Block all signal except those which are set for explicit handling in glusterfs_signals_setup(). Since thread spawning code in libglusterfs and xlators can get called from application threads when used through libgfapi, it is necessary to do this blocking. Change-Id: Ia320f80521a83d2edcda50b9ad414583a0175281 BUG: 1011662 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: remove GLUSTERFS_CREATE_MODE_KEY usageAmar Tumballi2013-08-231-14/+0
| | | | | | | | | Change-Id: I23b8cb7223b91a55af1cd4214f61bbe0e87351f6 BUG: 952029 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5683 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: changes to support gfid-accessAmar Tumballi2013-08-211-0/+14
| | | | | | | | | Change-Id: I38d2fdc47e4b805deafca6805e54807976ffdb7e Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 952029 Reviewed-on: http://review.gluster.org/5496 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Revert "fuse: auxiliary gfid mount support"Amar Tumballi2013-08-211-14/+0
| | | | | | | | | | | | | | | | | This reverts commit 4c0f4c8a89039b1fa1c9c015fb6f273268164c20. Conflicts: xlators/mount/fuse/src/fuse-bridge.c For build issues added CREATE_MODE_KEY definition in: libglusterfs/src/glusterfs.h Change-Id: I8093c2a0b5349b01e1ee6206025edbdbee43055e BUG: 952029 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5495 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: treat appending writes as stable writes.Anand Avati2013-08-131-2/+39
| | | | | | | | | | | | | Durability of appending writes is implicit in the file size. Therefore performing an explicit fsync() is unnecessary in such cases as self-heal can check for the size of file when pending changelog is not unambiguous. Change-Id: I05446180a91d20e0dbee5de5a7085b87d57f178a BUG: 927146 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5501 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* posix: Default value for `batch-fsync-delay-usec` should be '0'Harshavardhana2013-08-131-3/+3
| | | | | | | | | | | | Also fixes for failing testcase `./tests/bugs/bug-888174.t`, which has been failing sporadically for many patches. Change-Id: Ic7d2c95da5d3126623cec403207afadd449bf950 BUG: 927146 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/5620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Enable Open-fd-count query in writevPranith Kumar K2013-08-021-1/+39
| | | | | | | | | | Change-Id: I86bdf865730416150c10617dcbad5c037579acde BUG: 910217 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5433 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Revert "storage/posix: Remove the interim fix that handles the gfid race"shishir gowda2013-07-303-2/+70
| | | | | | | | | | | | | | | | | | This reverts commit 97807e75956a2d240282bc64fab1b71762de0546. In a distribute or distribute-replica volume, this fix is required to prevent gfid mis-match due to race issues. test script bug-767585-gfid.t needs a sleep of 2, cause after setting backend gfid directly, we try to heal, and with this fix, we do not allow setxattr of gfid within creation of 1 second if not created by itself Change-Id: Ie3f4b385416889fd5de444638a64a7eaaf24cd60 BUG: 951195 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5240 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* storage/posix: Fix conditional compiling for syncfsPranith Kumar K2013-07-241-1/+4
| | | | | | | | | Change-Id: Ief22e1c0f2b5074060752d70da41ae93f1028d62 BUG: 927146 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5381 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: implement batched fsync in a single threadAnand Avati2013-07-233-0/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of the extra fsync()s issued by AFR transaction, they could potentially "clog" all the io-threads denying unrelated operations from making progress. This patch assigns a dedicated thread to issues fsyncs, as an experimental feature to understand performance characteristics with the approach. As a basis, incoming individual fsync requests are grouped into batches, falling in the same @batch-fsync-delay-usec window of time. These windows can extend in practice, as processing of the previous batch can take longer than @batch-fsync-delay-usec while new requests are getting batched. The feature support three modes (similar to the -S modes of fs_mark) - syncfs: In this mode one syncfs() is issued per batch, instead of N fsync()s (one per file.) - syncfs-single-fsync: In this mode one syncfs() is issued per batch (which, on Linux, guarantees the completion of write-out of dirty pages in the filesystem up to that point) and one single fsync() to synchronize or flush the controller/drive cache. This corresponds to -S 2 of fsmark. - syncfs-reverse-fsync: In this mode, one syncfs() is issued per batch, and all the open files in that batch are fsync()'ed in the reverse order of the queue. This corresponds to -S 4 of fsmark. - reverse-fsync: In this mode, no syncfs() is issued and all the files in the batch are fsync()'ed in the reverse order. This corresponds to -S 3 of fsmark. Change-Id: Ia1e170a810c780c8d80e02cf910accc4170c4cd4 BUG: 927146 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: auxiliary gfid mount supportRaghavendra G2013-07-191-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * files can be accessed directly through their gfid and not just through their paths. For eg., if the gfid of a file is f3142503-c75e-45b1-b92a-463cf4c01f99, that file can be accessed using <gluster-mount>/.gfid/f3142503-c75e-45b1-b92a-463cf4c01f99 .gfid is a virtual directory used to seperate out the namespace for accessing files through gfid. This way, we do not conflict with filenames which can be qualified as uuids. * A new file/directory/symlink can be created with a pre-specified gfid. A setxattr done on parent directory with fuse_auxgfid_newfile_args_t initialized with appropriate fields as value to key "glusterfs.gfid.newfile" results in the entry <parent>/bname whose gfid is set to args.gfid. The contents of the structure should be in network byte order. struct auxfuse_symlink_in { char linkpath[]; /* linkpath is a null terminated string */ } __attribute__ ((__packed__)); struct auxfuse_mknod_in { unsigned int mode; unsigned int rdev; unsigned int umask; } __attribute__ ((__packed__)); struct auxfuse_mkdir_in { unsigned int mode; unsigned int umask; } __attribute__ ((__packed__)); typedef struct { unsigned int uid; unsigned int gid; char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid string * in canonical form. */ unsigned int st_mode; char bname[]; /* bname is a null terminated string */ union { struct auxfuse_mkdir_in mkdir; struct auxfuse_mknod_in mknod; struct auxfuse_symlink_in symlink; } __attribute__ ((__packed__)) args; } __attribute__ ((__packed__)) fuse_auxgfid_newfile_args_t; An initial consumer of this feature would be geo-replication to create files on slave mount with same gfids as that on master. It will also help gsyncd to access files directly through their gfids. gsyncd in its newer version will be consuming a changelog (of master) containing operations on gfids and sync corresponding files to slave. * Also, bring in support to heal gfids with a specific value. fuse-bridge sends across a gfid during a lookup, which storage translators assign to an inode (file/directory etc) if there is no gfid associated it. This patch brings in support to specify that gfid value from an application, instead of relying on random gfid generated by fuse-bridge. gfids can be healed through setxattr interface. setxattr should be done on parent directory. The key used is "glusterfs.gfid.heal" and the value should be the following structure whose contents should be in network byte order. typedef struct { char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid * string in canonical form */ char bname[]; /* a null terminated basename */ } __attribute__((__packed__)) fuse_auxgfid_heal_args_t; This feature can be used for upgrading older geo-rep setups where gfids of files are different on master and slave to newer setups where they should be same. One can delete gfids on slave using setxattr -x and .glusterfs and issue stat on all the files with gfids from master. Thanks to "Amar Tumballi" <amarts@redhat.com> and "Csaba Henk" <csaba@redhat.com> for their inputs. Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ie8ddc0fb3732732315c7ec49eab850c16d905e4e BUG: 952029 Reviewed-on: http://review.gluster.com/#/c/4702 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/4702 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: add a simple health-checkerNiels de Vos2013-07-033-0/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Goal of this health-checker is to detect fatal issues of the underlying storage that is used for exporting a brick. The current implementation requires the filesystem to detect the storage error, after which it will notify the parent xlators and exit the glusterfsd (brick) process to prevent further troubles. The interval the health-check runs can be configured per volume with the storage.health-check-interval option. The default interval is 30 seconds. It is not trivial to write an automated test-case with the current prove-framework. These are the manual steps that can be done to verify the functionality: - setup a Logical Volume (/dev/bz970960/xfs) and format is as XFS for brick usage - create a volume with the one brick # gluster volume create failing_xfs glufs1:/bricks/failing_xfs/data # gluster volume start failing_xfs - mount the volume and verify the functionality - make the storage fail (use device-mapper, or pull disks) # dmsetup table .. bz970960-xfs: 0 196608 linear 7:0 2048 # echo 0 196608 error > dmsetup-error-target # dmsetup load bz970960-xfs dmsetup-error-target # dmsetup resume bz970960-xfs # dmsetup table ... bz970960-xfs: 0 196608 error - notice the errors caught by syslog: Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 buf count 512 Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): I/O Error Detected. Shutting down filesystem Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s) Jun 24 11:31:49 vm130-32 kernel: VFS:Filesystem freeze failed Jun 24 11:31:50 vm130-32 GlusterFS[1969]: [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down Jun 24 11:32:09 vm130-32 kernel: XFS (dm-2): xfs_log_force: error 5 returned. Jun 24 11:32:20 vm130-32 GlusterFS[1969]: [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM - these errors are in the log of the brick as well: [2013-06-24 10:31:50.500607] W [posix-helpers.c:1102:posix_health_check_thread_proc] 0-failing_xfs-posix: stat() on /bricks/failing_xfs/data returned: Input/output error [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM - the glusterfsd process has exited correctly: # gluster volume status Status of volume: failing_xfs Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick glufs1:/bricks/failing_xfs/data N/A N N/A NFS Server on localhost 2049 Y 1897 Change-Id: Ic247fbefb97f7e861307a5998a9a7a3ecc80aa07 BUG: 971774 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/5176 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterfs: discard (hole punch) supportBrian Foster2013-06-131-27/+63
| | | | | | | | | | | | | | | | Add support for the DISCARD file operation. Discard punches a hole in a file in the provided range. Block de-allocation is implemented via fallocate() (as requested via fuse and passed on to the brick fs) but a separate fop is created within gluster to emphasize the fact that discard changes file data (the discarded region is replaced with zeroes) and must invalidate caches where appropriate. BUG: 963678 Change-Id: I34633a0bfff2187afeab4292a15f3cc9adf261af Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5090 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gluster: add fallocate fop supportBrian Foster2013-06-131-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement support for the fallocate file operation. fallocate allocates blocks for a particular inode such that future writes to the associated region of the file are guaranteed not to fail with ENOSPC. This patch adds fallocate support to the following areas: - libglusterfs - mount/fuse - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi BUG: 949242 Change-Id: Ice8e61351f9d6115c5df68768bc844abbf0ce8bd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4969 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>