glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	tests: Fix timing issue in ec.t	Pranith Kumar K	2016-07-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Because of timing issue sometimes the mount is unmounted even before the version is updated, this is leading to not triggering heals. Fix: One way to fix this would be to increate 'sleep 2' to 'sleep 10' but that would slow things down. I changed the way ec learns it needs xattr healing so that it triggers heals even when the xattrs are not marked correctly. Change-Id: I1c82041166443ae7079dd99b89ea2ed170233ba3 BUG: 1359001 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/14980 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	nsr/jbr: Renaming nsr to jbr	Avra Sengupta	2016-04-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As per community consensus, we have decided to rename nsr to jbr(Journal-Based-Replication). This is the patch to rename the "nsr" code to "jbr" Change-Id: Id2a9837f2ec4da89afc32438b91a1c302bb4104f BUG: 1328043 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/13899 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
*	tests: use trap mechanism to ensure that proper cleanups happen	Jeff Darcy	2016-04-12	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This actually consists of several parts. * Added a generic cleanup-scheduling mechanism. Instead of calling "trap ... EXIT" directly, just call "push_trapfunc ..." instead and your cleanup function will be called along with any others. * Converted a few tests to use push_trapfunc. * Added "push_trapfunc cleanup_lvm" to snapshot.rc to address the particular problem that's driving this - snapshot tests not calling cleanup_lvm on their own and leaving bad state for the next test. Change-Id: I548a97a26328390992fc71ee1f03c0463703f9d7 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/13933 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
*	Tests portability: umount(8)	Emmanuel Dreyfus	2015-06-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) Avoid hangs on unmounting NFS on NetBSD NetBSD umount(8) on a NFS mount whose server is gone will wait forever because umount(8) calls realpath(3) and tries to access the mount before it calls unmount(2). The non-portable, NetBSD-specific umount -R flag prevent that behavior. We therefore introduce UMOUNT_F, defined as "umount -f" on Linux and "umount -f -R" on NetBSD to take care of forced unmounts, especially in the NFS case. 2) Enforce usage of force_umount wrapper with timeout Whenever umount is used it should be wrapped in force_umount with tiemout handling. That saves us timing issues, and it handles the NetBSD NFS case. 3) Cleanup kernel cache flush. We used (cd $M0 && umount $M0 ) as a portable kernel cache flush trick, but it does not flush everything we need on Linux. Introduce a drop_cache() shell function that reverts to previously used echo 3 > /proc/sys/vm/drop_caches on Linux, and keeps (cd $M0 && umount $M0 ) on other systems. BUG: 1129939 Change-Id: Iab1f5a023405f1f7270c42b595573702ca1eb6f3 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/11114 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	Regression test portability: ec.t	Emmanuel Dreyfus	2014-12-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	This test unmount/remount the filesystem to invalidate cache, but this leads to timing problems on NetBSD. We can work them around without sleeping by remounting on another mount point. BUG: 1129939 Change-Id: I10b3183e5e715053de162a6980af188710b607bb Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9285 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	ec: Fix rebalance issues	Xavier Hernandez	2014-10-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some issues in ec xlator made that rebalance didn't complete successfully and generated some warnings and errors in the log. The most critical error was a race condition that caused false corruption detection when two specific operations were executed sequentially and they shared the same lock. This explains the problem: 1. A setxattr is issued. 2. setxattr: ec locks the inode before updating the xattr. 3. setxattr: The xattr is updated. 4. setxattr: Upper xlator is notified that the operation completed. 5. setxattr: A background task is initiated to update the version of the file. 6. A stat is issued on the same file. 7. stat: Since the lock is already acquired, it's reused. 8. stat: A lookup is issued to determine version and size information of the file. At this point, operations 5 and 8 can interfere. This can make that lookup sees different information on each brick, determining that some bricks are corrupted and incorrectly excluding them from the operation and initiating a self-heal. In some cases this false detection combined with self-heal could lead to invalid updates of the trusted.ec.size xattr, leaving the file smaller than it should be. This only happens if the first operation does not perform a lookup, because chained operations reuse the information returned by the previous one, avoiding this kind of problems. To solve this, now the background update is executed atomically with the posterior unlock. This avoids some reuses of the lock while updating. However this reduces performance because the window in which new requests can reuse the lock is much smaller now. This has been alleviated by using the same technique implemented in AFR (i.e. waiting some time before releasing the lock). Some minor changes also introduced in this patch: * Bug in management of 'trusted.glusterfs.pathinfo' that was writing beyond the allocated space. * Uninitialized variable. * trusted.ec.config was not created for regular files created with mknod. * An invalid state was used in access fop. Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395 BUG: 1152902 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8947 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	ec: Fix self-heal issues	Xavier Hernandez	2014-10-21	1	-19/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Doing an 'ls' of a directory that has been modified while one of the bricks was down, sometimes returns the old directory contents. Cause: Directories are not marked when they are modified as files are. The ec xlator balances requests amongst available and healthy bricks. Since there is no way to detect that a directory is out of date in one of the bricks, it is used from time to time to return the directory contents. Solution: Basically the solution consists in use versioning information also for directories, however some additional changes have been necessary. Changes: * Use directory versioning: This required to lock full directory instead of a single entry for all requests that add or remove entries from it. This is needed to allow atomic version update. This affects the following fops: create, mkdir, mknod, link, symlink, rename, unlink, rmdir Another side effect is that opendir requires to do a previous lookup to get versioning information and discard out of date bricks for subsequent readdir(p) calls. * Restrict directory self-heal: Till now, when one discrepancy was found in lookup, a self-heal was automatically started. This caused the versioning information of a bad directory to be healed instantly, making the original problem to reapear again. To solve this, when a missing directory is detected in one or more bricks on lookup or opendir fops, only a partial self-heal is performed on it. A partial self-heal basically creates the directory but does not restore any additional information. This avoids that an 'ls' could repair the directory and cause the problem to happen again. With this change, output of 'ls' is always consistent. However, since the directory has been created in the brick, this allows any other operation on it (create new files, for example) to succeed on all bricks and not add additional work to the self-heal process. To force a self-heal of a directory, any other operation must be done on it. For example a getxattr. With these changes, the correct healing procedure that would avoid inconsistent directory browsing consists on a post-order traversal of directoriesi being healed. This way, the directory contents will be healed before healing the directory itslef. * Additional changes to fix self-heal errors - Don't use fop->fd to decide between fd/loc. open, opendir and create have an fd, but the correct data is in loc. - Fix incorrect management of bad bricks per inode/fd. - Fix incorrect selection of fop's target bricks when there are bad bricks involved. - Improved ec_loc_parent() to always return a parent loc as complete as possible. Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad BUG: 1149726 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8916 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	test/ec: Fix spurious failures caused by self-heal	Xavier Hernandez	2014-10-03	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sha1sum of a file may update the access time of that file. If this happens while a brick is down, as it is forced in the test, that brick doesn't get the update, getting out of sync. When the brick is restarted, self-heal repairs the file, but the test shouldn't access brick contents until self-heal finishes. If this is combined with a kill of another brick before self-heal has finished repairing the file, the volume could become inaccessible. Since the purpose of these tests is only to check ec functionality (there is another test that checks self-heal), the test that corrupts the file has been removed. Additional checks to validate the state of the volume have been added to avoid some timing issues. BUG: 1144108 Change-Id: Ibd9288de519914663998a1fbc4321ec92ed6082c Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8892 Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Tested-by: Emmanuel Dreyfus <manu@netbsd.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	Regression test portability: truncate	Emmanuel Dreyfus	2014-10-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use truncate -s 1M instead of truncate --size=1m for portability sake BUG: 1129939 Change-Id: I5bf6ca1f9bb4fa3c91796a659a06bf368776b3e5 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/8894 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Harshavardhana <harsha@harshavardhana.net>
*	porting: Provide setfattr/getfattr implementation	Harshavardhana	2014-09-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Use 'getfattr' properly avoid redundant options during xattr query - Untabify certain parts of tests (remove tabs) - Avoid backtick evaluation for certain values to make code more portable. - Use awk on FreeBSD/Darwin, since 'wc' implementation is broken and adds spurious spaces in its output. Change-Id: I7dcc0b70874e43b4cda8c306ed18a31b7a3f990a BUG: 1131713 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/8520 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Tested-by: Emmanuel Dreyfus <manu@netbsd.org>
*	porting: various fixes regression tests OSX/FreeBSD	Harshavardhana	2014-08-29	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- `wc -l` on OSX/FreeBSD adds spurious spaces, this clobbers up TAP output parsers - fix it. - `umount -l` doesn't exist on OSX/FreeBSD use 'umount -f' if available. - Add check for 'file' version, to handle mime type variations across versions - Converge 'glusterfs --attribute-timeout=0 --entry-timeout=0' into '$GFS' - Modify remaining 'mount -t nfs' to use 'mount_nfs' - Update sha1sum for OSX to use 'openssl sha1'. Change-Id: Id1012faa5d67a921513d220e7fa9cebafe830d34 BUG: 1131713 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/8501 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
*	Regression test portability: mktemp	Emmanuel Dreyfus	2014-08-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux mktemp accepts to run without a template, NetBSD mandates it. Since the template option has the same syntax, add it everywhere. While there, also do this in scripts outside of regression testing. BUG: 764655 Change-Id: I3ec140afbc9009257c81a56d77afcc21fef74cc4 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/8432 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Harshavardhana <harsha@harshavardhana.net>
*	cli/glusterd: Added support for dispersed volumes	Xavier Hernandez	2014-07-11	1	-0/+233
	Two new options have been added to the 'create' command of the cli interface: disperse [<count>] redundancy <count> Both are optional. A dispersed volume is created by specifying, at least, one of them. If 'disperse' is missing or it's present but '<count>' does not, the number of bricks enumerated in the command line is taken as the disperse count. If 'redundancy' is missing, the lowest optimal value is assumed. A configuration is considered optimal (for most workloads) when the disperse count - redundancy count is a power of 2. If the resulting redundancy is 1, the volume is created normally, but if it's greater than 1, a warning is shown to the user and he/she must answer yes/no to continue volume creation. If there isn't any optimal value for the given number of bricks, a warning is also shown and, if the user accepts, a redundancy of 1 is used. If 'redundancy' is specified and the resulting volume is not optimal, another warning is shown to the user. A distributed-disperse volume can be created using a number of bricks multiple of the disperse count. Change-Id: Iab93efbe78e905cdb91f54f3741599f7ea6645e4 BUG: 1118629 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/7782 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>