summaryrefslogtreecommitdiffstats
path: root/tests
diff options
context:
space:
mode:
authorPranith Kumar K <pkarampu@redhat.com>2019-04-16 14:19:47 +0530
committerShyamsundar Ranganathan <srangana@redhat.com>2019-05-08 13:54:59 +0000
commitbf69fa4727f2b4432d7d54312bc0177cbcf44936 (patch)
tree0c65ceaac4cc55cd6a42696c599264029d99ff5b /tests
parentab6c9ff91a56e38ca80b0ff855f3aeafefd99fbb (diff)
cluster/ec: fix fd reopen
Currently EC tries to reopen fd's that have been opened while a brick was down. This is done as part of regular write operations, just after having acquired the locks, and it's sent as a sub-fop of the main write fop. There were two problems: 1. The reopen was attempted on all UP bricks, even if a previous lock didn't succeed. This is incorrect because most probably the open will fail. 2. If reopen is sent and fails, the error is propagated to the main operation, causing it to fail when it shouldn't. To fix this, we only attempt reopens on bricks where the current fop owns a lock, and we prevent any error to be propagated to the main fop. To implement this behaviour an argument used to indicate the minimum number of required answers has overloaded to also include some flags. To make the change consistent, it has been necessary to rename the argument, which means that a lot of files have been changed. However there are no functional changes. This change has also uncovered a problem in discard code, which didn't correctely process requests of small sizes because no real discard fop was being processed, only a write of 0's on some region. In this case some fields of the fop remained uninitialized or with incorrect values. To fix this, a new function has been created to simulate success on a fop and it's used in the discard case. Thanks to Pranith for providing a test script that has also detected an issue in this patch. This patch includes a small modification of this script to force data to be written into bricks before stopping them. Backport of: > Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec > BUG: bz#1699866 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec Fixes: bz#1699917 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Diffstat (limited to 'tests')
-rw-r--r--tests/basic/ec/self-heal-read-write-fail.t69
-rw-r--r--tests/bugs/ec/bug-1699866-check-reopen-fd.t34
2 files changed, 103 insertions, 0 deletions
diff --git a/tests/basic/ec/self-heal-read-write-fail.t b/tests/basic/ec/self-heal-read-write-fail.t
new file mode 100644
index 00000000000..0ba591b5bb2
--- /dev/null
+++ b/tests/basic/ec/self-heal-read-write-fail.t
@@ -0,0 +1,69 @@
+#!/bin/bash
+
+. $(dirname $0)/../../include.rc
+. $(dirname $0)/../../volume.rc
+
+#This test verifies that self-heal fails when read/write fails as part of heal
+cleanup
+
+TEST glusterd
+TEST pidof glusterd
+TEST $CLI volume info
+
+TEST $CLI volume create $V0 disperse 3 redundancy 1 $H0:$B0/${V0}{0,1,2}
+TEST $CLI volume heal $V0 disable
+TEST $CLI volume start $V0
+
+TEST $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0
+EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0
+TEST touch $M0/a
+TEST kill_brick $V0 $H0 $B0/${V0}0
+echo abc >> $M0/a
+
+# Umount the volume to force all pending writes to reach the bricks
+EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
+
+#Load error-gen and fail read fop and test that heal fails
+TEST $CLI volume stop $V0 #Stop volume so that error-gen can be loaded
+TEST $CLI volume set $V0 debug.error-gen posix
+TEST $CLI volume set $V0 debug.error-fops read
+TEST $CLI volume set $V0 debug.error-number EBADF
+TEST $CLI volume set $V0 debug.error-failure 100
+
+TEST $CLI volume start $V0
+TEST $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0
+EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0
+EXPECT_WITHIN $HEAL_TIMEOUT "^2$" get_pending_heal_count $V0
+TEST ! getfattr -n trusted.ec.heal $M0/a
+EXPECT_WITHIN $HEAL_TIMEOUT "^2$" get_pending_heal_count $V0
+
+#fail write fop and test that heal fails
+TEST $CLI volume stop $V0
+TEST $CLI volume set $V0 debug.error-fops write
+
+TEST $CLI volume start $V0
+EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0
+EXPECT_WITHIN $HEAL_TIMEOUT "^2$" get_pending_heal_count $V0
+TEST ! getfattr -n trusted.ec.heal $M0/a
+EXPECT_WITHIN $HEAL_TIMEOUT "^2$" get_pending_heal_count $V0
+
+TEST $CLI volume stop $V0 #Stop volume so that error-gen can be disabled
+TEST $CLI volume reset $V0 debug.error-gen
+TEST $CLI volume reset $V0 debug.error-fops
+TEST $CLI volume reset $V0 debug.error-number
+TEST $CLI volume reset $V0 debug.error-failure
+
+TEST $CLI volume start $V0
+EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0
+EXPECT_WITHIN $HEAL_TIMEOUT "^2$" get_pending_heal_count $V0
+TEST getfattr -n trusted.ec.heal $M0/a
+EXPECT "^0$" get_pending_heal_count $V0
+
+#Test that heal worked as expected by forcing read from brick0
+#remount to make sure data is not served from any cache
+EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
+TEST $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0
+TEST kill_brick $V0 $H0 $B0/${V0}2
+EXPECT "abc" cat $M0/a
+
+cleanup
diff --git a/tests/bugs/ec/bug-1699866-check-reopen-fd.t b/tests/bugs/ec/bug-1699866-check-reopen-fd.t
new file mode 100644
index 00000000000..4386d010318
--- /dev/null
+++ b/tests/bugs/ec/bug-1699866-check-reopen-fd.t
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+. $(dirname $0)/../../include.rc
+. $(dirname $0)/../../volume.rc
+. $(dirname $0)/../../fileio.rc
+
+cleanup
+
+TEST glusterd
+TEST pidof glusterd
+TEST $CLI volume create $V0 disperse 6 redundancy 2 $H0:$B0/${V0}{0..5}
+TEST $CLI volume heal $V0 disable
+TEST $CLI volume set $V0 disperse.background-heals 0
+TEST $CLI volume set $V0 write-behind off
+TEST $CLI volume set $V0 open-behind off
+TEST $CLI volume start $V0
+TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0 $M0
+EXPECT_WITHIN $CHILD_UP_TIMEOUT "6" ec_child_up_count $V0 0
+
+TEST mkdir -p $M0/dir
+
+fd="$(fd_available)"
+
+TEST kill_brick $V0 $H0 $B0/${V0}0
+EXPECT_WITHIN $CHILD_UP_TIMEOUT "5" ec_child_up_count $V0 0
+
+TEST fd_open ${fd} rw $M0/dir/test
+TEST fd_write ${fd} "test1"
+TEST $CLI volume replace-brick ${V0} $H0:$B0/${V0}0 $H0:$B0/${V0}0_1 commit force
+EXPECT_WITHIN $CHILD_UP_TIMEOUT "6" ec_child_up_count $V0 0
+TEST fd_write ${fd} "test2"
+TEST fd_close ${fd}
+
+cleanup