<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/tests/bugs, branch v3.4.0beta4</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>cluster/afr: Perform delayed changelog wakeups for anon fd</title>
<updated>2013-06-10T20:04:21+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2013-05-22T09:48:19+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=2fda7a9de2edd624a03558d9a9a580e5639fef1f'/>
<id>2fda7a9de2edd624a03558d9a9a580e5639fef1f</id>
<content type='text'>
Problem:
Nfs xlator never does open on a file for performing writes,
afr does not perform changelog wakeup for this fd so operations
which do metadata operations as soon as the data operations are
completed perceive a delay od 'post-op-delay-secs'.

Fix:
Perform changelog wakeup on anon-fd if the fd with same pid is
not present in inode-list.

Note:
This approach is a short-term fix. A proper fix needs a new domain
for taking metadata locks so that data/metadata locks don't compete
with each other.

BUG: 966018
Change-Id: Ia9188a253e7943801b665e1b9205e2f551952d87
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/5067
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Anand Avati &lt;avati@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
Nfs xlator never does open on a file for performing writes,
afr does not perform changelog wakeup for this fd so operations
which do metadata operations as soon as the data operations are
completed perceive a delay od 'post-op-delay-secs'.

Fix:
Perform changelog wakeup on anon-fd if the fd with same pid is
not present in inode-list.

Note:
This approach is a short-term fix. A proper fix needs a new domain
for taking metadata locks so that data/metadata locks don't compete
with each other.

BUG: 966018
Change-Id: Ia9188a253e7943801b665e1b9205e2f551952d87
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/5067
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Anand Avati &lt;avati@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>glusterd: Backport of vme table changes from master</title>
<updated>2013-06-05T12:24:06+00:00</updated>
<author>
<name>Kaushal M</name>
<email>kaushal@redhat.com</email>
</author>
<published>2013-05-06T12:34:47+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=4965109a3c7d456b9f19eb67cf023ba86069e6e7'/>
<id>4965109a3c7d456b9f19eb67cf023ba86069e6e7</id>
<content type='text'>
This patch backports the following changes from the master branch

99fe09f glusterd: Moved the volume entry table to a separate file.
e306d08 glusterd: Changing the volume entry table's representation.
eac54f6 glusterd: Added option description, and validation function fields.
bcb4235 glusterd: Added validation function for performance cache max and min size.
8897d08 glusterd: Added validation function for quota-timeout.
4579609 glusterd: Added validation function for stripe-block-size.
6788bad glusterd: Fix some options in vme table
549231d glusterd: Added the validation function for subvols-per-directory
9636e63 glusterd: Added description for nfs.transport-type option in volume set help.

Change-Id: I4a64ad94f17df4b45a3a32262a83e2c35fb5f7da
BUG: 907311
Signed-off-by: Kaushal M &lt;kaushal@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4956
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch backports the following changes from the master branch

99fe09f glusterd: Moved the volume entry table to a separate file.
e306d08 glusterd: Changing the volume entry table's representation.
eac54f6 glusterd: Added option description, and validation function fields.
bcb4235 glusterd: Added validation function for performance cache max and min size.
8897d08 glusterd: Added validation function for quota-timeout.
4579609 glusterd: Added validation function for stripe-block-size.
6788bad glusterd: Fix some options in vme table
549231d glusterd: Added the validation function for subvols-per-directory
9636e63 glusterd: Added description for nfs.transport-type option in volume set help.

Change-Id: I4a64ad94f17df4b45a3a32262a83e2c35fb5f7da
BUG: 907311
Signed-off-by: Kaushal M &lt;kaushal@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4956
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: Modified test to use remove-brick instead of 'start' variant</title>
<updated>2013-05-21T11:29:57+00:00</updated>
<author>
<name>Kaushal M</name>
<email>kaushal@redhat.com</email>
</author>
<published>2013-05-21T07:31:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=8abe8f794913daa81d9f1dc0ddab26c9c414abf8'/>
<id>8abe8f794913daa81d9f1dc0ddab26c9c414abf8</id>
<content type='text'>
        Backport of change f75be77 from master

remove-brick start doesn't remove the brick from the volume immediately.
It would wait until migration of data to other bricks are complete. Even
when there is no data to be migrated, one can expect a finite delay from
the time of remove-brick start command's exit and removal of brick(s).
This may cause subsequent checks on brick count to fail in a
non-deterministic manner.

Also, renamed the test file name to reflect bug-id corresponding to
community release.

BUG: 878004
Change-Id: Ic6e1360ae5a5280d0d7efe8c3e9a0aa57dddb508
Signed-off-by: Kaushal M &lt;kaushal@redhat.com&gt;
Reviewed-on: http://review.gluster.org/5052
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
        Backport of change f75be77 from master

remove-brick start doesn't remove the brick from the volume immediately.
It would wait until migration of data to other bricks are complete. Even
when there is no data to be migrated, one can expect a finite delay from
the time of remove-brick start command's exit and removal of brick(s).
This may cause subsequent checks on brick count to fail in a
non-deterministic manner.

Also, renamed the test file name to reflect bug-id corresponding to
community release.

BUG: 878004
Change-Id: Ic6e1360ae5a5280d0d7efe8c3e9a0aa57dddb508
Signed-off-by: Kaushal M &lt;kaushal@redhat.com&gt;
Reviewed-on: http://review.gluster.org/5052
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: Increase wait time for nfs mount.</title>
<updated>2013-05-21T04:47:43+00:00</updated>
<author>
<name>Vijay Bellur</name>
<email>vbellur@redhat.com</email>
</author>
<published>2013-05-20T13:10:52+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=9a4585f4b3a7ea7bb4b3931462c06cbf13c4fcff'/>
<id>9a4585f4b3a7ea7bb4b3931462c06cbf13c4fcff</id>
<content type='text'>
Change-Id: I61815b502c90314ea6924e3046fb9b396ff56e8b
BUG: 927616
Signed-off-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
Reviewed-on: http://review.gluster.org/5051
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Anand Avati &lt;avati@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: I61815b502c90314ea6924e3046fb9b396ff56e8b
BUG: 927616
Signed-off-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
Reviewed-on: http://review.gluster.org/5051
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Anand Avati &lt;avati@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>configure.ac: enable-debug change breaks _hardened_build</title>
<updated>2013-05-19T13:05:54+00:00</updated>
<author>
<name>Kaleb S. KEITHLEY</name>
<email>kkeithle@redhat.com</email>
</author>
<published>2013-05-13T18:07:01+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=4a631d7876e6de555683939dd87baedb11bcfd6a'/>
<id>4a631d7876e6de555683939dd87baedb11bcfd6a</id>
<content type='text'>
See RHBZ 955283, and http://fedoraproject.org/wiki/Packaging:Guidelines#PIE

The previous change for BZ 851092 in
commit 058a736f9e36238c284ca80e7ed5f62434655019
breaks the ability to enable _hardened_build in release-3.4 and master

wrt test/bugs/bug-884455.t; passes on master/HEAD, passes on my dev box,
passed once with prove -rfvc in run-tests.sh.

BUG: 851092
Change-Id: Ic2afc53bcdf11ede4a543b87aa7c7a3a41ed6f1d
Signed-off-by: Kaleb S. KEITHLEY &lt;kkeithle@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4997
Reviewed-by: Niels de Vos &lt;ndevos@redhat.com&gt;
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
See RHBZ 955283, and http://fedoraproject.org/wiki/Packaging:Guidelines#PIE

The previous change for BZ 851092 in
commit 058a736f9e36238c284ca80e7ed5f62434655019
breaks the ability to enable _hardened_build in release-3.4 and master

wrt test/bugs/bug-884455.t; passes on master/HEAD, passes on my dev box,
passed once with prove -rfvc in run-tests.sh.

BUG: 851092
Change-Id: Ic2afc53bcdf11ede4a543b87aa7c7a3a41ed6f1d
Signed-off-by: Kaleb S. KEITHLEY &lt;kkeithle@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4997
Reviewed-by: Niels de Vos &lt;ndevos@redhat.com&gt;
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: Ensure portmap registration before nfs mount.</title>
<updated>2013-05-16T00:31:55+00:00</updated>
<author>
<name>Vijay Bellur</name>
<email>vbellur@redhat.com</email>
</author>
<published>2013-05-15T10:00:33+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=3c34e97027ec614582559ac65c530eb290cfa7c9'/>
<id>3c34e97027ec614582559ac65c530eb290cfa7c9</id>
<content type='text'>
Change-Id: I12a660a7dfbe4a2d0428910d762434043395fe02
BUG: 927616
Signed-off-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
Reviewed-on: http://review.gluster.org/5010
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Anand Avati &lt;avati@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change-Id: I12a660a7dfbe4a2d0428910d762434043395fe02
BUG: 927616
Signed-off-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
Reviewed-on: http://review.gluster.org/5010
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Anand Avati &lt;avati@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Don't queue transactions during open-fd fix</title>
<updated>2013-05-09T03:52:37+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2013-02-20T04:23:41+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=2c80052dbe5aca895a13597e36add51f796000e0'/>
<id>2c80052dbe5aca895a13597e36add51f796000e0</id>
<content type='text'>
Before Anonymous fds are available, afr had to queue up
transactions if the file is not opened on one of its
subvolumes. This happens until the attempt to open the
file either succeeds or fails. These attempts happen
until the file is successfully opened on the subvolume.
Now client xlator uses anonymous fds to perform the fops
if the fd used for the fop is not 'opened'.
Fops will be successful even when the file is not opened
so there is no need to queue up the transactions anymore in afr.
Open is attempted on the subvolume where it is not
opened independent of the fop.

Change-Id: I6d59293023e2de41c606395028c8980b83faca3f
BUG: 953887
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4868
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before Anonymous fds are available, afr had to queue up
transactions if the file is not opened on one of its
subvolumes. This happens until the attempt to open the
file either succeeds or fails. These attempts happen
until the file is successfully opened on the subvolume.
Now client xlator uses anonymous fds to perform the fops
if the fd used for the fop is not 'opened'.
Fops will be successful even when the file is not opened
so there is no need to queue up the transactions anymore in afr.
Open is attempted on the subvolume where it is not
opened independent of the fop.

Change-Id: I6d59293023e2de41c606395028c8980b83faca3f
BUG: 953887
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4868
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>glusterd: big lock - a coarse-grained locking to prevent races</title>
<updated>2013-04-17T12:48:50+00:00</updated>
<author>
<name>Krishnan Parthasarathi</name>
<email>kparthas@redhat.com</email>
</author>
<published>2013-04-15T10:26:57+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=92729add67e2e7b8c7589c2dfab0bde071a7faf2'/>
<id>92729add67e2e7b8c7589c2dfab0bde071a7faf2</id>
<content type='text'>
There are primarily three lists that are part of glusterd process,
that are concurrently accessed. Namely, priv-&gt;volumes, priv-&gt;peers
and volinfo-&gt;bricks_list.

Big-lock approach
-----------------
WHAT IS IT?
Big lock is a coarse-grained lock which protects all three
lists, mentioned above, from racy access.

HOW DOES IT WORK?
At any given point in time, glusterd's thread(s) are in execution
_iff_ there is a preceding, inbound network event. Of course, the
sigwaiter thread and timer thread are exceptions.
A network event is an external trigger to glusterd, via the epoll
thread, in the form of POLLIN and POLLERR.
As long as we take the big-lock at all such entry points and yield
it when we are done, we are guaranteed that all the network events,
accessing the global lists, are serialised.

This amounts to holding the big lock at
- all the handlers of all the actors in glusterd. (POLLIN)
- all the cbks in glusterd. (POLLIN)
- rpc_notify (DISCONNECT event), if we access/modify
  one of the three lists. (POLLERR)

In the case of synctask'ized volume operations, we must remember that,
if we held the big lock for the entire duration of the handler,
we may block other non-synctask rpc actors from executing.
For eg, volume-start would block in PMAP SIGNIN, if done incorrectly.
To prevent this, we need to yield the big lock, when we yield the
synctask, and reacquire on waking up of the synctask.

BUG: 948686
Change-Id: I429832f1fed67bcac0813403d58346558a403ce9
Signed-off-by: Krishnan Parthasarathi &lt;kparthas@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4835
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are primarily three lists that are part of glusterd process,
that are concurrently accessed. Namely, priv-&gt;volumes, priv-&gt;peers
and volinfo-&gt;bricks_list.

Big-lock approach
-----------------
WHAT IS IT?
Big lock is a coarse-grained lock which protects all three
lists, mentioned above, from racy access.

HOW DOES IT WORK?
At any given point in time, glusterd's thread(s) are in execution
_iff_ there is a preceding, inbound network event. Of course, the
sigwaiter thread and timer thread are exceptions.
A network event is an external trigger to glusterd, via the epoll
thread, in the form of POLLIN and POLLERR.
As long as we take the big-lock at all such entry points and yield
it when we are done, we are guaranteed that all the network events,
accessing the global lists, are serialised.

This amounts to holding the big lock at
- all the handlers of all the actors in glusterd. (POLLIN)
- all the cbks in glusterd. (POLLIN)
- rpc_notify (DISCONNECT event), if we access/modify
  one of the three lists. (POLLERR)

In the case of synctask'ized volume operations, we must remember that,
if we held the big lock for the entire duration of the handler,
we may block other non-synctask rpc actors from executing.
For eg, volume-start would block in PMAP SIGNIN, if done incorrectly.
To prevent this, we need to yield the big lock, when we yield the
synctask, and reacquire on waking up of the synctask.

BUG: 948686
Change-Id: I429832f1fed67bcac0813403d58346558a403ce9
Signed-off-by: Krishnan Parthasarathi &lt;kparthas@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4835
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: fix dependency on sleep in bug-874498.t</title>
<updated>2013-04-17T11:36:27+00:00</updated>
<author>
<name>Krishnan Parthasarathi</name>
<email>kparthas@redhat.com</email>
</author>
<published>2013-04-15T10:22:53+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=63098d9ff8dcfc08fd2ed83c62c4ffb63fc2126f'/>
<id>63098d9ff8dcfc08fd2ed83c62c4ffb63fc2126f</id>
<content type='text'>
With the introduction of http://review.gluster.org/4784, there are
delays which breaks bug-874498.t which wrongly depends on healing
to finish within 2 seconds.

Fix this by using 'EXPECT_WITHIN 60' instead of sleep 2.

BUG: 874498
Change-Id: I7131699908e63b024d2dd71395b3e94c15fe925c
Original-author: Anand Avati &lt;avati@redhat.com&gt;
Signed-off-by: Krishnan Parthasarathi &lt;kparthas@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4832
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the introduction of http://review.gluster.org/4784, there are
delays which breaks bug-874498.t which wrongly depends on healing
to finish within 2 seconds.

Fix this by using 'EXPECT_WITHIN 60' instead of sleep 2.

BUG: 874498
Change-Id: I7131699908e63b024d2dd71395b3e94c15fe925c
Original-author: Anand Avati &lt;avati@redhat.com&gt;
Signed-off-by: Krishnan Parthasarathi &lt;kparthas@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4832
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: fix further issues with bug-874498.t</title>
<updated>2013-04-17T08:53:43+00:00</updated>
<author>
<name>Krishnan Parthasarathi</name>
<email>kparthas@redhat.com</email>
</author>
<published>2013-04-15T10:21:14+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=28da431e5bedba64380cc3886cbab03c0d7a3cfd'/>
<id>28da431e5bedba64380cc3886cbab03c0d7a3cfd</id>
<content type='text'>
The failure of bug-874498.t seems to be a "bug" in glustershd.
The situation seems to be when both subvolumes of a replica are
"local" to glustershd, and in such cases glustershd is sensitive
to the order in which the subvols come up.

The core of the issue itself is that, without the patch (#4784),
self-heal daemon completes the processing of index and no entries
are left inside the xattrop index after a few seconds of volume
start force. However with the patch, the stale "backing file"
(against which index performs link()) is left. The likely reason
is that an "INDEX" based crawl is not happening against the subvol
when this patch is applied.

Before #4784 patch, the order in which subvols came up was :

  [2013-04-09 22:55:35.117679] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49156, attached to remote volume '/d/backends/brick1'.
  ...
  [2013-04-09 22:55:35.118399] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49157, attached to remote volume '/d/backends/brick2'.

However, with the patch, the order is reversed:

  [2013-04-09 22:53:34.945370] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49153, attached to remote volume '/d/backends/brick2'.
  ...
  [2013-04-09 22:53:34.950966] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49152, attached to remote volume '/d/backends/brick1'.

The index in brick2 has the list of files/gfid to heal. It appears
to be the case that when brick1 is the first subvol to be detected
as coming up, somehow an INDEX based crawl is clearing all the
index entries in brick2, but if brick2 comes up as the first subvol,
then the backing file is left stale.

Also, doing a "gluster volume heal full" seems to leave out stale
backing files too. As the crawl is performed on the namespace and
the backing file is never encountered there to get cleared out.

So the interim (possibly permanent) fix is to have the script issue
a regular self-heal command (and not a "full" one).

The failure of the script itself is non-critical. The data files are
all healed, and it is just the backing file which is left behind. The
stale backing file too gets cleared in the next index based healing,
either triggered manually or after 10mins.

BUG: 874498
Change-Id: I601e9adec46bb7f8ba0b1ba09d53b83bf317ab6a
Original-author: Anand Avati &lt;avati@redhat.com&gt;
Signed-off-by: Krishnan Parthasarathi &lt;kparthas@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4831
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The failure of bug-874498.t seems to be a "bug" in glustershd.
The situation seems to be when both subvolumes of a replica are
"local" to glustershd, and in such cases glustershd is sensitive
to the order in which the subvols come up.

The core of the issue itself is that, without the patch (#4784),
self-heal daemon completes the processing of index and no entries
are left inside the xattrop index after a few seconds of volume
start force. However with the patch, the stale "backing file"
(against which index performs link()) is left. The likely reason
is that an "INDEX" based crawl is not happening against the subvol
when this patch is applied.

Before #4784 patch, the order in which subvols came up was :

  [2013-04-09 22:55:35.117679] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49156, attached to remote volume '/d/backends/brick1'.
  ...
  [2013-04-09 22:55:35.118399] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49157, attached to remote volume '/d/backends/brick2'.

However, with the patch, the order is reversed:

  [2013-04-09 22:53:34.945370] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49153, attached to remote volume '/d/backends/brick2'.
  ...
  [2013-04-09 22:53:34.950966] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49152, attached to remote volume '/d/backends/brick1'.

The index in brick2 has the list of files/gfid to heal. It appears
to be the case that when brick1 is the first subvol to be detected
as coming up, somehow an INDEX based crawl is clearing all the
index entries in brick2, but if brick2 comes up as the first subvol,
then the backing file is left stale.

Also, doing a "gluster volume heal full" seems to leave out stale
backing files too. As the crawl is performed on the namespace and
the backing file is never encountered there to get cleared out.

So the interim (possibly permanent) fix is to have the script issue
a regular self-heal command (and not a "full" one).

The failure of the script itself is non-critical. The data files are
all healed, and it is just the backing file which is left behind. The
stale backing file too gets cleared in the next index based healing,
either triggered manually or after 10mins.

BUG: 874498
Change-Id: I601e9adec46bb7f8ba0b1ba09d53b83bf317ab6a
Original-author: Anand Avati &lt;avati@redhat.com&gt;
Signed-off-by: Krishnan Parthasarathi &lt;kparthas@redhat.com&gt;
Reviewed-on: http://review.gluster.org/4831
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Vijay Bellur &lt;vbellur@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
