<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/tests/bugs/replicate, branch v3.7.11</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>cluster/afr: Don't delete gfid-req from lookup request</title>
<updated>2016-04-12T11:57:36+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2016-02-27T17:38:06+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=e1396f078aec7dee9007f35c074a90391bdaf64f'/>
<id>e1396f078aec7dee9007f35c074a90391bdaf64f</id>
<content type='text'>
Problem:
Afr does dict_ref of the xattr_req that comes to it and deletes "gfid-req" key.
Dht uses same dict to send lookup to other subvolumes. So in case of
directories and more than 1 dht subvolumes, second subvolume till the last
subvolume won't get a lookup request with "gfid-req". So gfid reset never
happens on the directories in distributed replicate subvolume for 2nd till last
subvolumes.

Fix:
Make a copy of lookup xattr request.

Also fixed replies_wipe possibly resetting gfid to NULL gfid

 &gt;BUG: 1312816
 &gt;Change-Id: Ic16260e5a4664837d069c1dc05b9e96ca05bda88
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/13545
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
 &gt;(cherry picked from commit 9b022c3a3f2f774904b5b458ae065425b46cc15d)

Change-Id: Ia68193b559ec1dfd841cc5a22ef1fa801b866200
BUG: 1313693
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13574
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
Afr does dict_ref of the xattr_req that comes to it and deletes "gfid-req" key.
Dht uses same dict to send lookup to other subvolumes. So in case of
directories and more than 1 dht subvolumes, second subvolume till the last
subvolume won't get a lookup request with "gfid-req". So gfid reset never
happens on the directories in distributed replicate subvolume for 2nd till last
subvolumes.

Fix:
Make a copy of lookup xattr request.

Also fixed replies_wipe possibly resetting gfid to NULL gfid

 &gt;BUG: 1312816
 &gt;Change-Id: Ic16260e5a4664837d069c1dc05b9e96ca05bda88
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/13545
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
 &gt;(cherry picked from commit 9b022c3a3f2f774904b5b458ae065425b46cc15d)

Change-Id: Ia68193b559ec1dfd841cc5a22ef1fa801b866200
BUG: 1313693
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13574
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "cluster/ec: Rebalance hangs during rename"</title>
<updated>2016-04-01T07:15:51+00:00</updated>
<author>
<name>Kaushal M</name>
<email>kaushal@redhat.com</email>
</author>
<published>2016-04-01T07:11:54+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a7b3435af90e1e175fde31f61755d6fabda7ef'/>
<id>34a7b3435af90e1e175fde31f61755d6fabda7ef</id>
<content type='text'>
This reverts commit 3d34c495d547866a533bc0614b14163381830095, which
broke building rpms and possibly other packages as well.

Change-Id: I2c10a613599e63bc0cbdb1b405cd87be9efa4a99
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 3d34c495d547866a533bc0614b14163381830095, which
broke building rpms and possibly other packages as well.

Change-Id: I2c10a613599e63bc0cbdb1b405cd87be9efa4a99
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Rebalance hangs during rename</title>
<updated>2016-03-31T12:45:50+00:00</updated>
<author>
<name>Ashish Pandey</name>
<email>aspandey@redhat.com</email>
</author>
<published>2016-02-17T10:27:02+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=3d34c495d547866a533bc0614b14163381830095'/>
<id>3d34c495d547866a533bc0614b14163381830095</id>
<content type='text'>
Problem:
During the rename of a particular file (ec
is holding blocking inodelk on the parent
directory), if the rename of another file
under the same directory comes. EC does not
release the lock and goes ahead and renames
the "new" file with the "already held lock".

That causes rebalance process to be blocked
on a lock which has been acquired by rename.

Solution:
While rename fop comes, ec takes blocking inodelk
on old and new parent of the file. Before releasing,
every lock held by ec, it waits for some "time" to
see if that lock can be reused by the next fop.
If within this "time" some other request comes,
it releases this lock based on condition
"lock count &gt; 1"

To get this "lock count" for rename fop, we have
implemented "pl_rename" in feature/lock. Also,
on ec side, changed the condition to release the lock
based on the type of fop and old and new parent
directories.

master-
http://review.gluster.org/#/c/13460/

Change-Id: I979dbab1185df962e8f305a6074ae1186ffe7db0
Bug: 1322299
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13849
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
During the rename of a particular file (ec
is holding blocking inodelk on the parent
directory), if the rename of another file
under the same directory comes. EC does not
release the lock and goes ahead and renames
the "new" file with the "already held lock".

That causes rebalance process to be blocked
on a lock which has been acquired by rename.

Solution:
While rename fop comes, ec takes blocking inodelk
on old and new parent of the file. Before releasing,
every lock held by ec, it waits for some "time" to
see if that lock can be reused by the next fop.
If within this "time" some other request comes,
it releases this lock based on condition
"lock count &gt; 1"

To get this "lock count" for rename fop, we have
implemented "pl_rename" in feature/lock. Also,
on ec side, changed the condition to release the lock
based on the type of fop and old and new parent
directories.

master-
http://review.gluster.org/#/c/13460/

Change-Id: I979dbab1185df962e8f305a6074ae1186ffe7db0
Bug: 1322299
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13849
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: Add throttled background client-side heals</title>
<updated>2016-03-23T02:21:35+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2016-03-22T08:56:32+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=4a5d8f65b9b04385dcae8b16a650f4e8ed357f8b'/>
<id>4a5d8f65b9b04385dcae8b16a650f4e8ed357f8b</id>
<content type='text'>
Backport of: http://review.gluster.org/13207

If a heal is needed after inode refresh (lookup, read_txn), launch it in
the background instead of blocking the fop (that triggered refresh)
until the heal happens.

afr_replies_interpret() is modified such that the heal is
launched only if atleast one sink brick is up.

Max. no of heals that can happen in parallel is configurable via the
'background-self-heal-count' volume option. Any number greater than that
is put in a wait queue whose length is configurable via
'heal-wait-queue-leng' volume option. If the wait queue is also full,
further heals will be ignored.

Default values:  background-self-heal-count=8, heal-wait-queue-leng=128

Change-Id: I9a134b2c29d66b70b7b1278811bd504963aabacc
BUG: 1313312
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13564
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Backport of: http://review.gluster.org/13207

If a heal is needed after inode refresh (lookup, read_txn), launch it in
the background instead of blocking the fop (that triggered refresh)
until the heal happens.

afr_replies_interpret() is modified such that the heal is
launched only if atleast one sink brick is up.

Max. no of heals that can happen in parallel is configurable via the
'background-self-heal-count' volume option. Any number greater than that
is put in a wait queue whose length is configurable via
'heal-wait-queue-leng' volume option. If the wait queue is also full,
further heals will be ignored.

Default values:  background-self-heal-count=8, heal-wait-queue-leng=128

Change-Id: I9a134b2c29d66b70b7b1278811bd504963aabacc
BUG: 1313312
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13564
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Provide an option to enable/disable eager lock</title>
<updated>2016-03-21T05:52:53+00:00</updated>
<author>
<name>Ashish Pandey</name>
<email>aspandey@redhat.com</email>
</author>
<published>2016-03-04T07:35:09+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=46920e3bd38d9ae7c1910d0bd83eff309ab20c66'/>
<id>46920e3bd38d9ae7c1910d0bd83eff309ab20c66</id>
<content type='text'>
Problem: If a fop takes lock, and completes its operation,
it waits for 1 second before releasing the lock. However,
If ec find any lock contention within this time period,
it release the lock immediately before time expires. As we
take lock on first brick, for few operations, like read, it
might happen that discovery of lock contention might take
long time and can degrades the performance.

Solution: Provide an option to enable/disable eager lock.
If eager lock is disabled, lock will be released as soon
as fop completes.

gluster v set &lt;VOLUME NAME&gt; disperse.eager-lock on
gluster v set &lt;VOLUME NAME&gt; disperse.eager-lock off

master-
http://review.gluster.org/13605

Change-Id: I000985a787eba3c190fdcd5981dfbf04e64af166
BUG: 1318965
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13773
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: If a fop takes lock, and completes its operation,
it waits for 1 second before releasing the lock. However,
If ec find any lock contention within this time period,
it release the lock immediately before time expires. As we
take lock on first brick, for few operations, like read, it
might happen that discovery of lock contention might take
long time and can degrades the performance.

Solution: Provide an option to enable/disable eager lock.
If eager lock is disabled, lock will be released as soon
as fop completes.

gluster v set &lt;VOLUME NAME&gt; disperse.eager-lock on
gluster v set &lt;VOLUME NAME&gt; disperse.eager-lock off

master-
http://review.gluster.org/13605

Change-Id: I000985a787eba3c190fdcd5981dfbf04e64af166
BUG: 1318965
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13773
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>heal: Remove sleep()</title>
<updated>2016-02-12T04:00:04+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2016-02-10T09:58:39+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=059710ad62d828d40760c9c63a109286e5e5dca3'/>
<id>059710ad62d828d40760c9c63a109286e5e5dca3</id>
<content type='text'>
I wrote this program from a sample gfapi program which had sleep.
I am not sure why this sleep was needed. So removing it now.

Changed tests/bugs/replicate/bug-1190069-afr-stale-index-entries.t
to execute count_sh_entries every second, instead of comparing
same value over and over.

 &gt;Change-Id: I7b89d6cab3e50bb7bf4d40a6064f2d8734155bea
 &gt;BUG: 1306199
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/13421
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
 &gt;(cherry picked from commit 320779d53ae013147d5e2556d2946c73e45734ab)

Change-Id: Ia98bb4b35b0e778d777705a03b2415f2093863f7
BUG: 1306738
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13431
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I wrote this program from a sample gfapi program which had sleep.
I am not sure why this sleep was needed. So removing it now.

Changed tests/bugs/replicate/bug-1190069-afr-stale-index-entries.t
to execute count_sh_entries every second, instead of comparing
same value over and over.

 &gt;Change-Id: I7b89d6cab3e50bb7bf4d40a6064f2d8734155bea
 &gt;BUG: 1306199
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/13421
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
 &gt;(cherry picked from commit 320779d53ae013147d5e2556d2946c73e45734ab)

Change-Id: Ia98bb4b35b0e778d777705a03b2415f2093863f7
BUG: 1306738
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13431
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Fix heal-info slow response while IO is in progress</title>
<updated>2016-02-05T04:31:54+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2016-02-01T06:16:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=476abe074b63e4b348b48af9b04a3d27244d7d17'/>
<id>476abe074b63e4b348b48af9b04a3d27244d7d17</id>
<content type='text'>
        Backport of: http://review.gluster.org/#/c/13326/

Now heal-info does an open() on the file being examined so that
the client at some point sees open-fd count being &gt; 1 and releases
the eager-lock so that heal-info doesn't remain blocked forever
until IO completes.

Change-Id: I7d4a8aa4de459216408b666894ee7bb42e406547
BUG: 1303899
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13348
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Tested-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
        Backport of: http://review.gluster.org/#/c/13326/

Now heal-info does an open() on the file being examined so that
the client at some point sees open-fd count being &gt; 1 and releases
the eager-lock so that heal-info doesn't remain blocked forever
until IO completes.

Change-Id: I7d4a8aa4de459216408b666894ee7bb42e406547
BUG: 1303899
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13348
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Tested-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: Fix spurious failure in bug-1221481-allow-fops-on-dir-split-brain.t</title>
<updated>2016-02-02T10:47:28+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2016-01-07T07:31:02+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=10153901f0649b2d12c505a0f9fbef7a69acf128'/>
<id>10153901f0649b2d12c505a0f9fbef7a69acf128</id>
<content type='text'>
        Backport of: http://review.gluster.org/#/c/13172/

Occasionally, when ls is executed, prior to READDIRP, a STAT is wound
on the operand directory. And AFR fails STAT with EIO if it is in
metadata split-brain which "dir" is in the test case in question.
As a result, ls also fails with EIO, causing test 20 to return negative
exit status.
The fix is in the test script where the parts that cause the dir to go
into metadata split-brain have been removed. Now "dir" will only have
entry split-brain.

Change-Id: Icf3998ad6f8735c283171e22445406a2eaaaa23f
BUG: 1296400
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13190
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Tested-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
        Backport of: http://review.gluster.org/#/c/13172/

Occasionally, when ls is executed, prior to READDIRP, a STAT is wound
on the operand directory. And AFR fails STAT with EIO if it is in
metadata split-brain which "dir" is in the test case in question.
As a result, ls also fails with EIO, causing test 20 to return negative
exit status.
The fix is in the test script where the parts that cause the dir to go
into metadata split-brain have been removed. Now "dir" will only have
entry split-brain.

Change-Id: Icf3998ad6f8735c283171e22445406a2eaaaa23f
BUG: 1296400
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13190
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Tested-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Fix data loss due to race between sh and ongoing write.</title>
<updated>2016-01-29T01:43:33+00:00</updated>
<author>
<name>Krutika Dhananjay</name>
<email>kdhananj@redhat.com</email>
</author>
<published>2015-12-17T12:11:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=b43aa481712dab5df813050119ba6c08f50dbfd9'/>
<id>b43aa481712dab5df813050119ba6c08f50dbfd9</id>
<content type='text'>
        Backport of: http://review.gluster.org/#/c/13001/

Problem:

When IO is happening on a file and a brick goes down comes back up
during this time, protocol/client translator attempts reopening of the
fd on the gfid handle of the file. But if another client renames this
file while a brick was down &amp;&amp; writes were in progress on it, once this
brick is back up, there can be a race between reopening of the fd and
entry self-heal replaying the effect of the rename() on the sink brick.
If the reopening of the fd happens first, the application's writes
continue to go into the data blocks associated with the gfid.
Now entry-self-heal deletes 'src' and creates 'dst' file on the sink,
marking dst as a 'newentry'.  Data self-heal is also completed on 'dst'
as a result and self-heal terminates. If at this point the application
is still writing to this fd, all writes on the file after self-heal
would go into the data blocks associated with this fd, which would be
lost once the fd is closed. The result - the 'dst' file on the source
and sink are not the same and there is no pending heal on the file,
leading to silent corruption on the sink.

Fix:

Leverage http://review.gluster.org/#/c/12816/ to ensure the gfid handle
path gets saved in .glusterfs/unlink until the fd is closed on the file.
During this time, when self-heal sends mknod() with gfid of the file,
do the following:
link() the gfid handle under .glusterfs/unlink to the new path to be
created in mknod() and
rename() the gfid handle to go back under .glusterfs/ab/cd/.


Change-Id: I5dc49c127ef0a1bf3cf4ce1b24610b1527f84d6f
BUG: 1293265
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13036
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Tested-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
        Backport of: http://review.gluster.org/#/c/13001/

Problem:

When IO is happening on a file and a brick goes down comes back up
during this time, protocol/client translator attempts reopening of the
fd on the gfid handle of the file. But if another client renames this
file while a brick was down &amp;&amp; writes were in progress on it, once this
brick is back up, there can be a race between reopening of the fd and
entry self-heal replaying the effect of the rename() on the sink brick.
If the reopening of the fd happens first, the application's writes
continue to go into the data blocks associated with the gfid.
Now entry-self-heal deletes 'src' and creates 'dst' file on the sink,
marking dst as a 'newentry'.  Data self-heal is also completed on 'dst'
as a result and self-heal terminates. If at this point the application
is still writing to this fd, all writes on the file after self-heal
would go into the data blocks associated with this fd, which would be
lost once the fd is closed. The result - the 'dst' file on the source
and sink are not the same and there is no pending heal on the file,
leading to silent corruption on the sink.

Fix:

Leverage http://review.gluster.org/#/c/12816/ to ensure the gfid handle
path gets saved in .glusterfs/unlink until the fd is closed on the file.
During this time, when self-heal sends mknod() with gfid of the file,
do the following:
link() the gfid handle under .glusterfs/unlink to the new path to be
created in mknod() and
rename() the gfid handle to go back under .glusterfs/ab/cd/.


Change-Id: I5dc49c127ef0a1bf3cf4ce1b24610b1527f84d6f
BUG: 1293265
Signed-off-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13036
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Tested-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>afr: modify afr_txn_nothing_failed()</title>
<updated>2015-08-31T15:03:09+00:00</updated>
<author>
<name>Ravishankar N</name>
<email>ravishankar@redhat.com</email>
</author>
<published>2015-08-04T13:07:47+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=7924eb1a11fe0b1443903a69b7e93e4767061064'/>
<id>7924eb1a11fe0b1443903a69b7e93e4767061064</id>
<content type='text'>
Backport of http://review.gluster.org/#/c/11827/

In an AFR transaction, we need to consider something as failed only if the
failure (either in the pre-op or the FOP phase) occurs on the bricks on which a
transaction lock was obtained.

Without this, we would end up considering the transaction as failure even on the
bricks on which the lock was not obtained, resulting in unnecessary fsyncs
during the post-op phase of every write transaction for non-appending writes.

Change-Id: Iee79e5d85dc7b4c41459d8bdd04a8454bdaf9a9d
BUG: 1255698
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-on: http://review.gluster.org/11985
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Tested-by: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Backport of http://review.gluster.org/#/c/11827/

In an AFR transaction, we need to consider something as failed only if the
failure (either in the pre-op or the FOP phase) occurs on the bricks on which a
transaction lock was obtained.

Without this, we would end up considering the transaction as failure even on the
bricks on which the lock was not obtained, resulting in unnecessary fsyncs
during the post-op phase of every write transaction for non-appending writes.

Change-Id: Iee79e5d85dc7b4c41459d8bdd04a8454bdaf9a9d
BUG: 1255698
Signed-off-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-on: http://review.gluster.org/11985
Tested-by: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
Tested-by: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
