<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/xlators/cluster/ec, branch v3.7.14</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>cluster/ec: Unlock stale locks when inodelk/entrylk/lk fails</title>
<updated>2016-07-30T01:04:22+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2016-06-11T13:13:42+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=4a49d3116b00811b7a4ba98f51bb27da0fa63d5c'/>
<id>4a49d3116b00811b7a4ba98f51bb27da0fa63d5c</id>
<content type='text'>
Thanks to Rafi for hinting a while back that this kind of
problem he saw once. I didn't think the theory was valid.
Could have caught it earlier if I had tested his theory.

 &gt;Change-Id: Iac6ffcdba2950aa6f8cf94f8994adeed6e6a9c9b
 &gt;BUG: 1344836
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/14703
 &gt;Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
 &gt;Tested-by: mohammed rafi  kc &lt;rkavunga@redhat.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;

BUG: 1361402
Change-Id: If9ccf0b3db7159b87ddcdc7b20e81cde8c3c76f0
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15040
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Thanks to Rafi for hinting a while back that this kind of
problem he saw once. I didn't think the theory was valid.
Could have caught it earlier if I had tested his theory.

 &gt;Change-Id: Iac6ffcdba2950aa6f8cf94f8994adeed6e6a9c9b
 &gt;BUG: 1344836
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/14703
 &gt;Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
 &gt;Tested-by: mohammed rafi  kc &lt;rkavunga@redhat.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;

BUG: 1361402
Change-Id: If9ccf0b3db7159b87ddcdc7b20e81cde8c3c76f0
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15040
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Handle absence of keys in some callback dict</title>
<updated>2016-07-27T07:00:04+00:00</updated>
<author>
<name>Ashish Pandey</name>
<email>aspandey@redhat.com</email>
</author>
<published>2016-06-17T12:22:56+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=1e3a8f47cd88c39c41519d143b001d45387eb4b8'/>
<id>1e3a8f47cd88c39c41519d143b001d45387eb4b8</id>
<content type='text'>
Problem: This issue arises when we do a rolling update
from 3.7.5 to 3.7.9.
For 4+2 volume running 3.7.5, if we update 2 nodes
and after heal completion  kill 2 older nodes, this
problem can be seen. After update and killing of
bricks, 2 nodes will return inodelk count key in dict
while other 2 nodes will not have inodelk count in dict.
This is also true for get-link-count.
During dictionary match , ec_dict_compare, this will
lead to mismatch of answers and the file operation
on mount point will fail with IO error.

Solution:
Don't match inode, entry and link count keys while
comparing two dictionaries. However, while combining the
data in ec_dict_combine, go through all the dictionaries
and select the maximum values received in different dicts
for these keys.

master-
http://review.gluster.org/#/c/14761/

Change-Id: I33546e3619fe8f909286ee48fb0df2009cd3d22f
BUG: 1360152
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/14761
Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15012
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: This issue arises when we do a rolling update
from 3.7.5 to 3.7.9.
For 4+2 volume running 3.7.5, if we update 2 nodes
and after heal completion  kill 2 older nodes, this
problem can be seen. After update and killing of
bricks, 2 nodes will return inodelk count key in dict
while other 2 nodes will not have inodelk count in dict.
This is also true for get-link-count.
During dictionary match , ec_dict_compare, this will
lead to mismatch of answers and the file operation
on mount point will fail with IO error.

Solution:
Don't match inode, entry and link count keys while
comparing two dictionaries. However, while combining the
data in ec_dict_combine, go through all the dictionaries
and select the maximum values received in different dicts
for these keys.

master-
http://review.gluster.org/#/c/14761/

Change-Id: I33546e3619fe8f909286ee48fb0df2009cd3d22f
BUG: 1360152
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/14761
Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/15012
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Fix race in timer cancellation</title>
<updated>2016-07-18T06:29:05+00:00</updated>
<author>
<name>Xavier Hernandez</name>
<email>xhernandez@datalab.es</email>
</author>
<published>2016-06-13T10:42:47+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=74d2aaf51c7ff601e4394cad9f8e23092267af55'/>
<id>74d2aaf51c7ff601e4394cad9f8e23092267af55</id>
<content type='text'>
A race in timer cancellation for delayed unlock could cause a crash
if the cancelling thread fails to cancel the timer because it has
already been fired but not executed, and the callback is scheduled
out of the CPU, delaying it until the thread has released important
resources needed by the callback.

This patch improves the handling of this case to make it robust.

Backport of:
&gt; Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4
&gt; BUG: 1345855
&gt; Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
&gt; Reviewed-on: http://review.gluster.org/14712
&gt; Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
&gt; NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
&gt; CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
&gt; Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;

Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4
BUG: 1346156
Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Reviewed-on: http://review.gluster.org/14724
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A race in timer cancellation for delayed unlock could cause a crash
if the cancelling thread fails to cancel the timer because it has
already been fired but not executed, and the callback is scheduled
out of the CPU, delaying it until the thread has released important
resources needed by the callback.

This patch improves the handling of this case to make it robust.

Backport of:
&gt; Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4
&gt; BUG: 1345855
&gt; Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
&gt; Reviewed-on: http://review.gluster.org/14712
&gt; Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
&gt; NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
&gt; CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
&gt; Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;

Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4
BUG: 1346156
Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Reviewed-on: http://review.gluster.org/14724
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Fix invalid __fd_unref() call</title>
<updated>2016-06-14T01:03:50+00:00</updated>
<author>
<name>Xavier Hernandez</name>
<email>xhernandez@datalab.es</email>
</author>
<published>2016-06-09T15:29:26+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=8383a5163d0b97105e600e9361d322882023295f'/>
<id>8383a5163d0b97105e600e9361d322882023295f</id>
<content type='text'>
__fd_unref() doesn't do any cleanup, so it cannot be called to release
fd references, specially if it's the last reference.

The code has been changed to avoid a call to this function.

In the previous version we always tried to keep the newest fd in the
ec_lock_t structure. However this is not necessary. We'll always keep
one reference to an open file on the same inode. It's irrelevant if
the reference is new or old.

The function __fd_unref() has also been removed from fd.h to avoid being
used in the future since it's useless as it's defined now.

Backport of http://review.gluster.org/14683

Change-Id: Ia728777fc8e464758d5ea4d3bf020f0603919039
BUG: 1344422
Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Reviewed-on: http://review.gluster.org/14685
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__fd_unref() doesn't do any cleanup, so it cannot be called to release
fd references, specially if it's the last reference.

The code has been changed to avoid a call to this function.

In the previous version we always tried to keep the newest fd in the
ec_lock_t structure. However this is not necessary. We'll always keep
one reference to an open file on the same inode. It's irrelevant if
the reference is new or old.

The function __fd_unref() has also been removed from fd.h to avoid being
used in the future since it's useless as it's defined now.

Backport of http://review.gluster.org/14683

Change-Id: Ia728777fc8e464758d5ea4d3bf020f0603919039
BUG: 1344422
Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Reviewed-on: http://review.gluster.org/14685
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Pass xdata to dht in case of error</title>
<updated>2016-06-13T08:30:20+00:00</updated>
<author>
<name>Ashish Pandey</name>
<email>aspandey@redhat.com</email>
</author>
<published>2016-06-09T10:49:37+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=fb7adf2e1f8ab582173fa38e5aaa584659ddd2a5'/>
<id>fb7adf2e1f8ab582173fa38e5aaa584659ddd2a5</id>
<content type='text'>
Problem: In case of mkdir failure, dht expects
error information so that it can act accordingly.
Aftre adding bricks and re balance, layout gets
changed. Fop "mkdir" with old layout returns EIO.
EC gets this error in xdata but does not pass it
back to dht. In this case dht will not be able to
take corrective action.

Solution: Return xdata back to dht

master -
http://review.gluster.org/#/c/14679/

Change-Id: I24def8038e6880607689b7b046dc6428f564c6ab
BUG: 1344595
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/14689
Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: In case of mkdir failure, dht expects
error information so that it can act accordingly.
Aftre adding bricks and re balance, layout gets
changed. Fop "mkdir" with old layout returns EIO.
EC gets this error in xdata but does not pass it
back to dht. In this case dht will not be able to
take corrective action.

Solution: Return xdata back to dht

master -
http://review.gluster.org/#/c/14679/

Change-Id: I24def8038e6880607689b7b046dc6428f564c6ab
BUG: 1344595
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/14689
Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Restrict the launch of replace brick heal</title>
<updated>2016-06-09T09:04:07+00:00</updated>
<author>
<name>Ashish Pandey</name>
<email>aspandey@redhat.com</email>
</author>
<published>2016-06-06T04:47:54+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=37ac79fe54e91a8ffe23318855f2cda582a27798'/>
<id>37ac79fe54e91a8ffe23318855f2cda582a27798</id>
<content type='text'>
Problem: When features.cache-invalidation is ON, a lot of
ec_notify function gets called which leads to launch of
too many heals. This leads to no heal completion,
which causes accumulation of heals.

Solution: ec_launch_replace_heal should not be launch
for every event. Replace brick will trigger a child up
event and then only this heal function should be called.

master -
http://review.gluster.org/#/c/14649/

Change-Id: I57b44c6a279d57230daea1d93229be6069245b7d
BUG: 1342964
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/14652
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: When features.cache-invalidation is ON, a lot of
ec_notify function gets called which leads to launch of
too many heals. This leads to no heal completion,
which causes accumulation of heals.

Solution: ec_launch_replace_heal should not be launch
for every event. Replace brick will trigger a child up
event and then only this heal function should be called.

master -
http://review.gluster.org/#/c/14649/

Change-Id: I57b44c6a279d57230daea1d93229be6069245b7d
BUG: 1342964
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/14652
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Fix issues with eager locking</title>
<updated>2016-05-04T11:29:04+00:00</updated>
<author>
<name>Xavier Hernandez</name>
<email>xhernandez@datalab.es</email>
</author>
<published>2016-04-28T06:42:40+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=6e1de9e46b12b25d27d852d3cccadc51768e1150'/>
<id>6e1de9e46b12b25d27d852d3cccadc51768e1150</id>
<content type='text'>
Due to a race in timer cancellation, in some cases it was possible
to unlock the lock while another concurrent fop that needed it
continues execution as if it were not released.

This patch also fixes an issue that caused a lock to not be released
if an error was found while preparing ec_update_size_version().

&gt; Change-Id: I1344a3f5ecfc333f05a09e62653838264c9c26b1
&gt; BUG: 1331254
&gt; Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
&gt; Reviewed-on: http://review.gluster.org/14112
&gt; Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
&gt; CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
&gt; Reviewed-by: Chen Chen &lt;chenchen@smartquerier.com&gt;
&gt; NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;

Change-Id: I21edd17d914dfa8d2f98e6bbde50830496e12a92
BUG: 1330132
Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Reviewed-on: http://review.gluster.org/14174
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Jeff Darcy &lt;jdarcy@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Due to a race in timer cancellation, in some cases it was possible
to unlock the lock while another concurrent fop that needed it
continues execution as if it were not released.

This patch also fixes an issue that caused a lock to not be released
if an error was found while preparing ec_update_size_version().

&gt; Change-Id: I1344a3f5ecfc333f05a09e62653838264c9c26b1
&gt; BUG: 1331254
&gt; Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
&gt; Reviewed-on: http://review.gluster.org/14112
&gt; Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
&gt; CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
&gt; Reviewed-by: Chen Chen &lt;chenchen@smartquerier.com&gt;
&gt; NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;

Change-Id: I21edd17d914dfa8d2f98e6bbde50830496e12a92
BUG: 1330132
Signed-off-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
Reviewed-on: http://review.gluster.org/14174
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Jeff Darcy &lt;jdarcy@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/afr: Don't lookup/forget inodes</title>
<updated>2016-04-17T14:11:08+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2016-03-28T11:01:12+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=20300fa96802ec6a0cd17edba38baf9639561d55'/>
<id>20300fa96802ec6a0cd17edba38baf9639561d55</id>
<content type='text'>
Problem:
All inodes that are looked-up are always forgotten without fail in
afr removing the benefits of them being in lru. This same code can
cause crashes if between inode_lookup, inode_forget in afr if the
top xlator does inode_forget(0).

Fix:
Don't use lookup/forget in afr. No benefits are there at the moment
for keeping this code. It is impossible to prevent top xlators to
do inode_forget(0). Found similar instances in ec
and removed them even though those code paths are not going to
be executed in any place other than heal-daemon.

 &gt;BUG: 1321554
 &gt;Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/13834
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;Reviewed-by: Anuradha Talur &lt;atalur@redhat.com&gt;

BUG: 1327864
Change-Id: I3507ed88cd75e069ed302525bfa259cf407871fb
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/14009
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
All inodes that are looked-up are always forgotten without fail in
afr removing the benefits of them being in lru. This same code can
cause crashes if between inode_lookup, inode_forget in afr if the
top xlator does inode_forget(0).

Fix:
Don't use lookup/forget in afr. No benefits are there at the moment
for keeping this code. It is impossible to prevent top xlators to
do inode_forget(0). Found similar instances in ec
and removed them even though those code paths are not going to
be executed in any place other than heal-daemon.

 &gt;BUG: 1321554
 &gt;Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/13834
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;Reviewed-by: Anuradha Talur &lt;atalur@redhat.com&gt;

BUG: 1327864
Change-Id: I3507ed88cd75e069ed302525bfa259cf407871fb
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/14009
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Do not ref dictionary in lookup</title>
<updated>2016-04-09T18:51:08+00:00</updated>
<author>
<name>Pranith Kumar K</name>
<email>pkarampu@redhat.com</email>
</author>
<published>2016-03-08T17:35:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=0830a6b05fbc7db89e984bb12c40c8eb7dbe119f'/>
<id>0830a6b05fbc7db89e984bb12c40c8eb7dbe119f</id>
<content type='text'>
Problem:
1) dict_for_each loops over the elements without any locks, so the members of
   the dictionary can be ref/unrefed while dict_for_each is executed by another
   thread leading to crashes.

Basically with distributed ec + disctributed replicate as cold, hot tiers. tier
sends a lookup which fails on ec. (By this time dict already contains ec
xattrs) After this lookup_everywhere code path is hit in tier which triggers
lookup on each of distribute's hash lookup but fails which leads to the cold,
hot dht's lookup_everywhere in two parallel epoll threads where in ec when it
tries to set trusted.ec.version/dirty/size as keys in the dictionary, the older
values against the same key get erased. While this erasing is going on if the
thread that is doing lookup on afr's subvolume accesses these keys either in
dict_copy_with_ref or client xlator trying to serialize, that can either lead
to crash or hang based on if the spin/mutex lock is called on invalid memory.

2) EC deletes GF_CONTENT_KEY from the dictionary, this may lead to extra reads
   in case of lookup-everwhere for tiered volumes.

Fix:
Do dict_copy_with_ref() for the lookup-dictionary.
This is avoiding the problem and is not actually fixing the 1st problem.
2nd problem will be fixed.

 &gt;Change-Id: I5427aa14c48cb7572977d4de9a28c5ffff2b4b95
 &gt;BUG: 1315560
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/13680
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
 &gt;(cherry picked from commit 64cba025b13aad7fb3020a04930cfa22fbfcb859)

Change-Id: I2828a0d9e730bc4b0ea6cee037365131767ae43e
BUG: 1322520
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13859
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem:
1) dict_for_each loops over the elements without any locks, so the members of
   the dictionary can be ref/unrefed while dict_for_each is executed by another
   thread leading to crashes.

Basically with distributed ec + disctributed replicate as cold, hot tiers. tier
sends a lookup which fails on ec. (By this time dict already contains ec
xattrs) After this lookup_everywhere code path is hit in tier which triggers
lookup on each of distribute's hash lookup but fails which leads to the cold,
hot dht's lookup_everywhere in two parallel epoll threads where in ec when it
tries to set trusted.ec.version/dirty/size as keys in the dictionary, the older
values against the same key get erased. While this erasing is going on if the
thread that is doing lookup on afr's subvolume accesses these keys either in
dict_copy_with_ref or client xlator trying to serialize, that can either lead
to crash or hang based on if the spin/mutex lock is called on invalid memory.

2) EC deletes GF_CONTENT_KEY from the dictionary, this may lead to extra reads
   in case of lookup-everwhere for tiered volumes.

Fix:
Do dict_copy_with_ref() for the lookup-dictionary.
This is avoiding the problem and is not actually fixing the 1st problem.
2nd problem will be fixed.

 &gt;Change-Id: I5427aa14c48cb7572977d4de9a28c5ffff2b4b95
 &gt;BUG: 1315560
 &gt;Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
 &gt;Reviewed-on: http://review.gluster.org/13680
 &gt;Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
 &gt;CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
 &gt;Reviewed-by: Xavier Hernandez &lt;xhernandez@datalab.es&gt;
 &gt;(cherry picked from commit 64cba025b13aad7fb3020a04930cfa22fbfcb859)

Change-Id: I2828a0d9e730bc4b0ea6cee037365131767ae43e
BUG: 1322520
Signed-off-by: Pranith Kumar K &lt;pkarampu@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13859
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
Reviewed-by: Ravishankar N &lt;ravishankar@redhat.com&gt;
Reviewed-by: Krutika Dhananjay &lt;kdhananj@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cluster/ec: Provide an option to enable/disable eager lock</title>
<updated>2016-03-21T05:52:53+00:00</updated>
<author>
<name>Ashish Pandey</name>
<email>aspandey@redhat.com</email>
</author>
<published>2016-03-04T07:35:09+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=46920e3bd38d9ae7c1910d0bd83eff309ab20c66'/>
<id>46920e3bd38d9ae7c1910d0bd83eff309ab20c66</id>
<content type='text'>
Problem: If a fop takes lock, and completes its operation,
it waits for 1 second before releasing the lock. However,
If ec find any lock contention within this time period,
it release the lock immediately before time expires. As we
take lock on first brick, for few operations, like read, it
might happen that discovery of lock contention might take
long time and can degrades the performance.

Solution: Provide an option to enable/disable eager lock.
If eager lock is disabled, lock will be released as soon
as fop completes.

gluster v set &lt;VOLUME NAME&gt; disperse.eager-lock on
gluster v set &lt;VOLUME NAME&gt; disperse.eager-lock off

master-
http://review.gluster.org/13605

Change-Id: I000985a787eba3c190fdcd5981dfbf04e64af166
BUG: 1318965
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13773
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: If a fop takes lock, and completes its operation,
it waits for 1 second before releasing the lock. However,
If ec find any lock contention within this time period,
it release the lock immediately before time expires. As we
take lock on first brick, for few operations, like read, it
might happen that discovery of lock contention might take
long time and can degrades the performance.

Solution: Provide an option to enable/disable eager lock.
If eager lock is disabled, lock will be released as soon
as fop completes.

gluster v set &lt;VOLUME NAME&gt; disperse.eager-lock on
gluster v set &lt;VOLUME NAME&gt; disperse.eager-lock off

master-
http://review.gluster.org/13605

Change-Id: I000985a787eba3c190fdcd5981dfbf04e64af166
BUG: 1318965
Signed-off-by: Ashish Pandey &lt;aspandey@redhat.com&gt;
Reviewed-on: http://review.gluster.org/13773
Smoke: Gluster Build System &lt;jenkins@build.gluster.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
