<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glusterfs.git/tests/bugs/glusterd, branch v4.0dev</title>
<subtitle></subtitle>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/'/>
<entry>
<title>glusterfs: Not able to mount running volume after enable brick mux and stopped any volume</title>
<updated>2017-05-31T20:43:53+00:00</updated>
<author>
<name>Mohit Agrawal</name>
<email>moagrawa@redhat.com</email>
</author>
<published>2017-05-25T16:13:42+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=dba55ae364a2772904bb68a6bd0ea87289ee1470'/>
<id>dba55ae364a2772904bb68a6bd0ea87289ee1470</id>
<content type='text'>
Problem: After enabled brick mux if any volume has down and then try ot run mount
         with running volume , mount command is hung.

Solution: After enable brick mux server has shared one data structure server_conf
          for all associated subvolumes.After down any subvolume in some
          ungraceful manner (remove brick directory) posix xlator sends
          GF_EVENT_CHILD_DOWN event to parent xlatros and server notify
          updates the child_up to false in server_conf.When client is trying
          to communicate with server through mount it checks conf-&gt;child_up
          and it is FALSE so it throws message "translator are not yet ready".
          From this patch updated structure server_conf to save child_up status
          for xlator wise. Another improtant correction from this patch is
          cleanup threads from server side xlators after stop the volume.

BUG: 1453977
Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d
Signed-off-by: Mohit Agrawal &lt;moagrawa@redhat.com&gt;
Reviewed-on: https://review.gluster.org/17356
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Raghavendra Talur &lt;rtalur@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Jeff Darcy &lt;jeff@pl.atyp.us&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: After enabled brick mux if any volume has down and then try ot run mount
         with running volume , mount command is hung.

Solution: After enable brick mux server has shared one data structure server_conf
          for all associated subvolumes.After down any subvolume in some
          ungraceful manner (remove brick directory) posix xlator sends
          GF_EVENT_CHILD_DOWN event to parent xlatros and server notify
          updates the child_up to false in server_conf.When client is trying
          to communicate with server through mount it checks conf-&gt;child_up
          and it is FALSE so it throws message "translator are not yet ready".
          From this patch updated structure server_conf to save child_up status
          for xlator wise. Another improtant correction from this patch is
          cleanup threads from server side xlators after stop the volume.

BUG: 1453977
Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d
Signed-off-by: Mohit Agrawal &lt;moagrawa@redhat.com&gt;
Reviewed-on: https://review.gluster.org/17356
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Raghavendra Talur &lt;rtalur@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Jeff Darcy &lt;jeff@pl.atyp.us&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libglusterfs : Fix crash in glusterd while peer probing</title>
<updated>2017-05-26T12:08:36+00:00</updated>
<author>
<name>Gaurav Yadav</name>
<email>gyadav@redhat.com</email>
</author>
<published>2017-05-22T17:55:47+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=23930326e0378edace9c8c41e8ae95931a2f68ba'/>
<id>23930326e0378edace9c8c41e8ae95931a2f68ba</id>
<content type='text'>
glusterd crashes when port is being set explcitly to a
range which is outside greater than short data type range.
Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
In above case glusterd crashes while parsing the port.

With this fix glusterd will be able to handle port range
between INT_MIN to INT_MAX

Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
BUG: 1454418
Signed-off-by: Gaurav Yadav &lt;gyadav@redhat.com&gt;
Reviewed-on: https://review.gluster.org/17359
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Samikshan Bairagya &lt;samikshan@gmail.com&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
Reviewed-by: Niels de Vos &lt;ndevos@redhat.com&gt;
Reviewed-by: Jeff Darcy &lt;jeff@pl.atyp.us&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
glusterd crashes when port is being set explcitly to a
range which is outside greater than short data type range.
Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
In above case glusterd crashes while parsing the port.

With this fix glusterd will be able to handle port range
between INT_MIN to INT_MAX

Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
BUG: 1454418
Signed-off-by: Gaurav Yadav &lt;gyadav@redhat.com&gt;
Reviewed-on: https://review.gluster.org/17359
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Samikshan Bairagya &lt;samikshan@gmail.com&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
Reviewed-by: Niels de Vos &lt;ndevos@redhat.com&gt;
Reviewed-by: Jeff Darcy &lt;jeff@pl.atyp.us&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>glusterd: Don't spawn new glusterfsds on node reboot with brick-mux</title>
<updated>2017-05-18T16:45:28+00:00</updated>
<author>
<name>Samikshan Bairagya</name>
<email>samikshan@gmail.com</email>
</author>
<published>2017-05-16T09:37:21+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=13e7b3b354a252ad4065f7b2f0f805c40a3c5d18'/>
<id>13e7b3b354a252ad4065f7b2f0f805c40a3c5d18</id>
<content type='text'>
With brick multiplexing enabled, upon a node reboot new bricks were
not being attached to the first spawned brick process even though
there wasn't any compatibility issues.

The reason for this is that upon glusterd restart after a node
reboot, since brick services aren't running, glusterd starts the
bricks in a "no-wait" mode. So after a brick process is spawned for
the first brick, there isn't enough time for the corresponding pid
file to get populated with a value before the compatibilty check is
made for the next brick.

This commit solves this by iteratively waiting for the pidfile to be
populated in the brick compatibility comparison stage before checking
if the brick process is alive.

Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4
BUG: 1451248
Signed-off-by: Samikshan Bairagya &lt;samikshan@gmail.com&gt;
Reviewed-on: https://review.gluster.org/17307
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With brick multiplexing enabled, upon a node reboot new bricks were
not being attached to the first spawned brick process even though
there wasn't any compatibility issues.

The reason for this is that upon glusterd restart after a node
reboot, since brick services aren't running, glusterd starts the
bricks in a "no-wait" mode. So after a brick process is spawned for
the first brick, there isn't enough time for the corresponding pid
file to get populated with a value before the compatibilty check is
made for the next brick.

This commit solves this by iteratively waiting for the pidfile to be
populated in the brick compatibility comparison stage before checking
if the brick process is alive.

Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4
BUG: 1451248
Signed-off-by: Samikshan Bairagya &lt;samikshan@gmail.com&gt;
Reviewed-on: https://review.gluster.org/17307
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>glusterd: remove useless options from glusterd's volume set table</title>
<updated>2017-05-17T12:31:11+00:00</updated>
<author>
<name>Zhou Zhengping</name>
<email>johnzzpcrystal@gmail.com</email>
</author>
<published>2017-05-09T04:13:37+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=be14d189360d09f8e10e6b75326df6d1db306467'/>
<id>be14d189360d09f8e10e6b75326df6d1db306467</id>
<content type='text'>
	These options will cause brick's log complains:
_log_if_unknown_option] 0-patchy-quota: option 'timeout' is not recognized
_log_if_unknown_option] 0-patchy-server: option 'ping-timeout' is not recognized

Change-Id: Ida2add13f792736a4e52bfaf38d1169309283a3f
BUG: 1449008
Signed-off-by: Zhou Zhengping &lt;johnzzpcrystal@gmail.com&gt;
Reviewed-on: https://review.gluster.org/17213
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Prashanth Pai &lt;ppai@redhat.com&gt;
Reviewed-by: Jeff Darcy &lt;jeff@pl.atyp.us&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
	These options will cause brick's log complains:
_log_if_unknown_option] 0-patchy-quota: option 'timeout' is not recognized
_log_if_unknown_option] 0-patchy-server: option 'ping-timeout' is not recognized

Change-Id: Ida2add13f792736a4e52bfaf38d1169309283a3f
BUG: 1449008
Signed-off-by: Zhou Zhengping &lt;johnzzpcrystal@gmail.com&gt;
Reviewed-on: https://review.gluster.org/17213
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Prashanth Pai &lt;ppai@redhat.com&gt;
Reviewed-by: Jeff Darcy &lt;jeff@pl.atyp.us&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>glusterd: Make reset-brick work correctly if brick-mux is on</title>
<updated>2017-05-10T18:58:21+00:00</updated>
<author>
<name>Samikshan Bairagya</name>
<email>samikshan@gmail.com</email>
</author>
<published>2017-04-24T16:30:17+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=74383e3ec6f8244b3de9bf14016452498c1ddcf0'/>
<id>74383e3ec6f8244b3de9bf14016452498c1ddcf0</id>
<content type='text'>
Reset brick currently kills of the corresponding brick process.
However, with brick multiplexing enabled, stopping the brick
process would render all bricks attached to it unavailable. To
handle this correctly, we need to make sure that the brick process
is terminated only if brick-multiplexing is disabled. Otherwise,
we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
brick process to detach the brick that is to be reset.

Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
BUG: 1446172
Signed-off-by: Samikshan Bairagya &lt;samikshan@gmail.com&gt;
Reviewed-on: https://review.gluster.org/17128
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reset brick currently kills of the corresponding brick process.
However, with brick multiplexing enabled, stopping the brick
process would render all bricks attached to it unavailable. To
handle this correctly, we need to make sure that the brick process
is terminated only if brick-multiplexing is disabled. Otherwise,
we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
brick process to detach the brick that is to be reset.

Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
BUG: 1446172
Signed-off-by: Samikshan Bairagya &lt;samikshan@gmail.com&gt;
Reviewed-on: https://review.gluster.org/17128
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>glusterd: socketfile &amp; pidfile related fixes for brick multiplexing feature</title>
<updated>2017-05-09T01:30:01+00:00</updated>
<author>
<name>Mohit Agrawal</name>
<email>moagrawa@redhat.com</email>
</author>
<published>2017-05-08T13:59:22+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=21c7f7baccfaf644805e63682e5a7d2a9864a1e6'/>
<id>21c7f7baccfaf644805e63682e5a7d2a9864a1e6</id>
<content type='text'>
Problem: While brick-muliplexing is on after restarting glusterd, CLI is
         not showing pid of all brick processes in all volumes.

Solution: While brick-mux is on all local brick process communicated through one
          UNIX socket but as per current code (glusterd_brick_start) it is trying
          to communicate with separate UNIX socket for each volume which is populated
          based on brick-name and vol-name.Because of multiplexing design only one
          UNIX socket is opened so it is throwing poller error and not able to
          fetch correct status of brick process through cli process.
          To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
          that will call by glusterd_brick_start to validate about the existence of socketpath.
          To avoid the continuous EPOLLERR erros in  logs update socket_connect code.

Test:     To reproduce the issue followed below steps
          1) Create two distributed volumes(dist1 and dist2)
          2) Set cluster.brick-multiplex is on
          3) kill glusterd
          4) run command gluster v status
          After apply the patch it shows correct pid for all volumes

BUG: 1444596
Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
Signed-off-by: Mohit Agrawal &lt;moagrawa@redhat.com&gt;
Reviewed-on: https://review.gluster.org/17101
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Prashanth Pai &lt;ppai@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: While brick-muliplexing is on after restarting glusterd, CLI is
         not showing pid of all brick processes in all volumes.

Solution: While brick-mux is on all local brick process communicated through one
          UNIX socket but as per current code (glusterd_brick_start) it is trying
          to communicate with separate UNIX socket for each volume which is populated
          based on brick-name and vol-name.Because of multiplexing design only one
          UNIX socket is opened so it is throwing poller error and not able to
          fetch correct status of brick process through cli process.
          To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
          that will call by glusterd_brick_start to validate about the existence of socketpath.
          To avoid the continuous EPOLLERR erros in  logs update socket_connect code.

Test:     To reproduce the issue followed below steps
          1) Create two distributed volumes(dist1 and dist2)
          2) Set cluster.brick-multiplex is on
          3) kill glusterd
          4) run command gluster v status
          After apply the patch it shows correct pid for all volumes

BUG: 1444596
Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
Signed-off-by: Mohit Agrawal &lt;moagrawa@redhat.com&gt;
Reviewed-on: https://review.gluster.org/17101
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Prashanth Pai &lt;ppai@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fixes quota aux mount failure</title>
<updated>2017-05-08T06:15:55+00:00</updated>
<author>
<name>Sanoj Unnikrishnan</name>
<email>sunnikri@redhat.com</email>
</author>
<published>2017-03-22T09:32:12+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=2ae4b4058691b324535d802f4e6d24cce89a10e5'/>
<id>2ae4b4058691b324535d802f4e6d24cce89a10e5</id>
<content type='text'>
The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1433906
Signed-off-by: Sanoj Unnikrishnan &lt;sunnikri@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16938
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Manikandan Selvaganesh &lt;manikandancs333@gmail.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The aux mount is created on the first limit/remove_limit/list command
and it remains until volume is stopped / deleted / (quota is disabled)
, where we do a lazy unmount. If the process is uncleanly terminated,
then the mount entry remains and we get (Transport disconnected) error
on subsequent attempts to run quota list/limit-usage/remove commands.

Second issue, There is also a risk of inadvertent rm -rf on the
/var/run/gluster causing data loss for the user. Ideally, /var/run is
a temp path for application use and should not cause any data loss to
persistent storage.

Solution:
1) unmount the aux mount after each use.
2) clean stale mount before mounting, if any.

One caveat with doing mount/unmount on each command is that we cannot
use same mount point for both list and limit commands.
The reason for this is that list command needs mount to be accessible
in cli after response from glusterd, So it could be unmounted by a
limit command if executed in parallel (had we used same mount point)
Hence we use separate mount points for list and limit commands.

Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
BUG: 1433906
Signed-off-by: Sanoj Unnikrishnan &lt;sunnikri@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16938
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Manikandan Selvaganesh &lt;manikandancs333@gmail.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Raghavendra G &lt;rgowdapp@redhat.com&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>glusterd : Disallow peer detach if snapshot bricks exist on it</title>
<updated>2017-04-01T01:53:10+00:00</updated>
<author>
<name>Gaurav Yadav</name>
<email>gyadav@redhat.com</email>
</author>
<published>2017-03-16T09:26:39+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=1c92f83ec041176ad7c42ef83525cda7d3eda3c5'/>
<id>1c92f83ec041176ad7c42ef83525cda7d3eda3c5</id>
<content type='text'>
Problem :
- Deploy gluster on 2 nodes, one brick each, one volume replicated
- Create a snapshot
- Lose one server
- Add a replacement peer and new brick with a new IP address
- replace-brick the missing brick onto the new server
  (wait for replication to finish)
- peer detach the old server
- after doing above steps, glusterd fails to restart.

Solution:
  With the fix detach peer will populate an error : "N2 is part of
  existing snapshots. Remove those snapshots before proceeding".
  While doing so we force user to stay with that peer or to delete
  all snapshots.

Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c
BUG: 1322145
Signed-off-by: Gaurav Yadav &lt;gyadav@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16907
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem :
- Deploy gluster on 2 nodes, one brick each, one volume replicated
- Create a snapshot
- Lose one server
- Add a replacement peer and new brick with a new IP address
- replace-brick the missing brick onto the new server
  (wait for replication to finish)
- peer detach the old server
- after doing above steps, glusterd fails to restart.

Solution:
  With the fix detach peer will populate an error : "N2 is part of
  existing snapshots. Remove those snapshots before proceeding".
  While doing so we force user to stay with that peer or to delete
  all snapshots.

Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c
BUG: 1322145
Signed-off-by: Gaurav Yadav &lt;gyadav@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16907
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rpc: bump up conn-&gt;cleanup_gen in rpc_clnt_reconnect_cleanup</title>
<updated>2017-03-20T23:34:16+00:00</updated>
<author>
<name>Atin Mukherjee</name>
<email>amukherj@redhat.com</email>
</author>
<published>2017-03-18T10:59:10+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=39e09ad1e0e93f08153688c31433c38529f93716'/>
<id>39e09ad1e0e93f08153688c31433c38529f93716</id>
<content type='text'>
Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1433578
Signed-off-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16914
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Jeff Darcy &lt;jeff@pl.atyp.us&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.

Peer probing an invalid host/IP was the basic test to catch this issue.

Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1433578
Signed-off-by: Atin Mukherjee &lt;amukherj@redhat.com&gt;
Reviewed-on: https://review.gluster.org/16914
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Jeff Darcy &lt;jeff@pl.atyp.us&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>TESTS/TIER: bug-1303028-Rebalance-glusterd-rpc-connection-issue.t</title>
<updated>2017-02-23T11:50:21+00:00</updated>
<author>
<name>hari gowtham</name>
<email>hgowtham@redhat.com</email>
</author>
<published>2017-02-22T09:49:20+00:00</published>
<link rel='alternate' type='text/html' href='http://git.gluster.org/cgit/glusterfs.git/commit/?id=a88cb01c78c9a13d2879917b54ead26dce8cc63c'/>
<id>a88cb01c78c9a13d2879917b54ead26dce8cc63c</id>
<content type='text'>
PROBLEM: spurious failure of the test.

CAUSE: the function "rebalance_run_time" calculates the total time
the tier has been running for. this being a test case, the run time
of tier can be 0 and when the function adds up zero it results in
zero. and thus it starts to fail.

FIX: Give it some time for the function to add up the values.

Signed-off-by: hari gowtham &lt;hgowtham@redhat.com&gt;

Change-Id: Ie270f3f3c8942081cca85dc49ef8fec76f3a261a
BUG: 1425743
Reviewed-on: https://review.gluster.org/16711
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Tested-by: hari gowtham &lt;hari.gowtham005@gmail.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Shyamsundar Ranganathan &lt;srangana@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PROBLEM: spurious failure of the test.

CAUSE: the function "rebalance_run_time" calculates the total time
the tier has been running for. this being a test case, the run time
of tier can be 0 and when the function adds up zero it results in
zero. and thus it starts to fail.

FIX: Give it some time for the function to add up the values.

Signed-off-by: hari gowtham &lt;hgowtham@redhat.com&gt;

Change-Id: Ie270f3f3c8942081cca85dc49ef8fec76f3a261a
BUG: 1425743
Reviewed-on: https://review.gluster.org/16711
Smoke: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Tested-by: hari gowtham &lt;hari.gowtham005@gmail.com&gt;
NetBSD-regression: NetBSD Build System &lt;jenkins@build.gluster.org&gt;
CentOS-regression: Gluster Build System &lt;jenkins@build.gluster.org&gt;
Reviewed-by: Shyamsundar Ranganathan &lt;srangana@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
