summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src/glusterd-utils.c
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: Free up svc->conn on volume deleteAtin Mukherjee2017-12-071-0/+4
| | | | | | | | | | | | Daemons like snapd, tierd and gfproxyd are maintained on per volume basis and on a volume delete we should destroy the rpc connection established for them. >mainline patch : https://review.gluster.org/#/c/18957 Change-Id: Id1440e39da07b990fdb9b207df18da04b1ca8014 BUG: 1523050 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: persist brickinfo's port change into glusterd's storeGaurav Yadav2017-10-311-0/+19
| | | | | | | | | | | | | | | | | | Problem: Consider a case where node reboot is performed and prior to reboot brick was listening to 49153. Post reboot glusterd assigned 49152 to brick and started the brick process but the new port was never persisted. Now when glusterd restarts glusterd always read the port from its persisted store i.e 49153 however pmap signin happens with the correct port i.e 49152. Fix: Make sure when glusterd_brick_start is called, glusterd_store_volinfo is eventually invoked. Change-Id: Ic0efbd48c51d39729ed951a42922d0e59f7115a1 BUG: 1507752 Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
* glusterd: Gluster should keep PID file in correct locationGaurav Kumar Garg2017-10-251-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently Gluster keeps process pid information of all the daemons and brick processes in Gluster configuration file directory (ie., /var/lib/glusterd/*). These pid files should be seperate from configuration files. Deletion of the configuration file directory might result into serious problems. Also, /var/run/gluster is the default placeholder directory for pid files. So, with this fix Gluster will keep all process pid information of all processes in /var/run/gluster/* directory. > Change-Id: Idb09e3fccb6a7355fbac1df31082637c8d7ab5b4 > BUG: 1258561 > Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> > Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> > Reviewed-on: https://review.gluster.org/13580 > Tested-by: MOHIT AGRAWAL <moagrawa@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > (Cherry pick from commit 220d406ad13d840e950eef001a2b36f87570058d) BUG: 1491059 Change-Id: Idb09e3fccb6a7355fbac1df31082637c8d7ab5b4 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* glusterd : glusterd fails to start when peer's network interface is downGaurav Yadav2017-09-171-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: glusterd fails to start on nodes where glusterd tries to come up even before network is up. Fix: On startup glusterd tries to resolve brick path which is based on hostname/ip, but in the above scenario when network interface is not up, glusterd is not able to resolve the brick path using ip_address or hostname With this fix glusterd will use UUID to resolve brick path. >Reviewed-on: https://review.gluster.org/17813 >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Prashanth Pai <ppai@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Change-Id: Icfa7b2652417135530479d0aa4e2a82b0476f710 BUG: 1482857 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/18063 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: Eliminate race in brick compatibility checking stageSamikshan Bairagya2017-05-311-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In https://review.gluster.org/17307/, while looking for compatible bricks for multiplexing, it is checked if the brick pidfile exists before checking if the corresponding brick process is running. However checking if the brick process is running just after checking if the pidfile exists isn't enough since there might be race conditions where the pidfile has been created but hasn't been updated with a pid value yet. This commit solves that by making sure that we wait iteratively till the pid value is updated as well. > Reviewed-on: https://review.gluster.org/17375 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit a8624b8b13a1f4222e4d3e33fa5836d7b45369bc) Change-Id: Ib7a158f95566486f7c1f84b6357c9b89e4c797ae BUG: 1453087 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17425 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* glusterd: Don't spawn new glusterfsds on node reboot with brick-muxSamikshan Bairagya2017-05-301-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With brick multiplexing enabled, upon a node reboot new bricks were not being attached to the first spawned brick process even though there wasn't any compatibility issues. The reason for this is that upon glusterd restart after a node reboot, since brick services aren't running, glusterd starts the bricks in a "no-wait" mode. So after a brick process is spawned for the first brick, there isn't enough time for the corresponding pid file to get populated with a value before the compatibilty check is made for the next brick. This commit solves this by iteratively waiting for the pidfile to be populated in the brick compatibility comparison stage before checking if the brick process is alive. > Reviewed-on: https://review.gluster.org/17307 > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 13e7b3b354a252ad4065f7b2f0f805c40a3c5d18) Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4 BUG: 1453087 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17352 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* Fixes quota aux mount failureSanoj Unnikrishnan2017-05-131-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aux mount is created on the first limit/remove_limit/list command and it remains until volume is stopped / deleted / (quota is disabled) , where we do a lazy unmount. If the process is uncleanly terminated, then the mount entry remains and we get (Transport disconnected) error on subsequent attempts to run quota list/limit-usage/remove commands. Second issue, There is also a risk of inadvertent rm -rf on the /var/run/gluster causing data loss for the user. Ideally, /var/run is a temp path for application use and should not cause any data loss to persistent storage. Solution: 1) unmount the aux mount after each use. 2) clean stale mount before mounting, if any. One caveat with doing mount/unmount on each command is that we cannot use same mount point for both list and limit commands. The reason for this is that list command needs mount to be accessible in cli after response from glusterd, So it could be unmounted by a limit command if executed in parallel (had we used same mount point) Hence we use separate mount points for list and limit commands. > Reviewed-on: https://review.gluster.org/16938 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > (cherry picked from commit 2ae4b4058691b324535d802f4e6d24cce89a10e5) Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0 BUG: 1449779 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com> Reviewed-on: https://review.gluster.org/17241 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* glusterd: Make reset-brick work correctly if brick-mux is onSamikshan Bairagya2017-05-121-11/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reset brick currently kills of the corresponding brick process. However, with brick multiplexing enabled, stopping the brick process would render all bricks attached to it unavailable. To handle this correctly, we need to make sure that the brick process is terminated only if brick-multiplexing is disabled. Otherwise, we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective brick process to detach the brick that is to be reset. > Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> > Reviewed-on: https://review.gluster.org/17128 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 74383e3ec6f8244b3de9bf14016452498c1ddcf0) Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58 BUG: 1449934 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17253 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: cleanup pidfile on pmap signoutAtin Mukherjee2017-05-101-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | This patch ensures 1. brick pidfile is cleaned up on pmap signout 2. pmap signout evemt is sent for all the bricks when a brick process shuts down. >Reviewed-on: https://review.gluster.org/17168 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> >(cherry picked from commit 3d35e21ffb15713237116d85711e9cd1dda1688a) Change-Id: I7606a60775b484651d4b9743b6037b40323931a2 BUG: 1449002 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17209 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: socketfile & pidfile related fixes for brick multiplexing featureMohit Agrawal2017-05-101-25/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes > BUG: 1444596 > Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > Reviewed-on: https://review.gluster.org/17101 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Prashanth Pai <ppai@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > (cherry picked from commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6) Change-Id: I1892c80b9ffa93974f20c92d421660bcf93c4cda BUG: 1449002 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17210 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com>
* glusterd/geo-rep: Fix snapshot create in geo-rep setupKotresh HR2017-05-021-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterd persists geo-rep sessions in glusterd info file which is represented by dictionary 'volinfo->gsync_slaves' in memory. Glusterd also maintains in memory active geo-rep sessions in dictionary 'volinfo->gsync_active_slaves' whose key is "<slave_url>::<slavhost>". When glusterd is restarted while the geo-rep sessions are active, it builds the 'volinfo->gsync_active_slaves' from persisted glusterd info file. Since slave volume uuid is added to "voinfo->gsync_slaves" with the commit "http://review.gluster.org/13111", it builds it with key "<slave_url>::<slavehost>:<slavevol_uuid>" which is wrong. So during snapshot pre-validation which checks whether geo-rep is active or not, it always says it is ACTIVE, as geo-rep stop would not deleted this key. Fixed the same in this patch. > BUG: 1443977 > Signed-off-by: Kotresh HR <khiremat@redhat.com> > Reviewed-on: https://review.gluster.org/17093 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit f071d2a285ea4802fe8f328f9f275180983fbbba) Change-Id: I185178910b4b8a62e66aba406d88d12fabc5c122 BUG: 1445209 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17108 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* glusterd: fix glusterd_wait_for_blockers to go in infinite loopAtin Mukherjee2017-04-261-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | In send_attach_req () conf->blockers is bumped up before rpc_clnt_submit however the same is bumped down twice, one from the callback and one from the negative ret handling which can very well be a possible case if the rpc submit fails. >Reviewed-on: https://review.gluster.org/17055 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> >(cherry picked from commit 090c8866eb3ae174be50dec8d9d5ecf978d18a45) Change-Id: Icb820694034cbfcb3d427911e192ac4a0f4540f6 BUG: 1445408 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17117 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: set conn->reconnect to null on timer cancellationAtin Mukherjee2017-04-241-0/+1
| | | | | | | | | | | | | | | | | | >Reviewed-on: https://review.gluster.org/17088 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> >(cherry picked from commit 98dc1f08c114adea1f4133c12dff0d4c3d75b30d) Change-Id: Ic48e6652f431daeb0db027660f6c9de16d893f08 BUG: 1444128 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17095 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: hold off volume deletes while still restarting bricksJeff Darcy2017-04-241-11/+28
| | | | | | | | | | | | | | | | | | | | | We need to do this because modifying the volume/brick tree while glusterd_restart_bricks is still walking it can lead to segfaults. Without waiting we could accidentally "slip in" while attach_brick has released big_lock between retries and make such a modification. Backport of: > Commit a7ce0548b7969050644891cd90c0bf134fa1594c > BUG: 1432542 > Reviewed-on: https://review.gluster.org/16927 Change-Id: I30ccc4efa8d286aae847250f5d4fb28956a74b03 BUG: 1441476 Signed-off-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-on: https://review.gluster.org/17044 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: support filesystems with dynamic inode sizesNiels de Vos2017-03-281-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs and zfs are two filesystems that do not have fixed sizes for inodes. Instead of logging an error, skip checking and mark the size as "N/A" like other properties that can not be reported. The error message that was reported by users on the mailinglist shows up like: [glusterd-utils.c:5458:glusterd_add_inode_size_to_dict] 0-management: could not find (null) to getinode size for /dev/vdb (btrfs): (null) package missing? Cherry picked from commit 12921693b572f642156d3167d1c92d3449dfc8ec: > Change-Id: Ib10b7a3669f2f4221075715d9fd44ce1ffc35324 > Reported-by: Arman Khalatyan <arm2arm@gmail.com> > URL: http://lists.gluster.org/pipermail/gluster-users/2017-March/030189.html > BUG: 1433425 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: https://review.gluster.org/16867 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > Reviewed-by: Prashanth Pai <ppai@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Change-Id: Ib10b7a3669f2f4221075715d9fd44ce1ffc35324 Reported-by: Arman Khalatyan <arm2arm@gmail.com> BUG: 1436411 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16959 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* glusterd: don't queue attach reqs before connectingJeff Darcy2017-03-101-11/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was causing USS tests to fail. The underlying problem here is that if we try to queue the attach request too soon after starting a brick process then the socket code will get an error trying to write to the still-unconnected socket. Its response is to shut down the socket, which causes the queued attach requests to be force-unwound. There's nothing to retry them, so they effectively never happen and those bricks (second and succeeding for a snapshot) never become available. We *do* have a retry loop for attach requests, but currently break out as soon as a request is queued - not actually sent. The fix is to modify that loop so it will wait some more if the rpc connection isn't even complete yet. Now we break out only when we have a completed connection *and* a queued request. Backport of: > 53e2c875cf97df8337f7ddb5124df2fc6dd37bca > BUG: 1430148 > Reviewed-on: https://review.gluster.org/16868 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> BUG: 1431176 Change-Id: Ib6be13646f1fa9072b4a944ab5f13e1b29084841 Reviewed-on: https://review.gluster.org/16887 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* glusterd: take conn->lock around operations on conn->reconnectJeff Darcy2017-03-101-1/+2
| | | | | | | | | | | | | | | | | | | | | Failure to do this could lead to a race in which a timer would be removed twice concurrently, corrupting the timer list (because gf_timer_call_cancel has no internal protection against this) and possibly causing a crash. Backport of: > 4e0d4b15717da1f6466133158a26927fb91384b8 > BUG: 1421721 > Reviewed-on: https://review.gluster.org/16662 Change-Id: Ic1a8b612d436daec88fd6cee935db0ae81a47d5c BUG: 1431175 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16885 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: unref brickinfo object on volume stopAtin Mukherjee2017-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | If brick multiplexing is enabled, on a volume stop glusterd was not unrefing the brickinfo rpc object which lead to a flood of stale rpc logs. >Reviewed-on: https://review.gluster.org/16699 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> >(cherry picked from commit 9cdfbdced23cd43b8738636a3ed906c8d4267d67) Change-Id: I18fedcd6921042ef2e945605466194b7b53fe2f7 BUG: 1425556 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16703 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* glusterd: add a cli command to trigger a statedump on a clientPoornima G2017-02-131-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this, we will be able to trigger statedumps on remote Gluster clients, mainly targetted for applications using libgfapi. Design: SIGUSR signal is the most comman way of taking a statedump in Gluster. But it cannot be used for libgfapi based processes, as the process loading the library might have already consumed SIGUSR signal. Hence going by the command way. One has to issue a Gluster command to initiate a statedump on the libgfapi based client. The command takes hostname and PID as an argument. All the glusterds in the cluster, check if they are connected to the specified hostname, and send an RPC request to all the connected clients from that hostname (via the mgmt connection). > URL: http://review.gluster.org/16357 > BUG: 1169302 > Signed-off-by: Poornima G <pgurusid@redhat.com> > [ndevos: minor fixes and split patch in smaller pieces] > Reviewed-on-master: https://review.gluster.org/9228 > Reviewed-by: Niels de Vos <ndevos@redhat.com> > Tested-by: Niels de Vos <ndevos@redhat.com> > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> > Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> BUG: 1418981 Change-Id: Icbe4d2f026b32a2c7d5535e1bfb2cdaaff042e91 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: https://review.gluster.org/16601 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: keep snapshot bricks separate from regular onesJeff Darcy2017-02-101-82/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | The problem here is that a volume's transport options can change, but any snapshots' bricks don't follow along even though they're now incompatible (with respect to multiplexing). This was causing the USS+SSL test to fail. By keeping the snapshot bricks separate (though still potentially multiplexed with other snapshot bricks including those for other volumes) we can ensure that they remain unaffected by changes to their parent volumes. Also fixed various issues with how the test waits (or more precisely didn't) for various events to complete before it continues. Backport of: > Change-Id: Iab4a8a44fac5760373fac36956a3bcc27cf969da > BUG: 1385758 > Reviewed-on: https://review.gluster.org/16544 ` Change-Id: I91c73e3fdf20d23bff15fbfcc03a8a1922acec27 BUG: 1418091 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16597 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* glusterd: ignore return code of glusterd_restart_bricksAtin Mukherjee2017-02-101-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When GlusterD is restarted on a multi node cluster, while syncing the global options from other GlusterD, it checks for quorum and based on which it decides whether to stop/start a brick. However we handle the return code of this function in which case if we don't want to start any bricks the ret will be non zero and we will end up failing the import which is incorrect. Fix is just to ignore the ret code of glusterd_restart_bricks () >Reviewed-on: https://review.gluster.org/16574 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> >(cherry picked from commit 55625293093d485623f3f3d98687cd1e2c594460) Change-Id: I37766b0bba138d2e61d3c6034bd00e93ba43e553 BUG: 1420991 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16593 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* glusterd : do not load io-threads in client graph for replicate volumesAtin Mukherjee2017-02-061-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | client.io-threads has been turned on by default from release-3.9 onwards, however this has an adverse effects on replicate volumes due to the design limitations on replications, till that gets addressed through server side replication as a preventive measure it is wiser not to load io-threads in the client graph for replicate volumes. >Reviewed-on: https://review.gluster.org/16502 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Prashanth Pai <ppai@redhat.com> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: Ibc576d4517da23fcdf55c6f4d17b90152a8817d7 BUG: 1419305 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16545 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* core: run many bricks within one glusterfsd processJeff Darcy2017-02-011-51/+562
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Backport of: > Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb > BUG: 1385758 > Reviewed-on: https://review.gluster.org/14763 Change-Id: I4bce9080f6c93d50171823298fdf920258317ee8 BUG: 1418091 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16496 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* glusterd: daemon restart logic should adhere server side quorumAtin Mukherjee2017-01-301-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | Just like brick processes, other daemon services should also follow the same logic of quorum checks to see if a particular service needs to come up if glusterd is restarted or the incoming friend add/update request is received (in glusterd_restart_bricks () function) >Reviewed-on: https://review.gluster.org/15626 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Prashanth Pai <ppai@redhat.com> Change-Id: I54a1fbdaa1571cc45eed627181b81463fead47a3 BUG: 1417042 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16472 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Prashanth Pai <ppai@redhat.com>
* dht/rebalance Estimate time to complete rebalanceN Balachandran2017-01-241-2/+45
| | | | | | | | | | | | | | | | | | | | | | | | | The estimates will be logged to the rebalance log on running gluster v rebalance <vol> status > Change-Id: I9d51b139cd4c8dfde1ff2c2050720ae606c13fc6 > BUG: 1396004 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: http://review.gluster.org/15893 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 2edd75ec8de17da89004859375844f60890a4df0) Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I58b0550210149443966798d9a26e73cb598eeb6a BUG: 1415915 Reviewed-on: https://review.gluster.org/16458 Tested-by: N Balachandran <nbalacha@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Change the volfile to have readdir-ahead as a childPoornima G2017-01-171-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | of dht As mentioned in feature page http://review.gluster.org/#/c/16090/ readdir-ahead will be optionally placed below dht. There are two options: 1. performance.readdir-ahead 2. performance.parallel-readdir If only option is enabled, then readdir ahead is placed at its original place as an ancestor of dht. If both the options 1 and 2 are enabled then readdir ahead is placed as a child of dht. Also changes have been made to retain the rebalance, quotad, snapd vol files to remain unchanged. Change-Id: I0adf0b476fcbf91251f5a2fee2241786a3d8255a BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16072 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tier : Tier as a servicehari gowtham2017-01-161-72/+352
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tierd is implemented by separating from rebalance process. The commands affected: 1) Attach tier will trigger this process instead of old one 2) tier start and tier start force will also trigger this process. 3) volume status [tier] will show tier daemon as a process instead of task and normal tier status and tier detach status works. 4) tier stop implemented. 5) detach tier implemented separately along with new detach tier status 6) volume tier volname status will work using the changes. 7) volume set works This patch has separated the tier translator from the legacy DHT rebalance code. It now sends the RPCs from the CLI to glusterd separate to the DHT rebalance code. The daemon is now a service, similar to the snapshot daemon, and can be viewed using the volume status command. The code for the validation and commit phase are the same as the earlier tier validation code in DHT rebalance. The “brickop” phase has been changed so that the status command can use this framework. The service management framework is now used. DHT rebalance does not use this framework. This service framework takes care of : *) spawning the daemon, killing it and other such processes. *) volume set options , which are written on the volfile. *) restart and reconfigure functions. Restart is to restart the daemon at two points 1)after gluster goes down and comes up. 2) to stop detach tier. *) reconfigure is used to make immediate volfile changes. By doing this, we don’t restart the daemon. it has the code to rewrite the volfile for topological changes too (which comes into place during add and remove brick). With this patch the log, pid, and volfile are separated and put into respective directories. Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399 BUG: 1313838 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13365 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: remove extra space in error messageMichael Adam2017-01-161-2/+2
| | | | | | | | | | | | | BUG: 1402237 Change-Id: Ib6efca655555a92a0542ef6056f3357f390eeb38 Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-on: http://review.gluster.org/16048 Tested-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* glusterd: Get maximum supported op-version in a clusterSamikshan Bairagya2017-01-081-1/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gluster volume get <VOLNAME> cluster.opversion gives us the current op-version on which the cluster is operating. There is no command that lets the user know the maximum supported op-version that the cluster can run on. This patch adds a new global option cluster.max-op-version, that can be used to retrieve the maximum supported op-version in a cluster. Usage: # gluster volume get all cluster.max-op-version Example output: Option Value ------ ----- cluster.max-op-version 30900 NOTE: The only way to test this feature for now is to set the GD_OP_VERSION_MAX macro to different values (30800 for 3.8,30900 for 3.9, and so on) and rebuild glusterd. Since the regression test framework currently doesn't have support to simulate these tests, there are no accompanying regression tests for this feature. It should be possible to add tests once glusto comes in and makes it easier to run a heterogeneous cluster. Change-Id: I547480ee5e7912664784643e436feb198b6d16d0 BUG: 1365822 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/16283 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd, cli: Get global options through volume get functionalitySamikshan Bairagya2016-12-301-0/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently it is not possible to retrieve values of global options by using the 'gluster volume get' functionality if there are no volumes present. In order to get the global options one has to use 'gluster volume get' with a specific volume name. This usage makes the illusion as though the option is set only on one volume, which is incorrect. When setting the global options, 'gluster volume set' provides a way to set them using the volume name as 'all'. Similarly, retrieving the global options should be made possible by using the volume name 'all' with the 'gluster volume get' functionality. This patch adds that functionality to 'volume get' Usage: # gluster volume get all <OPTION/all> Change-Id: Ic2fdb9eda69d4806d432dae26d117d9660fe6d4e BUG: 1378842 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/15563 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* socket: socket disconnect should wait for poller thread exitRajesh Joseph2016-12-211-1/+2
| | | | | | | | | | | | | | | | | | | | | When SSL is enabled or if "transport.socket.own-thread" option is set then socket_poller is run as different thread. Currently during disconnect or PARENT_DOWN scenario we don't wait for this thread to terminate. PARENT_DOWN will disconnect the socket layer and cleanup resources used by socket_poller. Therefore before disconnect we should wait for poller thread to exit. Change-Id: I71f984b47d260ffd979102f180a99a0bed29f0d6 BUG: 1404181 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/16141 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd : coverity fix for string overflowMuthu-vigneshwaran2016-12-061-2/+4
| | | | | | | | | | | | | | | | CID : 1357872, 1357873, 1351695 BUG: 789278 Change-Id: I2ee01a6054326f35de621ee7a1f2afd09c5738fe Signed-off-by: Muthu-vigneshwaran <mvignesh@redhat.com> Reviewed-on: http://review.gluster.org/15989 Tested-by: Muthu Vigneshwaran Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: fix few events generationAtin Mukherjee2016-11-231-5/+6
| | | | | | | | | | | | | | | | | | | | | This patch does the following: 1. Generate PEER_REJECT event if the peer add request is from an unknown peer during peer handshaking. 2. EVENT_COMPARE_FRIEND_VOLUME_FAILED should be generated based on status code, not ret. 3. Add EVENT_BRICKPATH_RESOLVE_FAILED event in case glusterd fails to resolve bricks, this is mainly at restore path. 4. Remove EVENT_BRICKS_START_FAILED event as we already have EVENT_BRICK_START_FAILED Change-Id: I90e5bc4a331166d0bb3554eb2ec9df2526837a1d BUG: 1397424 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15903 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
* events: Add FMT_WARN for gf_eventPranith Kumar K2016-11-181-10/+3
| | | | | | | | | | | | | | | | | Raghavendra G found that posix is trying to print %s but passing an int when HEALTH_CHECK fails in posix. These are the kind of bugs that should be caught at compilation itself. Also fixed the problematic gf_event() callers. BUG: 1386097 Change-Id: Id7bd6d9a9690237cec3ca1aefa2aac085e8a1270 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15671 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* snapshot: Fix for memory leaks in snapshot code pathAvra Sengupta2016-10-251-2/+15
| | | | | | | | | | | Change-Id: Idc2cb16574d166e3c0ee1f7c3a485f1acb19fc8c BUG: 1386088 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/15668 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* cli, glusterd: Address issues in get-state cli outputSamikshan Bairagya2016-10-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following data points: 1. Volume type 2. Peer state 3. List of other hostnames for a peer 4. Data unit information for rebalance The following data points are removed: 1. Mount options and filesystem types for bricks 2. global-option-version from list of global options The following data points are added: 1. Replica Count 2. Tier type for bricks belonging to hot/cold tier Change-Id: I5011250e863fdc4929b203cdb345d79b2f16c6a5 BUG: 1385839 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/15662 Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: set the brickinfo->port before spawning the bricksAtin Mukherjee2016-10-181-4/+8
| | | | | | | | | | | | | | | | | | | | | As of now, when glusterd spawns a brick process, post spawning, the brickinfo's port is set. The side effect of this is it opens up an window where the pmap_signin event can be initiated by the brick to glusterd and glusterd fails to update signed_in flag since the brickinfo port is still 0 and the comparison of port and brickinfo->port fails. As a solution, set the brickinfo->port post pmap_registry_alloc and if the brick spawn fails reset it to 0. This logic applies for rdma port too. Change-Id: I00a13d4c6d6809ebd19a972aa13e71ee5eac7e35 BUG: 1385575 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15655 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* xlators/glusterd: fix unused variable warnings/errorsKaleb S. KEITHLEY2016-09-131-2/+0
| | | | | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a "pragma leak" where the generated rpc/xdr headers have a pair of pragmas that disable these warnings. With the warnings disabled, many unused variables have crept into the code base. And 14085 won't pass its own smoke test until all these warnings are fixed. BUG: 1369124 Change-Id: I7ccaa4f1cc817fa81082cee83e99a2dc7e542e17 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15479 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd (utils): fix unused variable warnings/errorsKaleb S. KEITHLEY2016-09-121-39/+3
| | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a/the "leak" - via the generated rpc/xdr headers - of pragmas that mask these warnings. However 14085 won't pass the smoke test until all the warnings are fixed. Change-Id: I3efaad13cc4cbba7853ada20e838b6fe417ca2d6 BUG: 1369124 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15280 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd : Introduce reset brickAnuradha Talur2016-08-291-3/+324
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command basically allows replace brick with src and dst bricks as same. Usage: gluster v reset-brick <volname> <hostname:brick-path> start This command kills the brick to be reset. Once this command is run, admin can do other manual operations that they need to do, like configuring some options for the brick. Once this is done, resetting the brick can be continued with the following options. gluster v reset-brick <vname> <hostname:brick> <hostname:brick> commit {force} Does the job of resetting the brick. 'force' option should be used when the brick already contains volinfo id. Problem: On doing a disk-replacement of a brick in a replicate volume the following 2 scenarios may occur : a) there is a chance that reads are served from this replaced-disk brick, which leads to empty reads. b) potential data loss if next writes succeed only on replaced brick, and heal is done to other bricks from this one. Solution: After disk-replacement, make sure that reset-brick command is run for that brick so that pending markers are set for the brick and it is not chosen as source for reads and heal. But, as of now replace-brick for the same brick-path is not allowed. In order to fix the above mentioned problem, same brick-path replace-brick is needed. With this patch reset-brick commit {force} will be allowed even when source and destination <hostname:brickpath> are identical as long as 1) destination brick is not alive 2) source and destination brick have the same brick uuid and path. Also, the destination brick after replace-brick will use the same port as the source brick. Change-Id: I440b9e892ffb781ea4b8563688c3f85c7a7c89de BUG: 1266876 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12250 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd/cli: cli to get local state representation from glusterdSamikshan Bairagya2016-08-261-0/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is no existing CLI that can be used to get the local state representation of the cluster as maintained in glusterd in a readable as well as parseable format. The CLI added has the following usage: # gluster get-state [daemon] [odir <path/to/output/dir>] [file <filename>] This would dump data points that reflect the local state representation of the cluster as maintained in glusterd (no other daemons are supported as of now) to a file inside the specified output directory. The default output directory and filename is /var/run/gluster and glusterd_state_<timestamp> respectively. The option for specifying the daemon name leaves room to add support for other daemons in the future. Following are the data points captured as of now to represent the state from the local glusterd pov: * Peer: - Primary hostname - uuid - state - connection status - List of hostnames * Volumes: - name, id, transport type, status - counts: bricks, snap, subvol, stripe, arbiter, disperse, redundancy - snapd status - quorum status - tiering related information - rebalance status - replace bricks status - snapshots * Bricks: - Path, hostname (for all bricks these info will be shown) - port, rdma port, status, mount options, filesystem type and signed in status for bricks running locally. * Services: - name, online status for initialised services * Others: - Base port, last allocated port - op-version - MYUUID Change-Id: I4a45cc5407ab92d8afdbbd2098ece851f7e3d618 BUG: 1353156 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/14873 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Revert "glusterd-ganesha : copy ganesha export configuration files during ↵Jiffin Tony Thottan2016-08-261-32/+0
| | | | | | | | | | | | | | | | | | | | | reboot" This reverts commit f71e2fa49af185779b9f43e146effd122d4e9da0. Reason: As part of sync up node reboot this patch copies ganesha export conf file from a source node. This change is no more require if the export files are available in shared storage. Change-Id: Id9c1ae78377bbd7d5d80aa1c14f534e30feaae97 BUG: 1355956 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/14907 Reviewed-by: soumya k <skoduri@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* glusterd: add async events (part 2)Atin Mukherjee2016-08-231-11/+49
| | | | | | | | | | | Change-Id: I7a5687143713c283f0051aac2383f780e3e43646 BUG: 1360809 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15153 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
* glusterd: Fix volume restart issue upon glusterd restartSamikshan Bairagya2016-08-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | http://review.gluster.org/#/c/14758/ introduces a check in glusterd_restart_bricks that makes sure that if server quorum is enabled and if the glusterd instance has been restarted, the bricks do not get started. This prevents bricks which have been brought down purposely, say for maintainence, from getting started upon a glusterd restart. However this change introduced regression for a situation that involves multiple volumes. The bricks from the first volume get started, but then for the subsequent volumes the bricks do not get started. This patch fixes that by setting the value of conf->restart_done to _gf_true only after bricks are started correctly for all volumes. Change-Id: I2c685b43207df2a583ca890ec54dcccf109d22c3 BUG: 1367478 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/15183 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: clean up old port and allocate new one on every restartAtin Mukherjee2016-08-031-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | GlusterD as of now was blindly assuming that the brick port which was already allocated would be available to be reused and that assumption is absolutely wrong. Solution : On first attempt, we thought GlusterD should check if the already allocated brick ports are free, if not allocate new port and pass it to the daemon. But with that approach there is a possibility that if PMAP_SIGNOUT is missed out, the stale port will be given back to the clients where connection will keep on failing. Now given the port allocation always start from base_port, if everytime a new port has to be allocated for the daemons, the port range will still be under control. So this fix tries to clean up old port using pmap_registry_remove () if any and then goes for pmap_registry_alloc () Change-Id: If54a055d01ab0cbc06589dc1191d8fc52eb2c84f BUG: 1221623 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15005 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Avra Sengupta <asengupt@redhat.com>
* changelog/rpc: Fix rpc_clnt_t mem leaksKotresh HR2016-07-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: 1. Freeing up rpc_clnt object might lead to crashes. Well, it was not a necessity to free rpc-clnt object till now because all the existing use cases needs to reconnect back on disconnects. Hence timer code was not taking ref on rpc-clnt object. Glusterd had some use-cases that led to crash due to ping-timer and they fixed only those code paths that involve ping-timer. Now, since changelog has an use-case where rpc-clnt need to be freed up, we need to fix timer code to take refs 2. In changelog, because of issue 1, only mydata was being freed which is incorrect. And there are races where rpc-clnt object would access the freed mydata which would lead to crashes. Since changelog xlator resides on brick side and is long living process, if multiple libgfchangelog consumers register to changelog and disconnect/reconnect mulitple times, it would result in leak of 'rpc-clnt' object for every connect/disconnect. SOLUTION: 1. Handle ref/unref of 'rpc_clnt' structure in timer functions properly. 2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT after disabling timers and free mydata on RPC_CLNT_DESTROY. RPC SETUP IN CHANGELOG: 1. changelog xlator initiates rpc server say 'changelog_rpc_server' 2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server' 3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server' 4. In return changelog_rpc_server initiates a rpc client and connects back to 'libgfchangelog_rpc_server' REF/UNREF HANDLING IN TIMER FUNCTIONS: Let's say rpc clnt refcount = 1 1. Take the ref before reigstering callback to timer queue >>>> rpc_clnt_ref (say ref count becomes = 2) 2. Register a callback to timer say 'callback1' 3. If register fails: >>>> rpc_clnt_unref (ref count = 1) 4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end in 'callback1'. This is corresponding to ref taken in step 1 >>>> rpc_clnt_unref (ref count = 1) 5. The cycle from step-1 to step-4 continues....until timer cancel event happens 6. timer cancel of say 'callback1' If timer cancel fails: Do nothing, Step-4 would have unrefd If timer cancel succeeds: >>>> rpc_clnt_unref (ref count = 1) Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09 BUG: 1316178 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/13658 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: fix compilation warningAtin Mukherjee2016-07-181-3/+4
| | | | | | | | | | | | | | | | | | | | glusterd-utils.c: In function 'glusterd_handle_replicate_brick_ops': glusterd-utils.c:11402:9: warning: this 'if' clause does not guard... [-Wmisleading-indentation] if (dict_get_str (THIS->options, "transport.socket.bind-address", ^~ glusterd-utils.c:11406:17: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if' snprintf (logfile, sizeof (logfile), Solution : indentation does the magic :) Change-Id: I887fcba69ba1e952cc635d939e636d69e227f8b8 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/14937 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Anuradha Talur <atalur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: Fix gsyncd upgrade issueKotresh HR2016-07-131-28/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: gluster upgrade is not generating new volfiles Cause: During upgrade, "glusterd --xlator-option *.upgrade=on -N" is run to generate new volfiles. It is run post 'glusterfs' rpm installation. The above command fails during upgrade if geo-replication is installed. This is because on glusterd start 'gsyncd' binary is called to configure geo-replication related stuff. Since 'glusterfs' rpm is installed prior to 'geo-rep' rpm, the 'gsyncd' binary used to glusterd upgrade command is of old version and hence it fails before generating new volfiles. Solution: Don't call geo-replication configure during upgrade/downgrade. Geo-replication configuration happens during start of glusterd after upgrade. Change-Id: Id58ea44ead9f69982f86fb68dc5b9ee3f6cd11a1 BUG: 1355628 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/14898 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* glusterd/geo-rep: Handle empty monitor.status during upgradeSaravanakumar Arumugam2016-07-111-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Consider geo-replication is in Stopped state. Following which, glusterfs is upgraded (where monitor.status is the new status file). Now, When geo-replication status command is run, empty monitor status file gets created. Now, if glusterd is restarted, it reads empty monitor status and starts geo-replication session. This is incorrect as session was in Stopped state earlier. Solution: If monitor status is empty, error out and avoid starting geo-replication session. Note: if monitor status is empty, geo-rep session is displayed as Stopped state. Change-Id: Ifb3db896e5ed92b927764cf1163503765cb08bb4 BUG: 1351071 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/14830 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* feature/bitrot: Show whether scrub is in progress/idleKotresh HR2016-07-111-0/+26
| | | | | | | | | | | | | | | | | Bitrot scrub status shows whether the scrub is paused or active. It doesn't show whether the scrubber is actually scrubbing or waiting in the timer wheel for the next schedule. This patch shows this status with "In Progress" and "Idle" respectively. Change-Id: I995d8553d1ff166503ae1e7b46282fc3ba961f0b BUG: 1352871 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/14864 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>