summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: fix crash on statedump when no volumes are startedAtin Mukherjee2017-06-151-12/+17
| | | | | | | | | | | | | | | | | pmap object is created when glusterd allocates a port for the very first time, however before that if someone tries to take statedump glusterd will crash. Solution : Add a NULL check before accessing pmap reference. Change-Id: I206b02e07a4717e68af2c6bf05fac55119353de8 BUG: 1461655 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17549 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: fix quorum calculation if percentage is not specified correctlyMichael Adam2017-06-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | There was an extra "ratio = _gf_true". - In case the ratio was specified correctly in the volfile, this is redundant. - In case the ratio was specified, but not parseable into a precentage, this is wrong and would lead to a quorum count of 0 instead of falling back to the default of 50% + 1. This patch removes the extra setting of "ratio". Change-Id: I2bd57ebf1b8989e905481a2b6285a1f422942f72 BUG: 1461129 Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-on: https://review.gluster.org/17538 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* Revert "glusterd: disallow rebalance & remove-brick on a sharded volume"Krutika Dhananjay2017-06-132-19/+0
| | | | | | | | | | | | | | | | | This reverts commit 8375b3d70d5c6268c6770b42a18b2e1bc09e411e. Now that some of the users have confirmed rebalance works fine without causing corruption of VMs, time to revert the CLI restriction. Change-Id: I45493fcbb1f25fd0fff27b2b3526c42642ccb464 BUG: 1460585 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17506 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* nl-cache: Fix a possible crash and stale cachePoornima G2017-06-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue1: Consider the followinf sequence of operations: ... nlc_ctx = nlc_ctx_get (inode i1) ....... -> nlc_clear_cache (i1) gets called as a part of nlc_invalidate or any other callers ... GF_FREE (ii nlc_ctx) LOCK (nlc_ctx->lock); -> This will result in crash as the ctx got freed in nlc_clear_cache. Issue2: lookup on dir1/file1 result in ENOENT add cache to dir1 at time T1 .... CHILD_DOWN at T2 lookup on dir1/file2 result in ENOENT add cache to dir1, but the cache time is still T1 lookup on dir1/file2 - should have been served from cache but the cache time is T1 < T2, hence cache is considered as invalid. So, after CHILD_DOWN the right thing would be to clear the cache and restart caching on that inode. Solution: Do not free nlc_ctx in nlc_clear_cache, but only in inode_forget() The fix for both issue1 and 2 is interleaved hence sending it as single patch. Change-Id: I83d8ed36c049a93567c6d7e63d045dc14ccbb397 BUG: 1458539 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17453 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: log stale rpc disconnects occasionallyAtin Mukherjee2017-06-091-3/+6
| | | | | | | | | | | | | | | | There might be situations where if a brick process is killed through SIGKILL (not SIGTERM) when brick mux is enabled glusterd will continue to receive disconnect events from the stale rpc which might flood the glusterd log. Fix is to use GF_LOG_OCCASIONALLY. Change-Id: I95a10c8be2346614e0a3458f98d9f99aab34800a BUG: 1460225 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17499 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* protocol/server: make listen backlog value as configurableMohammed Rafi KC2017-06-084-7/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | problem: When we call listen from protocol/server, we are giving a hard coded valie of 10 if it is not manually given. With multiplexing, especially when glusterd restarts all clients may try to connect to the server at a time. Which will result in overflowing the queue, and kernel will complain about the errors. Solution: This patch will introduce a volume set command to make backlog value as a configurable. This patch also changes the default values for backlog from 10 to 128. This changes is only applicable for sockets listening from protocol. Example: gluster volume set <volname> transport.listen-backlog 1024 Note: 1 Brick has to be restarted to get this value in effect 2 This changes won't be reflected in glusterd, or other xlators which calls listen. If you need, you have to add this option to the volfile. Change-Id: I0c5a2bbf28b5db612f9979e7560e05dd82b41477 BUG: 1456405 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17411 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: fix glusterd crash from glusterd_op_ac_rcvd_brick_op_accAtin Mukherjee2017-06-071-1/+1
| | | | | | | | | | | | | | | In out label, before checking ev_ctx->rsp_dict we should first check if ev_ctx is not NULL Change-Id: I28f4f1ee9070617a0e6a23a43af8c5756f96a47e BUG: 1452956 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17478 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
* glusterd: fix brick start raceAtin Mukherjee2017-06-065-21/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit tries to handle a race where we might end up trying to spawn the brick process twice with two different set of ports resulting into glusterd portmapper having the same brick entry in two different ports which will result into clients to fail connect to bricks because of incorrect ports been communicated back by glusterd. In glusterd_brick_start () checking brickinfo->status flag to identify whether a brick has been started by glusterd or not is not sufficient as there might be cases where while glusterd restarts glusterd_restart_bricks () will be called through glusterd_spawn_daemons () in synctask and immediately glusterd_do_volume_quorum_action () with server-side-quorum set to on will again try to start the brick and in case if the RPC_CLNT_CONNECT event for the same brick hasn't been processed by glusterd by that time, brickinfo->status will still be marked as GF_BRICK_STOPPED resulting into a reattempt to start the brick with a different port and that would result portmap go for a toss and resulting clients to fetch incorrect port. Fix would be to introduce another enum value called GF_BRICK_STARTING in brickinfo->status which will be set when a brick start is attempted by glusterd and will be set to started through RPC_CLNT_CONNECT event. For brick multiplexing, on attach brick request given the brickinfo->status flag is marked to started directly this value will not have any effect. Also this patch removes started_here flag as it looks to be redundant as brickinfo->status. Change-Id: I9dda1a9a531b67734a6e8c7619677867b520dcb2 BUG: 1457981 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17447 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* core: fix spelling errorsKaleb S. KEITHLEY2017-06-021-3/+3
| | | | | | | | | | | | | | fixes for various minor spelling errors and typos Reported-by: Patrick Matthäi <pmatthaei@debian.org> Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5 BUG: 1457808 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17442 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Fix regression wrt add-brick on replica count changeSamikshan Bairagya2017-06-012-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | tests/bugs/glusterd/bug-1406411-fail-add-brick-on-replica-count-change.t was failing on centos machines with brick multiplexing enabled. This is because detaching individual bricks manually from the backend like it is done in the regression test framework by 'kill_brick', fails to send a RPC_CLNT_DISCONNECT to glusterd when multiplexing is enabled. This causes the add-brick command to not fail when one of the bricks are killed using kill_brick in the regression test framework. To fix this, set the brick status to GF_BRICK_STOPPED on the glusterd end during portmap signout. This commit also sets the brick status in glusterd_brick_stop() function so that the brick status is correctly set to 'stopped' even when the function is called independently for individual bricks. Change-Id: I4d6f7b579069d0cfa53cb2b0cff78876e1f31594 BUG: 1456898 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17422 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterfs: Not able to mount running volume after enable brick mux and ↵Mohit Agrawal2017-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | stopped any volume Problem: After enabled brick mux if any volume has down and then try ot run mount with running volume , mount command is hung. Solution: After enable brick mux server has shared one data structure server_conf for all associated subvolumes.After down any subvolume in some ungraceful manner (remove brick directory) posix xlator sends GF_EVENT_CHILD_DOWN event to parent xlatros and server notify updates the child_up to false in server_conf.When client is trying to communicate with server through mount it checks conf->child_up and it is FALSE so it throws message "translator are not yet ready". From this patch updated structure server_conf to save child_up status for xlator wise. Another improtant correction from this patch is cleanup threads from server side xlators after stop the volume. BUG: 1453977 Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17356 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* tier: port value missing on cli parsinghari gowtham2017-05-311-0/+10
| | | | | | | | | | | | | | | | | | | | | problem: as tier didn't have a port all the values regarding the port were removed. but the cli needs a port value to parse and print the status. fix: fake the port value with a zero. Change-Id: I6491f6c441f7cfddbdaa724fcbe7c30e348aa765 BUG: 1452006 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: https://review.gluster.org/17419 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Tier: removing port allocated for tierhari gowtham2017-05-303-35/+0
| | | | | | | | | | | | | | | | Problem: Tier has a port which it doesn't use. Fix: Remove the port getting allocated for tier. Change-Id: If0fe393fc335d9f622a063787e0a3c6db9b7a50c BUG: 1452006 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: https://review.gluster.org/17328 Tested-by: hari gowtham <hari.gowtham005@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* glusterd: ignore incorrect uuid validation if uuid_str is emptyAtin Mukherjee2017-05-241-11/+16
| | | | | | | | | | | | | | | | | | | | If uuid_str is not filled up in dictionary (when glusterd bit is old), we shouldn't be additional validation with peer uuid otherwise the handshake request will fail. Refer : http://lists.gluster.org/pipermail/gluster-users/2017-May/031187.html Credits : pawan@platform.sh Change-Id: I2c30bf0490c31d1418b31d555e7758696e79409f BUG: 1454375 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17358 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com>
* glusterd: Eliminate race in brick compatibility checking stageSamikshan Bairagya2017-05-241-2/+5
| | | | | | | | | | | | | | | | | | | | | | In https://review.gluster.org/17307/, while looking for compatible bricks for multiplexing, it is checked if the brick pidfile exists before checking if the corresponding brick process is running. However checking if the brick process is running just after checking if the pidfile exists isn't enough since there might be race conditions where the pidfile has been created but hasn't been updated with a pid value yet. This commit solves that by making sure that we wait iteratively till the pid value is updated as well. Change-Id: Ib7a158f95566486f7c1f84b6357c9b89e4c797ae BUG: 1451248 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17375 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd : volume profile command on one of the node crashes glusterdGaurav Yadav2017-05-231-5/+6
| | | | | | | | | | | | | | | | | | | | | When volume profile command is issued on one of the node glusterd crashes. Its a race condition which may hit when profile command and status command is being executed from node A and node B respectively. While doing so event GD_OP_STATE_BRICK_OP_SENT/GD_OP_STATE_BRICK_COMMITTED is being triggered. As handling of event is not thread safe, hence context got modify and glusterd crashes. With the fix now we are validating the context before using it. Change-Id: Ic07c3cdc5644677b0e40ff0fac6fcca834158913 BUG: 1452956 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/17350 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* rda, glusterd: Change the max of rda-cache-limit to INFINITYPoornima G2017-05-211-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The max value of rda-cache-limit is 1GB before this patch. When parallel-readdir is enabled, there will be many instances of readdir-ahead, hence the rda-cache-limit depends on the number of instances. Eg: On a volume with distribute count 4, rda-cache-limit when parallel-readdir is enabled, will be 4GB instead of 1GB. Consider a followinf sequence of operations: - Enable parallel readdir - Set rda-cache-limit to lets say 3GB - Disable parallel-readdir, this results in one instance of readdir-ahead and the rda-cache-limit will be back to 1GB, but the current value is 3GB and hence the mount will stop working as 3GB > max 1GB. Solution: To fix this, we can limit the cache to 1GB even when parallel-readdir is enabled. But there is no necessity to limit the cache to 1GB, it can be increased if the system has enough resources. Hence getting rid of the rda-cache-limit max value is more apt. If we just change the rda-cache-limit max to INFINITY, we will render older(<3.11) clients broken, when the rda-cache-limit is set to > 1GB (as the older clients still expect a value < 1GB). To safely change the max value of rda-cache-limit to INFINITY, add a check in glusted to verify all the clients are > 3.11 if the value exceeds 1GB. Change-Id: Id0cdda3b053287b659c7bf511b13db2e45b92032 BUG: 1446516 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17338 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Don't spawn new glusterfsds on node reboot with brick-muxSamikshan Bairagya2017-05-183-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | With brick multiplexing enabled, upon a node reboot new bricks were not being attached to the first spawned brick process even though there wasn't any compatibility issues. The reason for this is that upon glusterd restart after a node reboot, since brick services aren't running, glusterd starts the bricks in a "no-wait" mode. So after a brick process is spawned for the first brick, there isn't enough time for the corresponding pid file to get populated with a value before the compatibilty check is made for the next brick. This commit solves this by iteratively waiting for the pidfile to be populated in the brick compatibility comparison stage before checking if the brick process is alive. Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4 BUG: 1451248 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17307 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: remove useless options from glusterd's volume set tableZhou Zhengping2017-05-171-13/+0
| | | | | | | | | | | | | | | | These options will cause brick's log complains: _log_if_unknown_option] 0-patchy-quota: option 'timeout' is not recognized _log_if_unknown_option] 0-patchy-server: option 'ping-timeout' is not recognized Change-Id: Ida2add13f792736a4e52bfaf38d1169309283a3f BUG: 1449008 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17213 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: coverity fix for string overflowSakshi Bansal2017-05-121-2/+3
| | | | | | | | | | | | | | | coverity CID: 1124852 Change-Id: Ifb04ad36b0652474007d2768737722231a5c1df0 BUG: 789278 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: https://review.gluster.org/9539 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: Make reset-brick work correctly if brick-mux is onSamikshan Bairagya2017-05-109-80/+89
| | | | | | | | | | | | | | | | | | | Reset brick currently kills of the corresponding brick process. However, with brick multiplexing enabled, stopping the brick process would render all bricks attached to it unavailable. To handle this correctly, we need to make sure that the brick process is terminated only if brick-multiplexing is disabled. Otherwise, we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective brick process to detach the brick that is to be reset. Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58 BUG: 1446172 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17128 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: socketfile & pidfile related fixes for brick multiplexing featureMohit Agrawal2017-05-097-32/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes BUG: 1444596 Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17101 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: cleanup pidfile on pmap signoutAtin Mukherjee2017-05-083-5/+92
| | | | | | | | | | | | | | | | This patch ensures 1. brick pidfile is cleaned up on pmap signout 2. pmap signout evemt is sent for all the bricks when a brick process shuts down. Change-Id: I7606a60775b484651d4b9743b6037b40323931a2 BUG: 1444596 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17168 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* Fixes quota aux mount failureSanoj Unnikrishnan2017-05-084-60/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aux mount is created on the first limit/remove_limit/list command and it remains until volume is stopped / deleted / (quota is disabled) , where we do a lazy unmount. If the process is uncleanly terminated, then the mount entry remains and we get (Transport disconnected) error on subsequent attempts to run quota list/limit-usage/remove commands. Second issue, There is also a risk of inadvertent rm -rf on the /var/run/gluster causing data loss for the user. Ideally, /var/run is a temp path for application use and should not cause any data loss to persistent storage. Solution: 1) unmount the aux mount after each use. 2) clean stale mount before mounting, if any. One caveat with doing mount/unmount on each command is that we cannot use same mount point for both list and limit commands. The reason for this is that list command needs mount to be accessible in cli after response from glusterd, So it could be unmounted by a limit command if executed in parallel (had we used same mount point) Hence we use separate mount points for list and limit commands. Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0 BUG: 1433906 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com> Reviewed-on: https://review.gluster.org/16938 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* snapview-server : Refresh the snapshot list during each reconnectMohammed Rafi KC2017-05-081-0/+1
| | | | | | | | | | | | | | | | | | | | Currently we are refreshing the snapshot list either when there is a request from glusterd or the very first initialization. But if anything changed after when glusterd is down then there is no mechanism to refresh the snashot dentries. This patch will refresh snapshot list during each reconnect Change-Id: I3ed655572d777f60d57dd479d190f75553591267 BUG: 1448150 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17178 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* Tier: Watermark check for hi and low value being equalhari gowtham2017-05-081-2/+3
| | | | | | | | | | | | | | | | | | | Problem: Both low and hi watermark can be set to same value as the check missed the case for being equal. Fix: Add the check to both the hi and low values being equal along with the low value being higher than hi value. Change-Id: Ia235163aeefdcb2a059e2e58a5cfd8fb7f1a4c64 BUG: 1447960 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: https://review.gluster.org/17175 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc: Remove accidental IPV6 changesKaushal M2017-05-051-4/+0
| | | | | | | | | | | | | | They snuck in with the HALO patch (07cc8679c) Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6 BUG: 1447953 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: https://review.gluster.org/17174 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* glusterd: disallow rebalance & remove-brick on a sharded volumeAtin Mukherjee2017-05-042-0/+19
| | | | | | | | | | | | | Change-Id: Idfbdbc61ca18054fdbf7556f74e195a63cd8a554 BUG: 1447630 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17160 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* mgmt: Fix null pointer derefMichael Scherer2017-05-041-1/+0
| | | | | | | | | | | | | | | | since "this" can be NULL, we should skip this validation, there is another call to GF_VALIDATE_OR_GOTO in the following lines. Found by cppcheck. Change-Id: I329f50b986a9eaf3315e09f851080ab41bea57c0 BUG: 789278 Signed-off-by: Michael Scherer <misc@redhat.com> Reviewed-on: https://review.gluster.org/16742 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* SELinux : implementation of SELinux translatorManikandan Selvaganesh2017-05-033-5/+37
| | | | | | | | | | | | | | | | | | | | The patch implement a part of SELinux translator to support setting SELinux contexts on files in a glusterfs volume. URL: https://github.com/gluster/glusterfs-specs/blob/master/accepted/SELinux-client-support.md Change-Id: Id8916bd8e064ccf74ba86225ead95f86dc5a1a25 BUG: 1318100 Fixes : #55 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/13762 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: skip nfs svc reconfigure if nfs xlator is not installedAtin Mukherjee2017-05-021-0/+9
| | | | | | | | | | | | | | | | With 83abcba, nfs svc is not (re)started or stopped if NFS so file is not installed. However the same check was missing in nfs svc reconfigure which was causing all volume set command to fail. Change-Id: Ie87b5dba44ac59e890cbd60f85944f8e685ad52b BUG: 1326219 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17149 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-023-2/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Make rebalance throttle option tuned by numberSusant Palai2017-04-291-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current rebalance throttle options: lazy/normal/aggressive may not always be sufficient for the purpose of throttling. In our recent test, we observed for certain setups, normal and aggressive modes behaved similarly consuming full disk bandwidth. So in cases like this admin should be able to tune it down(or vice versa) depending on the need. Along with old throttle configurations, thread counts are tuned based on number. e.g. gluster v set vol-name cluster-rebal.throttle 5. Admin can tune up/down between 0 and the number of cores available. Note: For heterogenous servers, validation will fail on the old server if "number" is given for throttle configuration. The message looks something like this: "volume set: failed: Staging failed on vm2. Error: cluster.rebal-throttle should be {lazy|normal|aggressive}" Test: Manual test by logging active thread number after reconfiguring throttle option. testcase: tests/basic/distribute/throttle-rebal.t Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3 BUG: 1438370 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16980 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Fix removing pmap entry on rpc disconnectPrashanth Pai2017-04-282-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | Problem: The following line of code intended to remove pmap entry for the connection during disconnects: pmap_registry_remove (this, 0, NULL, GF_PMAP_PORT_NONE, xprt); However, no pmap entry will have it's type set to GF_PMAP_PORT_NONE at any point in time. So a call to pmap_registry_search_by_xprt() in pmap_registry_remove() will always fail to find a match. Fix: Optionally ignore pmap entry's type in pmap_registry_search_by_xprt(). BUG: 1193929 Change-Id: I705f101739ab1647ff52a92820d478354407264a Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: https://review.gluster.org/17129 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* build: conditionally build legacy gNFS server and associated sub-packagingKaleb S. KEITHLEY2017-04-286-50/+36
| | | | | | | | | | | | | | | | | | | Plus some additional logic in glusterd to ensure gnfs (glusterfs) daemons are never started if server/nfs xlator is not installed. As a service, nfs is still initialized. The glusterfs-gnfs RPM may be installed or uninstalled independent of anything else, including on a system where gluster is actively running, so the existence of the xlator is always tested before trying to start gnfs. Change-Id: I56743ad1cb36a84917226d7d26cb9d015d441e66 BUG: 1326219 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/16958 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Implement self-heal-window_size optionSunil Kumar Acharya2017-04-251-0/+5
| | | | | | | | | | | | | | | | Fix implements the heal window size option for EC. This option control the maximum size of read/write operation carried out in self-heal process. BUG: 1441491 Change-Id: I6c0ef65c9ca18b0828f91b319d4f52ac5b77d0d8 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17098 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd/geo-rep: Fix snapshot create in geo-rep setupKotresh HR2017-04-241-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterd persists geo-rep sessions in glusterd info file which is represented by dictionary 'volinfo->gsync_slaves' in memory. Glusterd also maintains in memory active geo-rep sessions in dictionary 'volinfo->gsync_active_slaves' whose key is "<slave_url>::<slavhost>". When glusterd is restarted while the geo-rep sessions are active, it builds the 'volinfo->gsync_active_slaves' from persisted glusterd info file. Since slave volume uuid is added to "voinfo->gsync_slaves" with the commit "http://review.gluster.org/13111", it builds it with key "<slave_url>::<slavehost>:<slavevol_uuid>" which is wrong. So during snapshot pre-validation which checks whether geo-rep is active or not, it always says it is ACTIVE, as geo-rep stop would not deleted this key. Fixed the same in this patch. Change-Id: I185178910b4b8a62e66aba406d88d12fabc5c122 BUG: 1443977 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17093 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: set conn->reconnect to null on timer cancellationAtin Mukherjee2017-04-201-0/+1
| | | | | | | | | | | Change-Id: Ic48e6652f431daeb0db027660f6c9de16d893f08 BUG: 1443896 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17088 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* Implement negative lookup cachePoornima G2017-04-201-0/+31
| | | | | | | | | | | | | | | | | | | | | Before creating any file negative lookups(1 in Fuse, 4 in SMB etc.) are sent to verify if the file already exists. By serving these lookups from the cache when possible, increases the create performance by multiple folds in SMB access and some percentage in Fuse/NFS access. Feature page: https://review.gluster.org/#/c/16436 Updates #82 Change-Id: Ib1c0e7ac7a386f943d84f6398c27f9a03665b2a4 BUG: 1442569 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16952 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/bit-rot-stub: bring in optional versioningRaghavendra Bhat2017-04-181-2/+15
| | | | | | | | | | | | | | | | * As of now bit-rot-stub does versioning always. This leads lots of getxattr calls being made in lookups. So make object versioning optional. Change-Id: I83713e45ae59fb28004bb3cfa008f2d69edebbfa BUG: 1359599 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/14442 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix snapshot failure in non-root geo-rep setupKotresh HR2017-04-181-2/+5
| | | | | | | | | | | | | | | | | | | | Geo-replication session directory name has the form '<mastervol>_<slavehost>_<slavevol>'. But in non-root geo-replication setup, while preparing geo-replication session directory name, glusterd is including 'user@' resulting in "<mastervol>_<user@slavehost>_<slavevol>". Hence snapshot is failing to copy geo-rep specific session files. Fixing the same. Change-Id: Id214d3186e40997d2827a0bb60d3676ca2552df7 BUG: 1442760 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17067 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* dht: Add readdir-ahead in rebalance graph if parallel-readdir is onPoornima G2017-04-182-5/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The value of linkto xattr is generally the name of the dht's next subvol, this requires that the next subvol of dht is not changed for the life time of the volume. But with parallel readdir enabled, the readdir-ahead loaded below dht, is optional. The linkto xattr for first subvol, when: - parallel readdir is enabled : "<volname>-readdir-head-0" - plain distribute volume : "<volname>-client-0" - distribute replicate volume : "<volname>-afr-0" The value of linkto xattr is "<volname>-readdir-head-0" when parallel readdir is enabled, and is "<volname>-client-0" if its disabled. But the dht_lookup takes care of healing if it cannot identify which linkto subvol, the xattr points to. In dht_lookup_cbk, if linkto xattr is found to be "<volname>-client-0" and parallel readdir is enabled, then it cannot understand the value "<volname>-client-0" as it expects "<volname>-readdir-head-0". In that case, dht_lookup_everywhere is issued and then the linkto file is unlinked and recreated with the right linkto xattr. The issue is when parallel readdir is enabled, mount point accesses the file that is currently being migrated. Since rebalance process doesn't have parallel-readdir feature, it expects "<volname>-client-0" where as mount expects "<volname>-readdir-head-0". Thus at some point either the mount or rebalance will fail. Solution: Enable parallel-readdir for rebalance as well and then do not allow enabling/disabling parallel-readdir if rebalance is in progress. Change-Id: I241ab966bdd850e667f7768840540546f5289483 BUG: 1436090 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17056 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Add brick capacity details to get-state CLI outputSamikshan Bairagya2017-04-141-0/+19
| | | | | | | | | | | Change-Id: I53fe180e71d41d56b129254b93bb74014a2cdb43 BUG: 1431192 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17029 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: fix glusterd_wait_for_blockers to go in infinite loopAtin Mukherjee2017-04-131-6/+4
| | | | | | | | | | | | | | | | In send_attach_req () conf->blockers is bumped up before rpc_clnt_submit however the same is bumped down twice, one from the callback and one from the negative ret handling which can very well be a possible case if the rpc submit fails. Change-Id: Icb820694034cbfcb3d427911e192ac4a0f4540f6 BUG: 1441910 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17055 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: Propagate EADDRINUSE correctly to parent processPrashanth Pai2017-04-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | exit()/_exit(): Only the least significant 8 bits i.e (err & 255) shall be available to the waiting parent process on calling _exit() or exit() with an integer exit status. If this number is negative, the parent process doesn't readily get what it's really looking forward to handle. For example: EADDRINUSE is 98 and if exit status code is set to -98, the waiting parent process shall get 158 (= -98 & 255) as exit status. BUG: 1193929 Change-Id: Idc6b0f40c2332e087e584b4b40cbf0d29168c9cd Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: https://review.gluster.org/16200 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: Add client details to get-state outputSamikshan Bairagya2017-04-123-1/+207
| | | | | | | | | | | | | | | | | | | | | | | | This commit optionally adds client details corresponding to the locally running bricks to the get-state output. Since getting the client details involves sending RPC requests to the respective local bricks, this is a relatively more costly operation. These client details would be added to the get-state output only if the get-state command is invoked with the 'detail' option. This commit therefore also changes the get-state CLI usage. The modified usage is as follows: # gluster get-state [<daemon>] [[odir </path/to/output/dir/>] \ [file <filename>]] [detail] Change-Id: I42cd4ef160f9e96d55a08a10d32c8ba44e4cd3d8 BUG: 1431183 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17003 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* rpc: add options to manage socket keepalive lifespanMilind Changire2017-04-121-1/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Default values for handling socket timeouts for brick responses are insufficient for aggressive applications such as databases. Solution: Add 1:1 gluster options for keepalive, keepalive-idle, keepalive-interval and keepalive-timeout as per the socket level options available as per tcp(7) man page. Default values for options are NOT agressive and continue to be values which result in default timeout when only the keep alive option is turned on. These options are Linux specific and will not be applicable to the *BSDs. Change-Id: I2a08ecd949ca8ceb3e090d336ad634341e2dbf14 BUG: 1426059 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/16731 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Add validation for options rda-cache-limit rda-request-sizePoornima G2017-04-112-7/+35
| | | | | | | | | | | | | | | | | Currently when prarallel readdir is enabled, setting any junk value to rda-cache-limit and rda-request-size succeeds. This is because of bug in the special handling of these options. Fixing the same in this patch Change-Id: I902cd9ac9134c158ab6f8aea4b001254a03547bd BUG: 1439640 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17008 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* xlator: do not call dlclose() when debuggingNiels de Vos2017-04-077-11/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Valgrind can not show the symbols if a .so after calling dlclose(). The unhelpful ??? in the output gets resolved properly with this change: ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324 ==25170== at 0x4C29975: calloc (vg_replace_malloc.c:711) ==25170== by 0x52C7C0B: __gf_calloc (mem-pool.c:117) ==25170== by 0x12B0638A: ??? ==25170== by 0x528FCE6: __xlator_init (xlator.c:472) ==25170== by 0x528FE16: xlator_init (xlator.c:498) ==25170== by 0x52DA8D6: glusterfs_graph_init (graph.c:321) ==25170== by 0x52DB587: glusterfs_graph_activate (graph.c:695) ==25170== by 0x5046407: glfs_process_volfp (glfs-mgmt.c:79) ==25170== by 0x5043B9E: glfs_volumes_init (glfs.c:281) ==25170== by 0x5044FEC: glfs_init_common (glfs.c:986) ==25170== by 0x50451A7: glfs_init@@GFAPI_3.4.0 (glfs.c:1031) By not calling dlclose(), the dynamically loaded .so is still available upon program exit, and Valgrind is able to resolve the symbols. This will add an additional leak, so dlclose() is called for normal builds, but skipped when configuring with "./configure --enable-valgrind" or passing the "run-with-valgrind" xlator option. URL: http://valgrind.org/docs/manual/faq.html#faq.unhelpful Change-Id: I2044e21b1b8fcce32ad1a817fdd795218f967731 BUG: 1425623 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16809 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Fixes Stale auxiliary mount when crawler fails to spawnSanoj Unnikrishnan2017-04-051-2/+8
| | | | | | | | | | | | | | | | The auxiliary mount created for crawling remains if the crawler was not successfully spawned due to transport disconnect or other such issues. The patch ensures the mount is cleared in those code paths as well. Change-Id: I659fcc1d1956f8e05a37b75ebe3f3a00c24693e8 BUG: 1429330 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com> Reviewed-on: https://review.gluster.org/16853 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com>