summaryrefslogtreecommitdiffstats
path: root/tests/bugs/glusterd
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: ./tests/bugs/glusterd/bug-1595320.t is failingMohit Agrawal2019-08-261-1/+1
| | | | | | | | | | | | | | | | | | Problem: sometime ./tests/bugs/glusterd/bug-1595320.t is failing is failing at the time of checking brick_process after sending a kill signal to brick process Solution: Wait sometime after just sending a kill signal to brick process to make sure brick process is stopped > Change-Id: Iee9e91284618abfc62a550d47e4f9117785def58 > Fixes: bz#1743200 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > (cherry picked from commit 8f1620ad7f5d3d040fee55c5f873349800e2268d) Change-Id: Iee9e91284618abfc62a550d47e4f9117785def58 Fixes: bz#1745421 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfileAtin Mukherjee2019-07-161-1/+3
| | | | | | | | | | | | | ... with out which volume creation fails with "volume create: <xyz>: failed: Failed to create volume files" >Fixes: bz#1716812 >Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb >Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Fixes: bz#1721105 Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: define dumpops in the xlator_api of glusterdSanju Rakonde2019-05-081-0/+13
| | | | | | | | | | | | | | | Problem: statedump is not capturing information related to glusterd Solution: statdump is not capturing glusterd info because trav->dumpops is null in gf_proc_dump_single_xlator_info () where trav is glusterd xlator object. trav->dumpops is null because we missed to define dumpops in xlator_api of glusterd. defining dumpops in xlator_api of glusterd fixes the issue. fixes: bz#1703759 Change-Id: If85429ecb1ef580aced8d5b88d09fc15258bfc4c Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 5d866c13efdcdeddf184f012aa88a652e90ff22e)
* core: Log level changes do not effect on running client processMohit Agrawal2019-04-161-0/+113
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level per xlator during reconfigure only for a brick process not for the client process. Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure about brick multiplex introudce a flag brick_mux at ctx->cmd_args. Note: There are two other changes done with this patch 1) Ignore client-log-level option to attach a brick with already running brick if brick_mux is enabled 2) Add a log to print pid of the running process to make easier debugging > Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > Fixes: bz#1696046 > (Cherry picked from commit 798aadbe51a9a02dd98a0f861cc239ecf7c8ed57) > (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/22495/) Change-Id: If91682830f894ab8f6857f19dcb1797fc15ca64c Fixes: bz#1699715 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* Bump up timeout for tests on AWSNigel Babu2019-02-071-2/+2
| | | | | | Fixes: bz#1673267 Change-Id: I2b9be45f199f6436b858536c6f49be85902217f0 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* tests: run nfs tests only if --enable-gnfs is providedAmar Tumballi2019-01-241-0/+2
| | | | | | Fixes: bz#1665358 Change-Id: Idbf88ec3ac683733b32c313377eeb72f2819bf0d Signed-off-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs/common-utils.c: Fix buffer size for checksum computationVarsha Rao2019-01-111-0/+35
| | | | | | | | | | | | | | Problem: When quorum count option is updated, the change is not reflected in the nfs-server.vol file. This is because in get_checksum_for_file(), when the last part of the file read has size less than buffer size, the read buffer stores old data value along with correct data value. Solution: Pass the bytes read instead of fixed buffer size, for calculating checksum. Change-Id: I4b641607c8a262961b3f3da0028a54e08c3f8589 fixes: bz#1657744 Signed-off-by: Varsha Rao <varao@redhat.com>
* cluster/afr: Disable client side heals in AFR by default.Sunil Kumar Acharya2019-01-101-0/+3
| | | | | | | | | With this changeset, default value for the AFR client side heal volume option is set to "off" fixes: bz#1663102 Change-Id: Ie4016932339c4896487e3e7cb5caca68739b7ba2 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* glusterd: coverity fixesAtin Mukherjee2018-11-031-0/+3
| | | | | | | | | | | | | Addresses CIDs : 1124769, 1124852, 1124864, 1134024, 1229876, 1382382 Also addressed a spurious failure in tests/bugs/glusterd/df-results-post-replace-brick-operations.t to ensure post replace brick operation and before triggering 'df' from mount, client has connection to the newly replaced bricks. Change-Id: Ie5d7e02f89400a661491d7fc2a120d6f6a83a1cc Updates: bz#789278 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tiering: remove the translator from build and glusterdAmar Tumballi2018-11-021-78/+0
| | | | | | | | | | | | | Based on the proposal to remove few features as they are not actively maintained [1], removing tier translator from the build. Also make sure there are no regression tests involving tiering feature are present. [1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html Change-Id: I2c177f711f9b54b7b24e1a13525ff3132bd9a9c5 updates: bz#1642807 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* glusterd: set fsid while performing replace brickSanju Rakonde2018-11-021-0/+58
| | | | | | | | | | While performing the replace-brick operation, we should set fsid value to the new brick. fixes: bz#1637196 Change-Id: I9e9a4962fc0c2f5dff43e4ac11767814a0c0beaf Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* tests: brick-mux-fd-cleanup.t should be under core directoryAtin Mukherjee2018-10-311-78/+0
| | | | | | Fixes: bz#1637934 Change-Id: I5f95beab62bd2bdde3bbee94c308b0ad03e94379 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* stripe: remove the translator from build and glusterdAmar Tumballi2018-10-312-9/+2
| | | | | | | | | | | | | | | | Based on the proposal to remove few features as they are not actively maintained [1], removing stripe translator from the build. Also make sure there are no regression tests involving stripe translator. [1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html Note that this patch aims at removing the translator from build, and a followup patch is needed to remove the code from repository. Updates: bz#1364707 Change-Id: I235b305338f138e29e9f30cba65bc0dadbebbbd5 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* glusterd: ensure volinfo->caps is set to correct value.Sanju Rakonde2018-10-251-0/+9
| | | | | | | | | | | | | | | | | With the commit febf5ed4848, during the volume create op, we are setting volinfo->caps to 0, only if any of the bricks belong to the same node and brickinfo->vg[0] is null. Previously, we used to set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. With this patch, we set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. (as we do earlier without commit febf5ed4848). fixes: bz#1635820 Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.tSanju Rakonde2018-10-251-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has optimised glusterd test cases by clubbing the similar test cases into a single test case. https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t test case has been deleted and added as a part of tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t In the original test case, we create a volume with two bricks, each on a separate node(N1 & N2). From another node in cluster(N3), we try to detach a node which is hosting bricks. It fails. In the new test, we created volume with single brick on N1. and from another node in cluster, we tried to detach N1. we expect peer detach to fail, but peer detach was success as the node is hosting all the bricks of volume. Now, changing the new test case to cover the original test case scenario. Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1642597#c1 to understand why the new test case is not failing in centos-regression. fixes: bz#1642597 Change-Id: Ifda12b5677143095f263fbb97a6808573f513234 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* core: glusterfsd keeping fd open in index xlatorMohit Agrawal2018-10-121-0/+78
| | | | | | | | | | | | | | | | | | | | | | Problem: At the time of processing GF_EVENT_PARENT_DOWN at brick xlator, it forwards the event to next xlator only while xlator ensures no stub is in progress. At io-thread xlator it decreases stub_cnt before the process a stub and notify EVENT to next xlator Solution: Introduce a new counter to save stub_cnt and decrease the counter after process the stub completely at io-thread xlator. To avoid brick crash at the time of call xlator_mem_cleanup move only brick xlator if detach brick name has found in the graph Note: Thanks to pranith for sharing a simple reproducer to reproduce the same fixes bz#1637934 Change-Id: I1a694a001f7a5417e8771e3adf92c518969b6baa Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* tests: fix test case failureSanju Rakonde2018-09-201-1/+1
| | | | | | | | | | | | | | | | tests/bugs/glusterd/bug-1595320.t is failing in downstream. In downstream repo, enabling the brick multiplexing made interactive, so it will throw an prompt for the user input. As no input is provided during the test case execution, the test is failing. Using macro CLI instead of using gluster command, will bypass the interacive commands. so replacing the gluster command with CLI macro will address the issue. Change-Id: I6b39052d8e415a8ed08de7c80a91dadce155146a updates: bz#1193929 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-121-7/+7
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* glusterd: avoid using glusterd's working directory as a brickSanju Rakonde2018-09-081-0/+9
| | | | | | | | | Adding checks for avoiding glusterd's working directory used as a brick for volume creation. fixes: bz#853601 Change-Id: I4b16a05f752e92216aa628f542a4fdbf59b3c669 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* tests: fix brick check ordersAtin Mukherjee2018-08-134-33/+51
| | | | | | | | | | | | fix brick checks for validating-server-quorum.t & quorum-validation.t ...and make brick_up_status_1 function more generic. Also fix a timing issue in bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t Change-Id: I797ef4cec5b160aafa979bae7151b1e99fcb48ac Updates: bz#1603063 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests: Fix cleanup routine for some mux testsShyamsundarR2018-08-131-3/+2
| | | | | | | | | | | | | | | | | Some of the mux tests, set a trap to catch test exit and call cleanup. This will cause cleanup to be invoked twice in case the test times out, or even otherwise, as include.rc also sets a trap to cleanup on exit (TERM and others). This leads to the tarballs generated on failures for these tests to be empty and does not aid debugging. This patch corrects this pattern across the tests to the more standard cleanup at the end. Fixes: bz#1615037 Change-Id: Ib83aeb09fac2aa591b390b9fb9e1f605bfef9a8b Signed-off-by: ShyamsundarR <srangana@redhat.com>
* glusterd: Compare volume_id before start/attach a brickMohit Agrawal2018-08-101-0/+93
| | | | | | | | | | | | | | Problem: After reboot a node brick is not coming up because fsid comparison is failed before start a brick Solution: Instead of comparing fsid compare volume_id to resolve the same because fsid is changed after reboot a node but volume_id persist as a xattr on brick_root path at the time of creating a volume. Change-Id: Ic289aab1b4ebfd83bbcae8438fee26ae61a0fff4 fixes: bz#1612418 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* glusterd: Bricks of a normal volumes should not attach to ↵Sanju Rakonde2018-08-032-33/+51
| | | | | | | | | | | | | | | | | | | gluster_shared_storage bricks Problem: In a brick multiplexing environment, Bricks of a normal volume created by user are getting attached to the bricks of a volume "gluster_shared_storage" which is created by enabling the enable-shared-storage option. Mounting gluster_shared_storage has strict authentication checks. when we attach bricks of a normal volume to bricks of gluster_shared_storage, mounting the normal volume created by user will fail due to strict authentication checks. Solution: We should not attach bricks of a normal volume to brick process of gluster_shared_storage volume and vice versa. fixes: bz#1610726 Change-Id: If1b5a2a02675789a2915ba480fb48c145449163d Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* glusterd: Introduce daemon-log-level cluster wide optionAtin Mukherjee2018-07-031-0/+93
| | | | | | | | | | | | This option, applicable to the node level daemons can be very helpful in controlling the log level of these services. Please note any daemon which is started prior to setting the specific value of this option (if not INFO) will need to go through a restart to have this change into effect. Change-Id: I7f6d2620bab2b094c737f5cc816bc093e9c9c4c9 fixes: bz#1597473 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: handling brick termination in brick-muxSanju Rakonde2018-05-071-0/+33
| | | | | | | | | | | | | | | | | Problem: There's a race between the glusterfs_handle_terminate() response sent to glusterd from last brick of the process and the socket disconnect event that encounters after the brick process got killed. Solution: When it is a last brick for the brick process, instead of sending GLUSTERD_BRICK_TERMINATE to brick process, glusterd will kill the process (same as we do it in case of non brick multiplecing). The test case is added for https://bugzilla.redhat.com/show_bug.cgi?id=1549996 Change-Id: If94958cd7649ea48d09d6af7803a0f9437a85503 fixes: bz#1545048 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* gluster: Sometimes Brick process is crashed at the time of stopping brickMohit Agrawal2018-04-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Sometimes brick process is getting crashed at the time of stop brick while brick mux is enabled. Solution: Brick process was getting crashed because of rpc connection was not cleaning properly while brick mux is enabled.In this patch after sending GF_EVENT_CLEANUP notification to xlator(server) waits for all rpc client connection destroy for specific xlator.Once rpc connections are destroyed in server_rpc_notify for all associated client for that brick then call xlator_mem_cleanup for for brick xlator as well as all child xlators.To avoid races at the time of cleanup introduce two new flags at each xlator cleanup_starting, call_cleanup. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/#/c/19700/) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ic4ab9c128df282d146cf1135640281fcb31997bf updates: bz#1544090
* glusterd: mark port_registered to true for all running bricks with brick muxAtin Mukherjee2018-04-051-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | glusterd maintains a boolean flag 'port_registered' which is used to determine if a brick has completed its portmap sign in process. This flag is (re)set in pmap_sigin and pmap_signout events. In case of brick multiplexing this flag is the identifier to determine if the very first brick with which the process is spawned up has completed its sign in process. However in case of glusterd restart when a brick is already identified as running, glusterd does a pmap_registry_bind to ensure its portmap table is updated but this flag isn't which is fine in case of non brick multiplex case but causes an issue if the very first brick which came as part of process is replaced and then the subsequent brick attach will fail. One of the way to validate this is to create and start a volume, remove the first brick and then add-brick a new one. Add-brick operation will take a very long time and post that the volume status will show all other brick status apart from the new brick as down. Solution is to set brickinfo->port_registered to true for all the running bricks when brick multiplexing is enabled. Change-Id: Ib0662d99d0fa66b1538947fd96b43f1cbc04e4ff Fixes: bz#1560957 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* Revert "glusterd: handling brick termination in brick-mux"Sanju Rakonde2018-03-291-32/+0
| | | | | | | | | | | | | This reverts commit a60fc2ddc03134fb23c5ed5c0bcb195e1649416b. This commit was causing multiple tests to time out when brick multiplexing is enabled. With further debugging, it's found that even though the volume stop transaction is converted into mgmt_v3 to allow the remote nodes to follow the synctask framework to process the command, there are other callers of glusterd_brick_stop () which are not synctask based. Change-Id: I7aee687abc6bfeaa70c7447031f55ed4ccd64693 updates: bz#1545048
* glusterd: handling brick termination in brick-muxSanju Rakonde2018-03-281-0/+32
| | | | | | | | | | | | | | | Problem: There's a race between the last glusterfs_handle_terminate() response sent to glusterd and the kill that happens immediately if the terminated brick is the last brick. Solution: When it is a last brick for the brick process, instead of glusterfsd killing itself, glusterd will kill the process in case of brick multiplexing. And also changing gf_attach utility accordingly. Change-Id: I386c19ca592536daa71294a13d9fc89a26d7e8c0 fixes: bz#1545048 BUG: 1545048 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* glusterd: optimization of test casesSanju Rakonde2018-02-1093-2696/+1575
| | | | | | | | | | | | | | | To reduce the overall time taken by the every regression job for all glusterd test cases, avoiding some duplicate tests by clubbing similar test cases into one. real time taken for all regression jobs of glusterd without this patch is 1959 seconds, with this patch it is 1059 seconds. Look at the below document for your reference. https://docs.google.com/document/d/1u8o4-wocrsuPDI8BwuBU6yi_x4xA_pf2qSrFY6WEQpo/edit?usp=sharing Change-Id: Ib14c61ace97e62c3abce47230dd40598640fe9cb BUG: 1530905 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* tests: fix for bug-1260185-donot-allow-detach-commit-unnecessarily.t failurehari gowtham2017-11-301-49/+0
| | | | | | | | | | problem: detach commit was issues before detach start was completed. fix: wait for detach start to finish and then detach commit. Change-Id: I639962be6de6dbd1512f0a5617050d1e6872eac8 BUG: 1517961 Signed-off-by: hari gowtham <hgowtham@redhat.com>
* tests: mark currently failing regression tests as known issuesAmar Tumballi2017-11-281-0/+6
| | | | | | Change-Id: If6c36dc6c395730dfb17b5b4df6f24629d904926 BUG: 1517961 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests: fix bug-1483058-replace-brick-quorum-validation.t spurious failureAtin Mukherjee2017-11-121-1/+8
| | | | | | Change-Id: I04c35305bfb663eabbf715eee78695adfd4a2d20 BUG: 1511310 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests: fix spurious failure in bug-1345727-bricks-stop-on-no-quorum-validation.tAtin Mukherjee2017-11-071-0/+1
| | | | | | | | Add peer_count check before checking for brick status Change-Id: I0179ec29729ab6bbc3571eb6ffd631b7b0d15f7c BUG: 1510415 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: delete source brick only once in reset-brick commit forceAtin Mukherjee2017-10-311-0/+24
| | | | | | | | | | | While stopping the brick which is to be reset and replaced delete_brick flag was passed as true which resulted glusterd to free up to source brick before the actual operation. This results commit force to fail failing to find the source brickinfo. Change-Id: I1aa7508eff7cc9c9b5d6f5163f3bb92736d6df44 BUG: 1507466 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* tests: Update tier CLI in .t filesN Balachandran2017-10-301-1/+1
| | | | | | | | Update .t tier tests to use the new tier CLI. Change-Id: I0e7f1769071108d8266fc86378c4466bcaf96e7d BUG: 1505253 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* glusterd:Marking all the brick status as stopped when a process goes down in ↵Sanju Rakonde2017-10-121-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | brick multiplexing In brick multiplexing environment, if a brick process goes down i.e., if we kill it with SIGKILL, the status of the brick for which the process came up for the first time is only changing to stopped. all other brick statuses are remain started. This is happening because the process was killed abruptly using SIGKILL signal and signal handler wasn't invoked and further cleanup wasn't triggered. When we try to start a volume using force, it shows error saying "Request timed out", since all the brickinfo->status are still in started state, we're waiting for one of the brick process to come up which never going to happen since the brick process was killed. To resolve this, In the disconnect event, We are checking all the processes that whether the brick which got disconnected belongs the process. Once we get the process we are calling a function named glusterd_mark_bricks_stopped_by_proc() and sending brick_proc_t object as an argument. From the glusterd_brick_proc_t we can get all the bricks attached to that process. but these are duplicated ones. To get the original brickinfo we are reading volinfo from brick. In volinfo we will have original brickinfo copies. We are changing brickinfo->status to stopped for all the bricks. Change-Id: Ifb9054b3ee081ef56b39b2903ae686984fe827e7 BUG: 1499509 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* glusterd: disallow replace brick for dist only volumesAtin Mukherjee2017-09-194-17/+17
| | | | | | | | | | | | | | | | | | Allowing replace-brick on dist only volumes will lead to data loss. This patch blocks replace brick commit force to fail if a volume is dist only. Also removing tests/basic/pump.t as its of no use as per the discussion in http://lists.gluster.org/pipermail/gluster-devel/2017-September/053652.html Change-Id: Iabb0c16f865f3fc361b64a19bfcf0c0fbb5c2682 BUG: 1489432 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/18226 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: glusterd fails to start if peers file has blank lineGaurav Yadav2017-08-241-0/+29
| | | | | | | | | | | | | | | | | | | | | | | Problem: On start of glusterd service, glusterd fetch data from store, while parsing data from store if peers file consists of blank line glusterd fails to start. Fix: With this fix while parsing peers file glusterd will skip blank lines if it contains any. Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Change-Id: I53cd65a54de5f57baef292b2118b70ffb7f99388 BUG: 1482906 Reviewed-on: https://review.gluster.org/18066 Tested-by: Gaurav Yadav <gyadav@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterd: replace-brick executing successfully when quorum does not metGaurav Yadav2017-08-221-0/+51
| | | | | | | | | | | | | | | | | | Problem: replace-brick command on a setup where quorum does not met executing successfully. Fix: With the fix glusterd is validating whether server is in quorum or not during replace-brick staging Change-Id: I8017154bb62bdcc6c6490e720ecfe9cde090c161 BUG: 1483058 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/18068 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: disallow volume specific options to be set with all as volume nameAtin Mukherjee2017-08-181-0/+25
| | | | | | | | | | | | | | | | All the .validate_fn defined in volume map entry table refers to volinfo object. And if we end up in trying to set a volume level option cluster wide glusterd results into a crash. Change-Id: I7c877aee0ff5c8c1d8c95662fdc8c8923355ae7b BUG: 1482344 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/18052 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Reviewed-by: Gaurav Yadav <gyadav@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Gluster should keep PID file in correct locationGaurav Kumar Garg2017-08-112-3/+2
| | | | | | | | | | | | | | | | | | | | | | | Currently Gluster keeps process pid information of all the daemons and brick processes in Gluster configuration file directory (ie., /var/lib/glusterd/*). These pid files should be seperate from configuration files. Deletion of the configuration file directory might result into serious problems. Also, /var/run/gluster is the default placeholder directory for pid files. So, with this fix Gluster will keep all process pid information of all processes in /var/run/gluster/* directory. Change-Id: Idb09e3fccb6a7355fbac1df31082637c8d7ab5b4 BUG: 1258561 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: https://review.gluster.org/13580 Tested-by: MOHIT AGRAWAL <moagrawa@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterfs: Not able to mount running volume after enable brick mux and ↵Mohit Agrawal2017-05-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | stopped any volume Problem: After enabled brick mux if any volume has down and then try ot run mount with running volume , mount command is hung. Solution: After enable brick mux server has shared one data structure server_conf for all associated subvolumes.After down any subvolume in some ungraceful manner (remove brick directory) posix xlator sends GF_EVENT_CHILD_DOWN event to parent xlatros and server notify updates the child_up to false in server_conf.When client is trying to communicate with server through mount it checks conf->child_up and it is FALSE so it throws message "translator are not yet ready". From this patch updated structure server_conf to save child_up status for xlator wise. Another improtant correction from this patch is cleanup threads from server side xlators after stop the volume. BUG: 1453977 Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17356 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* libglusterfs : Fix crash in glusterd while peer probingGaurav Yadav2017-05-261-0/+25
| | | | | | | | | | | | | | | | | | | | | | glusterd crashes when port is being set explcitly to a range which is outside greater than short data type range. Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156" In above case glusterd crashes while parsing the port. With this fix glusterd will be able to handle port range between INT_MIN to INT_MAX Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1 BUG: 1454418 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/17359 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: Don't spawn new glusterfsds on node reboot with brick-muxSamikshan Bairagya2017-05-181-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | With brick multiplexing enabled, upon a node reboot new bricks were not being attached to the first spawned brick process even though there wasn't any compatibility issues. The reason for this is that upon glusterd restart after a node reboot, since brick services aren't running, glusterd starts the bricks in a "no-wait" mode. So after a brick process is spawned for the first brick, there isn't enough time for the corresponding pid file to get populated with a value before the compatibilty check is made for the next brick. This commit solves this by iteratively waiting for the pidfile to be populated in the brick compatibility comparison stage before checking if the brick process is alive. Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4 BUG: 1451248 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17307 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: remove useless options from glusterd's volume set tableZhou Zhengping2017-05-171-10/+10
| | | | | | | | | | | | | | | | These options will cause brick's log complains: _log_if_unknown_option] 0-patchy-quota: option 'timeout' is not recognized _log_if_unknown_option] 0-patchy-server: option 'ping-timeout' is not recognized Change-Id: Ida2add13f792736a4e52bfaf38d1169309283a3f BUG: 1449008 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17213 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: Make reset-brick work correctly if brick-mux is onSamikshan Bairagya2017-05-101-0/+79
| | | | | | | | | | | | | | | | | | | Reset brick currently kills of the corresponding brick process. However, with brick multiplexing enabled, stopping the brick process would render all bricks attached to it unavailable. To handle this correctly, we need to make sure that the brick process is terminated only if brick-multiplexing is disabled. Otherwise, we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective brick process to detach the brick that is to be reset. Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58 BUG: 1446172 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17128 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: socketfile & pidfile related fixes for brick multiplexing featureMohit Agrawal2017-05-093-0/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes BUG: 1444596 Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17101 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Fixes quota aux mount failureSanoj Unnikrishnan2017-05-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aux mount is created on the first limit/remove_limit/list command and it remains until volume is stopped / deleted / (quota is disabled) , where we do a lazy unmount. If the process is uncleanly terminated, then the mount entry remains and we get (Transport disconnected) error on subsequent attempts to run quota list/limit-usage/remove commands. Second issue, There is also a risk of inadvertent rm -rf on the /var/run/gluster causing data loss for the user. Ideally, /var/run is a temp path for application use and should not cause any data loss to persistent storage. Solution: 1) unmount the aux mount after each use. 2) clean stale mount before mounting, if any. One caveat with doing mount/unmount on each command is that we cannot use same mount point for both list and limit commands. The reason for this is that list command needs mount to be accessible in cli after response from glusterd, So it could be unmounted by a limit command if executed in parallel (had we used same mount point) Hence we use separate mount points for list and limit commands. Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0 BUG: 1433906 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com> Reviewed-on: https://review.gluster.org/16938 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd : Disallow peer detach if snapshot bricks exist on itGaurav Yadav2017-03-311-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem : - Deploy gluster on 2 nodes, one brick each, one volume replicated - Create a snapshot - Lose one server - Add a replacement peer and new brick with a new IP address - replace-brick the missing brick onto the new server (wait for replication to finish) - peer detach the old server - after doing above steps, glusterd fails to restart. Solution: With the fix detach peer will populate an error : "N2 is part of existing snapshots. Remove those snapshots before proceeding". While doing so we force user to stay with that peer or to delete all snapshots. Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c BUG: 1322145 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/16907 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>