summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src/glusterd-rebalance.c
Commit message (Collapse)AuthorAgeFilesLines
* build: extend --enable-valgrind to support Memcheck and DRDDmitry Antipov2020-09-051-3/+15
| | | | | | | | | Extend '-enable-valgrind' to '--enable=valgrind[=memcheck,drd]' to enable Memcheck or DRD Valgrind tool, respectively. Change-Id: I80d13d72ba9756e0cbcdbeb6766b5c98e3e8c002 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Updates: #1002
* glusterd: additional log informationnik-redhat2020-06-291-1/+4
| | | | | | | | | | | | | | | Issue: Some of the functions didn't had sufficient logging of information in case of failure. Fix: Added log information in few functions in case of failure indicating the cause of such event. Change-Id: I301cf3a1c8d2c94505c6ae0d83072b0241c36d84 fixes: #874 Signed-off-by: nik-redhat <nladha@redhat.com>
* glusterd: create separate logdirs for cluster.rc instancesN Balachandran2019-08-141-3/+3
| | | | | | | | | | Create a separate logdir for each host instance created by cluster.rc. This makes it easier to determine the files belonging to a particular instance. Change-Id: Ic8321f83f98995412b7d5f095b3d3f0391767a8b Fixes: bz#1733042 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* core: use more restrictive mode while creating the directoriesSanju Rakonde2019-07-231-1/+1
| | | | | | | fixes: bz#1724024 Change-Id: I539fb7248b2cfc037ec29f1413ea648f9ec21ef2 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* glusterd: Can't run rebalance due to long unix socketMohit Agrawal2019-06-251-51/+9
| | | | | | | | | | | | | Problem: glusterd populate unix socket file name based on volname and if volname is lengthy socket system call's are failed due to breach maximum length is defined in the kernel. Solution:Convert unix socket name to hash to resolve the issue Change-Id: I5072e8184013095587537dbfa4767286307fff65 fixes: bz#1720566 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* glusterd.h: remove unneeded macros or move them to their users.Yaniv Kaul2019-06-251-0/+25
| | | | | | | | | | Some macros were not used, so removed. Some macros were quite local, so moved to the respective users. Some macros simplified (removed an allocation here and there) Change-Id: Ifaf1aff15a78f105b1549ab8053378933b35df43 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* glusterd/tier: remove tier related code from glusterdHari Gowtham2019-05-271-179/+11
| | | | | | | | | | | | | The handler functions are pointed to dummy functions. The switch case handling for tier also have been moved to point default case to avoid issues, if reintroduced. The tier changes in DHT still remain as such. updates: bz#1693692 Change-Id: I80d80c9a3eb862b4440a36b31ae82b2e9d92e4dc Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
* rpc/transport: Missing a ref on dict while creating transport objectMohammed Rafi KC2019-03-201-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while creating rpc_tranpsort object, we store a dictionary without taking a ref on dict but it does an unref during the cleaning of the transport object. So the rpc layer expect the caller to take a ref on the dictionary before passing dict to rpc layer. This leads to a lot of confusion across the code base and leads to ref leaks. Semantically, this is not correct. It is the rpc layer responsibility to take a ref when storing it, and free during the cleanup. I'm listing down the total issues or leaks across the code base because of this confusion. These issues are currently present in the upstream master. 1) changelog_rpc_client_init 2) quota_enforcer_init 3) rpcsvc_create_listeners : when there are two transport, like tcp,rdma. 4) quotad_aggregator_init 5) glusterd: init 6) nfs3_init_state 7) server: init 8) client:init This patch does the cleanup according to the semantics. Change-Id: I46373af9630373eb375ee6de0e6f2bbe2a677425 updates: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* glusterd: migrating rebalance commands to mgmt_v3 frameworkSanju Rakonde2018-12-181-9/+487
| | | | | | | | | Current rebalance commands use the op_state machine framework. Porting it to use the mgmt_v3 framework. Change-Id: I6faf4a6335c2e2f3d54bbde79908a7749e4613e7 fixes: bz#1655827 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-051-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* all: fix the format string exceptionsAmar Tumballi2018-11-051-1/+1
| | | | | | | | | | | | | | | | Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. While fixing it, makes sense to make the explicit check for such issues across the codebase, by making the format call properly. Fixes: CVE-2018-14661 Fixes: bz#1644763 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-121-937/+925
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* Some (mgmt) xlators: use dict_{setn|getn|deln|get_int32n|set_int32n|set_strn}Yaniv Kaul2018-09-091-22/+36
| | | | | | | | | | | | | | | | | | | | | In a previous patch (https://review.gluster.org/20769) we've added the key length to be passed to dict_* funcs, to remove the need to strlen() it. This patch moves some xlators to use it. - It also adds dict_get_int32n which was missing. - It also reduces the size of some key variables. They were set to 1024b or PATH_MAX, where sometimes 64 bytes were really enough. Please review carefully: 1. That I did not reduce some the size of the key variables too much. 2. That I did not mix up some keys. Compile-tested only! Change-Id: Ic729baf179f40e8d02bc2350491d4bb9b6934266 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-221-1/+1
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* glusterd: enable self-heal in daemonsRavishankar N2018-05-041-4/+0
| | | | | | | | | ..like rebalance, quota and tier because that seems to be the consensus (see BZ). Change-Id: I912336a12f4e33ea4ec55f804df403fab0dc89fc BUG: 1536024 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* glusterd: honour localtime-logging for all the daemonsAtin Mukherjee2018-04-031-0/+6
| | | | | | Change-Id: I97a70d29365b0a454241ac5f5cae56d93eefd73a Fixes: bz#1563334 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* Coverity Issue: PW.INCLUDE_RECURSION in several filesGirjesh Rajoria2017-11-091-1/+0
| | | | | | | | | | | | | | Coverity ID: 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 423, 424, 425, 426, 427, 428, 429, 436, 437, 438, 439, 440, 441, 442, 443 Issue: Event include_recursion Removed redundant, recursive includes from the files. Change-Id: I920776b1fa089a2d4917ca722d0075a9239911a7 BUG: 789278 Signed-off-by: Girjesh Rajoria <grajoria@redhat.com>
* xlators/mgmt/glusterd/: Coverity Issue MISSING_BREAK in ↵Girjesh Rajoria2017-11-071-1/+2
| | | | | | | | | | | | | | glusterd_op_stage_rebalance Coverity ID: 297, 298 Issue: Event unterminated_case Added Fall through in comment. Change-Id: Ied742e64d9741cc974906921dcca80f0b738b7d6 BUG: 789278 Signed-off-by: Girjesh Rajoria <grajoria@redhat.com>
* glusterd: disable rpc_clnt_t after relalance process disconnectionMilind Changire2017-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | Problem: glusterd continues to connect to rebalance process even after the socket connection has disconnected. Solution: rpc_clnt_disable() disables the rpc_clnt_t object and disarms all relevant timers and drops refs to the rpc_clnt_t object and the transport as well. Change-Id: I981d6f1cc0087037f1927062c2770a4d5026a619 BUG: 1484225 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/18114 Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Infra to indentify processhari gowtham2017-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: currently we can't identify which process is running and how many instances of it are available. Fix: name the process when its spawned and send it to the server and save it in the client_t The processes that abide by this change from this patch are: 1) fuse mount, 2) rebalance, 3) selfheal, 4) tier, 5) quota, 6) snapshot, 7) brick. 8) gfapi (by default. gfapi.<processname> if processname is found) Note: fuse gets a process name as native-fuse-client by default. If the user gives a name for the fuse and spawns it, it will be of this type --process-name native-fuse-client.<name_specified>. This can be made use by the process like aux mount done by quota, geo-rep, etc by adding another option in the aux mount " -o process-name=gsync_mount" Updates: #178 Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: Ie4d02257216839338043737691753bab9a974d5e Reviewed-on: https://review.gluster.org/17957 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Revert "glusterd: disallow rebalance & remove-brick on a sharded volume"Krutika Dhananjay2017-06-131-9/+0
| | | | | | | | | | | | | | | | | This reverts commit 8375b3d70d5c6268c6770b42a18b2e1bc09e411e. Now that some of the users have confirmed rebalance works fine without causing corruption of VMs, time to revert the CLI restriction. Change-Id: I45493fcbb1f25fd0fff27b2b3526c42642ccb464 BUG: 1460585 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17506 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: socketfile & pidfile related fixes for brick multiplexing featureMohit Agrawal2017-05-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes BUG: 1444596 Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17101 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: disallow rebalance & remove-brick on a sharded volumeAtin Mukherjee2017-05-041-0/+9
| | | | | | | | | | | | | Change-Id: Idfbdbc61ca18054fdbf7556f74e195a63cd8a554 BUG: 1447630 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17160 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* xlator: do not call dlclose() when debuggingNiels de Vos2017-04-071-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Valgrind can not show the symbols if a .so after calling dlclose(). The unhelpful ??? in the output gets resolved properly with this change: ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324 ==25170== at 0x4C29975: calloc (vg_replace_malloc.c:711) ==25170== by 0x52C7C0B: __gf_calloc (mem-pool.c:117) ==25170== by 0x12B0638A: ??? ==25170== by 0x528FCE6: __xlator_init (xlator.c:472) ==25170== by 0x528FE16: xlator_init (xlator.c:498) ==25170== by 0x52DA8D6: glusterfs_graph_init (graph.c:321) ==25170== by 0x52DB587: glusterfs_graph_activate (graph.c:695) ==25170== by 0x5046407: glfs_process_volfp (glfs-mgmt.c:79) ==25170== by 0x5043B9E: glfs_volumes_init (glfs.c:281) ==25170== by 0x5044FEC: glfs_init_common (glfs.c:986) ==25170== by 0x50451A7: glfs_init@@GFAPI_3.4.0 (glfs.c:1031) By not calling dlclose(), the dynamically loaded .so is still available upon program exit, and Valgrind is able to resolve the symbols. This will add an additional leak, so dlclose() is called for normal builds, but skipped when configuring with "./configure --enable-valgrind" or passing the "run-with-valgrind" xlator option. URL: http://valgrind.org/docs/manual/faq.html#faq.unhelpful Change-Id: I2044e21b1b8fcce32ad1a817fdd795218f967731 BUG: 1425623 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16809 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* glusterd : log improvements in glusterd_op_stage_rebalance ()Atin Mukherjee2017-02-241-4/+7
| | | | | | | | | | | | | | Include volume names in the respective staging failure error logs in rebalance staging Change-Id: Iaaab12a552930dd5274fbecec78f5735f883ab6b BUG: 1426509 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16746 Reviewed-by: Gaurav Yadav <gyadav@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-30/+21
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht/rebalance Estimate time to complete rebalanceN Balachandran2017-01-191-1/+1
| | | | | | | | | | | | | | The estimates will be logged to the rebalance log on running gluster v rebalance <vol> status Change-Id: I9d51b139cd4c8dfde1ff2c2050720ae606c13fc6 BUG: 1396004 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15893 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier : Tier as a servicehari gowtham2017-01-161-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tierd is implemented by separating from rebalance process. The commands affected: 1) Attach tier will trigger this process instead of old one 2) tier start and tier start force will also trigger this process. 3) volume status [tier] will show tier daemon as a process instead of task and normal tier status and tier detach status works. 4) tier stop implemented. 5) detach tier implemented separately along with new detach tier status 6) volume tier volname status will work using the changes. 7) volume set works This patch has separated the tier translator from the legacy DHT rebalance code. It now sends the RPCs from the CLI to glusterd separate to the DHT rebalance code. The daemon is now a service, similar to the snapshot daemon, and can be viewed using the volume status command. The code for the validation and commit phase are the same as the earlier tier validation code in DHT rebalance. The “brickop” phase has been changed so that the status command can use this framework. The service management framework is now used. DHT rebalance does not use this framework. This service framework takes care of : *) spawning the daemon, killing it and other such processes. *) volume set options , which are written on the volfile. *) restart and reconfigure functions. Restart is to restart the daemon at two points 1)after gluster goes down and comes up. 2) to stop detach tier. *) reconfigure is used to make immediate volfile changes. By doing this, we don’t restart the daemon. it has the code to rewrite the volfile for topological changes too (which comes into place during add and remove brick). With this patch the log, pid, and volfile are separated and put into respective directories. Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399 BUG: 1313838 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13365 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: Revert 368e26f454fe35477e46dc698fa6b8c3c608ea8dN Balachandran2016-09-261-39/+0
| | | | | | | | | | | | | | | | The earlier fix prevented rebalance from starting even if only a single brick of a replica set was down. Change-Id: I8c9a7d4c1fe33f7749568357462f29796e1b832a BUG: 1376671 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15517 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd (rebalance): fix unused variable warnings/errorsKaleb S. KEITHLEY2016-08-291-2/+0
| | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a/the "leak" - via the generated rpc/xdr headers - of pragmas that mask these warnings. However 14085 won't pass the smoke test until all the warnings are fixed. Change-Id: I136cf4c1d318f98933f3bfed81e2de3da0f56abc BUG: 1369124 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15273 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com>
* glusterd: add async events (part 2)Atin Mukherjee2016-08-231-1/+1
| | | | | | | | | | | Change-Id: I7a5687143713c283f0051aac2383f780e3e43646 BUG: 1360809 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15153 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
* glusterd: Add async eventsAtin Mukherjee2016-08-221-1/+4
| | | | | | | | | | | | | | | | | | | As the eventing framework is already in the code, this patch targets to capture all the async glusterd events which are important to be notified to the higher layers which consume the eventing framework. I plan to break this work into two different patches where this patch set covers the first set of events. Change-Id: Ie1bd4f6fa84117b26ccb4c75bc4dc68e6ef19134 BUG: 1360809 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/15015 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rohan Kanade <rkanade@redhat.com> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
* changelog/rpc: Fix rpc_clnt_t mem leaksKotresh HR2016-07-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: 1. Freeing up rpc_clnt object might lead to crashes. Well, it was not a necessity to free rpc-clnt object till now because all the existing use cases needs to reconnect back on disconnects. Hence timer code was not taking ref on rpc-clnt object. Glusterd had some use-cases that led to crash due to ping-timer and they fixed only those code paths that involve ping-timer. Now, since changelog has an use-case where rpc-clnt need to be freed up, we need to fix timer code to take refs 2. In changelog, because of issue 1, only mydata was being freed which is incorrect. And there are races where rpc-clnt object would access the freed mydata which would lead to crashes. Since changelog xlator resides on brick side and is long living process, if multiple libgfchangelog consumers register to changelog and disconnect/reconnect mulitple times, it would result in leak of 'rpc-clnt' object for every connect/disconnect. SOLUTION: 1. Handle ref/unref of 'rpc_clnt' structure in timer functions properly. 2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT after disabling timers and free mydata on RPC_CLNT_DESTROY. RPC SETUP IN CHANGELOG: 1. changelog xlator initiates rpc server say 'changelog_rpc_server' 2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server' 3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server' 4. In return changelog_rpc_server initiates a rpc client and connects back to 'libgfchangelog_rpc_server' REF/UNREF HANDLING IN TIMER FUNCTIONS: Let's say rpc clnt refcount = 1 1. Take the ref before reigstering callback to timer queue >>>> rpc_clnt_ref (say ref count becomes = 2) 2. Register a callback to timer say 'callback1' 3. If register fails: >>>> rpc_clnt_unref (ref count = 1) 4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end in 'callback1'. This is corresponding to ref taken in step 1 >>>> rpc_clnt_unref (ref count = 1) 5. The cycle from step-1 to step-4 continues....until timer cancel event happens 6. timer cancel of say 'callback1' If timer cancel fails: Do nothing, Step-4 would have unrefd If timer cancel succeeds: >>>> rpc_clnt_unref (ref count = 1) Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09 BUG: 1316178 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/13658 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* Glusterd: printing the node details on error message of rebalancehari2016-05-251-5/+7
| | | | | | | | | | | | | | | | | | Problem: on the rebalance start with one of the glusterd being down among the volume, the error message says only about the brick path. Fix: adding the node details Change-Id: I5827d3a9a15b0461c9ce3a51c0b16246ca58f335 BUG: 1337899 Signed-off-by: hari <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/14495 Tested-by: hari gowtham <hari.gowtham005@gmail.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* core: assorted typos and spelling mistakes reported by Debian lintianKaleb S KEITHLEY2016-05-181-1/+1
| | | | | | | | | | | | | | | Also missing bang (!) in #!/bin/bash in shell scripts. Change-Id: I567a4be8f0f31f6285550f243fe802895f6bc43b BUG: 1336793 Reported-by: Patrick Matthäi <pmatthaei@debian.org> Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14398 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* glusterd: copy real_path from older brickinfo during brick importAtin Mukherjee2016-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | In glusterd_import_new_brick () new_brickinfo->real_path will not be populated for the first time and hence if the underlying file system is bad for the same brick, import will fail resulting in inconsistent configuration data. Fix is to populate real_path from old brickinfo object. Also there were many cases where we were unnecessarily calling realpath() and that may cause in failure. For eg - if a remove brick is executed with a brick whoose underlying file system has crashed, remove-brick fails since realpath() call fails. We'd need to call realpath() here as the value is of no use.Hence passing construct_realpath as _gf_false in glusterd_volume_brickinfo_get_by_brick () is a must in such cases. Change-Id: I7ec93871dc9e616f5d565ad5e540b2f1cacaf9dc BUG: 1335531 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/14306 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: populate brickinfo->real_path conditionallyAtin Mukherjee2016-04-111-1/+2
| | | | | | | | | | | | | | | | | | | | glusterd_brickinfo_new_from_brick () is called from multiple places and one of them is glusterd_brick_rpc_notify where its very well possible that an underlying brick's file system has crashed and a disconnect event has been received. In this case glusterd tries to build the brickinfo from the brickid in the RPC request, however the same fails as glusterd_brickinfo_new_from_brick () fails from realpath. Fix is to skip populating real_path if its a disconnect event. Change-Id: I9d9149c64a9cf2247abb731f219c1b1eef037960 BUG: 1325841 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/13965 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: check if glusterd is started on all nodes and allSakshi2016-02-281-0/+37
| | | | | | | | | | | | | bricks are started before performing rebalance Change-Id: I458ea9cd86cf35bdb7d758be55f951ae9f3e66f0 BUG: 1224857 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/10906 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd/rebalance: initialize defrag variable after glusterd restartMohammed Rafi KC2016-02-221-0/+34
| | | | | | | | | | | | | | | | | During reblance restart after glusterd restarted, we are not connecting to rebalance process from glusterd, because the defrag variable in volinfo will be null. Initializing the variable will connect the rpc Change-Id: Id820cad6a3634a9fc976427fbe1c45844d3d4b9b BUG: 1303028 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13319 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* tier/glusterd: tier daemon not updating the statusMohammed Rafi KC2016-01-031-2/+1
| | | | | | | | | | | | | Tier process is not updating the status when the process killed mnually. Change-Id: Ia5ea903af78ff3582da2242e6058f11c71923fab BUG: 1294600 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/13107 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Tier: "tier start force" command implementationhari gowtham2015-12-221-32/+79
| | | | | | | | | | | | | | | | The start command doesnt restart the tier deamon if the deamon is running at one node. hence to bring up the tierd on the nodes where the deamon is down, the force command is implemented. It skips the check for tierd running. Change-Id: I0037d3e5ecfe56637d0da201a97903c435d26436 BUG: 1292112 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12983 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier/glusterd : making new tier detach command throw warninghari gowtham2015-12-151-20/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | For detach tier, the validation was done using the string "detach-tier" but the new commands used has the string "tier". Making the string use "tier" to compare, creates problem as the tier status and tier detach have the keyword "tier". So tier detach and tier status were separated. and strtok was used to prevent the condition from passing when the volume name has a substring of "tier". (only the second word from the string is got and checked if the feature is tier) Problem: new detach tier command doesnt throw warnings like "not a tier volume" or " detach tier not started" respectively instead it prints empty output. Fix: while validate the volume is checked if its a tiered volume if yes it is checked if the detach tier is started, else a warning is thrown respectively. Change-Id: I94246d53b18ab0e9406beaf459eaddb7c5b766c2 BUG: 1288517 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12883 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tier/glusterd: Check before starting tier daemon during volume startMohammed Rafi KC2015-12-081-0/+9
| | | | | | | | | | | | | | | | | We start tier daemon when volume is started without looking into the previous status. The problem with that if detach-tier is started and then volume force start is actually starting tier daemon. This is also fixes a problem where tier daemon is not starting after detach stop. Change-Id: I15b56a711e12f0e24f5ab123561258bd448621f7 BUG: 1286974 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12833 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Tiering: change of error message for "v tier <vname> detach status"hari gowtham2015-11-261-0/+8
| | | | | | | | | | | | | | If the volume was not a tiered volume then empty status was being printed instead of an error message. Change-Id: I13ccb16e1562966976a48d9365ced4c8a124de59 BUG: 1284357 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12713 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* Tiering: Adding space between error meassge for detach-tierhari gowtham2015-11-191-1/+1
| | | | | | | | | | | Change-Id: I730cf7fa6fbfb3842d337cd3d7b8394b9c3876d8 BUG: 1283488 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12657 Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: Changed tier xattr-name valueN Balachandran2015-10-191-1/+1
| | | | | | | | | | | | | | | Each tier layer (for future stacking implementations) must have a unique xattr name. We are currently using the name of the tier subvolume excluding the volume name. Change-Id: Id4adea61dc1c8473fb1d4d7364d1940278c6e129 BUG: 1259298 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12350 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tiering: Error message change for detach-tier non tier volumeHari Gowtham2015-07-291-5/+13
| | | | | | | | | | | Change-Id: Ib350b201df14b105e475426d2ec20ff5da39a8a1 BUG: 1245935 Signed-off-by: Hari Gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/11745 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* glusterd: rebalance support for cluster.rc frameworkanand2015-07-161-1/+20
| | | | | | | | | | | | | | | | | | | Issue:Rebalance is failing in cluster framework (any simulated cluster environment in same node ). RCA: 1. we are passing always "localhost" as volfile server for rebalance xlator . 2. Rebalance daemons are overwriting unix socket and log files each other. (All rebalance processes are creating socket with same name) . Fix: set vol_file_server, unix socket and log files properly. Change-Id: I6654461e00c2a164b2f1f1db24a316c4180dd8d5 BUG: 1231437 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/11210 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: Porting left out log messages to new frameworkNandaja Varma2015-06-261-3/+8
| | | | | | | | | | | Change-Id: I70d40ae3b5f49a21e1b93f82885cd58fa2723647 BUG: 1235538 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/11388 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd/tier: configure tier daemon during volume restartMohammed Rafi KC2015-06-191-4/+9
| | | | | | | | | | | | | | rebalance daemon will be running on every tier volume for promoting/demoting the files. When volume/glusterd is restarted, then we need to configure the daemon. Change-Id: Ib565240a70edea2ec8bc1601c52b40c0783491d3 BUG: 1225330 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10933 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>