summaryrefslogtreecommitdiffstats
path: root/rpc/rpc-transport/socket/src/socket.c
Commit message (Collapse)AuthorAgeFilesLines
* gluster: IPv6 single stack supportKevin Vigor2017-10-241-0/+13
| | | | | | | | | | | | | | | | | | | | | | Summary: - This diff changes all locations in the code to prefer inet6 family instead of inet. This will allow change GlusterFS to operate via IPv6 instead of IPv4 for all internal operations while still being able to serve (FUSE or NFS) clients via IPv4. - The changes apply to NFS as well. - This diff ports D1892990, D1897341 & D1896522 to the 3.8 branch. Test Plan: Prove tests! Reviewers: dph, rwareing Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I34fdaaeb33c194782255625e00616faf75d60c33 BUG: 1406898 Reviewed-on-3.8-fb: http://review.gluster.org/16059 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Tested-by: Shreyas Siravara <sshreyas@fb.com>
* rpc-transport/socket: tab -> spaces cleanupKaleb S. KEITHLEY2017-09-151-446/+446
| | | | | | | | | | | tired of review comments about use of tabs when the whole file uses tabs Change-Id: I4f822a53f47886da04282f9c3fb84d81a7b3f8d0 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18286 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rpc: TLSv1_2_method() is deprecated in OpenSSL-1.1Kaleb S. KEITHLEY2017-09-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fedora 26 has OpenSSL-1.1. Compile-time warnings indicate that TLSv1_2_method() is now deprecated. As per the SSL man page: TLS_method(), TLS_server_method(), TLS_client_method() These are the general-purpose version-flexible SSL/TLS methods. The actual protocol version used will be negotiated to the highest version mutually supported by the client and the server. The supported protocols are SSLv3, TLSv1, TLSv1.1 and TLSv1.2. Applications should use these methods, and avoid the version- specific methods described below. ... TLSv1_2_method(), ... ... Note that OpenSSL-1.1 is the version of OpenSSL; Fedora 25 and RHEL 7.3 and other distributions (still) have OpenSSL-1.0. TLS versions are orthogonal to the OpenSSL version. TLS_method() is the new — in OpenSSL-1.1 — version flexible function intended to replace the TLSv1_2_method() function in OpenSSL-1.0 and the older (?), insecure TLSv23_method(). (OpenSSL-1.0 does not have TLS_method()) Change-Id: I190363ccffe7c25606ea2cf30a6b9ff1ec186057 BUG: 1491025 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18268 Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* socket: Use granular mutex locks during pollin and pollout event processingKrutika Dhananjay2017-09-081-34/+56
| | | | | | | | | | | | | | | | | | | ... instead of one global lock. This is because pollin and pollout processing code operate on mutually exclusive members of socket_private_t. Keeping this in mind, this patch introduces the more granular priv->in_lock and priv->out_lock locks. For pollerr, which modifies both priv->incoming and priv->ioq, both locks need to be taken. Change-Id: Id7aeb608dc7755551b6b404470d5d80709c81960 BUG: 1467614 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17687 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Do not declare the variable timeout_ms if TCP_USER_TIMEOUT is not definedMichael Scherer2017-09-071-0/+2
| | | | | | | | | | | | | Trying to fix the build on FreeBSD, -Wunused-variable trigger this warning. Change-Id: I318121829ac6a609fe9c606aead257827c7748b1 BUG: 1488829 Signed-off-by: Michael Scherer <misc@redhat.com> Reviewed-on: https://review.gluster.org/18215 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Michael Scherer <misc@fedoraproject.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* libglusterfs: Name threads on creationRaghavendra Talur2017-07-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set names to threads on creation for easier debugging. Output of top -H -p <PID-OF-GLUSTERFSD> Before: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd After: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703 BUG: 1254002 Updates: #271 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: https://review.gluster.org/11926 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* socket: call init_openssl_mt() in init() and add cleanupPrashanth Pai2017-07-171-25/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | init_openssl_mt() wasn't explicitly invoked and was run implicitly before dlopen() returned as it was tagged as __attribute__ ((constructor)). This function used to call GF_CALLOC() which wasn't available or initialized when socket.so is dlopen()ed by an external program. The program used to crash with SIGSEGV as follows: 0x00007ffff5efe1ad in __gf_calloc (nmemb=41, size=40, type=158, typestr=0x7ffff63eb3d6 "gf_sock_mt_lock_array") at mem-pool.c:109 0x00007ffff63e6acf in init_openssl_mt () at socket.c:4016 0x00007ffff7de90da in call_init.part () from /lib64/ld-linux-x86-64.so.2 0x00007ffff7de91eb in _dl_init () from /lib64/ld-linux-x86-64.so.2 0x00007ffff7dedde1 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 0x00007ffff7de8f84 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 0x00007ffff7ded339 in _dl_open () from /lib64/ld-linux-x86-64.so.2 This change moves call to init_openssl_mt() from being a constructor function to the init() function and introduces fini_openssl_mt() which cleans up resources (called in destructor). BUG: 1193929 Change-Id: Iab690897ec34e24c33f6b43f8d8d9f8fd75ac607 Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: https://review.gluster.org/17753 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* protocol/server: make listen backlog value as configurableMohammed Rafi KC2017-06-081-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | problem: When we call listen from protocol/server, we are giving a hard coded valie of 10 if it is not manually given. With multiplexing, especially when glusterd restarts all clients may try to connect to the server at a time. Which will result in overflowing the queue, and kernel will complain about the errors. Solution: This patch will introduce a volume set command to make backlog value as a configurable. This patch also changes the default values for backlog from 10 to 128. This changes is only applicable for sockets listening from protocol. Example: gluster volume set <volname> transport.listen-backlog 1024 Note: 1 Brick has to be restarted to get this value in effect 2 This changes won't be reflected in glusterd, or other xlators which calls listen. If you need, you have to add this option to the volfile. Change-Id: I0c5a2bbf28b5db612f9979e7560e05dd82b41477 BUG: 1456405 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17411 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* socket/reconfigure: reconfigure should be done on new dictMohammed Rafi KC2017-05-291-8/+8
| | | | | | | | | | | | | | In socket reconfigure, reconfigurations are doing with old dict values. It should be with new reconfigured dict values Change-Id: Iac5ad4382fe630806af14c99bb7950a288756a87 BUG: 1456405 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17412 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* socket: Avoid flooding of error message in case of SSLMohit Agrawal2017-05-151-1/+10
| | | | | | | | | | | | | | | | | Problem: socket poller is throwing Input/Output messages during volume operation Solution: Update the code in socket_connect function before call socket_spwan. BUG: 1450559 Change-Id: I5f275fe7a4b730b16d7b0a0407e76288b07ceaef Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17280 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* event/epoll: Add back socket for polling of events immediately afterRaghavendra G2017-05-121-19/+77
| | | | | | | | | | | | | | | | | | | | | | | | | reading the entire rpc message from the wire Currently socket is added back for future events after higher layers (rpc, xlators etc) have processed the message. If message processing involves signficant delay (as in writev replies processed by Erasure Coding), performance takes hit. Hence this patch modifies transport/socket to add back the socket for polling of events immediately after reading the entire rpc message, but before notification to higher layers. credits: Thanks to "Kotresh Hiremath Ravishankar" <khiremat@redhat.com> for assitance in fixing a regression in bitrot caused by this patch. Change-Id: I04b6b9d0b51a1cfb86ecac3c3d87a5f388cf5800 BUG: 1448364 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/15036 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* rpc: Remove accidental IPV6 changesKaushal M2017-05-051-14/+0
| | | | | | | | | | | | | | They snuck in with the HALO patch (07cc8679c) Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6 BUG: 1447953 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: https://review.gluster.org/17174 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* rpc: fix transport add/remove race on port probingMilind Changire2017-05-031-165/+196
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Spurious __gf_free() assertion failures seen all over the place with header->magic being overwritten when running port probing tests with 'nmap' Solution: Fix sequence of: 1. add accept()ed socket connection fd to epoll set 2. add newly created rpc_transport_t object in RPCSVC service list Correct sequence is #2 followed by #1. Reason: Adding new fd returned by accept() to epoll set causes an epoll_wait() to return immediately with a POLLIN event. This races ahead to a readv() which returms with errno:104 (Connection reset by peer) during port probing using 'nmap'. The error is then handled by POLLERR code to remove the new transport object from RPCSVC service list and later unref and destroy the rpc transport object. socket_server_event_handler() then catches up with registering the unref'd/destroyed rpc transport object. This is later manifest as assertion failures in __gf_free() with the header->magic field botched due to invalid address references. All this does not result in a Segmentation Fault since the address space continues to be mapped into the process and pages still being referenced elsewhere. As a further note: This race happens only in accept() codepath. Only in this codepath, the notify will be referring to two transports: 1, listener transport and 2. newly accepted transport All other notify refer to only one transport i.e., the transport/socket on which the event is received. Since epoll is ONE_SHOT another event won't arrive on the same socket till the current event is processed. However, in the accept() codepath, the current event - ACCEPT - and the new event - POLLIN/POLLER - arrive on two different sockets: 1. ACCEPT on listener socket and 2. POLLIN/POLLERR on newly registered socket. Also, note that these two events are handled different thread contexts. Cleanup: Critical section in socket_server_event_handler() has been removed. Instead, an additional ref on new_trans has been used to avoid ref/unref race when notifying RPCSVC. Change-Id: I4417924bc9e6277d24bd1a1c5bcb7445bcb226a3 BUG: 1438966 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/17139 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-021-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* rpc: add options to manage socket keepalive lifespanMilind Changire2017-04-121-42/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Default values for handling socket timeouts for brick responses are insufficient for aggressive applications such as databases. Solution: Add 1:1 gluster options for keepalive, keepalive-idle, keepalive-interval and keepalive-timeout as per the socket level options available as per tcp(7) man page. Default values for options are NOT agressive and continue to be values which result in default timeout when only the keep alive option is turned on. These options are Linux specific and will not be applicable to the *BSDs. Change-Id: I2a08ecd949ca8ceb3e090d336ad634341e2dbf14 BUG: 1426059 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/16731 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* rpc: avoid logging success on failureMilind Changire2017-03-071-0/+2
| | | | | | | | | | | | | | | | Avoid logging Success in the event of failure especially when errno has no meaningful value w.r.t. the failure. In this case the errno is set to zero when there's indeed a failure at the RPC level. Change-Id: If2cc81aa1e590023ed22892dacbef7cac213e591 BUG: 1426032 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/16730 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* rpc: log more about socket disconnectsMilind Changire2017-03-011-3/+15
| | | | | | | | | | | | | | | | | Log more about the different paths leading to socket disconnect for ease of debugging. Log via gf_log_callingfn() in __socket_disconnect() at loglevel TRACE if socket connection is being torn down. Change-Id: I1e551c2d685784b5ec747f481179f64d524c0461 BUG: 1426125 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/16732 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* socket: Avoid flooding of SSL messages in case of failure/successMohit Agrawal2017-02-271-7/+7
| | | | | | | | | | | | | | | | | | Problem: Avoid flodding of SSL messages in case of failure/success Solution: 1) Ideally ssl_setup_connection should be call after getting success on connect so update the condition before call socket_spawn in socket_connect. 2) Change the message type to debug in case of success. BUG: 1427018 Change-Id: Icb6101e49304d5fe539609b4afacfb1b50b62f84 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/16767 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* Free iobuf after using it, not beforeMichael Scherer2017-02-261-2/+2
| | | | | | | | | | | | | | | Coverity warn of use after free here. I assume that under pressure, this might crash the whole process. Change-Id: I15fb5cfc9b509705e96e4156b739988d816bbef5 BUG: 789278 Signed-off-by: Michael Scherer <misc@redhat.com> Reviewed-on: https://review.gluster.org/16719 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Michael Scherer <misc@fedoraproject.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* socket: GF_REF_PUT should be called outside lockRajesh Joseph2017-02-061-2/+4
| | | | | | | | | | | | | | | GF_REF_PUT was called inside lock which can call socket_poller_mayday which inturn tries to take the same lock. This can lead to deadlock scenario. BUG: 1410701 Change-Id: Ib3b161bcfeac810bd3593dc04c10ef984f996b17 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: https://review.gluster.org/16343 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* socket: retry connect immediately if it failsJeff Darcy2017-02-021-2/+36
| | | | | | | | | | | | | | | | | | | | | | Previously we relied on a complex dance of setting flags, shutting down the socket, tearing stuff down, getting an event, tearing more stuff down, and waiting for a higher-level retry. What we really need, in the case where we're just trying to connect prematurely e.g. to a brick that hasn't fully come up yet, is a simple retry of the connect(2) call. This was discovered by observing failures in ec-new-entry.t with multiplexing enabled, but probably fixes other random failures as well. Change-Id: Ibedb8942060bccc96b02272a333c3002c9b77d4c BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16510 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* rpc/socket.c : Bonnie++ hangs during rewrites in ganesha + SSLMohit Agrawal2017-02-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Bonnie++ rewrite operation hangs in ganesha + SSL environment Solution: Bonnie++ hangs during execution of rewrite operation in ganesha + SSL environment.It was hanged due to blocking on poll call in ssl_do because no POLLOUT event was getting on socket. Socket is not getting POLLOUT event because all other threads are waiting to get lock and lock is not released ssl_do because it is not getting any event on poll.To correct it update the condition in ssl_do as same in getting error SSL_ERROR_WANT_READ. Test: To test the patch followed below procedure 1) Setup 2X2 Ganesha + SSL environment. 2) Run bonnie from 3 nfs client parallely 3) After run "Rewwrite operation" by bonnie it is hanged. 4) After apply the patch it is not hanged. BUG: 1418213 Change-Id: I5985cbbc4cfdac5d287268d791e31c274abc3c8d Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/16501 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* refcount: typecast function for calling on freeNiels de Vos2017-01-311-4/+2
| | | | | | | | | | | | | | | | | | | All of the functions called to free the refcounted structure are doing a typecast from (void*) to their own type taht is being free'd. This really is not needed and the refcount interface is made a little simpler without the requirement of typecasting. With this small improvement in the API, all callers are updated too. Change-Id: I32473b6d1799f62861d4b2d78ea30c09e6c80ab1 BUG: 1416889 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16471 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-2/+0
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* socket: socket disconnect should wait for poller thread exitRajesh Joseph2016-12-211-17/+81
| | | | | | | | | | | | | | | | | | | | | When SSL is enabled or if "transport.socket.own-thread" option is set then socket_poller is run as different thread. Currently during disconnect or PARENT_DOWN scenario we don't wait for this thread to terminate. PARENT_DOWN will disconnect the socket layer and cleanup resources used by socket_poller. Therefore before disconnect we should wait for poller thread to exit. Change-Id: I71f984b47d260ffd979102f180a99a0bed29f0d6 BUG: 1404181 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/16141 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* socket: Keepalives should happen on IPv6 as well as IPv4Shreyas Siravara2016-12-161-1/+1
| | | | | | | | | | | | | | | | | Summary: - Check for AF_INET *and* AF_INET6. - This is a cherry-pick of D3057373 to 3.8 Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I53eb79284eddfee6e13821c6570809f575b96769 BUG: 1405478 Reviewed-on: http://review.gluster.org/16167 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Jeff Darcy <jdarcy@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rpc/socket.c : Modify socket_poller code in case of ENODATA error code.Mohit Agrawal2016-10-231-1/+1
| | | | | | | | | | | | | | | | | | Problem: Continuous warning message(ENODATA) are coming in socket_rwv while SSL is enabled. Solution: To avoid the warning message update one condition in socket_poller loop code before break from loop in case of error returned by poll functions. BUG: 1386450 Change-Id: I19b3a92d4c3ba380738379f5679c1c354f0ab9b1 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15677 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* rpc/socket: Close pipe on disconnectionKaushal M2016-10-231-1/+8
| | | | | | | | | | | | | | | | Encrypted connections create a pipe, which isn't closed when the connection disconnects. This leaks fds, and gluster eventually ends up in a situation with fd starvation which leads to operation failures. Change-Id: I144e1f767cec8c6fc1aa46b00cd234129d2a4adc BUG: 1336371 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/14356 Tested-by: MOHIT AGRAWAL <moagrawa@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* rpc/socket.c : Modify gf_log message in socket_poller code in case of errorMohit Agrawal2016-10-121-3/+6
| | | | | | | | | | | | | | | | | | | Problem: In case of SSL after stopping the volume if client(mount point) is still trying to write the data on socket then it will throw an EIO error on that socket and given this log message is captured at every attempt this would flood the log file. Solution: To reduce the frequency of stored log message use GF_LOG_OCCASIONALLY instead of gf_log. BUG: 1381115 Change-Id: I66151d153c2cbfb017b3ebc4c52162278c0f537c Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15605 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* socket: log the client identifier in ssl connectMohit Agrawal2016-10-051-2/+2
| | | | | | | | | | | | | | | | | | Problem: client identifier is not logged in message in ssl_setup_connection Solutuion: In ssl_setup_connection xl_private is not available in rpc_transport so changed to this peerinfo.identifier. BUG: 1380275 Change-Id: I05006a3d63e46de8c388298c22faa9a3329eb6f3 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15596 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* socket: pollerr event shouldn't trigger socket_connnect_finishAtin Mukherjee2016-09-191-1/+41
| | | | | | | | | | | | | | | | | | | | | | | | If connect fails with any other error than EINPROGRESS we cannot get the error status using getsockopt (... SO_ERROR ... ). Hence we need to remember the state of connect and take appropriate action in the event_handler for the same. As an added note, a event can come where poll_err is HUP and we have poll_in as well (i.e some status was written to the socket), so for such cases we need to finish the connect, process the data and then the poll_err as is the case in the current code. Special thanks to Kaushal M & Raghavendra G for figuring out the issue. Change-Id: Ic45ad59ff8ab1d0a9d2cab2c924ad940b9d38528 BUG: 1372356 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/15440 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* rpc: fix unused variable warnings/errorsKaleb S. KEITHLEY2016-08-291-2/+0
| | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a/the "leak" - via the generated rpc/xdr headers - of pragmas that mask these warnings. However 14085 won't pass the smoke test until all the warnings are fixed. Change-Id: I20d91091bee0bf8f198a307ebba4b284bc3817ff BUG: 1369124 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15240 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* socket: SSL_shutdown should be called before socket shutdownRajesh Joseph2016-08-161-13/+28
| | | | | | | | | | | | | | SSL_shutdown shuts down an active SSL connection. But we are calling this after underlying socket is disconnected. Change-Id: Ia943179d23395f42b942450dbcf26336d4dfc813 BUG: 1362602 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/15072 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* rpc/socket.c : Modify socket code to avoid fd leak in case of socket_poller ↵Mohit Agrawal2016-07-201-4/+3
| | | | | | | | | | | | | | | thread Update socket code to avoid fd leak in case of socket_poller thread. Change-Id: Ifa9718fbaf065988a299cda7ba0282dfd6b10a32 BUG: 1356888 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/14929 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* rpc: invalid argument when function setsockopt sets option TCP_USER_TIMEOUTZhou Zhengping2016-07-101-0/+2
| | | | | | | | | | | | | | | If option "transport.tcp-user-timeout" hasn't been setted, glusterd's priv->timeout will be -1, which will cause invalid argument when set TCP_USER_TIMEOUT. Change-Id: Ibc16264ceac0e69ab4a217ffa27c549b9fa21df9 BUG: 1349657 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: http://review.gluster.org/14785 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* rpc/socket: pthread resources are not cleaned upN Balachandran2016-07-081-4/+6
| | | | | | | | | | | | | | | | | | A socket_connect failure creates a new pthread which is not a detached thread. As no pthread_join is called, the thread resources are not cleaned up causing a memory leak. Now, socket_connect creates a detached thread to handle failure. Change-Id: Idbf25d312f91464ae20c97d501b628bfdec7cf0c BUG: 1343374 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/14875 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* rpc/socket.c : Modify socket_poller code in case of ENODATA error code.Mohit Agrawal2016-07-081-1/+1
| | | | | | | | | | | | | | | | | | Problem: Polling failure errors are coming till volume is not come while SSL is enabled. Solution: To avoid the message update one condition in socket_poller code It will not exit from thread in case of received ENODATA from ssl_do function. Change-Id: Ia514e99b279b07b372ee950f4368ac0d9c702d82 BUG: 1349709 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/14786 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* socket: log the client identifier in ssl connectRaghavendra Bhat2016-06-301-1/+6
| | | | | | | | | | | | Change-Id: I4b463ecafb66de16cbe7ed23fae800bb1204f829 BUG: 1333912 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/14242 Tested-by: Vijay Bellur <vbellur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* rpc/socket.c: Modify approach to cleanup threads of socket_poller in ↵Mohit Agrawal2016-06-241-141/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | socket_spawn. Problem: Current approach to cleanup threads of socket_poller is not appropriate. Solution: Enable detach flag at the time of thread creation in socket_spawn. Fix: Write a new wrapper(gf_create_detach_thread) to create detachable thread instead of store thread ids in a queue. Test: Fix is verfied on gluster process, To test the patch followed below procedure Enable the client.ssl and server.ssl option on the volume Start the volume and count anon segment in pmap output for glusterd process pmap -x <glusterd-pid> | grep "\[ anon \]" | wc -l Stop the volume and check again count of anon segment it should not increase. Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: Ib8f7ec7504ec8f6f74b45ce6719b6fb47f9fdc37 BUG: 1336508 Reviewed-on: http://review.gluster.org/14694 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* socket: Fix incorrect handling of partial readsXavier Hernandez2016-05-111-3/+5
| | | | | | | | | | | | | | | | | | The usage of function local variables in the protocol state machine caused an incorrect behaviour when a partial read from the socket forced the function to return and restart later when more data was available. At this point the local variables contained incorrect data. Change-Id: I4db1f4ef5c46a3d2d7f7c5328e906188c3af49e6 BUG: 1334285 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/14270 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* glusterd: add defence mechanism to avoid brick port clashesPrasanna Kumar Kalever2016-05-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intro: Currently glusterd maintain the portmap registry which contains ports that are free to use between 49152 - 65535, this registry is initialized once, and updated accordingly as an then when glusterd sees they are been used. Glusterd first checks for a port within the portmap registry and gets a FREE port marked in it, then checks if that port is currently free using a connect() function then passes it to brick process which have to bind on it. Problem: We see that there is a time gap between glusterd checking the port with connect() and brick process actually binding on it. In this time gap it could be so possible that any process would have occupied this port because of which brick will fail to bind and exit. Case 1: To avoid the gluster client process occupying the port supplied by glusterd : we have separated the client port map range with brick port map range more @ http://review.gluster.org/#/c/13998/ Case 2: (Handled by this patch) To avoid the other foreign process occupying the port supplied by glusterd : To handle above situation this patch implements a mechanism to return EADDRINUSE error code to glusterd, upon which a new port is allocated and try to restart the brick process with the newly allocated port. Note: Incase of glusterd restarts i.e. runner_run_nowait() there is no way to handle Case 2, becuase runner_run_nowait() will not wait to get the return/exit code of the executed command (brick process). Hence as of now in such case, we cannot know with what error the brick has failed to connect. This patch also fix the runner_end() to perform some cleanup w.r.t return values. Change-Id: Iec52e7f5d87ce938d173f8ef16aa77fd573f2c5e BUG: 1322805 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/14043 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* socket: Reap own-threadsKaushal M2016-04-291-0/+121
| | | | | | | | | | | | | | | | Dead own-threads are reaped periodically (currently every minute). This helps avoid memory being leaked, and should help prevent memory starvation issues with GlusterD. Change-Id: Ifb3442a91891b164655bb2aa72210b13cee31599 BUG: 1331289 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/14101 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Jeff Darcy <jdarcy@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* socket: Don't cleanup encrypted transport in socket_connect()Kaushal M2016-04-071-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ..instead cleanup only in socket_poller() With commit d117466 socket_poller() wasn't launched from socket_connect (for encrypted connections), if connect() failed. This was done to prevent the socket private data from being double unreffed, from the cleanups in both socket_poller() and socket_connect(). This allowed future reconnects to happen successfully. If a socket reconnects is sort of decided by the rpc notify function registered. The above change worked with glusterd, as the glusterd rpc notify function (glusterd_peer_rpc_notify()) continuously allowed reconnects on failure. mgmt_rpc_notify(), the rpc notify function in glusterfsd, behaves differently. For a DISCONNECT event, if more volfile servers are available or if more addresses are available in the dns cache, it allows reconnects. If not it terminates the program. For a CONNECT event, it attempts to do a volfile fetch rpc request. If sending this rpc fails, it immediately terminates the program. One side effect of commit d117466, was that the encrypted socket was registered with epoll, unintentionally, on a connect failure. A weird thing happens because of this. The epoll notifier notifies mgmt_rpc_notify() of a CONNECT event, instead of a DISCONNECT as expected. This causes mgmt_rpc_notify() to attempt an unsuccessful volfile fetch rpc request, and terminate. (I still don't know why the epoll raises the CONNECT event) Commit 46bd29e fixed some issues with IPv6 in GlusterFS. This caused address resolution in GlusterFS to also request of IPv6 addresses (AF_UNSPEC) instead of just IPv4. On most systems, this causes the IPv6 addresses to be returned first. GlusterD listens on 0.0.0.0:24007 by default. While this attaches to all interfaces, it only listens on IPv4 addresses. GlusterFS daemons and bricks are given 'localhost' as the volfile server. This resolves to '::1' as the first address. When using management encryption, the above reasons cause the daemon processes to fail to fetch volfiles and terminate. Solution -------- The solution to this is simple. Instead of cleaning up the encrypted socket in socket_connect(), launch socket_poller() and let it cleanup the socket instead. This prevents the unintentional registration with epoll, and socket_poller() sends the correct events to the rpc notify functions, which allows proper reconnects to happen. Change-Id: Idb0c0a828743cccca51cfdd1aa6458cfa0a9d100 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/13926 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Jeff Darcy <jdarcy@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* socket: Launch socket_poller only if connect succeededKaushal M2016-03-031-1/+1
| | | | | | | | | | | | | | | | | | | | For an encrypted connection, sockect_connect() used to launch socket_poller() in it's own thread (ON by default), even if the connect failed. This would cause two unrefs to be done on the transport, once in socket_poller() and once in socket_connect(), causing the transport to be freed and cleaned up. This would cause further reconnect attempts from failing as the transport wouldn't be available. By starting socket_poller() only if connect succeeded, this is avoided. Change-Id: Ie22090dbb1833bdd0f636a76cb3935d766711917 BUG: 1313206 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/13554 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* socket: reduce rate of readv failure logs due to disconnectKrishnan Parthasarathi2016-02-221-5/+6
| | | | | | | | | | | | | | ... by using GF_LOG_OCCASIONALLY Change-Id: I779ff32ead13c8bb446a57b5baccf068ae992df1 BUG: 1114847 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/8210 Tested-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* all: fixes for clang compile warningsKaleb S KEITHLEY2016-02-151-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cli/src/cli-cmd-parser.c (chenk) cli/src/cli-xml-output.c (spandit) cli/src/cli.c (chenk) libglusterfs/src/common-utils.c (vmallika) libglusterfs/src/gfdb/gfdb_sqlite3.c (jfernand +1) rpc/rpc-transport/socket/src/socket.c (?) xlators/cluster/afr/src/afr-transaction.c (?) xlators/cluster/dht/src/dht-common.h (srangana +2) xlators/cluster/dht/src/dht-selfheal.c (srangana +2) xlators/debug/io-stats/src/io-stats.c (R. Wareing) xlators/features/barrier/src/barrier.c (vshastry) xlators/features/bit-rot/src/bitd/bit-rot-scrub.h (vshankar +1) xlators/features/shard/src/shard.c (kdhananj +1) xlators/mgmt/glusterd/src/glusterd-ganesha.c (skoduri) xlators/mgmt/glusterd/src/glusterd-handler.c (atinmu) xlators/mgmt/glusterd/src/glusterd-op-sm.h (atinmu) xlators/mgmt/glusterd/src/glusterd-snapshot.c (spandit) xlators/mgmt/glusterd/src/glusterd-syncop.c (atinmu) xlators/mgmt/glusterd/src/glusterd-volgen.c (atinmu) xlators/protocol/client/src/client-messages.h (mselvaga +1) xlators/storage/bd/src/bd-helper.c (M. Mohan Kumar) xlators/storage/bd/src/bd.c (M. Mohan Kumar) xlators/storage/posix/src/posix.c (nbalacha +1) Change-Id: I85934fbcaf485932136ef3acd206f6ebecde61dd BUG: 1293133 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13031 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* build: fix ecdh.h and dh.h depsMilind Changire2015-11-161-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | openssl.ecdh.h and openssl/dh.h are not available on all platforms. This patch adds check to autoconf and updates relevant source files. Add missing #include "config.h" to socket.c to make HAVE_OPENSSL_DH_H and HAVE_OPENSSL_ECDH_H macros available. Definitions for UTIME_OMIT and UTIME_NOW in contrib/qemu/util/oslib-posix.c have been selected from /usr/include/bits/stat.h on Fedora 22 SSL context options SSL_OP_NO_TICKET and SSL_OP_NO_COMPRESSION are now conditionally set by testing their presence. glusterfs.spec.in file now adds CFLAGS=-DUSE_INSECURE_OPENSSL for RHEL < 6 in the %build section. Change-Id: Ie32a950dad77bb0f09b4ba53edb3e1f3147056f3 BUG: 1258883 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/12517 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* core: use syscall wrappers instead of direct syscalls - miscellaneousKaleb S. KEITHLEY2015-10-281-22/+23
| | | | | | | | | | | | | | | various xlators and other components are invoking system calls directly instead of using the libglusterfs/syscall.[ch] wrappers. If not using the system call wrappers there should be a comment in the source explaining why the wrapper isn't used. Change-Id: I1f47820534c890a00b452fa61f7438eb2b3f667c BUG: 1267967 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12276 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* all: reduce "inline" usageJeff Darcy2015-09-011-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155 BUG: 1245331 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11769 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* SSL improvements: do not fail if certificate purpose is setEmmanuel Dreyfus2015-08-231-0/+8
| | | | | | | | | | | | | | | Since glusterfs shares the same settings for client-side and server-side of SSL, we need to ignore any certificate usage specification (SSL client vs SSL server), otherwise SSL connexions will fail with 'unsupported cerritifcate" BUG: 1247152 Change-Id: I7ef60271718d2d894176515aa530ff106127bceb Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/11840 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>