summaryrefslogtreecommitdiffstats
path: root/rpc
Commit message (Collapse)AuthorAgeFilesLines
* posix: Implement a janitor thread to close fdMohit Agrawal2020-08-201-6/+0
| | | | | | | | | | | | | | Problem: In the commit fb20713b380e1df8d7f9e9df96563be2f9144fd6 we use syntask to close fd but we have found the patch is reducing the performance Solution: Use janitor thread to close fd's and save the pfd ctx into ctx janitor list and also save the posix_xlator into pfd object to avoid the race condition during cleanup in brick_mux environment Change-Id: Ifb3d18a854b267333a3a9e39845bfefb83fbc092 Fixes: #1396 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* glusterd: dump SSL error stack on disconnectLeonid Ishimnikov2020-08-131-0/+7
| | | | | | | | | | | | | | | | | Problem: When a non-SSL connection is attempted on an SSL-enabled management port, unrelated peers are subsequently disconnected from the node with a misleading error message. Cause: A non-SSL client causes OpenSSL to push a wrong version error into its thread-local error stack, but this error is never cleared, and it lingers in the stack until the thread is used by another SSL session, and a certain condition requires the error stack to be examined, at which time the old error is discovered and the connection is terminated. Solution: Log and clear the error stack upon terminating the connection. Change-Id: I82f3a723285df24dafc88850ae4fca65b69f6ae4 Fixes: #1418 Signed-off-by: Leonid Ishimnikov <lishim@fastmail.com>
* contrib: remove contrib/sunrpc/xdr_sizeof.cYaniv Kaul2020-07-231-1/+1
| | | | | | | | | It's not needed, and it has a license that Fedora is not very happy with. Just removed that file. Change-Id: Ia753f0058c8a7c6482aca40c3b3dc8f6aa4a266d Fixes: #1383 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* NetBSD build fixesEmmanuel Dreyfus2020-06-301-0/+4
| | | | | | | | | | | | | - Make sure -largp is used at link time - PTHREAD_MUTEX_ADAPTIVE_NP is not available, use PTHREAD_MUTEX_DEFAULT instead - Avoid non POSIX [[ ]] in scripts - Do not check of lock.spinlock is NULL since it is not a pointer (it is not a pointer on Linux either) Change-Id: I5e04a7c552d24f8a473c2b837828d1bddfa7e128 Fixes: #1347 Type: Bug Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
* rpc: fix undefined behaviour in __builtin_ctzDmitry Antipov2020-06-172-41/+15
| | | | | | | | | | | | | | | | | | | | | | | | Found with GCC UBsan: rpcsvc.c:102:36: runtime error: passing zero to ctz(), which is not a valid argument #0 0x7fcd1ff6faa4 in rpcsvc_get_free_queue_index /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:102 #1 0x7fcd1ff81e12 in rpcsvc_handle_rpc_call /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:837 #2 0x7fcd1ff833ad in rpcsvc_notify /path/to/glusterfs/rpc/rpc-lib/src/rpcsvc.c:1000 #3 0x7fcd1ff8829d in rpc_transport_notify /path/to/glusterfs/rpc/rpc-lib/src/rpc-transport.c:520 #4 0x7fcd0dd72f16 in socket_event_poll_in_async /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2502 #5 0x7fcd0dd8986a in gf_async ../../../../libglusterfs/src/glusterfs/async.h:189 #6 0x7fcd0dd8986a in socket_event_poll_in /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2543 #7 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2934 #8 0x7fcd0dd8986a in socket_event_handler /path/to/glusterfs/rpc/rpc-transport/socket/src/socket.c:2854 #9 0x7fcd2048aff7 in event_dispatch_epoll_handler /path/to/glusterfs/libglusterfs/src/event-epoll.c:640 #10 0x7fcd2048aff7 in event_dispatch_epoll_worker /path/to/glusterfs/libglusterfs/src/event-epoll.c:751 ... Fix, simplify, and prefer 'unsigned long' as underlying bitmap type. Change-Id: If3f24dfe7bef8bc7a11a679366e219a73caeb9e4 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1283
* Indicate timezone offsets in timestampsCsaba Henk2020-06-151-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Logs and other output carrying timestamps will have now timezone offsets indicated, eg.: [2020-03-12 07:01:05.584482 +0000] I [MSGID: 106143] [glusterd-pmap.c:388:pmap_registry_remove] 0-pmap: removing brick (null) on port 49153 To this end, - gf_time_fmt() now inserts timezone offset via %z strftime(3) template. - A new utility function has been added, gf_time_fmt_tv(), that takes a struct timeval pointer (*tv) instead of a time_t value to specify the time. If tv->tv_usec is negative, gf_time_fmt_tv(... tv ...) is equivalent to gf_time_fmt(... tv->tv_sec ...) Otherwise it also inserts tv->tv_usec to the formatted string. - Building timestamps of usec precision has been converted to gf_time_fmt_tv, which is necessary because the method of appending a period and the usec value to the end of the timestamp does not work if the timestamp has zone offset, but it's also beneficial in terms of eliminating repetition. - The buffer passed to gf_time_fmt/gf_time_fmt_tv has been unified to be of GF_TIMESTR_SIZE size (256). We need slightly larger buffer space to accommodate the zone offset and it's preferable to use a buffer which is undisputedly large enough. This change does *not* do the following: - Retaining a method of timestamp creation without timezone offset. As to my understanding we don't need such backward compatibility as the code just emits timestamps to logs and other diagnostic texts, and doesn't do any later processing on them that would rely on their format. An exception to this, ie. a case where timestamp is built for internal use, is graph.c:fill_uuid(). As far as I can see, what matters in that case is the uniqueness of the produced string, not the format. - Implementing a single-token (space free) timestamp format. While some timestamp formats used to be single-token, now all of them will include a space preceding the offset indicator. Again, I did not see a use case where this could be significant in terms of representation. - Moving the codebase to a single unified timestamp format and dropping the fmt argument of gf_time_fmt/gf_time_fmt_tv. While the gf_timefmt_FT format is almost ubiquitous, there are a few cases where different formats are used. I'm not convinced there is any reason to not use gf_timefmt_FT in those cases too, but I did not want to make a decision in this regard. Change-Id: I0af73ab5d490cca7ed8d07a2ce7ac22a6df2920a Updates: #837 Signed-off-by: Csaba Henk <csaba@redhat.com>
* rpc, gf_attach: add minimal proper synchronizationDmitry Antipov2020-06-032-0/+4
| | | | | | | | | | Implement minimal proper synchronization between gf_attach and underlying RPC layer using convenient POSIX primitives. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1260 Change-Id: Ib5130b586a8b65ed5cf5f9156c111b161570224b
* socket: Use AES128 cipher in SSL if AES is supported by CPUMohit Agrawal2020-05-261-0/+32
| | | | | | | | | | SSL performance is improved after configuring AES128 cipher so use AES128 cipher as a default cipher on the CPU those enabled AES bits otherwise ssl use AES256 cipher Change-Id: I91c50fe987cbb22ed76f8012094730c592c63506 Fixes: #1050 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* socket: Resolve ssl_ctx leak for a brick while only mgmt SSL is enabledMohit Agrawal2020-04-281-2/+2
| | | | | | | | | | | | | Problem: While only mgmt SSL is enabled for a brick process use_ssl flag is false for a brick process and socket api's cleanup ssl_ctx only while use_ssl and ssl_ctx both are valid Solution: To avoid a leak check only ssl_ctx, if it is valid cleanup ssl_ctx Fixes: #1196 Change-Id: I2f4295478f4149dcb7d608ea78ee5104f28812c3 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* test: tests/bugs/rpc/bug-847624.t is crashedMohit Agrawal2020-04-152-7/+8
| | | | | | | | | | | | | Problem: glusterfs(GNFS) is crashing at the time of handling Pollerr event in rpcsvc_drc_client_unref.GNFS is crashed because ref was 0 at the time of unref and ref was taken while Pollin event successfully handled. Solution: Convert drc_client ref to atomic ref to avoid the crash Change-Id: Ia4c054f2f388032a5cd99597d0cfa18b003ca690 Fixes: #1038 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* rpc: Make ssl log more usefulMohit Agrawal2020-04-021-17/+22
| | | | | | | | | | | | | | Currently, ssl_setup_connection_params throws 4 messages for every rpc connection that irritates a user while reading the logs. The same info we can print in a single log with peerinfo to make it more useful.ssl_setup_connection_params try to load dh_param even user has not configured it and if a dh_param file is not available it throws a failure message.To avoid the message load dh_param only while the user has configured it. Change-Id: I9ddb57f86a3fa3e519180cb5d88828e59fe0e487 Fixes: #1141 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* Posix: Use simple approach to close fdMohit Agrawal2020-03-201-0/+6
| | | | | | | | | | | | | | | Problem: posix_release(dir) functions add the fd's into a ctx->janitor_fds and janitor thread closes the fd's.In brick_mux environment it is difficult to handle race condition in janitor threads because brick spawns a single janitor thread for all bricks. Solution: Use synctask to execute posix_release(dir) functions instead of using background a thread to close fds. Credits: Pranith Karampuri <pkarampu@redhat.com> Change-Id: Iffb031f0695a7da83d5a2f6bac8863dad225317e Fixes: bz#1811631 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* feature/changelog: Avoid thread creation if xlator is not enabledMohit Agrawal2020-02-091-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Changelog creates threads even if the changelog is not enabled Background: Changelog xlator broadly does two things 1. Journalling - Cosumers are geo-rep and glusterfind 2. Event Notification for registered events like (open, release etc) - Consumers are bitrot, geo-rep The existing option "changelog.changelog" controls journalling and there is no option to control event notification and is enabled by default. So when bitrot/geo-rep is not enabled on the volume, threads and resources(rpc and rbuf) related to event notifications consumes resources and cpu cycle which is unnecessary. Solution: The solution is to have two different options as below. 1. changelog-notification : Event notifications 2. changelog : Journalling This patch introduces the option "changelog-notification" which is not exposed to user. When either bitrot or changelog (journalling) is enabled, it internally enbales 'changelog-notification'. But once the 'changelog-notification' is enabled, it will not be disabled for the life time of the brick process even after bitrot and changelog is disabled. As of now, rpc resource cleanup has lot of races and is difficult to cleanup cleanly. If allowed, it leads to memory leaks and crashes on enable/disable of bitrot or changelog (journal) in a loop. Hence to be safer, the event notification is not disabled within lifetime of process once enabled. Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a Updates: #475 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* bitrot: Make number of signer threads configurableKotresh HR2020-02-071-0/+1
| | | | | | | | | | | | | The number of signing process threads (glfs_brpobj) is set to 4 by default. The recommendation is to set it to number of cores available. This patch makes it configurable as follows gluster vol bitrot <volname> signer-threads <count> fixes: bz#1797869 Change-Id: Ia883b3e5e34e0bc8d095243508d320c9c9c58adc Signed-off-by: Kotresh HR <khiremat@redhat.com>
* name.c: fix Coverity issues 1412332/3 - strcat into uninitialized valueYaniv Kaul2020-01-191-2/+3
| | | | | | | | | | Check limit to 108 bytes before strcpy(). fixes: CID#1412332 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I8b26b1e1d2daca98ff36db531539bec0a405769c
* socket.c/name.c: minor changesYaniv Kaul2020-01-133-238/+136
| | | | | | | | | | | | | | - Move functions to static - Remove redundant checks - Use dict_get_...sizen() where applicable - Remove unused variables. - Moved some code to be executed only if relevant. ~3% object size reduction. Change-Id: Id9b8414e0a17442f1dac10ba77014d565756c935 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* rpc-transport: minor changesYaniv Kaul2020-01-132-61/+32
| | | | | | | | | | - Removed dead code - Remove redundant checks - Changed dict functions to use dict_..._sizen() functions. Change-Id: If00aaa90eef4078effd5b7fed2294f872e001b0a updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* transport/socket: destroy notify mutex and condition variableDmitry Antipov2019-12-311-0/+5
| | | | | | Change-Id: Id74f829dc5c6a30d19e3c3ef42bcb938afc0d8e4 Updates: bz#1430623 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* socket: fix typos and drop unused members/optionsDmitry Antipov2019-12-272-11/+7
| | | | | | | | | | Consistently fix 'configued' -> 'configured' typo, remove useless members from 'socket_private_t' and unused 'transport.socket.lowlat' option. Adjust tests as well. Change-Id: I285be196457763aec16b184acd26b90623074dec Updates: bz#1193929 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* rpc: define xdr_sizeof regardless of whether IPv6 is the defaultAndrew Miloradovsky2019-12-131-2/+0
| | | | | Change-Id: I4f20f376d82b28e1c572c0fd0b6cd38e97b133da Fixes: bz#1780260
* socket: fix error handlingXavi Hernandez2019-12-121-84/+91
| | | | | | | | | | | | | | | | | | | | | | | When __socket_proto_state_machine() detected a problem in the size of the request or it couldn't allocate an iobuf of the requested size, it returned -ENOMEM (-12). However the caller was expecting only -1 in case of error. For this reason the error passes undetected initially, adding back the socket to the epoll object. On further processing, however, the error is finally detected and the connection terminated. Meanwhile, another thread could receive a poll_in event from the same connection, which could cause races with the connection destruction. When this happened, the process crashed. To fix this, all error detection conditions have been hardened to be more strict on what is valid and what not. Also, we don't return -ENOMEM anymore. We always return -1 in case of error. An additional change has been done to prevent destruction of the transport object while it may still be needed. Change-Id: I6e59cd81cbf670f7adfdde942625d4e6c3fbc82d Fixes: bz#1782495 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* pcsvc: fix subnet_mask_v4 checkAmar Tumballi2019-11-271-4/+7
| | | | | | | | | | The check we had for subnet mask validation wasn't checking in proper sequence. Corrected the order of calling `inet_pton()` as the fix. Fixes: #765 Change-Id: I5d31468eb917aa94cbb85f573b37c60023e9daf3 Signed-off-by: Amar Tumballi <amar@kadalu.io>
* socket.c: minor changesYaniv Kaul2019-11-192-105/+77
| | | | | | | | | | | | 1. Remove dead code and declarations 2. Move some dict functions to use more efficient ones. 3. Use more constants, where possible. 4. Align messages - easier to grep the code for them. 5. Aligned structures and adding padding where needed. Change-Id: Ifc2639afe65a935fab5238d3e4a121b662836d3d updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* rpc: Cleanup SSL specific data at the time of freeing rpc objectl17zhou2019-11-081-2/+20
| | | | | | | | | | | | Problem: At the time of cleanup rpc object ssl specific data is not freeing so it has become a leak. Solution: To avoid the leak cleanup ssl specific data at the time of cleanup rpc object Credits: l17zhou <cynthia.zhou@nokia-sbell.com.cn> Fixes: bz#1768407 Change-Id: I37f598673ae2d7a33c75f39eb8843ccc6dffaaf0
* rpc: align structsYaniv Kaul2019-10-175-45/+46
| | | | | | | | | | | | | squash tens of warnings on padding of structs in afr structures. The warnings were found by manually added '-Wpadded' to the GCC command line. Also made relevant structs and definitions static, where it was applicable. Change-Id: Ib71a7e9c6179378f072d796d11172d086c343e53 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* rpc: fix missing unref on reconnectZhang Huan2019-10-021-6/+10
| | | | | | | | | | | | | | On protocol client connecting to brick, client will firstly contact glusterd to get port, then reconnect to glusterfsd. Reconnect cancels the reconnect timer and start a new one. However, cancelling the timer does not unref rpc ref-ed for it. That leads to refcount leak. Fix this issue by unref-ing rpc if reconnect timer is canceled. Change-Id: Ice89dcd93cb283a0c7250c369cc8961d52fb2022 Fixes: bz#1538900 BUG: 1538900 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* glusterd, rpc, glusterfsd: fix coverity defects and put required annotationsAtin Mukherjee2019-09-101-0/+1
| | | | | | | | | | | 1404965 - Null pointer dereference 1404316 - Program hangs 1401715 - Program hangs 1401713 - Program hangs Updates: bz#789278 Change-Id: I6e6575daafcb067bc910445f82a9d564f43b75a2 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* rpc/xdr: fixes in MakefileAmar Tumballi2019-09-051-1/+6
| | | | | | | | there is no need to cleanup the .x files. Fixes: bz#1743094 Change-Id: I89d8deb3939c83069709c701cb8f1972e3746168 Signed-off-by: Amar Tumballi <amarts@gmail.com>
* graph/cleanup: Fix race in graph cleanupMohammed Rafi KC2019-09-052-5/+5
| | | | | | | | | | | | | | | | | We were unconditionally cleaning up the grap when we get child_down followed by parent_down. But this is prone to race condition when some of the bricks are already disconnected. In this case, even before the last child down is executed in the client xlator code,we might have freed the graph. Because the child_down event is alreadt recevied. To fix this race, we have introduced a check to see if all client xlator have cleared thier reconnect chain, and called the child_down for last time. Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042 fixes: bz#1727329 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* rpc: Update address family if it is not provide in cmd-line argumentsMohit Agrawal2019-09-021-1/+12
| | | | | | | | | | | | | | Problem: After enabling transport-type to inet6 and passed ipv6 transport.socket.bind-address in glusterd.vol clients are not started. Solution: Need to update address-family based on remote-address for all gluster client process Change-Id: Iaa3588cd87cebc45231bfd675745c1a457dc9b31 Fixes: bz#1747746 Credits: Amgad Saleh <amgad.saleh@nokia.com> Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* rpc: glusterd start is failed and throwing an error Address already in useMohit Agrawal2019-08-181-7/+37
| | | | | | | | | | | | | | | Problem: Some of the .t are failed due to bind is throwing an error EADDRINUSE Solution: After killing all gluster processes .t is trying to start glusterd but somehow if kernel has not cleaned up resources(socket) then glusterd startup is failed due to bind system call failure.To avoid the issue retries to call bind 10 times to execute system call succesfully Change-Id: Ia5fd6b788f7b211c1508c1b7304fc08a32266629 Fixes: bz#1743020 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* libglusterfs: remove dependency of rpcAmar Tumballi2019-08-163-66/+44
| | | | | | | | | | | | | | | | | | Goal: 'libglusterfs' files shouldn't have any dependency outside of the tree, specially the header files, shouldn't have '#include' from outside the tree. Fixes: * Had to introduce libglusterd so, methods and structures required for only mgmt/glusterd, and cli/ are separated from 'libglusterfs/' * Remove rpc/xdr/gen from build, which was used mainly so dependency for libglusterfs could be properly satisfied. * Move rpcsvc_auth_data to client_t.h, so all dependencies could be handled. Updates: bz#1636297 Change-Id: I0e80243a5a3f4615e6fac6e1b947ad08a9363fce Signed-off-by: Amar Tumballi <amarts@redhat.com>
* rpc/transport: have default listen-portAtin Mukherjee2019-08-061-0/+2
| | | | | | | | | | | | | | | With release-6, we now can have transport.socket.listen-port parameter configurable in glusterd.vol. However the default value wasn't defined in the code and this breaks the backward compatibility where if one has a modified glusterd.vol file, then post upgrade the same file will be retained and the new changes introduced as part of the release wouldn't be available in the glusterd.vol. So it's important that for each new options introduced in glusterd.vol file backward compatibility is guaranteed. Fixes: bz#1737676 Change-Id: I776b28bff786320cda299fe673d824024dc9803e Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* event: rename event_XXX with gf_ prefixedXiubo Li2019-07-292-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I hit one crash issue when using the libgfapi. In the libgfapi it will call glfs_poller() --> event_dispatch() in file api/src/glfs.c:721, and the event_dispatch() is defined by libgluster locally, the problem is the name of event_dispatch() is the extremly the same with the one from libevent package form the OS. For example, if a executable program Foo, which will also use and link the libevent and the libgfapi at the same time, I can hit the crash, like: kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp 00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000] The link for Foo is: lib_foo_LADD = -levent $(GFAPI_LIBS) It will crash. This is because the glfs_poller() is calling the event_dispatch() from the libevent, not the libglsuter. The gfapi link info : GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid If I link Foo like: lib_foo_LADD = $(GFAPI_LIBS) -levent It will works well without any problem. And if Foo call one private lib, such as handler_glfs.so, and the handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't and it will dlopen(handler_glfs.so), then the crash will be hit everytime. The link info will be: foo_LADD = -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like: foo_LADD = $(GFAPI_LIBS) -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS. And in some cases when the --as-needed link option is added(on many dists it is added as default), then the crash is back again, the above workaround won't work. Fixes: #699 Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa Signed-off-by: Xiubo Li <xiubli@redhat.com>
* ctime: Set mdata xattr on legacy filesKotresh HR2019-07-223-1/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The files which were created before ctime enabled would not have "trusted.glusterfs.mdata"(stores time attributes) xattr. Upon fops which modifies either ctime or mtime, the xattr gets created with latest ctime, mtime and atime, which is incorrect. It should update only the corresponding time attribute and rest from backend Solution: Creating xattr with values from brick is not possible as each brick of replica set would have different times. So create the xattr upon successful lookup if the xattr is not created Note To Reviewers: The time attributes used to set xattr is got from successful lookup. Instead of sending the whole iatt over the wire via setxattr, a structure called mdata_iatt is sent. The mdata_iatt contains only time attributes. Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 fixes: bz#1593542 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* ibverbs/rdma: remove from buildAmar Tumballi2019-07-138-6126/+1
| | | | | | | | | | | | | | | We have proposed about this an year ago, and with recent smoke failures, it looks like the right time to take such call. ref: https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html With this, glusterfs-8.0 wouldn't have rdma feature, and would allow some modularity changes possible with rpc layer (as we would have just 1 transport) Updates: bz#1635688 Change-Id: Ia277dca4d4b1f0cffae20819024a52b075b775e5 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* rpc/xdr: include nfs specific files in build only if gNFS is enabledAmar Tumballi2019-07-102-10/+23
| | | | | | updates: bz#1193929 Change-Id: I2b85fd0a04c77815a154f445ec8fb4da37dcbe40 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* glusterd/svc: update pid of mux volumes from the shd processMohammed Rafi KC2019-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | For a normal volume, we are updating the pid from a the process while we do a daemonization or at the end of the init if it is no-daemon mode. Along with updating the pid we also lock the file, to make sure that the process is running fine. With brick mux, we were updating the pidfile from gluterd after an attach/detach request. There are two problems with this approach. 1) We are not holding a pidlock for any file other than parent process. 2) There is a chance for possible race conditions with attach/detach. For example, shd start and a volume stop could race. Let's say we are starting an shd and it is attached to a volume. While we trying to link the pid file to the running process, this would have deleted by the thread that doing a volume stop. Change-Id: I29a00352102877ce09ea3f376ca52affceb5cf1a Updates: bz#1722541 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* glusterfs-fops: fix the modularityAmar Tumballi2019-07-027-261/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs-fops.h was moved to rpc/xdr to support compound fops. (ref: https://review.gluster.org/14032, 2f945b86d3) This was fine as long as all these header files were in single include directory after 'install'. With the move to separate out glusterfs specific header files into another directory inside /usr/include (ref: https://review.gluster.org/21746, 20ef211cfa), glusterfs-fops.h file was not in the proper path when an external .c file tried to include any of glusterfs specific .h file (like xlator.h). Now, we have removed compound-fops, with that, none of the enums declared in glusterfs-fops.h are actually getting used on wire anymore. Hence, it makes sense to get this to libglusterfs/src as a single point of definition. With this change, the external programs can use glusterfs header files. also remove some enum definitions which are not used in code anymore. Updates: bz#1636297 Change-Id: I423c44d3dbe2efc777299c544ece3cb172fc7e44 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* core: fedora 30 compiler warningsSheetalPamecha2019-06-181-1/+1
| | | | | | | | warning: ā€˜%sā€™ directive argument is null [-Wformat-overflow=] Change-Id: I69b8d47f0002c58b00d1cc947fac6f1c64e0b295 updates: bz#1193929 Signed-off-by: SheetalPamecha <spamecha@redhat.com>
* multiple files: another attempt to remove includesYaniv Kaul2019-06-1417-45/+1
| | | | | | | | | | | | | | | | | | There are many include statements that are not needed. A previous more ambitious attempt failed because of *BSD plafrom (see https://review.gluster.org/#/c/glusterfs/+/21929/ ) Now trying a more conservative reduction. It does not solve all circular deps that we have, but it does reduce some of them. There is just too much to handle reasonably (dht-common.h includes dht-lock.h which includes dht-common.h ...), but it does reduce the overall number of lines of include we need to look at in the future to understand and fix the mess later one. Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* across: clang-scan: fix NULL dereferencing warningsAmar Tumballi2019-06-041-5/+3
| | | | | | | | | All these checks are done after analyzing clang-scan report produced by the CI job @ https://build.gluster.org/job/clang-scan updates: bz#1622665 Change-Id: I590305af4ceb779be952974b2a36066ffc4865ca Signed-off-by: Amar Tumballi <amarts@redhat.com>
* If bind-address is IPv6 return it successfullyAmgad Saleh2019-05-281-6/+11
| | | | | | Change-Id: Ibd37b6ea82b781a1a266b95f7596874134f30079 fixes: bz#1713730 Signed-off-by: Amgad Saleh <amgad.saleh@nokia.com>
* Fix some "Null pointer dereference" coverity issuesXavi Hernandez2019-05-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | This patch fixes the following CID's: * 1124829 * 1274075 * 1274083 * 1274128 * 1274135 * 1274141 * 1274143 * 1274197 * 1274205 * 1274210 * 1274211 * 1288801 * 1398629 Change-Id: Ia7c86cfab3245b20777ffa296e1a59748040f558 Updates: bz#789278 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* Revert "rpc: implement reconnect back-off strategy"Amar Tumballi2019-05-212-18/+16
| | | | | | | | | | | | | | | This reverts commit 59841f7e1ff0511b04884015441a181a56d07bea. This revert is done as a 'possible' fix for frequent regression failures, which are random in nature too (ie, different tests fails in different runs). Why exactly this patch? Because this patch seemed like most probable candidate which got merged in last 15days, and after which regressions are failing more often. Updates: bz#1711827 Change-Id: I35333162fcd4064f9609525ca93c666053c6d959
* rpc: implement reconnect back-off strategyXavier Hernandez2019-05-112-16/+18
| | | | | | | | | | | | | | | When a connection failure happens, gluster tries to reconnect every 3 seconds. In some cases the failure is spurious, so a delay of 3 seconds could be unnecessarily long. This patch implements a back-off strategy that tries a reconnect as soon as 1 tenth of a second. If this fails, the time is doubled until it's around 3 seconds. After that, the reconnect is attempted every 3 seconds as before. Change-Id: Icb3fbe20d618f50cbbb599dce542b4e871c22149 Updates: bz#1193929 Signed-off-by: Xavier Hernandez <xhernandez@redhat.com>
* protocol: remove compound fopAmar Tumballi2019-04-292-240/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Compound fops are kept on wire as a backward compatibility with older AFR modules. The AFR module used beyond 4.x releases are not using compound fops. Hence removing the compound fop in the protocol code. Note that, compound-fops was already an 'option' in AFR, and completely removed since 4.1.x releases. So, point to note is, with this change, we have 2 ways to upgrade when clients of 3.x series are present. i) set 'use-compound-fops' option to 'false' on any volume which is of replica type. And then upgrade the servers. ii) Do a two step upgrade. First from current version (which will already be EOL if it's using compound) to a 4.1..6.x version, and then an upgrade to 7.x. Consider the overall code which we are removing for the option seems quite high, I believe it is worth it. updates: bz#1693692 Signed-off-by: Amar Tumballi <amarts@redhat.com> Change-Id: I0a8876d0367a15e1410ec845f251d5d3097ee593
* rpclib: slow floating point math and libmKaleb S. KEITHLEY2019-04-032-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In release-6 rpc/rpc-lib (libgfrpc) added the function get_rightmost_set_bit() which calls log2(3), a call that takes a floating point parameter. It's used thusly: right_most_unset_bit = get_rightmost_set_bit(...); (So is it really the right-most unset bit, or the right-most set bit?) It's unclear to me whether this is in the data path or not. If it is, it's rather scary to think about integer-to-float conversions and slow calls to libm functions in the data path. gcc and clang have __builtin_ctz() which returns the same result as get_rightmost_set_bit(), and does it substantially faster. Approx 20M iterations of get_rightmost_set_bit() took ~33sec of wall clock time on my devel machine, while 20M iterations of __builtin_ctz() took < 9sec; get_rightmost_set_bit() is 3x slower than __builtin_ctz(). And as a side benefit, we can again eliminate the need to link libgfrpc with libm. Change-Id: If9e7e80874577c52223f8125b385fc930de20699 updates: bz#1193929 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* transport/socket: log shutdown msg occasionallyRaghavendra G2019-04-032-2/+3
| | | | | | Change-Id: If3fc0884e7e2f45de2d278b98693b7a473220a5f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1691616
* mgmt/shd: Implement multiplexing in self heal daemonMohammed Rafi KC2019-04-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Shd daemon is per node, which means they create a graph with all volumes on it. While this is a great for utilizing resources, it is so good in terms of performance and managebility. Because self-heal daemons doesn't have capability to automatically reconfigure their graphs. So each time when any configurations changes happens to the volumes(replicate/disperse), we need to restart shd to bring the changes into the graph. Because of this all on going heal for all other volumes has to be stopped in the middle, and need to restart all over again. Solution: This changes makes shd as a per volume daemon, so that the graph will be generated for each volumes. When we want to start/reconfigure shd for a volume, we first search for an existing shd running on the node, if there is none, we will start a new process. If already a daemon is running for shd, then we will simply detach a graph for a volume and reatach the updated graph for the volume. This won't touch any of the on going operations for any other volumes on the shd daemon. Example of an shd graph when it is per volume graph ----------------------- | debug-iostat | ----------------------- / | \ / | \ --------- --------- ---------- | AFR-1 | | AFR-2 | | AFR-3 | -------- --------- ---------- A running shd daemon with 3 volumes will be like--> graph ----------------------- | debug-iostat | ----------------------- / | \ / | \ ------------ ------------ ------------ | volume-1 | | volume-2 | | volume-3 | ------------ ------------ ------------ Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99 fixes: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>