glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	build: when building with tirpc, link with libtirpc	Kaleb S. KEITHLEY	2020-09-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	uncovered on Ubuntu Groovy (20.10, ubuntu's bleeding edge devel dist), seems to now have stricter link semantics than it did when we last built 8.1 and 7.5. Many xlators actually do have direct calls to xdr_sizeof(), so strictly speaking they should be linked with libtirpc. Change-Id: Iee1fd3528fde19db397c4eae6978d9b9a2c3e17f Updates: #1002 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	glusterfsd, libglusterfs, rpc: prefer libglusterfs time API	Dmitry Antipov	2020-09-07	1	-2/+2
\| \| \| \| \| \| \| \|	Use timespec_now_realtime() rather than clock_gettime(). Change-Id: I8fa00b7c0f7b388305c7d19574be3b409db68558 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Updates: #1002
*	libglusterfs: add functions to calculate time difference	Dmitry Antipov	2020-08-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Add gf_tvdiff() and gf_tsdiff() to calculate the difference between 'struct timeval' and 'struct timespec' values, use them where appropriate. Change-Id: I172be06ee84e99a1da76847c15e5ea3fbc059338 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Updates: #1002
*	posix: Implement a janitor thread to close fd	Mohit Agrawal	2020-08-20	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In the commit fb20713b380e1df8d7f9e9df96563be2f9144fd6 we use syntask to close fd but we have found the patch is reducing the performance Solution: Use janitor thread to close fd's and save the pfd ctx into ctx janitor list and also save the posix_xlator into pfd object to avoid the race condition during cleanup in brick_mux environment Change-Id: Ifb3d18a854b267333a3a9e39845bfefb83fbc092 Fixes: #1396 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	libglusterfs: add library wrapper for time()	Dmitry Antipov	2020-08-17	1	-2/+2
\| \| \| \| \| \| \| \| \|	Add thin convenient library wrapper gf_time(), adjust related users and comments as well. Change-Id: If8969af2f45ee69c30c3406bce5baa8305fb7f80 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Updates: #1002
*	glusterfsd - fixing a coverity issue	Barak Sason Rofman	2020-07-06	1	-0/+1
\| \| \| \| \| \| \| \| \|	Fixing a resource leak issue Change-Id: I6c75de02887ddd59f7edfd65ebeca9d9629c6f1f CID: 1430129 updates: #1202 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	build: Pass $(LIB_DL) using prog_LDADD or lib_LIBADD	Anoop C S	2020-07-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"Program and Library Variables" section from Automake manual suggests the following: . . . _LDADD and _LIBADD are inappropriate for passing program-specific linker flags (except for -l, -L, -dlopen and -dlpreopen). Use the _LDFLAGS variable for this purpose. . . . Therefore it is reasonable to move $(LIB_DL) additon from _LDFLAGS to _LDADD and _LIBADD variables for program and library respectively. Change-Id: Id8b4734c207ab28a08bcce683d316cdc7acb0bcd Updates: #1000 Signed-off-by: Anoop C S <anoopcs@redhat.com>
*	Indicate timezone offsets in timestamps	Csaba Henk	2020-06-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Logs and other output carrying timestamps will have now timezone offsets indicated, eg.: [2020-03-12 07:01:05.584482 +0000] I [MSGID: 106143] [glusterd-pmap.c:388:pmap_registry_remove] 0-pmap: removing brick (null) on port 49153 To this end, - gf_time_fmt() now inserts timezone offset via %z strftime(3) template. - A new utility function has been added, gf_time_fmt_tv(), that takes a struct timeval pointer (tv) instead of a time_t value to specify the time. If tv->tv_usec is negative, gf_time_fmt_tv(... tv ...) is equivalent to gf_time_fmt(... tv->tv_sec ...) Otherwise it also inserts tv->tv_usec to the formatted string. - Building timestamps of usec precision has been converted to gf_time_fmt_tv, which is necessary because the method of appending a period and the usec value to the end of the timestamp does not work if the timestamp has zone offset, but it's also beneficial in terms of eliminating repetition. - The buffer passed to gf_time_fmt/gf_time_fmt_tv has been unified to be of GF_TIMESTR_SIZE size (256). We need slightly larger buffer space to accommodate the zone offset and it's preferable to use a buffer which is undisputedly large enough. This change does not* do the following: - Retaining a method of timestamp creation without timezone offset. As to my understanding we don't need such backward compatibility as the code just emits timestamps to logs and other diagnostic texts, and doesn't do any later processing on them that would rely on their format. An exception to this, ie. a case where timestamp is built for internal use, is graph.c:fill_uuid(). As far as I can see, what matters in that case is the uniqueness of the produced string, not the format. - Implementing a single-token (space free) timestamp format. While some timestamp formats used to be single-token, now all of them will include a space preceding the offset indicator. Again, I did not see a use case where this could be significant in terms of representation. - Moving the codebase to a single unified timestamp format and dropping the fmt argument of gf_time_fmt/gf_time_fmt_tv. While the gf_timefmt_FT format is almost ubiquitous, there are a few cases where different formats are used. I'm not convinced there is any reason to not use gf_timefmt_FT in those cases too, but I did not want to make a decision in this regard. Change-Id: I0af73ab5d490cca7ed8d07a2ce7ac22a6df2920a Updates: #837 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	rpc, gf_attach: add minimal proper synchronization	Dmitry Antipov	2020-06-03	1	-8/+37
\| \| \| \| \| \| \| \| \| \|	Implement minimal proper synchronization between gf_attach and underlying RPC layer using convenient POSIX primitives. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1260 Change-Id: Ib5130b586a8b65ed5cf5f9156c111b161570224b
*	core, cli, quota: cleanup malloc debugging and stats	Dmitry Antipov	2020-05-04	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \|	1. Since mcheck()/mprobe() etc. features are no longer used, mcheck.h isn't required to be included. 2. Since mallinfo() is used to obtain malloc statistics, it should be detected instead of malloc_stats(). Change-Id: I54c7d2ee568e06ab29938efc01d1a2153c5bd5db Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1172
*	glusterfsd: structure logging	yatipadia	2020-02-06	3	-124/+132
\| \| \| \| \| \| \| \|	convert gf_msg() to gf_smsg() Change-Id: I1cd6a5ac6f4361195d5d925efb2cc194045d0bba Updates: #657 Signed-off-by: yatip <ypadia@redhat.com>
*	multiple xlators: reduce key length	Yaniv Kaul	2020-01-14	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In many cases, we were freely allocating long keys with no need. Smaller char arrays are just fine almost anywhere, so just went ahead and looked where they we can use smaller ones. In some cases, annotated the functions as static and the prefixes passed as const as it was easier to read and understand. Where relevant, converted the dict functions to use known key length. Change-Id: I882ab33ea20d90b63278336cd1370c09ffdab7f2 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	mgmt/brick-mux: Avoid sending two response when attach is failed.	Mohammed Rafi KC	2019-12-31	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \|	We were sending two response back to glusterd when an attach is failed. One from the handler function glusterfs_handle_attach and another from rpcsvc_check_and_reply_error. It was causing problems like ref leaks, transport disconnect etc. Change-Id: I3bb5b59959530760b568d52becb519499b3dcd2b updates: bz#1785143 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	glusterd: refactoring long method	Barak Sason Rofman	2019-12-19	3	-325/+94
\| \| \| \| \| \| \| \| \| \| \|	- Refactored set_fuse_mount_options(...) in order to shorten it. - Removed dead code and moved some method to it's apropriate location. - Converted loggin in set_fuse_mount_options(...) to structured logs fixes: bz#1768896 Change-Id: If865833d4c60d517da202871978691ef21235fe4 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
*	[RFC]#ifdef gNFS related code if we are not compiling gNFS	Yaniv Kaul	2019-12-18	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we are not compiling gNFS (--enable-gnfs is not given in the ./configure script params), there is little point in compiling code that is related to it. This patch tries to eliminate it. My hope (and it's not clear from the code ) is that I did not break the NFS Ganesha support as well. Other than that, tried to compile with and without anad it looks sane. Change-Id: I8d6c98066b9fceab4ec10fc6f5e81ab069e853bd updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	glusterfsd.c: remove sys_lstat() call	Yaniv Kaul	2019-11-27	1	-9/+0
\| \| \| \| \| \| \| \| \|	get_volfp() in glfs.c doesn't use it, so get_volfp() in glusterfsd.c can just open the file without the stat call as well, IMHO. Change-Id: I3cb5bf12a09b5be42aa2ee4f432f8d351eee5b9e updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	glusterfsd-mgmt.c: move INFO log outside a LOCK	Yaniv Kaul	2019-11-19	1	-28/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	In glusterfs_handle_attach() we can: 1. Move an INFO level to be executed before the LOCK. 2. Skip the LOCK altogether, if there's no active graph. I hope it's safe - I've seen that in other functions you could look at ctx->active outside of a lock. Change-Id: I3e1ec5b1430d5fddee46883d468ff4f5bd6ca54b updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	[WIP]gluster-volgen.c: remove more of JBR and FDL xlators	Yaniv Kaul	2019-11-13	1	-5/+0
\| \| \| \| \| \| \| \| \|	the JBR and FDL experimental xlators were apparently removed. Removed additional leftovers scattered in the code. Change-Id: I78b6fa5fd9044dc48cdcb1fb094b8c267c2d1323 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	gluster: check ctx->active	Xie Changlong	2019-11-12	1	-0/+10
\| \| \| \| \| \| \| \| \|	To avoid process "TRANSLATOR INFO" "BARRIER" if graph is not ready, also see commit ee630e25. Updates: bz#1769712 Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com> Change-Id: Ibd446a35962206d3689667cda7e6712d72e4ec2f
*	glusterd: Client Handling of Elastic Clusters	Mohit Agrawal	2019-11-12	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Configure the list of gluster servers in the key GLUSTERD_BRICK_SERVERS at the time of GETSPEC RPC CALL and access the value in client side to update volfile serve list so that client would be able to connect next volfile server in case of current volfile server is down Updates #741 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I23f36ddb92982bb02ffd83937a8bd8a2c97e8104
*	glusterfsd-mgmt: unify read and write tests	Yaniv Kaul	2019-11-07	2	-148/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Both read and write tests required writing first. Either just writing (write test) or write and then read (read test). So the code is now unified. 2. There's no reason to read zeros from /dev/zero. Just use a CALLOC'ed buffer. I don't think we should read and write zeros, but I did not change the code yet (I think compression and/or dedup will offset results) It appears neither read-perf nor write-perf were tested, so added basic tests for them. Change-Id: I24b1f249fa0335ed652a8982e99c0687d940230e updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	rpc: align structs	Yaniv Kaul	2019-10-17	1	-44/+42
\| \| \| \| \| \| \| \| \| \| \| \| \|	squash tens of warnings on padding of structs in afr structures. The warnings were found by manually added '-Wpadded' to the GCC command line. Also made relevant structs and definitions static, where it was applicable. Change-Id: Ib71a7e9c6179378f072d796d11172d086c343e53 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	glusterfsd: fix unused value coverity issues	yatipadia	2019-10-15	1	-11/+0
\| \| \| \| \| \| \| \| \| \|	This patch addresses CID-1398624 and CID-1398631 removed the unused variable brick_name Change-Id: I4f40bd76cb4c94b28589c2333e29d4623da339d0 Updates: bz#789278 Signed-off-by: yati <ypadia@redhat.com>
*	glusterfs/fuse: Reduce the default lru-limit value	N Balachandran	2019-09-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The current lru-limit value still uses memory for upto 128K inodes. Reduce the default value of lru-limit to 64K. Change-Id: Ica2dd4f8f5fde45cb5180d8f02c3d86114ac52b3 Fixes: bz#1753880 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	glusterd, rpc, glusterfsd: fix coverity defects and put required annotations	Atin Mukherjee	2019-09-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	1404965 - Null pointer dereference 1404316 - Program hangs 1401715 - Program hangs 1401713 - Program hangs Updates: bz#789278 Change-Id: I6e6575daafcb067bc910445f82a9d564f43b75a2 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	mount.glusterfs: make fcache-keep-open option take a value	Philip Spencer	2019-08-16	1	-1/+1
\| \| \| \| \| \|	Fixes: bz#1158130 Change-Id: Ifdeaed7c9fbe85f7ce421f7c89cbe7265e45f77c Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	fuse: Set limit on invalidate queue size	N Balachandran	2019-08-14	2	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	If the glusterfs fuse client process is unable to process the invalidate requests quickly enough, the number of such requests quickly grows large enough to use a significant amount of memory. We are now introducing another option to set an upper limit on these to prevent runaway memory usage. Change-Id: Iddfff1ee2de1466223e6717f7abd4b28ed947788 Fixes: bz#1732717 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	fuse: rate limit reading from fuse device upon receiving EPERM	Csaba Henk	2019-08-08	2	-1/+30
\| \| \| \| \| \|	Fixes: bz#1644322 Change-Id: I53e8fa362cd8c7d04fb1c4abb606a9abb642c592 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	graph/shd: attach volfile even if ctx->active is NULL	Mohammed Rafi KC	2019-08-05	1	-11/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	While we receive a graph attach request, if ctx->active is not set we used to fail assuming that the initilization has not completed yet for the process start. Since the management connection is established, it will receive attach request, even when ctx->active is NULL. Change-Id: Ied4d1ac63e6d4ced4a9405a78e1ce39f81dfd437 fixes: bz#1727256 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	event: rename event_XXX with gf_ prefixed	Xiubo Li	2019-07-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I hit one crash issue when using the libgfapi. In the libgfapi it will call glfs_poller() --> event_dispatch() in file api/src/glfs.c:721, and the event_dispatch() is defined by libgluster locally, the problem is the name of event_dispatch() is the extremly the same with the one from libevent package form the OS. For example, if a executable program Foo, which will also use and link the libevent and the libgfapi at the same time, I can hit the crash, like: kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp 00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000] The link for Foo is: lib_foo_LADD = -levent $(GFAPI_LIBS) It will crash. This is because the glfs_poller() is calling the event_dispatch() from the libevent, not the libglsuter. The gfapi link info : GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid If I link Foo like: lib_foo_LADD = $(GFAPI_LIBS) -levent It will works well without any problem. And if Foo call one private lib, such as handler_glfs.so, and the handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't and it will dlopen(handler_glfs.so), then the crash will be hit everytime. The link info will be: foo_LADD = -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like: foo_LADD = $(GFAPI_LIBS) -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS. And in some cases when the --as-needed link option is added(on many dists it is added as default), then the crash is back again, the above workaround won't work. Fixes: #699 Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa Signed-off-by: Xiubo Li <xiubli@redhat.com>
*	cli: defer create_frame() (and dict creation) to later stages.	Yaniv Kaul	2019-07-16	1	-4/+6
\| \| \| \| \| \| \| \| \|	Where possible, defer create_frame() - whenever possible, after command line verification, for example. Change-Id: Id6606e90e7ea6190f30b225c4733b229c519bb2f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	Fix spelling errors	Aravinda VK	2019-07-14	1	-1/+1
\| \| \| \| \| \| \|	Fixes: bz#1728554 Change-Id: I88357aed7c14988a12616035c3738c32c09a8f9a Signed-off-by: Patrick Matthäi <pmatthaei@debian.org> Signed-off-by: Aravinda VK <avishwan@redhat.com>
*	glusterd/svc: update pid of mux volumes from the shd process	Mohammed Rafi KC	2019-07-09	2	-9/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For a normal volume, we are updating the pid from a the process while we do a daemonization or at the end of the init if it is no-daemon mode. Along with updating the pid we also lock the file, to make sure that the process is running fine. With brick mux, we were updating the pidfile from gluterd after an attach/detach request. There are two problems with this approach. 1) We are not holding a pidlock for any file other than parent process. 2) There is a chance for possible race conditions with attach/detach. For example, shd start and a volume stop could race. Let's say we are starting an shd and it is attached to a volume. While we trying to link the pid file to the running process, this would have deleted by the thread that doing a volume stop. Change-Id: I29a00352102877ce09ea3f376ca52affceb5cf1a Updates: bz#1722541 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	core: fedora 30 compiler warnings	SheetalPamecha	2019-06-18	1	-3/+1
\| \| \| \| \| \| \| \|	warning: ‘%s’ directive argument is null [-Wformat-overflow=] Change-Id: I69b8d47f0002c58b00d1cc947fac6f1c64e0b295 updates: bz#1193929 Signed-off-by: SheetalPamecha <spamecha@redhat.com>
*	multiple files: another attempt to remove includes	Yaniv Kaul	2019-06-14	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are many include statements that are not needed. A previous more ambitious attempt failed because of *BSD plafrom (see https://review.gluster.org/#/c/glusterfs/+/21929/ ) Now trying a more conservative reduction. It does not solve all circular deps that we have, but it does reduce some of them. There is just too much to handle reasonably (dht-common.h includes dht-lock.h which includes dht-common.h ...), but it does reduce the overall number of lines of include we need to look at in the future to understand and fix the mess later one. Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	tests: add tests for different signal handling	Amar Tumballi	2019-05-30	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \|	Also some cleanup: * old-protocol.t was actually added to make sure we have line-coverage * first-test.t should have been removed as per the comment. It doesn't do anything. * add statvfs to rpc-coverage so we can cover statvfs in few xlators. updates: bz#1693692 Change-Id: Ie8651ce007de484c4abced16b4de765aa5e517be Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	Fix some "Null pointer dereference" coverity issues	Xavi Hernandez	2019-05-26	1	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the following CID's: * 1124829 * 1274075 * 1274083 * 1274128 * 1274135 * 1274141 * 1274143 * 1274197 * 1274205 * 1274210 * 1274211 * 1288801 * 1398629 Change-Id: Ia7c86cfab3245b20777ffa296e1a59748040f558 Updates: bz#789278 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	core: avoid dynamic TLS allocation when possible	Xavi Hernandez	2019-04-24	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some interdependencies between logging and memory management functions make it impossible to use the logging framework before initializing memory subsystem because they both depend on Thread Local Storage allocated through pthread_key_create() during initialization. This causes a crash when we try to log something very early in the initialization phase. To prevent this, several dynamically allocated TLS structures have been replaced by static TLS reserved at compile time using '__thread' keyword. This also reduces the number of error sources, making initialization simpler. Updates: bz#1193929 Change-Id: I8ea2e072411e30790d50084b6b7e909c7bb01d50 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	core: Log level changes do not effect on running client process	Mohit Agrawal	2019-04-15	3	-18/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level per xlator during reconfigure only for a brick process not for the client process. Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure about brick multiplex introudce a flag brick_mux at ctx->cmd_args. Note: There are two other changes done with this patch 1) Ignore client-log-level option to attach a brick with already running brick if brick_mux is enabled 2) Add a log to print pid of the running process to make easier debugging Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> Fixes: bz#1696046
*	mgmt/shd: Implement multiplexing in self heal daemon	Mohammed Rafi KC	2019-04-01	3	-23/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Shd daemon is per node, which means they create a graph with all volumes on it. While this is a great for utilizing resources, it is so good in terms of performance and managebility. Because self-heal daemons doesn't have capability to automatically reconfigure their graphs. So each time when any configurations changes happens to the volumes(replicate/disperse), we need to restart shd to bring the changes into the graph. Because of this all on going heal for all other volumes has to be stopped in the middle, and need to restart all over again. Solution: This changes makes shd as a per volume daemon, so that the graph will be generated for each volumes. When we want to start/reconfigure shd for a volume, we first search for an existing shd running on the node, if there is none, we will start a new process. If already a daemon is running for shd, then we will simply detach a graph for a volume and reatach the updated graph for the volume. This won't touch any of the on going operations for any other volumes on the shd daemon. Example of an shd graph when it is per volume graph ----------------------- \| debug-iostat \| ----------------------- / \| \ / \| \ --------- --------- ---------- \| AFR-1 \| \| AFR-2 \| \| AFR-3 \| -------- --------- ---------- A running shd daemon with 3 volumes will be like--> graph ----------------------- \| debug-iostat \| ----------------------- / \| \ / \| \ ------------ ------------ ------------ \| volume-1 \| \| volume-2 \| \| volume-3 \| ------------ ------------ ------------ Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99 fixes: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	rpc/transport: Missing a ref on dict while creating transport object	Mohammed Rafi KC	2019-03-20	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	while creating rpc_tranpsort object, we store a dictionary without taking a ref on dict but it does an unref during the cleaning of the transport object. So the rpc layer expect the caller to take a ref on the dictionary before passing dict to rpc layer. This leads to a lot of confusion across the code base and leads to ref leaks. Semantically, this is not correct. It is the rpc layer responsibility to take a ref when storing it, and free during the cleanup. I'm listing down the total issues or leaks across the code base because of this confusion. These issues are currently present in the upstream master. 1) changelog_rpc_client_init 2) quota_enforcer_init 3) rpcsvc_create_listeners : when there are two transport, like tcp,rdma. 4) quotad_aggregator_init 5) glusterd: init 6) nfs3_init_state 7) server: init 8) client:init This patch does the cleanup according to the semantics. Change-Id: I46373af9630373eb375ee6de0e6f2bbe2a677425 updates: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	glusterfsd: Multiple shd processes are spawned on brick_mux environment	Mohit Agrawal	2019-03-12	1	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Multiple shd processes are spawned while starting volumes in the loop on brick_mux environment.glusterd spawn a process based on a pidfile and shd daemon is taking some time to update pid in pidfile due to that glusterd is not able to get shd pid Solution: Commit cd249f4cb783f8d79e79468c455732669e835a4f changed the code to update pidfile in parent for any gluster daemon after getting the status of forking child in parent.To resolve the same correct the condition update pidfile in parent only for glusterd and for rest of the daemon pidfile is updated in child Change-Id: Ifd14797fa949562594a285ec82d58384ad717e81 fixes: bz#1684404 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	glusterfsd: Do not process PROFILE_NFS_INFO if graph is not ready	hujianfei	2019-02-19	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, gnfs will crash in following situation. Also see commit 2f9e555f. Reproducible Steps: 1. kill gnfs process 2. service glusterd restart;gluster volume profile [vol] info nfs dump trace info: /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fcf5cb6a872] /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7fcf5cb743a4] /lib64/libc.so.6(+0x35670)[0x7fcf5b1d5670] /usr/sbin/glusterfs(glusterfs_handle_nfs_profile+0x114)[0x7fcf5d066474] /lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7fcf5cba1502] /lib64/libc.so.6(+0x47110)[0x7fcf5b1e7110] Fixes: bz#1677559 Change-Id: Id68edb3e4646c39544e0b4c90b5e0a9083b37b0d Signed-off-by: hujianfei <hujianfei@cmss.chinamobile.com>
*	glusterd: adding a comment for code readability	Sanju Rakonde	2019-02-19	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	Adding a comment in the source code, so that anyone reading the code will understand the changes done by d4fa29 better. fixes: bz#1654270 Change-Id: I75aff4243420c434c47d69a4b310f77bf161bb29 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	core: implement a global thread pool	Xavi Hernandez	2019-02-18	2	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a thread pool that is wait-free for adding jobs to the queue and uses a very small locked region to get jobs. This makes it possible to decrease contention drastically. It's based on wfcqueue structure provided by urcu library. It automatically enables more threads when load demands it, and stops them when not needed. There's a maximum number of threads that can be used. This value can be configured. Depending on the workload, the maximum number of threads plays an important role. So it needs to be configured for optimal performance. Currently the thread pool doesn't self adjust the maximum for the workload, so this configuration needs to be changed manually. For this reason, the global thread pool has been made optional, so that volumes can still use the thread pool provided by io-threads. To enable it for bricks, the following option needs to be set: config.global-threading = on This option has no effect if bricks are already running. A restart is required to activate it. It's recommended to also enable the following option when running bricks with the global thread pool: performance.iot-pass-through = on To enable it for a FUSE mount point, the option '--global-threading' must be added to the mount command. To change it, an umount and remount is needed. It's recommended to disable the following option when using global threading on a mount point: performance.client-io-threads = off To enable it for services managed by glusterd, glusterd needs to be started with option '--global-threading'. In this case all daemons, like self-heal, will be using the global thread pool. Currently it can only be enabled for bricks, FUSE mounts and glusterd services. The maximum number of threads for clients and bricks can be configured using the following options: config.client-threads config.brick-threads These options can be applied online and its effect is immediate most of the times. If one of them is set to 0, the maximum number of threads will be calcutated as #cores * 2. Some distributions use a very old userspace-rcu library (version 0.7) for this reason, some header files from version 0.10 have been copied into contrib/userspace-rcu and are used if the detected version is 0.7 or older. An additional change has been made to io-threads to prevent that threads are started when iot-pass-through is set. Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b updates: #532 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	fuse: reflect the actual default for lru-limit option	Amar Tumballi	2019-02-11	1	-1/+1
\| \| \| \| \| \| \| \|	in both `--help` text and man page updates: bz#1193929 Change-Id: I9aa9367c6863ac8e2403255280697c9e6be26cf0 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	mount/fuse: expose auto-invalidation as a mount option	Raghavendra Gowdappa	2019-02-02	2	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Auto invalidation is necessary when same (meta)data is shared/access across multiple mounts. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, fuse-auto-invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. From glusterfs --help, <snip> --auto-invalidation[=BOOL] controls whether fuse-kernel can auto-invalidate attribute, dentry and page-cache. Disable this only if same files/directories are not accessed across two different mounts concurrently [default: "on"] </snip> Details on how disabling auto-invalidation helped to reduce pgbench init times can be found at [1]. Time taken for pgbench init of scale 8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s) with auto-invalidations turned off along with other optimizations. Just disabling auto-invalidation contributed 56% improvement by reducing the total time taken by 33260s. [1] https://www.spinics.net/lists/gluster-devel/msg25907.html Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1664934
*	Multiple files: reduce work while under lock.	Yaniv Kaul	2019-01-29	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly, unlock before logging. In some cases, moved different code that was not needed to be under lock (for example, taking time, or malloc'ing) to be executed before taking the lock. Note: logging might be slightly less accurate in order, since it may not be done now under the lock, so order of logs is racy. I think it's a reasonable compromise. Compile-tested only! updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89
*	rpc: Fix double free	Poornima G	2019-01-22	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	The value rsp.xdata.xdata_val was being freed twice. It was assigned to dict->extra_stdfree, dict_destroy would free it and also there was an explicit free. Getting rid of explicit free in this patch. Change-Id: Ia9c73454bec3970b33f154fa754398bf3b045645 fixes: bz#1668268 Signed-off-by: Poornima G <pgurusid@redhat.com>
*	rpc: use address-family option from vol file	Milind Changire	2019-01-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch helps enable IPv6 connections in the cluster. The default address-family is IPv4 without using this option explicitly. When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol file, the mount command-line also needs to have -o xlator-option="transport.address-family=inet6" added to it. This option also gets added to the brick command-line. Snapshot and gfapi use-cases should also use this option to pass in the inet6 address-family. Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270 fixes: bz#1635863 Signed-off-by: Milind Changire <mchangir@redhat.com>