| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
quotad and ganesha.nfsd prints many logs as,
[rpc-clnt.c:1739:rpc_clnt_submit ] 0-<VOLUME_NAME>-quota: error returned while attempting to connect to host: (null), port 0
Change-Id: Ic0c815400619e4a87a772a51b19822920228c1ef
Updates: bz#1596787
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.
Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation <> in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs
This change although big, is just moving around the headers and
making it correct when including these headers from other sources.
This helps us correctly include libglusterfs includes without
namespace conflicts.
Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A single global per program queue is contended by all request handler
threads and event threads. This can lead to high contention. So,
reduce the contention by providing each request handler thread its own
private queue.
Thanks to "Manoj Pillai"<mpillai@redhat.com> for the idea of pairing a
single queue with a fixed request-handler-thread and event-thread,
which brought down the performance regression due to overhead of
queuing significantly.
Thanks to "Xavi Hernandez"<xhernandez@redhat.com> for discussion on
how to communicate the event-thread death to request-handler-thread.
Thanks to "Karan Sandha"<ksandha@redhat.com> for voluntarily running
the perf benchmarks to qualify that performance regression introduced
by ping-timer-fixes is fixed with this patch and patiently running
many iterations of regression tests while RCAing the issue.
Thanks to "Milind Changire"<mchangir@redhat.com> for patiently running
the many iterations of perf benchmarking tests while RCAing the
regression caused by ping-timer-expiry fixes.
Change-Id: I578c3fc67713f4234bd3abbec5d3fbba19059ea5
Fixes: bz#1644629
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since gcc-8.2.x (fedora-28 or so) gcc has been emitting warnings
about buggy use of strncpy.
Most uses that gcc warns about in our sources are exactly backwards;
the 'limit' or len is the strlen/size of the _source param_, giving
exactly zero protection against overruns. (Which was, after all, one
of the points of using strncpy in the first place.)
IOW, many warnings are about uses that look approximately like this:
...
char dest[8];
char src[] = "this is a string longer than eight chars";
...
strncpy (dest, src, sizeof(src)); /* boom */
...
The len/limit should be sizeof(dest).
Note: the above example has a definite over-run. In our source the
overrun is typically only theoretical (but possibly exploitable.)
Also strncpy doesn't null-terminate on truncation; snprintf does; prefer
snprintf over strncpy.
Mildly surprising that coverity doesn't warn/isn't warning about this.
Change-Id: I022d5c6346a751e181ad44d9a099531c1172626e
updates: bz#1193929
Signed-off-by: Kaleb S. KEITHLE <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we did not go to unlock the mutex if we failed
to connect. This patch fixes it.
Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Change-Id: I0fcca066a2601dba6bc3e9eb8b3c9fc757ffe4db
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assorted code refactoring to reduce lock contention.
Also, took the opportunity to reorder structs more properly.
Removed dead code.
Hopefully, no functional changes.
Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Change-Id: I5de6124ad071fd5e2c31832364d602b5f6d6fe28
|
|
|
|
|
|
|
|
|
|
|
| |
trav->saved_at.tv_sec is not initialized.
Calling "list_empty" function before initializing "trav".
Updates: bz#1622665
Change-Id: Ib5c2703a07a9c56ccd115001aca500f7a23c4a2e
Signed-off-by: Harpreet Lalwani <hlalwani@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case glfs_fini is ongoing, some cache xlators like readdir-ahead,
continues to submit requests. Current rpc submit code ignores
connection status and queues these internally generated requests. These
requests then got cleaned up after inode table has been destroyed,
causing crash.
Change-Id: Ife6b17d8592a054f7a7f310c79d07af005087017
updates: bz#1626313
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
|
|
|
|
|
| |
Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4
Signed-off-by: Nigel Babu <nigelb@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xlators/cluster/stripe/src/stripe-helpers.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/tier.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-layout.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-helper.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/dht/src/dht-common.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/afr/src/afr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
xlators/cluster/afr/src/afr-inode-read.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/bugs/replicate/bug-1250170-fsync.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/basic/gfapi/gfapi-async-calls-test.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
tests/basic/ec/ec-fast-fgetxattr.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/xdr/src/glusterfs3.h: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/rpc-transport/socket/src/socket.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
rpc/rpc-lib/src/rpc-clnt.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
extras/geo-rep/gsync-sync-gfid.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-xml-output.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-rpc-ops.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-volume.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-system.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-snapshot.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-peer.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
cli/src/cli-cmd-global.c: Move to GF_MALLOC() instead of GF_CALLOC() when possible
It doesn't make sense to calloc (allocate and clear) memory
when the code right away fills that memory with data.
It may be optimized by the compiler, or have a microscopic
performance improvement.
In some cases, also changed allocation size to be sizeof some
struct or type instead of a pointer - easier to read.
In some cases, removed redundant strlen() calls by saving the result
into a variable.
1. Only done for the straightforward cases. There's room for improvement.
2. Please review carefully, especially for string allocation, with the
terminating NULL string.
Only compile-tested!
updates: bz#1193929
Original-Author: Yaniv Kaul <ykaul@redhat.com>
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Change-Id: I16274dca4078a1d06ae09a0daf027d734b631ac2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
strncpy may not be very efficient for short strings copied into
a large buffer: If the length of src is less than n,
strncpy() writes additional null bytes to dest to ensure
that a total of n bytes are written.
Instead, use snprintf(). Check for truncated output
where applicable.
Also:
- save the result of strlen() and re-use it when possible.
- move from strlen to SLEN (sizeof() ) for const strings.
Compile-tested only!
Change-Id: I54e80d4f4a80e98d3775e376efe05c51af0b29eb
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
for better traceability between fuse requests and gluster requests a
mapping needs to be established in the logs between the two IDs
BUG: 1623408
Change-Id: I0ef82fe69c1ad7d0ce9e3ac4f35cd82aa6e9bca9
fixes: bz#1623408
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: gfapi client is getting crashed in rpc_clnt_connection_cleanup
at the time of destroying saved_frames
Solution: gfapi client is getting crashed because saved_frame ptr is
already freed in rpc_clnt_destroy.To avoid the same update
code in rpc_clnt_destroy
Change-Id: Id8cce102b49f26cfd86ef88257032ed98f43192b
fixes: bz#1607783
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The state management of "connected" in rpc is ad-hoc as far as the
responsibility goes. Note that there is nothing wrong with
functionality itself. rpc layer manages this state in disconnect
codepath and has exposed an api to manage this one from
consumers. Note that rpc layer never sets "connected" to true by
itself, which forces the consumers to use this api to get a working
rpc connection. The situation is best captured from a comment in code
from Jeff Darcy in glusterfsd/src/gf-attach.c:
-/*
- * In a sane world, the generic RPC layer would be capable of tracking
- * connection status by itself, with no help from us. It might invoke our
- * callback if we had registered one, but only to provide information. Sadly,
- * we don't live in that world. Instead, the callback *must* exist and *must*
- * call rpc_clnt_{set,unset}_connected, because that's the only way those
- * fields get set (with RPC both above and below us on the stack). If we don't
- * do that, then rpc_clnt_submit doesn't think we're connected even when we
- * are. It calls the socket code to reconnect, but the socket code tracks this
- * stuff in a sane way so it knows we're connected and returns EINPROGRESS.
- * Then we're stuck, connected but unable to use the connection. To make it
- * work, we define and register this trivial callback.
- */
Also, consumers of rpc know about state of connection only through the
notifications sent by rpc-clnt. So, consumers don't have any extra
information to manage the state and hence letting them manage the
state is counter intuitive. This patch cleans that up and instead
moves the responsibility of state management of rpc layer into
itself.
Change-Id: I31e641a60795fc480ca753917f4b2579f1e05094
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Fixes: bz#1585585
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The auth_value was being reset to AUTH_GLUSTERFS_v2
during rpc disconnect. It shoud not be reset. The
disconnect during portmap request can race with
handshake. If handshake happens first and
disconnect later, auth_value would set to default
value and it never sets back to actual auth_value
fixes: bz#1579276
Change-Id: Ib46c9e01a97f6defb3fd1e0423fdb4b899b4a361
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
| |
When a saved frame is to be forced unwind, there is no need to pass an
empty iovector without any data pointed to.
Change-Id: I6e858fb38644326e22239b83272b15db656035e5
BUG: 1523122
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce another authentication header which can now send more data.
This is useful because this data can be common for all the fops, and
we don't need to change all the signatures.
As part of this, made rpc-clnt.c little more modular to support multiple
authentication structures.
stack.h changes are placeholder for the ctime etc, can be moved later
based on need.
updates #384
Change-Id: I6111c13cfd2ec92e2b4e9295896bf62a8a33b2c7
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
If the transport object is freed in rpc_transport_unref,
a notify of RPC_TRANSPORT_CLEANUP is push to rpc_clnt_notify,
where the rpc_clnt(contains conn) is freed.
After that, using of conn after rpc_transport_unref is use after freed.
Change-Id: I5cac8a8e7ced7c1079930080a12abf02d46667d5
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In glusterfs code base we call mutex_lock/unlock to take
reference/dereference for a object.Sometime it could be
reason for lock contention also.
Solution: There is no need to use mutex to increase/decrease ref
counter, instead of using mutex use gcc builtin ATOMIC
operation.
Test: I have not observed yet how much performance gain after apply
this patch specific to glusterfs but i have tested same
with below small program(mutex and atomic both) and
get good difference.
static int numOuterLoops;
static void *
threadFunc(void *arg)
{
int j;
for (j = 0; j < numOuterLoops; j++) {
__atomic_add_fetch (&glob, 1,__ATOMIC_ACQ_REL);
}
return NULL;
}
int
main(int argc, char *argv[])
{
int opt, s, j;
int numThreads;
pthread_t *thread;
int verbose;
int64_t n = 0;
if (argc < 2 ) {
printf(" Please provide 2 args Num of threads && Outer Loop\n");
exit (-1);
}
numThreads = atoi(argv[1]);
numOuterLoops = atoi (argv[2]);
if (1) {
printf("\tthreads: %d; outer loops: %d;\n",
numThreads, numOuterLoops);
}
thread = calloc(numThreads, sizeof(pthread_t));
if (thread == NULL) {
printf ("calloc error so exit\n");
exit (-1);
}
__atomic_store (&glob, &n, __ATOMIC_RELEASE);
for (j = 0; j < numThreads; j++) {
s = pthread_create(&thread[j], NULL, threadFunc, NULL);
if (s != 0) {
printf ("pthread_create failed so exit\n");
exit (-1);
}
}
for (j = 0; j < numThreads; j++) {
s = pthread_join(thread[j], NULL);
if (s != 0) {
printf ("pthread_join failed so exit\n");
exit (-1);
}
}
printf("glob value is %ld\n",__atomic_load_n (&glob,__ATOMIC_RELAXED));
exit(0);
}
time ./thr_count 800 800000
threads: 800; outer loops: 800000;
glob value is 640000000
real 1m10.288s
user 0m57.269s
sys 3m31.565s
time ./thr_count_atomic 800 800000
threads: 800; outer loops: 800000;
glob value is 640000000
real 0m20.313s
user 1m20.558s
sys 0m0.028
Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rpc_clnt_submit() acquires conn->lock before call to
rpc_transport_submit_request() and subsequent queuing of frame into
saved_frames list. However, as part of handling RPC_TRANSPORT_MSG_RECEIVED
and RPC_TRANSPORT_MSG_SENT notifications in rpc_clnt_notify(), conn->lock
is again used to atomically update conn->last_received and conn->last_sent
event timestamps.
So when conn->lock is acquired as part of submitting a request,
a parallel POLLIN notification gets blocked at rpc layer until the request
submission completes and the lock is released.
To get around this, this patch call clock_gettime (instead to call gettimeofday)
to update event timestamps in conn->last_received and conn->last_sent and to
call clock_gettime don't need to call mutex_lock because it (clock_gettime)
is thread safe call.
Note: Run fio on vm after apply the patch, iops is improved after apply
the patch.
Change-Id: I347b5031d61c426b276bc5e07136a7172645d763
BUG: 1467614
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scan URL:
https://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2017-11-10-0f524f07/html/
ID: 9 (BAD_SHIFT)
ID: 58 (CHECKED_RETURN)
ID: 98 (DEAD_CODE)
ID: 249, 250, 251, 252 (MIXED_ENUMS)
ID: 289, 297 (NULL_RETURNS)
ID: 609, 613, 622, 644, 653, 655 (UNUSED_VALUE)
ID: 432 (RESOURCE_LEAK)
Change-Id: I2349877214dd38b789e08b74be05539f09b751b9
BUG: 789278
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Idf99908aa48718a7faf7f0bbc647a679ec548282
BUG: 1443145
Signed-off-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I8c6f6b642f025d1faf74015b8f7aaecd7ebfd4d5
BUG: 1443145
Signed-off-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 5b14c11d3cae38bc66006b02217ede485ae30dea.
BUG: 1484225
Change-Id: I3269d3fc64de3f3cc6f670ea564a87d7725e10fd
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/18113
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Tested-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
glusterd rpc code path attempts to reconnect to rebalance process
via the reconnect timer even after the rebalance process disconnection
Solution:
Set the clnt->disabled flag to 1 to avoid reconnection and cause
the clnt object to be unref'd
Change-Id: I4e38eaef45d2fdea86d25e9dff9f1af0cd29cf66
BUG: 1484225
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/18093
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
frames which time out at current second are missed out
Solution:
change test to include frames timing out at current second
i.e. timeout <= current.tv_sec
instead of
timeout < current.tv_sec
Change-Id: I459d47856ade2b657a0289e49f7f63da29186d6e
BUG: 1468433
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/17722
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I57ad970411db1ccd3d2c56c504c7da9cc221051f
BUG: 1448692
Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-on: https://review.gluster.org/17198
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 086436a introduced generation number (cleanup_gen) to ensure that
rpc layer doesn't end up cleaning up the connection object if
application layer has already destroyed it. Bumping up cleanup_gen was
done only in rpc_clnt_connection_cleanup (). However the same is needed
in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed
through the reconnect event in the application layer, rpc layer will
still end up in trying to delete the object resulting into double free
and crash.
Peer probing an invalid host/IP was the basic test to catch this issue.
Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46
BUG: 1433578
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: https://review.gluster.org/16914
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Milind Changire <mchangir@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Locking during notify was introduced as part of commit
aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced
to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent
xlators [2]. However as part of handling DISCONNECT protocol/client
does unwind saved frames (with failure) waiting for responses. This
saved_frames_unwind can be a costly operation and hence ideally
shouldn't be included in the critical section of notifylock, as it
unnecessarily delays the reconnection to same brick. Also, its not a
good practise to pass control to other xlators holding a lock as it
can lead to deadlocks. So, this patch removes locking in rpc-clnt
while notifying parent xlators.
To fix [2], two changes are present in this patch:
* notify DISCONNECT before cleaning up rpc connection (same as commit
a6b63e11b7758cf1bfcb6798, patch [3]).
* protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc
connection and does a start while handling a DISCONNECT event from
rpc. Note that patch [3] was reverted as rpc_clnt_start called in
quick_reconnect path of protocol/client didn't invoke connect on
transport as the connection was not cleaned up _yet_ (as cleanup was
moved post notification in rpc-clnt). This resulted in clients never
attempting connect to bricks.
Note that one of the neater ways to fix [2] (without using locks) is
to introduce generation numbers to map CONNECT and DISCONNECTS across
epochs and ignore DISCONNECT events if they don't belong to current
epoch. However, this approach is a bit complex to implement and
requires time. So, current patch is a hacky stop-gap fix till we come
up with a more cleaner solution.
[1] http://review.gluster.org/15916
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626
[3] http://review.gluster.org/15681
Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
BUG: 1427012
Reviewed-on: https://review.gluster.org/16784
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Milind Changire <mchangir@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I003e38b238704d3345d46688355bcf3702455ba1
BUG: 1399593
Signed-off-by: Mateusz Slupny <mateusz.slupny@appeartv.com>
[ndevos: rebased after I8ff5d1a32 moved the code around]
Reviewed-on: https://review.gluster.org/15969
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When SSL is enabled or if "transport.socket.own-thread" option is set
then socket_poller is run as different thread. Currently during
disconnect or PARENT_DOWN scenario we don't wait for this thread
to terminate. PARENT_DOWN will disconnect the socket layer and
cleanup resources used by socket_poller.
Therefore before disconnect we should wait for poller thread to exit.
Change-Id: I71f984b47d260ffd979102f180a99a0bed29f0d6
BUG: 1404181
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/16141
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaushal M <kaushal@redhat.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible that the notification thread which notifies
protocol/client layer about the disconnection is put to sleep
and meanwhile, a fuse thread or a timer thread initiates and
completes reconnection to the brick. The notification thread
is then woken up and protocol/client layer updates its flags
to indicate that network is disconnected. No reconnection is
initiated because reconnection is rpc-lib layer's responsibility
and its flags indicate that connection is connected.
Fix: Serialize connect and disconnect notify
Credit: Raghavendra Talur <rtalur@redhat.com>
Change-Id: I8ff5d1a3283b47f5c26848a42016a40bc34ffc1d
BUG: 1386626
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-on: http://review.gluster.org/15916
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a6b63e11b7758cf1bfcb67985e25ec02845f0995.
Nithya and Rajesh found that the mount fails sometimes after this patch
was merged so reverting it.
BUG: 1386626
Change-Id: I959a5b6c7da61368cf4c67c98193c6e8fdd1755d
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/15838
Reviewed-by: N Balachandran <nbalacha@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
There was a hang because unlock on an entry failed with
ENOTCONN.
Client thinks the connection is down where as server thinks
the connection is up.
This is the race we are seeing:
1) Connection from client to the brick disconnects.
2) Saved frames unwind is called which unwinds all
frames that were wound before disconnect.
3) connection from client to the brick happens and
setvolume.
4) Disconnect notification for the connection in 1)
comes now and calls client_rpc_notify() which
marks the connection to be offline even when the
connection is up.
This is happening because I/O can retrigger connection
before disconnect notification is sent to the higher
layers in rpc.
Fix:
Notify the higher layers that a disconnect happened and then
go ahead with reconnect logic.
For the logs which point to the information above check:
https://bugzilla.redhat.com/show_bug.cgi?id=1386626#c1
Thanks to Raghavendra G for suggesting the correct fix.
BUG: 1386626
Change-Id: I3c84ba1f17010bd69049fa88ec5f0ae431f8cda9
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/15681
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
http://review.gluster.org/14085 fixes a/the "leak" - via the
generated rpc/xdr headers - of pragmas that mask these warnings.
However 14085 won't pass the smoke test until all the warnings are
fixed.
Change-Id: I20d91091bee0bf8f198a307ebba4b284bc3817ff
BUG: 1369124
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/15240
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PROBLEM:
1. Freeing up rpc_clnt object might lead to crashes. Well,
it was not a necessity to free rpc-clnt object till now
because all the existing use cases needs to reconnect
back on disconnects. Hence timer code was not taking
ref on rpc-clnt object.
Glusterd had some use-cases that led to crash due to
ping-timer and they fixed only those code paths that
involve ping-timer.
Now, since changelog has an use-case where rpc-clnt
need to be freed up, we need to fix timer code to take
refs
2. In changelog, because of issue 1, only mydata was being
freed which is incorrect. And there are races where
rpc-clnt object would access the freed mydata which
would lead to crashes.
Since changelog xlator resides on brick side and is long
living process, if multiple libgfchangelog consumers
register to changelog and disconnect/reconnect mulitple
times, it would result in leak of 'rpc-clnt' object
for every connect/disconnect.
SOLUTION:
1. Handle ref/unref of 'rpc_clnt' structure in timer
functions properly.
2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT
after disabling timers and free mydata on RPC_CLNT_DESTROY.
RPC SETUP IN CHANGELOG:
1. changelog xlator initiates rpc server say 'changelog_rpc_server'
2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server'
3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server'
4. In return changelog_rpc_server initiates a rpc client and connects back
to 'libgfchangelog_rpc_server'
REF/UNREF HANDLING IN TIMER FUNCTIONS:
Let's say rpc clnt refcount = 1
1. Take the ref before reigstering callback to timer queue
>>>> rpc_clnt_ref (say ref count becomes = 2)
2. Register a callback to timer say 'callback1'
3. If register fails:
>>>> rpc_clnt_unref (ref count = 1)
4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end
in 'callback1'. This is corresponding to ref taken in step 1
>>>> rpc_clnt_unref (ref count = 1)
5. The cycle from step-1 to step-4 continues....until timer cancel event happens
6. timer cancel of say 'callback1'
If timer cancel fails:
Do nothing, Step-4 would have unrefd
If timer cancel succeeds:
>>>> rpc_clnt_unref (ref count = 1)
Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09
BUG: 1316178
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/13658
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a peer rpc disconnect event has been already processed, skip the furthers as
processing them are overheads and sometimes may lead to a crash like due to a
double free
Change-Id: Iec589ce85daf28fd5b267cb6fc82a4238e0e8adc
BUG: 1318546
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: http://review.gluster.org/13790
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to fix regression caused by below patch -
http://review.gluster.org/#/c/13456/
As discussed over http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/14284,
patch #13456 caused a regression where in if there are any pending
rpc invocations, we end up accessing freed object. This patch
fixes it by allowing reconnect during rpc submit only if rpc
is not disabled.
Change-Id: I4ef4dd52bd42368bb89129f98bc973e46c6a39f4
BUG: 1295107
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/13592
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The transport object needs to get unref'ed when the rpc clnt
object is getting destroyed. But currently in rpc_clnt_disable()
we set conn->trans to NULL before it gets unref'ed leading to
transport object leak.
This change is to fix it by setting conn-tran to NULL only when
it is being unref'ed.
Change-Id: I79ba34e28ae19eb616035f36bbed1c2f47875b94
BUG: 1295107
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-on: http://review.gluster.org/13456
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I0b96b83ad8d06de9b2f5fc14073b94777885a775
BUG: 1261927
Signed-off-by: Anoop C S <anoopcs@redhat.com>
Reviewed-on: http://review.gluster.org/12153
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are three kinds of inline functions: plain inline, extern inline,
and static inline. All three have been removed from .c files, except
those in "contrib" which aren't our problem. Inlines in .h files, which
are overwhelmingly "static inline" already, have generally been left
alone. Over time we should be able to "lower" these into .c files, but
that has to be done in a case-by-case fashion requiring more manual
effort. This part was easy to do automatically without (as far as I can
tell) any ill effect.
In the process, several pieces of dead code were flagged by the
compiler, and were removed.
Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155
BUG: 1245331
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/11769
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The @owner argument tells RPC layer the xlator that owns
the connection and to which xlator THIS needs be set during
network notifications like CONNECT and DISCONNECT.
Code paths that originate from the head of a (volume) graph and use
STACK_WIND ensure that the RPC local endpoint has the right xlator saved
in the frame of the call (callback pair). This guarantees that the
callback is executed in the right xlator context.
The client handshake process which includes fetching of brick ports from
glusterd, setting lk-version on the brick for the session, don't have
the correct xlator set in their frames. The problem lies with RPC
notifications. It doesn't have the provision to set THIS with the xlator
that is registered with the corresponding RPC programs. e.g,
RPC_CLNT_CONNECT event received by protocol/client doesn't have THIS set
to its xlator. This implies, call(-callbacks) originating from this
thread don't have the right xlator set too.
The fix would be to save the xlator registered with the RPC connection
during rpc_clnt_new. e.g, protocol/client's xlator would be saved with
the RPC connection that it 'owns'. RPC notifications such as CONNECT,
DISCONNECT, etc inherit THIS from the RPC connection's xlator.
Change-Id: I9dea2c35378c511d800ef58f7fa2ea5552f2c409
BUG: 1235582
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/11436
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Idd94adb0457aaffce7330f56f98cebafa2c4dae8
BUG: 1249499
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/11818
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
| |
See http://review.gluster.org/9613 for more details.
Change-Id: I05ac0267b8c6f4e9b354acbbdf5469835455fb10
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/10821
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
BUG: 1227583
Change-Id: Ifac4dd8c633081483e4eba9d7e5a89837b2a453a
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/11041
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of including config.h in each file, and have the additional
config.h included from the compiler commandline (-include option).
When a .c file tests for a certain #define, and config.h was not
included, incorrect assumtions were made. With this change, it can not
happen again.
BUG: 1222319
Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10808
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... and disable reconnect timer on rpc_clnt_disconnect.
Root Cause
----------
gluster-NFS service wouldn't be started if there are no
started volumes that have nfs service enabled for them.
Before this fix we would initiate a connect even when
the gluster-NFS service wasn't (re)started. Compounding
that glusterd_conn_disconnect doesn't disable reconnect
timer. So, it is possible that the reconnect timer was
in execution when the timer event was attempted to be
removed.
Change-Id: Iadcb5cff9eafefa95eaf3a1a9413eeb682d3aaac
BUG: 1222378
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/10830
Tested-by: NetBSD Build System
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
Tested-by: Kaushal M <kaushal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity CIDs:
1210973
1124887
1124888
1124682
1124849
1124503
Change-Id: I012f6cf9d14753f572ab94aae6d442d1ef8df79a
BUG: 789278
Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com>
Reviewed-on: http://review.gluster.org/9600
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch ports nfs, shd, quotad & snapd with the approach suggested in
http://www.gluster.org/pipermail/gluster-devel/2014-December/043180.html
Change-Id: I4ea5b38793f87fc85cc9d2cf873727351dedffd2
BUG: 1191486
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/9428
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rpc_clnt_disable() sets rpc->conn->trans to NULL, hence we should not
call rpc_transport_unref() afterwards. I moved it before the
rpc_clnt_disable() call, but I am not sure it should be called at all,
perhaps it should just go away.
BUG: 764655
Change-Id: I488d0207494e3a3fad52e64e67b2e740b236b864
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/8393
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|