summaryrefslogtreecommitdiffstats
path: root/xlators/protocol/client/src/client-handshake.c
Commit message (Collapse)AuthorAgeFilesLines
* server: Mount fails after reboot 1/3 gluster nodesMohit Agrawal2020-02-101-2/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: At the time of coming up one server node(1x3) after reboot client is unmounted.The client is unmounted because a client is getting AUTH_FAILED event and client call fini for the graph.The client is getting AUTH_FAILED because brick is not attached with a graph at that moment Solution: To avoid the unmounting the client graph throw ENOENT error from server in case if brick is not attached with server at the time of authenticate clients. > Credits: Xavi Hernandez <xhernandez@redhat.com> > Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e > Fixes: bz#1793852 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (cherry picked from commit > f6421dff22a6ddaf14134f6894deae219948c89d) Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e Fixes: bz#1794019 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* protocol: add an option to force using old-protocolAmar Tumballi2019-04-101-3/+10
| | | | | | | | | | | | | | As protocol implements every fop, and in general a large part of the codebase. Considering our regression is run mostly in 1 machine, there was no way of forcing the client to use old protocol (while new one is available). With this patch, a new 'testing' option is provided which forces client to use old protocol if found. This should help increase the code coverage by at least 10k lines overall. updates: bz#1693692 Change-Id: Ie45256f7dea250671b689c72b4b6f25037cef948 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* client-rpc: Fix the payload being sent on the wirePoornima G2019-03-291-16/+13
| | | | | | | | | | | | | | | | | | | The fops allocate 3 kind of payload(buffer) in the client xlator: - fop payload, this is the buffer allocated by the write and put fop - rsphdr paylod, this is the buffer required by the reply cbk of some fops like lookup, readdir. - rsp_paylod, this is the buffer required by the reply cbk of fops like readv etc. Currently, in the lookup and readdir fop the rsphdr is sent as payload, hence the allocated rsphdr buffer is also sent on the wire, increasing the bandwidth consumption on the wire. With this patch, the issue is fixed. Fixes: bz#1692093 Change-Id: Ie8158921f4db319e60ad5f52d851fa5c9d4a269b Signed-off-by: Poornima G <pgurusid@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-051-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* protocol: remove the option 'verify-volfile-checksum'Amar Tumballi2018-11-051-84/+1
| | | | | | | | | | | | | | | | 'getspec' operation is not used between 'client' and 'server' ever since we have off-loaded volfile management to glusterd, ie, at least 7 years. No reason to keep the dead code! The removed option had no meaning, as glusterd didn't provide a way to set (or unset) this option. So, no regression should be observed from any of the existing glusterfs deployment, supported or unsupported. Updates: CVE-2018-14653 Updates: bz#1644756 Change-Id: I4a2e0f673c5bcd4644976a61dbd2d37003a428eb Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-121-1395/+1414
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-221-1/+1
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* client: remove the "connecting" state - it's not usedMichael Adam2018-06-211-3/+0
| | | | | | | | | The "connecting" state is not used anywhere really. It's only being set and printed. So remove it. Change-Id: I11fc8b0bdcda5a812d065543aa447d39957d3b38 fixes: bz#1583583 Signed-off-by: Michael Adam <obnox@samba.org>
* rpc/clnt: Don't let consumers manage "connected" stateRaghavendra G2018-06-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The state management of "connected" in rpc is ad-hoc as far as the responsibility goes. Note that there is nothing wrong with functionality itself. rpc layer manages this state in disconnect codepath and has exposed an api to manage this one from consumers. Note that rpc layer never sets "connected" to true by itself, which forces the consumers to use this api to get a working rpc connection. The situation is best captured from a comment in code from Jeff Darcy in glusterfsd/src/gf-attach.c: -/* - * In a sane world, the generic RPC layer would be capable of tracking - * connection status by itself, with no help from us. It might invoke our - * callback if we had registered one, but only to provide information. Sadly, - * we don't live in that world. Instead, the callback *must* exist and *must* - * call rpc_clnt_{set,unset}_connected, because that's the only way those - * fields get set (with RPC both above and below us on the stack). If we don't - * do that, then rpc_clnt_submit doesn't think we're connected even when we - * are. It calls the socket code to reconnect, but the socket code tracks this - * stuff in a sane way so it knows we're connected and returns EINPROGRESS. - * Then we're stuck, connected but unable to use the connection. To make it - * work, we define and register this trivial callback. - */ Also, consumers of rpc know about state of connection only through the notifications sent by rpc-clnt. So, consumers don't have any extra information to manage the state and hence letting them manage the state is counter intuitive. This patch cleans that up and instead moves the responsibility of state management of rpc layer into itself. Change-Id: I31e641a60795fc480ca753917f4b2579f1e05094 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1585585
* protocol/client: fix memory corruptionXavi Hernandez2018-03-091-3/+6
| | | | | | | | | | | | | There was an issue when some accesses to saved_fds list were protected by the wrong mutex (lock instead of fd_lock). Additionally, the retrieval of fdctx from fd's context and any checks done on it have also been protected by fd_lock to avoid fdctx to become outdated just after retrieving it. Change-Id: If2910508bcb7d1ff23debb30291391f00903a6fe BUG: 1553129 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* protcol/client: Insert dummy clnt-lk-version to avoid upgrade failureAnoop C S2018-02-141-0/+9
| | | | | | | | | | | | | | | | | With https://review.gluster.org/#/c/12363/ being merged, we no longer send client's lk-version to server side and the corresponding check on server is also removed. But when clients are upgraded prior to servers, the check for lk-version at server side fails and is reported back to clients resulting in disconnection. Since we don't have lock-recovery (lk-version and grace-timeout) logic anymore in code base our best bet would be to add client's default lk-version i.e, 1, into the dictionary just to make server side check pass and continue with remaining SETVOLUME operations. Change-Id: I441b67bd271d1e9ba9a7c08703e651c7a6bd945b BUG: 1544699 Signed-off-by: Anoop C S <anoopcs@redhat.com>
* protocol: utilize the version 4 xdrAmar Tumballi2018-02-011-3/+261
| | | | | | | updates #384 Change-Id: Id80bf470988dbecc69779de9eb64088559cb1f6a Signed-off-by: Amar Tumballi <amarts@redhat.com>
* protocol: Remove lock recovery logic from client and serverAnoop C S2018-01-291-354/+11
| | | | | | Change-Id: I27f5e1e34fe3eac96c7dd88e90753fb5d3d14550 BUG: 1272030 Signed-off-by: Anoop C S <anoopcs@redhat.com>
* rpc/*: auth-header changesAmar Tumballi2018-01-171-10/+15
| | | | | | | | | | | | | | | | | Introduce another authentication header which can now send more data. This is useful because this data can be common for all the fops, and we don't need to change all the signatures. As part of this, made rpc-clnt.c little more modular to support multiple authentication structures. stack.h changes are placeholder for the ctime etc, can be moved later based on need. updates #384 Change-Id: I6111c13cfd2ec92e2b4e9295896bf62a8a33b2c7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* protocol/client: reduce lock contentionZhang Huan2017-12-261-15/+16
| | | | | | | | | | | | Current use of a per-client mutex to protect fdctx introduces lock contentions when there are dozens of file operations active. Use finer grain spinlock to reduce contention, and put retrieving fdctx out of lock. Change-Id: Iea3e2eb481e76a5d73a582ba81529180c5b88248 BUG: 1519598 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* glusterfs: Use gcc builtin ATOMIC operator to increase/decreate refcount.Mohit Agrawal2017-12-121-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In glusterfs code base we call mutex_lock/unlock to take reference/dereference for a object.Sometime it could be reason for lock contention also. Solution: There is no need to use mutex to increase/decrease ref counter, instead of using mutex use gcc builtin ATOMIC operation. Test: I have not observed yet how much performance gain after apply this patch specific to glusterfs but i have tested same with below small program(mutex and atomic both) and get good difference. static int numOuterLoops; static void * threadFunc(void *arg) { int j; for (j = 0; j < numOuterLoops; j++) { __atomic_add_fetch (&glob, 1,__ATOMIC_ACQ_REL); } return NULL; } int main(int argc, char *argv[]) { int opt, s, j; int numThreads; pthread_t *thread; int verbose; int64_t n = 0; if (argc < 2 ) { printf(" Please provide 2 args Num of threads && Outer Loop\n"); exit (-1); } numThreads = atoi(argv[1]); numOuterLoops = atoi (argv[2]); if (1) { printf("\tthreads: %d; outer loops: %d;\n", numThreads, numOuterLoops); } thread = calloc(numThreads, sizeof(pthread_t)); if (thread == NULL) { printf ("calloc error so exit\n"); exit (-1); } __atomic_store (&glob, &n, __ATOMIC_RELEASE); for (j = 0; j < numThreads; j++) { s = pthread_create(&thread[j], NULL, threadFunc, NULL); if (s != 0) { printf ("pthread_create failed so exit\n"); exit (-1); } } for (j = 0; j < numThreads; j++) { s = pthread_join(thread[j], NULL); if (s != 0) { printf ("pthread_join failed so exit\n"); exit (-1); } } printf("glob value is %ld\n",__atomic_load_n (&glob,__ATOMIC_RELAXED)); exit(0); } time ./thr_count 800 800000 threads: 800; outer loops: 800000; glob value is 640000000 real 1m10.288s user 0m57.269s sys 3m31.565s time ./thr_count_atomic 800 800000 threads: 800; outer loops: 800000; glob value is 640000000 real 0m20.313s user 1m20.558s sys 0m0.028 Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* rpc : Change the way client uuid is builtPoornima G2017-11-201-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Today the main users of client uuid are protocol layers, locks, leases. Protocol layers requires each client uuid to be unique, even across connects and disconnects. Locks and leases on the server side also use the same client uid which changes across file migrations. Which makes the graph switch and file migration tedious for locks and leases. file migration across bricks becomes difficult as client uuid for the same client, is different on the other brick. The exact set of issues exists for leases as well. Solution would be to introduce a constant in the client-uid string which the locks and leases can use to identify the owner client across bricks. Client uuid currently: %s(ctx uuid)-%s(protocol client name)-%d(graph id)%s(setvolume count/reconnect count) Proposed Client uuid: "CTX_ID:%s-GRAPH_ID:%d-PID:%d-HOST:%s-PC_NAME:%s-RECON_NO:%s" - CTX_ID: This is will be constant per client. - GRAPH_ID, PID, HOST, PC_NAME(protocol client name), RECON_NO(setvolume count) remains the same. Change-Id: Ia81d57a9693207cd325d7b26aee4593fcbd6482c BUG: 1369028 Signed-off-by: Susant Palai <spalai@redhat.com>
* rpc: bring a new protocol versionAmar Tumballi2017-11-071-0/+14
| | | | | | | | | | * xdr: add gfid to on wire format for fsetattr/rchecksum * as it is change in on wire XDR format, needed backward compatible RPC programs. Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 827334 Change-Id: Id0a2da3632516dc1a5560dde2b151b2e5f0be8e5
* core: make gf_boolean_t a C99 bool instead of an enumJeff Darcy2017-11-031-1/+4
| | | | | | | | | | | | This reduces the space used from four bytes to one, and allows new code to use familiar C99 types/values interoperably with our old cruft. It does *not* change current declarations or code; that will be left for a separate - much larger - patch. Updates: #80 Change-Id: I5baedd17d3fb05b38f0d8b8bb9dd62824475842e Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* protocol/client: handle the subdir handshake properly for add-brickAmar Tumballi2017-10-291-1/+9
| | | | | | | | | There should be different way we handle handshake in case of subdir mount for the first time, and in case of subsequent graph changes. Change-Id: I2a7ba836433bb0a0f4a861809e2bb0d7fbc4da54 BUG: 1505323 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Infra to indentify processhari gowtham2017-08-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: currently we can't identify which process is running and how many instances of it are available. Fix: name the process when its spawned and send it to the server and save it in the client_t The processes that abide by this change from this patch are: 1) fuse mount, 2) rebalance, 3) selfheal, 4) tier, 5) quota, 6) snapshot, 7) brick. 8) gfapi (by default. gfapi.<processname> if processname is found) Note: fuse gets a process name as native-fuse-client by default. If the user gives a name for the fuse and spawns it, it will be of this type --process-name native-fuse-client.<name_specified>. This can be made use by the process like aux mount done by quota, geo-rep, etc by adding another option in the aux mount " -o process-name=gsync_mount" Updates: #178 Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: Ie4d02257216839338043737691753bab9a974d5e Reviewed-on: https://review.gluster.org/17957 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterfsd: allow subdir mountAmar Tumballi2017-08-041-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: 1. Take subdir mount option in client (mount.gluster / glusterfsd) 2. Pass the subdir mount to server-handshake (from client-handshake) 3. Handle subdir-mount dir's lookup in server-first-lookup and handle all fops resolution accordingly with proper gfid of subdir 4. Change the auth/addr module to handle the multiple subdir entries in option, and valid parsing. How to use the feature: `# mount -t glusterfs $hostname:/$volname/$subdir /$mount_point` Or `# mount -t glusterfs $hostname:/$volname -osubdir_mount=$subdir /$mount_point` Option can be set like: `# gluster volume set <volname> auth.allow "/subdir1(192.168.1.*),/(192.168.10.*),/subdir2(192.168.8.*)"` Updates #175 Change-Id: I7ea57f76ddbe6c3862cfe02e13f89e8a39719e11 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/17141 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* protocol/client: Fix double free of client fdctx destroyRavishankar N2017-02-131-22/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the race between fd re-open code and fd release code, both of which free the fd context due to a race in certain variable checks as explained below: 1. client process (shd in the case of this BZ) sends an opendir to its children (client xlators) which send the fop to the bricks to get a valid fd. 2. Client xlator loses connection to the brick. fdctx->remotefd is -1 3. Client re-establishes connection. After handshake, it reopens the dir and sets fdctx->remotefd to a valid fd in client3_3_reopendir_cbk(). 4. Meanwhile, shd sends a fd unref after it is done with the opendir. This triggers a releasedir (since fd->refcount becomes 0). 5. client3_3_releasedir() sees that fdctx-->remotefd is a valid number (i.e not -1), sets fdctx->released=1 and calls client_fdctx_destroy() 6. As a continuation of step3, client_reopen_done() is called by client3_3_reopendir_cbk(), which sees that fdctx->released==1 and again calls client_fdctx_destroy(). Depending on when step-5 does GF_FREE(fdctx), we may crash at any place in step-6 in client3_3_reopendir_cbk() when it tries to access fdctx->{whatever}. Change-Id: Ia50873d11763e084e41d2a1f4d53715438e5e947 BUG: 1418629 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16521 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Add info on op-version for clients in vol status outputSamikshan Bairagya2017-01-121-0/+6
| | | | | | | | | | | | | | | | | | | | | Currently the `gluster volume status <VOLNAME|all> clients` command gives us the following information on clients: 1. Brick name 2. Client count for each brick 3. hostname:port for each client 4. Bytes read and written for each client There is no information regarding op-version for each client. This patch adds that to the output. Change-Id: Ib2ece93ab00c234162bb92b7c67a7d86f3350a8d BUG: 1409078 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/16303 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* socket: socket disconnect should wait for poller thread exitRajesh Joseph2016-12-211-2/+2
| | | | | | | | | | | | | | | | | | | | | When SSL is enabled or if "transport.socket.own-thread" option is set then socket_poller is run as different thread. Currently during disconnect or PARENT_DOWN scenario we don't wait for this thread to terminate. PARENT_DOWN will disconnect the socket layer and cleanup resources used by socket_poller. Therefore before disconnect we should wait for poller thread to exit. Change-Id: I71f984b47d260ffd979102f180a99a0bed29f0d6 BUG: 1404181 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/16141 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* protocol/client: Fix potential mem-leaksKrutika Dhananjay2016-12-161-0/+1
| | | | | | | | | | | | | | | Commit 93eaeb9c93be3232f24e840044d560f9f0e66f71 introduces leaks in INODELK callback where a dict is unserialized twice, leading to dict leaks. Change-Id: I219ccb2279f237ebc2e4fc366af4775a461929b8 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16156 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* protocol/client (no 2): fix unused variable warnings/errorsKaleb S. KEITHLEY2016-09-051-2/+0
| | | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a/the "leak" - via the generated rpc/xdr headers - of pragmas that mask these warnings. However 14085 won't pass the smoke test until all the warnings are fixed. BUG: 1369124 Change-Id: I54055b3b1038374b4e21432da48fdaeca2938289 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15339 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Anuradha Talur <atalur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* protocol/server: Fix client/server compatibilityAvra Sengupta2016-06-281-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | The 3.8 client expects a child_up key from the server indicating the status of the server translators. This key is not being sent by the servers running older versions, thereby breaking compatibility. With this patch we are treating the absence of the said key as an indication that the server trying to connect to this client is running an older version and hence in such a case we are setting conf->child_up as _gf_true explicitly. This should suffice in emulating the older behavior. Due to the nature of this bug, requiring two version to be reproducible, there are no testcases added for the same. Change-Id: I29e0a5c63b55380dc9db8e42852d7e95b64a2b2e BUG: 1350327 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/14811 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.org>
* protocol client/server: Fix client-server handshakeAvra Sengupta2016-03-101-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently on a successful connection between protocol server and client, the protocol client initiates a CHILD_UP event in the client stack. At this point in time, only the connection between server and client is established, and there is no guarantee that the server side stack is ready to serve requests. It works fine now, as most server side translators are not dependent on any other factors, before being able to serve requests today and hence they are up by the time the client stack translators receive the CHILD_UP (initiated by client handshake). The gap here is exposed when certain server side translators like NSR-Server for example, have a couple of protocol clients as their child(connecting them to other bricks), and they can't really serve requests till a quorum of their children are up. Hence these translators should defer sending CHILD_UP till they have enough children up, and the same needs to be propagated to the client stack translators. Fix: Maintain a child_up variable in both the protocol client and protocol server translators. The protocol server should update this value based on the CHILD_UP and CHILD_DOWN events it receives from the translators below it. On receiving such an event it should forward that event to the client. The protocol client on receiving such an event should forward it up the client stack, thereby letting the client translators correctly know that the server is up and ready to serve. The clients connecting later(long after a server has initialized and processed it's CHILD_UP events), will receive a child_up status as part of the handshake, and based on the status of the server's child_up, can either propagate a CHILD_UP event or defer it. Change-Id: I0807141e62118d8de9d9cde57a53a607be44a0e0 BUG: 1312845 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/13549 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* protocol/client: Remove dead code from client_rpc_notifyAnoop C S2015-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | Normally GF_EVENT_CHILD_UP is dispatched after client handshake. But we have some dead code in client_rpc_notify which is assumed to do the same on receiving RPC_CLNT_CONNECT. This dispatch is based on a condition whether "disable-handshake" is enabled or not. Since we require client-handshake everytime we have a connect this check for "disable-handshake" is invalid and no longer required. Moreover this option is never handled in any of the translators. Change-Id: Ic862d6ac08cd3b18cf231f50140cd00e84e52ca0 BUG: 1227667 Signed-off-by: Anoop C S <anoopcs@redhat.com> Reviewed-on: http://review.gluster.org/12170 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* protocol/client: Properly handle return value in clnt_release_reopen_fdAnoop C S2015-06-251-1/+0
| | | | | | | | | | | | | | On account of a lock reacquire failure [in clnt_release_reopen_fd()] the return value, on submitting the client request for release of reopened fd, is not honoured correctly. Change-Id: Iff11523b2cc6f284e806855f32a13d8c4432f1c6 BUG: 1227667 Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/11088 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* protocol/client: Remove unused function clnt_mark_fd_bad()Anoop C S2015-06-221-10/+0
| | | | | | | | | | | | | clnt_mark_fd_bad() is no longer used to mark the fd bad. Instead we make use of client_mark_fd_bad() to do the same. Change-Id: I09af892d8c0c5d1cf853ff020e8596c53d9539c0 BUG: 1227667 Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/11063 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* protocol/client : removing duplicate printing in gf_msgManikandan Selvaganesh2015-06-201-12/+11
| | | | | | | | | | | | | | Since the 3rd and 5th argument of gf_msg framework prints the error string in case of strerror(), 5th argument is removed. Change-Id: Ib1794ea2d4cb5c46a39311f0afcfd7e494540506 BUG: 1194640 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/11280 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* protocol/client : porting log messages to new frameworkManikandan Selvaganesh2015-06-151-141/+189
| | | | | | | | | | Change-Id: I9bf2ca08fef969e566a64475d0f7a16d37e66eeb BUG: 1194640 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/10042 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-291-5/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Xlators : Fixed typosManikandan Selvaganesh2015-04-021-1/+1
| | | | | | | | | | | Change-Id: I948f85cb369206ee8ce8b8cd5e48cae9adb971c9 BUG: 1075417 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9529 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
* protocol-client: Removal of Dead Codearao2015-03-301-9/+0
| | | | | | | | | | | | | CID: 1124448 CID: 1124449 Removal of the dead code in the 'out' label. Change-Id: Ibdd05cbb6e2204f6aefdf442698225883c2d7734 BUG: 789278 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/9676 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Change the subvolume encoding in d_off to be a "global"Dan Lambright2015-03-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | position in the graph rather than relative (local) to a particular translator. Encoding the volume in this way allows a single translator to manage which brick is currently being scanned for directory entries. Using a single translator minimizes allocated bits in the d_off. It also allows multiple DHT translators in the same graph to have a common frame of reference (the graph position) for which brick is being read. Multiple DHT translators are needed for the Tiering feature. The fix builds off a previous change (9332) which removed subvolume encoding from AFR. The fix makes an equivalent change to the EC translator. More background can be found in fix 9332 and gluster-dev discussions [1]. DHT and AFR/EC are responsibile (as before) for choosing which brick to enumerate directory entries in over the readdir lifecycle. The client translator receiving the readdir fop encodes the dht_t. It is referred to as the "leaf node" in the graph and corresponds to the brick being scanned. When DHT decodes the d_off, it translates the leaf node to a local subvolume, which represents the next node in the graph leading to the brick. Tracking of leaf nodes is done in common utility functions. Leaf nodes counts and positional information are updated on a graph switch. [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8 BUG: 1190734 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9688 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* protocol/client: sequence CHILD_UP, CHILD_DOWN etc notificationsKrishnan Parthasarathi2015-02-071-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... from all bricks in the volume This patch is important in the context of MT epoll. With MT epoll, notification events from client xlators could reach cluster xlators like afr, dht, ec, stripe etc. in different orders. For e.g, In a distributed replicate volume of 2 bricks, namely Brick1 and Brick2, the following network events are observed by a mount process. - connection to Brick1 is broken. - connection to Brick1 has been restored. - connection to Brick2 is broken. - connection to Brick2 has been restored. Without establishing a total ordering of events, we can't guarantee that cluster xlators like afr, dht perceive them in the same order. While we would expect afr (say) to perceive it as only one of Brick1 and Brick2 going down at any given time, it is possible for the notification of Brick2 going offline to race with the notification of Brick1 coming back online. Change-Id: I78f5a52bfb05593335d0e9ad53ebfff98995593d BUG: 1104462 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/9591 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rdma: client connection establishment takes more timeMohammed Rafi KC2014-11-181-16/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For rdma type only volume client connection establishment with server takes more than three seconds. Because for tcp,rdma type volume, will have 2 ports one for tcp and one for rdma, tcp port is stored with brickname and rdma port is stored as "brickname.rdma" during pamap_sighin. During the handshake when trying to get the brick port for rdma clients, since we are not aware of server transport type, we will append '.rdma' with brick name. So for tcp,rdma volume there will be an entry with '.rdma', but it will fail for rdma type only volume. So we will try again, this time without appending '.rdma' using a flag variable need_different_port, and it will succeed, but the reconnection happens only after 3 seconds. In this patch for rdma only type volume we will append '.rdma' during the pmap_signin. So during the handshake we will get the correct port for first try itself. Since we don't need to retry , we can remove the need_different_port flag variable. Change-Id: Ie8e3a7f532d4104829dbe995e99b35e95571466c BUG: 1153569 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/8934 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* rdma:rdma fuse mount hangs for tcp,rdma volumes if brick is down.Mohammed Rafi KC2014-11-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | When we try to mount a tcp,rdma volume as rdma transport using FUSE protocol, then mount will hang if the brick is down. When we kill a process, signal will be received in glusterfsd process and it will call pmap_signout with port listening on tcp only. In case of the tcp,rdma there will be two ports, and port which is listening for rdma will not called for sign out. So the mount process will try to connect to a port which is not open and it will keep trying to connect. This patch will call pmap_signout for rdma port also, So when mount tries to get the brick port,it will fail. Change-Id: I23676f65f96eb90b69b76478f7a21412a6aba70f BUG: 1143886 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/8762 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Ping timer implmentationKrishnan Parthasarathi2014-04-291-262/+0
| | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the existing client ping timer implementation, and makes use of the common code for implementing both client ping timer and the glusterd ping timer. A new gluster rpc program for ping is introduced. The ping timer is only started for peers that have this new program. The deafult glusterd ping timeout is 30 seconds. It is configurable by setting the option 'ping-timeout' in glusterd.vol . Also, this patch introduces changes in the glusterd-handshake path. The client programs for a peer are now set in the callback of dump_versions, for both the older handshake and the newer op-version handshake. This is the only place in the handshake process where we know what programs a peer supports. Change-Id: I035815ac13449ca47080ecc3253c0a9afbe9016a BUG: 1038261 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5202 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* rpc: transport may be destroyed while rpc isn'tKrishnan Parthasarathi2014-03-051-1/+1
| | | | | | | | | | | | | | | | rpc_clnt object is destroyed after the corresponding transport object is destroyed. But rpc_clnt_reconnect, a timer driven function, refers to the transport object beyond its 'life'. Instead, using the embedded connection object prevents use after free problem wrt transport object. Also, access transport object under conn->lock. Change-Id: Iae28e8a657d02689963c510114ad7cb7e6764e62 BUG: 962619 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6751 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/client: conn-id should be unique when lk-heal is offPranith Kumar K2014-02-171-8/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: It was observed that in some cases client disconnects and re-connects before server xlator could detect that a disconnect happened. So it still uses previous fdtable and ltable. But it can so happen that in between disconnect and re-connect an 'unlock' fop may fail because the fds are marked 'bad' in client xlator upon disconnect. Due to this stale locks remain on the brick which lead to hangs/self-heals not happening etc. For the exact bug RCA please look at https://bugzilla.redhat.com/show_bug.cgi?id=1049932#c0 Fix: When lk-heal is not enabled make sure connection-id is different for every setvolume. This will make sure that a previous connection's resources are not re-used in server xlator. Change-Id: Id844aaa76dfcf2740db72533bca53c23b2fe5549 BUG: 1049932 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* protocol/client: handle network disconnect/reconnect properlyAnand Avati2013-12-031-0/+1
| | | | | | | | | | | | | if client/server state versions match, we still need to notify parent xlators of reconnection (CHILD_UP) because they were notified of CHILD_DOWN at the time of disconnection. Change-Id: I36c4bde6d8c3db9cb0c48eeb10663b56897c932e BUG: 1037267 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6396 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* libglusterfs: Add monotonic clocking counter for timer threadHarshavardhana2013-10-151-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gettimeofday() returns the current wall clock time and timezone. Using these functions in order to measure the passage of time (how long an operation took) therefore seems like a no-brainer. This time suffer's from some limitations: a. They have a low resolution: “High-performance” timing by definition, requires clock resolutions into the microseconds or better. b. They can jump forwards and backwards in time: Computer clocks all tick at slightly different rates, which causes the time to drift. Most systems have NTP enabled which periodically adjusts the system clock to keep them in sync with “actual” time. The adjustment can cause the clock to suddenly jump forward (artificially inflating your timing numbers) or jump backwards (causing your timing calculations to go negative or hugely positive). In such cases timer thread could go into an infinite loop. From 'man gettimeofday': ---------- .. .. The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2). .. .. ---------- Rationale: For calculating interval timing for Timer thread, all that’s needed should be clock as a simple counter that increments at a stable rate. This is necessary to avoid the jumps which are caused by using "wall time", this counter must be monotonic that can never “tick” backwards, ever. Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb BUG: 1017993 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/client: Prevent excessive logging of client's "disconnect" messages.Venkatesh Somyajulu2013-05-281-0/+1
| | | | | | | | | | | | | | | | | | | | | Problem: Currently when gluster volume start force is executed, client process will talk to glusterd to get the port of the brick. But if brick's path is not available it cannot return brick's port. So client process will keep connecting and disconnecting from glusterd for port-query which is ultimately responsible for execssive logging of disconnect messages. Fix: Message will be logged just once at INFO level after the first disconnect from glusterd. Afterwards "disconnect" messages will be logged in DEBUG mode. Change-Id: I2b787f3820b5da45e090c562e5698fcfe24a02cd BUG: 959969 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4953 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>