summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/xlator.h
Commit message (Collapse)AuthorAgeFilesLines
* server: Resolve memory leak path in server_initMohit Agrawal2018-12-031-0/+2
| | | | | | | | | | | | | | Problem: 1) server_init does not cleanup allocate resources while it is failed before return error 2) dict leak at the time of graph destroying Solution: 1) free resources in case of server_init is failed 2) Take dict_ref of graph xlator before destroying the graph to avoid leak Change-Id: I9e31e156b9ed6bebe622745a8be0e470774e3d15 fixes: bz#1654917 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* xlator: add generic option parsing frameworkAmar Tumballi2018-11-021-0/+4
| | | | | | | | | | | | | | | | | | | | As an example, and also as an enhancement, added 'log-level' as a default option to every translator (glusterfs already support infrastructure to handle xl->loglevel). Corresponding infrastructure to add per xlator log-level is not present in glusterd volume-set. Plan is to get it sorted out in later patches or in GD2. * Why this is needed? - Mainly because we need to only add different log-level to some xlator to debug few things in a production system, while not changing overall log-level. This helps in better debug-ability. Updates: bz#1193929 Change-Id: Ia4098ce39197cd423345b3d31fe8315481681ab8 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* core: glusterfsd keeping fd open in index xlatorMohit Agrawal2018-10-081-0/+9
| | | | | | | | | | | | | | Problem: Current resource cleanup sequence is not perfect while brick mux is enabled Solution: 1) Destroying xprt after cleanup all fd associated with a client 2) Before call fini for brick xlators ensure no stub should be running on a brick Change-Id: I86195785e428f57d3ef0da3e4061021fafacd435 fixes: bz#1631357 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* Land clang-format changesGluster Ant2018-09-121-992/+807
| | | | Change-Id: I6f5d8140a06f3c1b2d196849299f8d483028d33b
* classification: provide infra to start labelling features/componentsAmar Tumballi2018-09-041-1/+5
| | | | | | | | | | | | | `doc/xlator-classification.md` talks about the reasoning and expectations Reviewers are expected to check the 'category' of new option / translator added in the codebase, and make sure the flag is always properly set. It helps to keep the 'expectation' proper on the codebase. updates: #430 Change-Id: I2bfc9934a5f6eed77fcc3e20364046242decc82c Signed-off-by: Amar Tumballi <amarts@redhat.com>
* fuse: add support for kernel writeback cacheCsaba Henk2018-05-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added kernel-writeback-cache command line and xlator option for requesting utilisation of the writeback cache of the kernel in FUSE_INIT (see [1]). - Added attr-times-granularity command line and xlator option via which granularity of the {a,m,c}time in stat (attr) data that we support can be indicated to kernel. This is a means to avoid divergence of the attr times between kernel and userspace that could occur with writeback-cache, while still maintaining maximum time precision the FUSE server is capable of (see [2]). - Handling FATTR_CTIME flag in FUSE_SETATTR that indicates presence of ctime in setattr payload. Currently we cannot associate arbitrary ctimes to files on backend, so we just touch them to update their ctimes to current time. Having ctimes in setattr payload is also a side effect of writeback cache (see [3] and [4]). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d99ff8, "fuse: Turn writeback cache on" [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e27c9d3, "fuse: fuse: add time_gran to INIT_OUT" [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e18bda, "fuse: add .write_inode" [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9e13f, "fuse: allow ctime flushing to userspace" Updates: #435 Change-Id: Id174c8e0c815c4456c35f8c53e41a6a507d91855 Signed-off-by: Csaba Henk <csaba@redhat.com>
* server: fix unresolved symbols by moving them to libglusterfsMohit Agrawal2018-04-201-0/+3
| | | | | | | | | | | | | | | | Problem: glusterd2 build is failed due to undefined symbol (xlator_mem_cleanup , glusterfsd_ctx) in server.so Solution: To resolve the same done below two changes 1) Move xlator_mem_cleanup code from glusterfsd-mgmt.c to xlator.c to be part of libglusterfs.so 2) replace glusterfsd_ctx to this->ctx because symbol glusterfsd_ctx is not part of server.so BUG: 1544090 Change-Id: Ie5e6fba9ed458931d08eb0948d450aa962424ae5 fixes: bz#1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* gluster: Sometimes Brick process is crashed at the time of stopping brickMohit Agrawal2018-04-191-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Sometimes brick process is getting crashed at the time of stop brick while brick mux is enabled. Solution: Brick process was getting crashed because of rpc connection was not cleaning properly while brick mux is enabled.In this patch after sending GF_EVENT_CLEANUP notification to xlator(server) waits for all rpc client connection destroy for specific xlator.Once rpc connections are destroyed in server_rpc_notify for all associated client for that brick then call xlator_mem_cleanup for for brick xlator as well as all child xlators.To avoid races at the time of cleanup introduce two new flags at each xlator cleanup_starting, call_cleanup. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/#/c/19700/) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ic4ab9c128df282d146cf1135640281fcb31997bf updates: bz#1544090
* glusterd: volume inode/fd status broken with brick muxhari gowtham2018-04-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | Problem: The values for inode/fd was populated from the ctx received from the server xlator. Without brickmux, every brick from a volume belonged to a single brick from the volume. So searching the server and populating it worked. With brickmux, a number of bricks can be confined to a single process. These bricks can be from different volumes too (if we use the max-bricks-per-process option). If they are from different volumes, using the server xlator to populate causes problem. Fix: Use the brick to validate and populate the inode/fd status. Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: I2543fa5397ea095f8338b518460037bba3dfdbfd fixes: bz#1566067
* cluster/afr: Make sure latency-arg is passed to afrPranith Kumar K2018-04-181-1/+1
| | | | | | | | | | | xlator_notify doesn't pass the extra arguments that come in the input function, so XLATOR_NOTIFY macro should be used instead to pass the extra arguments to the function. BUG: 1567881 fixes bz#1567881 Change-Id: Ic15b6c446638cbacf3149693147a754219037c47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cleanup: xlator_t structure's 'client_latency' variable is not usedSven Fischer2018-03-191-8/+7
| | | | | | | | | | | | | | | | | - Removed unused struct member and its one time usage. - cleaned up wrong white space member 'client_latency' was not used otherwise since it was added by commit 07cc8679cdf3b29680f4f105d0222da168d8bfc1 Author: Kevin Vigor <kvigor@fb.com> Date: Tue Mar 21 08:23:25 2017 -0700 Halo Replication feature for AFR translator Change-Id: Ibb0ea828d4090bbe8897f6af326b317884162a00 BUG: 1495153 Signed-off-by: Sven Fischer <sven@fischer-abc.de>
* core: provide infra to make any xlator pass-throughAmar Tumballi2018-03-091-0/+9
| | | | | | | updates: #304 Change-Id: If6a13d2e56b195390a386d720103a882e077f66c Signed-off-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: Fix volume_options_t structKaushal M2018-03-021-0/+11
| | | | | | | | | | | | | | | The volume_options_t struct was modified and a new member was introduced in the middle of the struct. This caused GD2 to crash when it tried to read the volume options. The new member has been moved to the end of the struct to correct this. And a note has been added to notify developers on how to modify this struct, and the xlator_api_t struct. Updates: gluster/glusterfs#302 Change-Id: I2e9899ec10516be29c7e9d574da53be8ec17a99e Signed-off-by: Kaushal M <kaushal@redhat.com>
* xlator.h: move options and other variables to the top of structureAmar Tumballi2017-12-221-22/+22
| | | | | | | | | | | | This helps external applications which wants to consume xlator_api to read only fields (and not functions) using dlopen() to write smaller structures/objects and still achieve their requirements. One such example is GD2 project. Updates #168 Change-Id: I8737939c8c72f6572ee1514201e9f9f8e4f37b40 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* metrics: provide options to dump metrics from xlatorsAmar Tumballi2017-12-061-0/+5
| | | | | | | | | | * Introduce xlator methods to allow dumping of metrics * Separate options to get the metrics dumped in a path Updates #168 Change-Id: I7df80df33b71d6f449f03c2332665b4a45f6ddf2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* rio/everywhere: add icreate/namelink fopSusant Palai2017-12-051-0/+20
| | | | | | | | | | | | | | | | | | | | | icreate creates inode, while namelink links the basename to it's parent gfid. For now mkdir is the primary user of these fops. Better distribution is acheived by creating the inode on ,(say) mds1 and linking the basename to it's parent gfid on mds2. The inode serves readdirp, stat etc. More details about the fops are present at: https://review.gluster.org/#/c/13395/3/design/DHT2/DHT2_Icreate_Namelink_Notes.md This backport of three patches from experimental branch. 1- https://review.gluster.org/#/c/18085/ 2- https://review.gluster.org/#/c/18086/ 3- https://review.gluster.org/#/c/18094/ Updates gluster/glusterfs#243 Change-Id: I1bd3d5a441a3cfab1acfeb52f15c6c867d362592 Signed-off-by: Susant Palai <spalai@redhat.com>
* libglusterfs: Add put fopPoornima G2017-12-051-0/+14
| | | | | | | | | | | | | | | | | Problem: It had been a longtime request to implement put fop in gluster. put fop in gluster may not have the exact sementics of HTTP PUT, but can be easily extended to do so. The subsequent patches, will contain more semantics on the put fop and its guarentees. Why compound fop framework is not used for put? Compound fop framework currently doesn't allow compounding of entry fop and inode fops, i.e. fops on multiple inodes cannot be combined in compound fop. Updates #353 Change-Id: Idb7891b3e056d46d570bb7e31bad1b6a28656ada Signed-off-by: Poornima G <pgurusid@redhat.com>
* xlator: provide a xlator_api_t structure to include all exported optionsAmar Tumballi2017-11-301-0/+74
| | | | | | | | | | each translator from now on can have just 1 symbol exported called 'xlator_api', which has all the required fields in it. Updates: #164 Change-Id: I48d54f5ec59fee842b1d55877e3ac5e9ec9b6bdd Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Coverity Issue: PW.INCLUDE_RECURSION in several filesGirjesh Rajoria2017-11-091-2/+0
| | | | | | | | | | | | | | Coverity ID: 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 423, 424, 425, 426, 427, 428, 429, 436, 437, 438, 439, 440, 441, 442, 443 Issue: Event include_recursion Removed redundant, recursive includes from the files. Change-Id: I920776b1fa089a2d4917ca722d0075a9239911a7 BUG: 789278 Signed-off-by: Girjesh Rajoria <grajoria@redhat.com>
* xlator: add more metrics per fopsAmar Tumballi2017-11-081-0/+16
| | | | | | | | | | | Make sure to handle these counters in STACK_WIND/UNWIND macro, and keep the counters as part of xlator_t structure itself, to provide infra to monitoring. Updates #137 Change-Id: Ib54d45e2321c2b095dac5810c37e6cdffe1f71b7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* stack: change gettimeofday() to clock_gettime()Amar Tumballi2017-11-061-42/+62
| | | | | | | | | | | | For achieving the above, needed below changes too. * more sanity into how 'frame->op' is assigned. * infra to have 'stats' as separate section in 'xlator_t' structure Updates #137 Change-Id: I36679bf9577f3ed00a695b4e7d92870dcb3db8e1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* mgtm/core : use sha hash function for volfile checkMohammed Rafi KC2017-07-101-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are storing the entire volfile and using this to check volfile change. With brick multiplexing there will be lot of graphs per process which will increase the memory foot print of the process. So instead of storing the entire graph we could use sha256 and we can compare the hash to see whether volfile change happened or not. Also with Brick multiplexing, the direct comparison of vol file is not correct. There are two problems. Problem 1: We are currently storing one single graph (the last updated volfile) whereas, what we need is the entire graph with all atttached bricks. If we fix this issue, we have second problem Problem 2: With multiplexing we have a graph that contains multiple bricks. But what we are checking as part of the reconfigure is, comparing the entire graph with one single graph, which will always fail. Solution: We create list in glusterfs_ctx_t that stores sha256 hash of individual brick graphs. When a graph changes happens we compare the stored hash and the current hash. If the hash matches, then no need for reconfigure. Otherwise we first do the reconfigure and then update the hash. For now, gfapi has not changed this way. Meaning when gfapi volfile fetch or reconfigure happens, we still store the entire graph and compare, each memory. This is fine, because libgfapi will not load brick graphs. But changing the libgfapi will make the code similar in both glusterfsd-mgmt and api. Also it helps to reduce some memory. Change-Id: I9df917a771a52b95622ab8f63af34ec390163a77 BUG: 1467986 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17709 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* multiple: fix struct/typedef inconsistenciesJeff Darcy2017-06-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | The most common pattern, both in our code and elsewhere, is this: struct _xyz { ... }; typedef struct _xyz xyz_t; These exceptions - especially call_frame/call_stack - have been slowing down code navigation for years. By converging on a single pattern, navigating from xyz_t in code to the actual definition of struct _xyz (i.e. without having to visit the typedef first) might even be automatable. Change-Id: I0e5dd1f51f98e000173c62ef4ddc5b21d9ec44ed Signed-off-by: Jeff Darcy <jdarcy@fb.com> Reviewed-on: https://review.gluster.org/17650 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* nl-cache: In case of nameless operations do not cachePoornima G2017-05-221-0/+1
| | | | | | | | | | | | | | | | | | Issue: In nameless lookup/other fops, parent inode will be NULL, when we try to add the cache to the NULL inode, it causes a crash. Hence handle the scenario of nameless fops, and do not cache/serve the nameless fops. Change-Id: I3b90f882ac89e6aaf3419db89e6f890797f37700 BUG: 1451588 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17316 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* libglusterfs: Fix a crash due to race between inode_ctx_set and inode_refPoornima G2017-02-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: Currently inode ref count is gaurded by inode_table->lock, and inode_ctx is gauarded by inode->lock. With the new patch [1] inode_ref was modified to change the inode_ctx to track the ref count per xlator. Thus inode_ref performed under inode_table->lock is modifying inode_ctx which has to be modified only under inode->lock Solution: When a inode is created, inode_ctx holder is allocated for all the xlators. Hence in case of inode_ctx_set instead of using the first free index in inode ctx holder, we can have predecided index for every xlator in the graph. Credits Pranith K <pkarampu@redhat.com> [1] http://review.gluster.org/13736 Change-Id: I1bfe111c211fcc4fcd761bba01dc87c4c69b5170 BUG: 1423373 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16622 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-0/+7
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: serialize init/reconfigure callsJeff Darcy2017-01-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | These functions do not generally "expect" to be called more than once in parallel, and many are likely to misbehave in that case (one case in DHT already). Such parallel calls have not generally happened because there are only a few places where we call these functions, and those have been implicitly serialized until recently. However, recent changes in the epoll layer change that, as does brick multiplexing. Therefore, the serialization is now explicit at the init/reconfigure level. It would be sufficient to serialize calls to a particular translator's init and reconfigure functions, but that would require per-translator locks and a bit more complexity in maintaining/using them. Since there's no clear reason why we would need or want to support a higher level of parallelism, the simpler approach of a global lock should suffice. Change-Id: I26296c2826e91dc00b7f0c2061bcc2964ef90c4c BUG: 1399134 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/16030 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/readdir-ahead: limit cache sizeRaghavendra G2016-12-221-0/+6
| | | | | | | | | | | | | | | | This patch introduces a new option called "rda-cache-limit", which is the maximum value the entire readdir-ahead cache can grow into. Since, readdir-ahead holds a reference to inode through dentries, this patch also accounts memory stored by various xlators in inode contexts. Change-Id: I84cc0ca812f35e0f9041f8cc71effae53a9e7f99 BUG: 1356960 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/16137 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* features/shard: Fill loc.pargfid too for named lookups on individual shardsKrutika Dhananjay2016-11-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a sharded volume when a brick is replaced while IO is going on, named lookup on individual shards as part of read/write was failing with ENOENT on the replaced brick, and as a result AFR initiated name heal in lookup callback. But since pargfid was empty (which is what this patch attempts to fix), the resolution of the shards by protocol/server used to fail and the following pattern of logs was seen: Brick-logs: [2016-11-08 07:41:49.387127] W [MSGID: 115009] [server-resolve.c:566:server_resolve] 0-rep-server: no resolution type for (null) (LOOKUP) [2016-11-08 07:41:49.387157] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-rep-server: 91833: LOOKUP(null) (00000000-0000-0000-0000-000000000000/16d47463-ece5-4b33-9c93-470be918c0f6.82) ==> (Invalid argument) [Invalid argument] Client-logs: [2016-11-08 07:41:27.497687] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.497755] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.498500] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.499680] E [MSGID: 133010] Also, this patch makes AFR by itself choose a non-NULL pargfid even if its ancestors fail to initialize all pargfid placeholders. Change-Id: I5f85b303ede135baaf92e87ec8e09941f5ded6c1 BUG: 1392445 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15788 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* core: add setactivelk () fopSusant Palai2016-05-011-0/+11
| | | | | | | | | | | Change-Id: Ic2ba77a1fdd27801a6e579e04e6c0dd93cd7127b BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14011 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* core: add getactivelk () fopSusant Palai2016-05-011-0/+12
| | | | | | | | | | | Change-Id: Ifd0ff278dcf43da064021f5c25e5dcd34347fcde BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/13970 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* performance/decompounder: Introducing decompounder xlatorAnuradha Talur2016-04-251-0/+8
| | | | | | | | | | | | | | This xlator decompounds the compound fops received, and executes them serially. Change-Id: Ieddcec3c2983dd9ca7919ba9d7ecaa5192a5f489 BUG: 1303829 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/13577 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: add lease fopPoornima G2016-04-211-0/+10
| | | | | | | | | | | | | Change-Id: Ia27d66b1061b0377857827515590eb89b18515c9 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/11596 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: add seek() FOPNiels de Vos2016-01-311-0/+11
| | | | | | | | | | | | | | Minimal infrastructure changes for the seek() FOP. This will provide SEEK_HOLE and SEEK_DATA functionalities. BUG: 1220173 Change-Id: I4b74fce8b0bad2f45291fd2c2b9e243c4f4a1aa9 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11480 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* xlators: add JSON FOP statistics dumps every N secondsRichard Wareing2015-10-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Summary: - Adds a thread to the io-stats translator which dumps out statistics every N seconds where N is configurable by an option called "diagnostics.stats-dump-interval" - Thread cleanly starts/stops when translator is unloaded - Updates macros to use "Atomic Builtins" (e.g. intel CPU extentions) to use memory barries to update counters vs using locks. This should reduce overhead and prevent any deadlock bugs due to lock contention. Test Plan: - Test on development machine - Run prove -v tests/basic/stats-dump.t Change-Id: If071239d8fdc185e4e8fd527363cc042447a245d BUG: 1266476 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/12209 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-291-5/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: use reference counting for mem_acct structuresJeff Darcy2015-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When freeing memory, our memory-accounting code expects to be able to dereference from the (previously) allocated block to its owning translator. However, as we have already found once in option validation and twice in logging, that translator might itself have been freed and the dereference attempt causes on of our daemons to crash with SIGSEGV. This patch attempts to fix that as follows: * We no longer embed a struct mem_acct directly in a struct xlator, but instead allocate it separately. * Allocated memory blocks now contain a pointer to the mem_acct instead of the xlator. * The mem_acct structure contains a reference count, manipulated in both the normal and translator allocate/free code using atomic increments and decrements. * Because it's now a separate structure, we can defer freeing the mem_acct until its reference count reaches zero (either way). * Some unit tests were disabled, because they embedded their own copies of the implementation for what they were supposedly testing. Life's too short to spend time fixing tests that seem designed to impede progress by requiring a certain implementation as well as behavior. Change-Id: Id929b11387927136f78626901729296b6c0d0fd7 BUG: 1211749 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/10417 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* build: make contrib/uuid dependency optionalNiels de Vos2015-04-101-0/+1
| | | | | | | | | | | | | | | | | | | On Linux systems we should use the libuuid from the distribution and not bundle and statically link the contrib/uuid/ bits. libglusterfs/src/compat-uuid.h has been introduced and should become an abstraction layer for different UUID APIs. Non-Linux operating systems should implement their compatibility layer there. Once all operating systems have an implementation in compat-uuid.h, we can remove contrib/uuid/ from the repository completely. Change-Id: I345e5357644be2521685e00358bb8c83c4ea0577 BUG: 1206587 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10129 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: Implementation of bit-rot xlatorVenky Shankar2015-03-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the "Signer" -- responsible for signing files with their checksums upon last file descriptor close (last release()). The event notification facility provided by the changelog xlator is made use of. Moreover, checksums are as of now SHA256 hash of the object data and is the only available hash at this point of time. Therefore, there is no special "what hash to use" type check, although it's does not take much to add various hashing algorithms to sign objects with. Signatures are stored in extended attributes of the objects along with the the type of hashing used to calculate the signature. This makes thing future proof when other hash types are added. The signature infrastructure is provided by bitrot stub: a little piece of code that sits over the POSIX xlator providing interfaces to "get or set" objects signature and it's staleness. Since objects are signed upon receiving release() notification, pre-existing data which are "never" modified would never be signed. To counter this, an initial crawler thread is spawned The crawler scans the entire brick for objects that are unsigned or "missed" signing due to the server going offline (node reboots, crashes, etc..) and triggers an explicit sign. This would also sign objects when bit-rot is enabled for a volume and/or after upgrade. Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b BUG: 1170075 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9711 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: Add inode context merge callbackVenky Shankar2015-03-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain translators may require to update the inode context of an already linked inode before unwinding the call to the client. Normally, such a case in encountered during parallel operations when a fresh inode is chosen at call (wind) time. In the callback path, one of inodes is successfully linked in the inode table, thereby the other inodes being thrown away (and the inode pointers for these calls being pointed to the linked inode). Translators which may have strict dependency on the correct value in the inode context would get stale values in inode context. This patch introduces a new callback which provides gives translators an opportunity to "patch" their respective inode contexts. Note that, as of now, this callback is only invoked during create()s unwind path. Although this might needed to be done for all dentry fops and lookup, but let that be done as an when required (bitrot stub requires this *only* for create()). Change-Id: I6cd91c2af473c44d1511208060d3978e580c67a6 BUG: 1170075 Original-Author: Raghavendra Bhat <rabhat@redhat.com> Original-Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9913 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Change the subvolume encoding in d_off to be a "global"Dan Lambright2015-03-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | position in the graph rather than relative (local) to a particular translator. Encoding the volume in this way allows a single translator to manage which brick is currently being scanned for directory entries. Using a single translator minimizes allocated bits in the d_off. It also allows multiple DHT translators in the same graph to have a common frame of reference (the graph position) for which brick is being read. Multiple DHT translators are needed for the Tiering feature. The fix builds off a previous change (9332) which removed subvolume encoding from AFR. The fix makes an equivalent change to the EC translator. More background can be found in fix 9332 and gluster-dev discussions [1]. DHT and AFR/EC are responsibile (as before) for choosing which brick to enumerate directory entries in over the readdir lifecycle. The client translator receiving the readdir fop encodes the dht_t. It is referred to as the "leaf node" in the graph and corresponds to the brick being scanned. When DHT decodes the d_off, it translates the leaf node to a local subvolume, which represents the next node in the graph leading to the brick. Tracking of leaf nodes is done in common utility functions. Leaf nodes counts and positional information are updated on a graph switch. [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8 BUG: 1190734 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9688 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Quota/marker : Support for inode quotavmallika2015-03-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the only way to retrieve the number of files/objects in a directory or volume is to do a crawl of the entire directory/volume. This is expensive and is not scalable. The new mechanism proposes to store count of objects/files as part of an extended attribute of a directory. Each directory's extended attribute value will indicate the number of files/objects present in a tree with the directory being considered as the root of the tree. Currently file usage is accounted in marker by doing multiple FOPs like setting and getting xattrs. Doing this with STACK WIND and UNWIND can be harder to debug as involves multiple callbacks. In this code we are replacing current mechanism with syncop approach as syncop code is much simpler to follow and help us implement inode quota in an organized way. Change-Id: Ibf366fbe07037284e89a241ddaff7750fc8771b4 BUG: 1188636 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/9567 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* every/where: add GF_FOP_IPC for inter-translator communicationJeff Darcy2015-03-171-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several features - e.g. encryption, erasure codes, or NSR - involve multiple cooperating translators which sometimes need a "private" means of communication amongst themselves. Historically we've used virtual or synthetic xattrs, but that's not very elegant and clutters up the getxattr/setxattr path which must also handle real xattr requests. This new fop should address that. The only argument is an int32_t "op" which should be recognized by the target translator. It is recommended that translators using these feature follow some convention regarding the ops that they define, to avoid conflicts. Using a hash of the target translator's type string as a base for a series of ops would probably be a good start. Any other information can be passed in both directions using xdata. The default behavior for this fop, as with any other, is to pass through to FIRST_CHILD. That makes use of this fop "transparent" to other translators that were written before it existed, but it also means that it only really works with pass-through translators. If a routing translator (such as DHT) or a fan-out translator (such as AFR) is involved, the IPC might not reach its intended destination unless those translators are modified to forward IPC fops along all paths. If an IPC gets all the way to storage/posix it is considered an error, much like an uncaught exception. We don't actually *do* anything in that case, but we do log it send back an EOPNOTSUPP error. This makes the "unrecognized opcode" condition distinguishable from the "no IPC support" condition (which would yield an RPC error instead) so clients can probe for the presence of a handler for their own favorite opcode and either use that or use old-school xattrs depending on the result. BUG: 1158628 Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: I84af1b17babe5b30ec03ecf027ae37d09b873968 Reviewed-on: http://review.gluster.org/8812 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Use common loc-touchup in fuse/server/gfapiPranith Kumar K2015-03-081-0/+3
| | | | | | | | | | | Change-Id: Id41fb29480bb6d22c34469339163da05b98c1a98 BUG: 1115907 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8226 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Add functions for xlator and graph cleanup.Poornima G2015-03-021-1/+2
| | | | | | | | | | Change-Id: If341e3c0a559aa5bbca9c1263a241c6592c59706 BUG: 1093594 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/9696 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* protocol/server: reflect lru limit in inode table alsoRaghavendra Bhat2014-06-131-0/+3
| | | | | | | | | | | | | | | | | | | | | Upon reconfigure, when lru limit of the inode table is changed, the new value was just saved in the private structure of the protocol/server xlator and the inode table used to have the older values still. A brick start was required for the changes to get reflected. To handle it, traverse through the xlator tree and check whether a xlator is a bound_xl or not (if it is a bound_xl it would have its itable pointer set). If a xlator is a bound_xl, then get the inode table of that bound_xl and set its lru limit to new value given via cli. Also prune the inode table so that extra inodes are purged from the inode table. Change-Id: I6909be028c116adaa1d1a5108470015b5fc6f09d BUG: 1103756 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/7957 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* user servicable snapshotsRaghavendra Bhat2014-05-291-0/+1
| | | | | | | | | | Change-Id: Idbf27dbe088e646a8ab81cedc5818413795895ea Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Anand Subramanian <anands@redhat.com> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/7700 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/gfid-access: fix lookup on .gfid/<parent>/bnameVenky Shankar2014-01-271-0/+2
| | | | | | | | | | | | | | In gfid translator, lookup was not handling the case when the lookup is sent on .gfid/<parent>/bname. In this case, we flip with fake inode of the parent with the real inode in loc and send it downwards. Change-Id: I639ff1dce10ffc045da419e333d455e208b6a0f0 BUG: 1057881 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/6795 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-141-1/+1
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>