summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/xlator.c
Commit message (Collapse)AuthorAgeFilesLines
* graph/cleanup: Fix race in graph cleanupMohammed Rafi KC2019-09-051-0/+23
| | | | | | | | | | | | | | | | | We were unconditionally cleaning up the grap when we get child_down followed by parent_down. But this is prone to race condition when some of the bricks are already disconnected. In this case, even before the last child down is executed in the client xlator code,we might have freed the graph. Because the child_down event is alreadt recevied. To fix this race, we have introduced a check to see if all client xlator have cleared thier reconnect chain, and called the child_down for last time. Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042 fixes: bz#1727329 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* graph/shd: Use glusterfs_graph_deactivate to free the xl recMohammed Rafi KC2019-06-271-1/+8
| | | | | | | | | | | | We were using glusterfs_graph_fini to free the xl rec from glusterfs_process_volfp as well as glusterfs_graph_cleanup. Instead we can use glusterfs_graph_deactivate, which is does fini as well as other common rec free. Change-Id: Ie4a5f2771e5254aa5ed9f00c3672a6d2cc8e4bc1 Updates: bz#1716695 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* xlator/log: Add more logging in xlator_is_cleanup_startingMohammed Rafi KC2019-06-081-3/+9
| | | | | | | | This patch will add two extra logs for invalid argument Change-Id: I3950b4f4b9d88b1f1e788ef93d8f09d4bd8d4d8b updates: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* ec/fini: Fix race with ec_fini and ec_notifyMohammed Rafi KC2019-05-211-0/+21
| | | | | | | | | | | | | | | During a graph cleanup, we first sent a PARENT_DOWN and wait for a child down to ultimately free the xlator and the graph. In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event. But a racing event like CHILD_UP or event xl_op may trigger healing threads after threads cleanup. So there is a chance that the threads might access a freed private variabe Change-Id: I252d10181bb67b95900c903d479de707a8489532 fixes: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* core: handle memory accounting correctlyXavi Hernandez2019-04-221-8/+15
| | | | | | | | | | | | | | | | | | | | When a translator stops, memory accounting for that translator is not destroyed (because there could remain memory allocated that references it), but mutexes that coordinate updates of memory accounting were destroyed. This caused incorrect memory accounting and even crashes in debug mode. This patch also fixes some other things: * Reduce the number of atomic operations needed to manage memory accounting. * Correctly account memory when realloc() is used. * Merge two critical sections into one. * Cleaned the code a bit. Change-Id: Id5eaee7338729b9bc52c931815ca3ff1e5a7dcc8 Updates: bz#1659334 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* mgmt/shd: Implement multiplexing in self heal daemonMohammed Rafi KC2019-04-011-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Shd daemon is per node, which means they create a graph with all volumes on it. While this is a great for utilizing resources, it is so good in terms of performance and managebility. Because self-heal daemons doesn't have capability to automatically reconfigure their graphs. So each time when any configurations changes happens to the volumes(replicate/disperse), we need to restart shd to bring the changes into the graph. Because of this all on going heal for all other volumes has to be stopped in the middle, and need to restart all over again. Solution: This changes makes shd as a per volume daemon, so that the graph will be generated for each volumes. When we want to start/reconfigure shd for a volume, we first search for an existing shd running on the node, if there is none, we will start a new process. If already a daemon is running for shd, then we will simply detach a graph for a volume and reatach the updated graph for the volume. This won't touch any of the on going operations for any other volumes on the shd daemon. Example of an shd graph when it is per volume graph ----------------------- | debug-iostat | ----------------------- / | \ / | \ --------- --------- ---------- | AFR-1 | | AFR-2 | | AFR-3 | -------- --------- ---------- A running shd daemon with 3 volumes will be like--> graph ----------------------- | debug-iostat | ----------------------- / | \ / | \ ------------ ------------ ------------ | volume-1 | | volume-2 | | volume-3 | ------------ ------------ ------------ Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99 fixes: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* fuse: correctly handle setxattr valuesXavi Hernandez2019-02-071-3/+25
| | | | | | | | | | The setxattr function receives a pointer to raw data, which may not be null-terminated. When this data needs to be interpreted as a string, an explicit null termination needs to be added before using the value. Change-Id: Id110f9b215b22786da5782adec9449ce38d0d563 updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* core: heketi-cli is throwing error "target is busy"Mohit Agrawal2019-01-311-1/+0
| | | | | | | | | | | | | | | | | | | | Problem: When rpc-transport-disconnect happens, server_connection_cleanup_flush_cbk() is supposed to call rpc_transport_unref() after open-files on that transport are flushed per transport.But open-fd-count is maintained in bound_xl->fd_count, which can be incremented/decremented cumulatively in server_connection_cleanup() by all transport disconnect paths. So instead of rpc_transport_unref() happening per transport, it ends up doing it only once after all the files on all the transports for the brick are flushed leading to rpc-leaks. Solution: To avoid races maintain fd_cnt at client instead of maintaining on brick Credits: Pranith Kumar Karampuri Change-Id: I6e8ea37a61f82d9aefb227c5b3ab57a7a36850e6 fixes: bz#1668190 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* core: Resolve dict_leak at the time of destroying graphMohit Agrawal2019-01-141-1/+0
| | | | | | | | | | | | Problem: In gluster code some of the places it call's get_new_dict to create a dictionary without taking reference so at the time of dict_unref it has become a leak Solution: To resolve the same call dict_new instead of get_new_dict updates bz#1650403 Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* xlator: make 'xlator_api' mandatoryAmar Tumballi2018-12-131-138/+16
| | | | | | | | | | | | | | * Remove the options to load old symbol. * keep only 'xlator_api' symbol from being exported using xlator.sym * add xlator_api to all the xlators where its missing NOTE: This covers all the xlators which has at least a test case to validate its loading. If there is a translator, which doesn't have any test, then we should probably remove that from codebase. fixes: #164 Change-Id: Ibcdc8c9844cda6b4463d907a15813745d14c1ebb Signed-off-by: Amar Tumballi <amarts@redhat.com>
* [geo-rep]: Worker still ACTIVE after killing bricksMohit Agrawal2018-12-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Problem: In changelog xlator after destroying listener it call's unlink to delete changelog socket file but socket file reference is not cleaned up from process memory Solution: 1) To cleanup reference completely from process memory serialize transport cleanup for changelog and then unlink socket file 2) Brick xlator will notify GF_EVENT_PARENT_DOWN to next xlator only after cleanup all xprts Test: To test the same run below steps 1) Setup some volume and enable brick mux 2) kill anyone brick with gf_attach 3) check changelog socket for specific to killed brick in lsof, it should cleanup completely fixes: bz#1600145 Change-Id: Iba06cbf77d8a87b34a60fce50f6d8c0d427fa491 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* copy_file_range support in GlusterFSRaghavendra Bhat2018-12-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * libglusterfs changes to add new fop * Fuse changes: - Changes in fuse bridge xlator to receive and send responses * posix changes to perform the op on the backend filesystem * protocol and rpc changes for sending and receiving the fop * gfapi changes for performing the fop * tools: glfs-copy-file-range tool for testing copy_file_range fop - Although, copy_file_range support has been added to the upstream fuse kernel module, no release has been made yet of a kernel which contains the support. It is expected to come in the upcoming release of linux-4.20 So, as of now, executing copy_file_range fop on a fused based filesystem results in fuse kernel module sending read on the source fd and write on the destination fd. Therefore a small gfapi based tool has been written to be able test the copy_file_range fop. This tool is similar (in functionality) to the example program given in copy_file_range man page. So, running regular copy_file_range on a fuse mount point and running gfapi based glfs-copy-file-range tool gives some idea about how fast, the copy_file_range (or reflink) can be. On the local machine this was the result obtained. mount -t glusterfs workstation:new /mnt/glusterfs [root@workstation ~]# cd /mnt/glusterfs/ [root@workstation glusterfs]# ls file [root@workstation glusterfs]# cd [root@workstation ~]# time /tmp/a.out /mnt/glusterfs/file /mnt/glusterfs/new real 0m6.495s user 0m0.000s sys 0m1.439s [root@workstation ~]# time glfs-copy-file-range $(hostname) new /tmp/glfs.log /file /rrr OPEN_SRC: opening /file is success OPEN_DST: opening /rrr is success FSTAT_SRC: fstat on /rrr is success copy_file_range successful real 0m0.309s user 0m0.039s sys 0m0.017s This tool needs following arguments 1) hostname 2) volume name 3) log file path 4) source file path (relative to the gluster volume root) 5) destination file path (relative to the gluster volume root) "glfs-copy-file-range <hostname> <volume> <log file path> <source> <destination>" - Added a testcase as well to run glfs-copy-file-range tool * io-stats changes to capture the fop for profiling * NOTE: - Added conditional check to see whether the copy_file_range syscall is available or not. If not, then return ENOSYS. - Added conditional check for kernel minor version in fuse_kernel.h and fuse-bridge while referring to copy_file_range. And the kernel minor version is kept as it is. i.e. 24. Increment it in future when there is a kernel release which contains the support for copy_file_range fop in fuse kernel module. * The document which contains a writeup on this enhancement can be found at https://docs.google.com/document/d/1BSILbXr_knynNwxSyyu503JoTz5QFM_4suNIh2WwrSc/edit Change-Id: I280069c814dd21ce6ec3be00a884fc24ab692367 updates: #536 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* server: Resolve memory leak path in server_initMohit Agrawal2018-12-031-0/+31
| | | | | | | | | | | | | | Problem: 1) server_init does not cleanup allocate resources while it is failed before return error 2) dict leak at the time of graph destroying Solution: 1) free resources in case of server_init is failed 2) Take dict_ref of graph xlator before destroying the graph to avoid leak Change-Id: I9e31e156b9ed6bebe622745a8be0e470774e3d15 fixes: bz#1654917 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* xlator: add generic option parsing frameworkAmar Tumballi2018-11-021-15/+49
| | | | | | | | | | | | | | | | | | | | As an example, and also as an enhancement, added 'log-level' as a default option to every translator (glusterfs already support infrastructure to handle xl->loglevel). Corresponding infrastructure to add per xlator log-level is not present in glusterd volume-set. Plan is to get it sorted out in later patches or in GD2. * Why this is needed? - Mainly because we need to only add different log-level to some xlator to debug few things in a production system, while not changing overall log-level. This helps in better debug-ability. Updates: bz#1193929 Change-Id: Ia4098ce39197cd423345b3d31fe8315481681ab8 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* core: glusterfsd keeping fd open in index xlatorMohit Agrawal2018-10-121-13/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: At the time of processing GF_EVENT_PARENT_DOWN at brick xlator, it forwards the event to next xlator only while xlator ensures no stub is in progress. At io-thread xlator it decreases stub_cnt before the process a stub and notify EVENT to next xlator Solution: Introduce a new counter to save stub_cnt and decrease the counter after process the stub completely at io-thread xlator. To avoid brick crash at the time of call xlator_mem_cleanup move only brick xlator if detach brick name has found in the graph Note: Thanks to pranith for sharing a simple reproducer to reproduce the same fixes bz#1637934 Change-Id: I1a694a001f7a5417e8771e3adf92c518969b6baa Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* core: glusterfsd keeping fd open in index xlatorMohit Agrawal2018-10-081-0/+2
| | | | | | | | | | | | | | Problem: Current resource cleanup sequence is not perfect while brick mux is enabled Solution: 1) Destroying xprt after cleanup all fd associated with a client 2) Before call fini for brick xlators ensure no stub should be running on a brick Change-Id: I86195785e428f57d3ef0da3e4061021fafacd435 fixes: bz#1631357 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-121-1142/+1128
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* classification: provide infra to start labelling features/componentsAmar Tumballi2018-09-041-0/+2
| | | | | | | | | | | | | `doc/xlator-classification.md` talks about the reasoning and expectations Reviewers are expected to check the 'category' of new option / translator added in the codebase, and make sure the flag is always properly set. It helps to keep the 'expectation' proper on the codebase. updates: #430 Change-Id: I2bfc9934a5f6eed77fcc3e20364046242decc82c Signed-off-by: Amar Tumballi <amarts@redhat.com>
* multiple files: remove unndeeded memset()Yaniv Kaul2018-08-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of multiple commits: contrib/fuse-lib/misc.c: remove unneeded memset() All flock variables are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I8e0512c5a88daadb0e587f545fdb9b32ca8858a2 libglusterfs/src/{client_t|fd|inode|stack}.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I2be9ccc3a5cb5da51a92af73488cdabd1c527f59 libglusterfs/src/xlator.c: remove unneeded memset() All xl->mem_acct members are properly set, no need to memset it. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I7f264cd47e7a06255a3f3943c583de77ae8e3147 xlators/cluster/afr/src/afr-self-heal-common.c: remove unneeded memset() Since we are going over the whole array anyway, initialize it properly, to either 1 or 0. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Ied4210388976b6a7a2e91cc3de334534d6fef201 xlators/cluster/dht/src/dht-common.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: Idc436d2bd0563b6582908d7cbebf9dbc66a42c9a xlators/cluster/ec/src/ec-helpers.c: remove unneeded memset() Since we are going over the whole array anyway it is initialized properly. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I81bf971f7fcecb4599e807d37f426f55711978fa xlators/mgmt/glusterd/src/glusterd-volgen.c: remove some memset() I don't think there's a need for any of them. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I476ea59ba53546b5153c269692cd5383da81ce2d xlators/mgmt/glusterd/src/glusterd-geo-rep.c: read() in 4K blocks The current 1K seems small. 4K is usually better (in Linux). Also remove a memset() that I don't think is needed between reads. Only compile-tested! Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I5fb7950c92d282948376db14919ad12e589eac2b xlators/storage/posix/src/posix-{gfid-path|inode-fd-ops}.c: remove memset() before sys_*xattr() functions. I don't see a reason to memset the array sent to the functions sys_llistxattr(), sys_lgetxattr(), sys_lgetxattr(), sys_flistxattr(), sys_fgetxattr(). (Note: it's unclear to me why we are calling sys_*txattr() functions with XATTR_VAL_BUF_SIZE-1 size instead of XATTR_VAL_BUF_SIZE ). Only compile-tested! Change-Id: Ief2103b56ba6c71e40ed343a93684eef6b771346 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* multiple files: move from strlen() to sizeof()Yaniv Kaul2018-08-291-1/+1
| | | | | | | | | | | | | | | {glusterfsd|glusterfsd-mgmt|quota-common-utils|xlator|tier|stripe}.c tools/setgfid2path/src/main.c xlators/cluster/afr/src/afr-inode-read.c {glusterfs-acl|glusterfs}.h For const strings, just do compile time size calc instead of runtime. Compile-tested only! Change-Id: I303684b1ff29b05c10126fb1057f507e404ced07 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* core: Update condition in get_xlator_by_name_or_typeMohit Agrawal2018-08-101-1/+1
| | | | | | | | | | | | | | | | Problem: Sometimes client connection is failed after throwing error "cleanup flag is set for xlator Try again later". The situation comes only after get a detach request but the brick stack is not completely detached and at the same time the client initiates a connection with brick Solution: To resolve the same check cleanup_starting flag in get xlator_by_name_or_type, this function call by server_setvolume to attach a client with brick. Change-Id: I3720e42642fe495dd05211e2ed2cb976db9e231b fixes: bz#1614124 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* server: fix unresolved symbols by moving them to libglusterfsMohit Agrawal2018-04-201-0/+101
| | | | | | | | | | | | | | | | Problem: glusterd2 build is failed due to undefined symbol (xlator_mem_cleanup , glusterfsd_ctx) in server.so Solution: To resolve the same done below two changes 1) Move xlator_mem_cleanup code from glusterfsd-mgmt.c to xlator.c to be part of libglusterfs.so 2) replace glusterfsd_ctx to this->ctx because symbol glusterfsd_ctx is not part of server.so BUG: 1544090 Change-Id: Ie5e6fba9ed458931d08eb0948d450aa962424ae5 fixes: bz#1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* gluster: Sometimes Brick process is crashed at the time of stopping brickMohit Agrawal2018-04-191-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Sometimes brick process is getting crashed at the time of stop brick while brick mux is enabled. Solution: Brick process was getting crashed because of rpc connection was not cleaning properly while brick mux is enabled.In this patch after sending GF_EVENT_CLEANUP notification to xlator(server) waits for all rpc client connection destroy for specific xlator.Once rpc connections are destroyed in server_rpc_notify for all associated client for that brick then call xlator_mem_cleanup for for brick xlator as well as all child xlators.To avoid races at the time of cleanup introduce two new flags at each xlator cleanup_starting, call_cleanup. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/#/c/19700/) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ic4ab9c128df282d146cf1135640281fcb31997bf updates: bz#1544090
* core: provide infra to make any xlator pass-throughAmar Tumballi2018-03-091-3/+15
| | | | | | | updates: #304 Change-Id: If6a13d2e56b195390a386d720103a882e077f66c Signed-off-by: Amar Tumballi <amarts@redhat.com>
* metrics: set latency min value during xlator initAmar Tumballi2018-02-161-0/+7
| | | | | | | | | | | otherwise, the very first metrics will have all the min as 0. also no need to print pending-fops if it is 0. Updates #168 Change-Id: I233de6c92b1a73977bb468ba211ac6ec3c05298f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Use RTLD_LOCAL for symbol resolutionPrashanth Pai2017-12-271-2/+2
| | | | | | | | | | | | | RTLD_LOCAL is the default value for symbol visibility flag of dlopen() in Linux and NetBSD. Using it avoids conflicts during symbol resolution. This also allows us to detect xlators that have not been explicitly linked with libraries that they use. This used to go unnoticed when RTLD_GLOBAL was being used. BUG: 1193929 Change-Id: I50db6ea14ffdee96596060c4d6bf71cd3c432f7b Signed-off-by: Prashanth Pai <ppai@redhat.com>
* core/memacct: save allocs in mem_acct_rec listN Balachandran2017-12-071-0/+3
| | | | | | | | | | | | | | | With configure --enable-debug, add all object allocations to a list in the corresponding mem_acct_rec. This allows us to see all objects of a particular type and allows for additional debugging in case of memory leaks. This is not compiled in by default and must be explicitly enabled. It is intended to be used by developers. Change-Id: I7cf2dbeadecf994423d7e7591e85f18d2575cce8 BUG: 1522662 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* libglusterfs: specify ctx in gf_log_set_loglevelZhang Huan2017-12-061-1/+1
| | | | | | | | | specify ctx in gf_log_set_loglevel, instead of getting it from a thread specific variable. Change-Id: I498f826e8e32231235a6b0005026a27c327727fd BUG: 1521213 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* metrics: provide options to dump metrics from xlatorsAmar Tumballi2017-12-061-0/+6
| | | | | | | | | | * Introduce xlator methods to allow dumping of metrics * Separate options to get the metrics dumped in a path Updates #168 Change-Id: I7df80df33b71d6f449f03c2332665b4a45f6ddf2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* rio/everywhere: add icreate/namelink fopSusant Palai2017-12-051-0/+2
| | | | | | | | | | | | | | | | | | | | | icreate creates inode, while namelink links the basename to it's parent gfid. For now mkdir is the primary user of these fops. Better distribution is acheived by creating the inode on ,(say) mds1 and linking the basename to it's parent gfid on mds2. The inode serves readdirp, stat etc. More details about the fops are present at: https://review.gluster.org/#/c/13395/3/design/DHT2/DHT2_Icreate_Namelink_Notes.md This backport of three patches from experimental branch. 1- https://review.gluster.org/#/c/18085/ 2- https://review.gluster.org/#/c/18086/ 3- https://review.gluster.org/#/c/18094/ Updates gluster/glusterfs#243 Change-Id: I1bd3d5a441a3cfab1acfeb52f15c6c867d362592 Signed-off-by: Susant Palai <spalai@redhat.com>
* libglusterfs: Add put fopPoornima G2017-12-051-0/+1
| | | | | | | | | | | | | | | | | Problem: It had been a longtime request to implement put fop in gluster. put fop in gluster may not have the exact sementics of HTTP PUT, but can be easily extended to do so. The subsequent patches, will contain more semantics on the put fop and its guarentees. Why compound fop framework is not used for put? Compound fop framework currently doesn't allow compounding of entry fop and inode fops, i.e. fops on multiple inodes cannot be combined in compound fop. Updates #353 Change-Id: Idb7891b3e056d46d570bb7e31bad1b6a28656ada Signed-off-by: Poornima G <pgurusid@redhat.com>
* xlator: provide a xlator_api_t structure to include all exported optionsAmar Tumballi2017-11-301-37/+183
| | | | | | | | | | each translator from now on can have just 1 symbol exported called 'xlator_api', which has all the required fields in it. Updates: #164 Change-Id: I48d54f5ec59fee842b1d55877e3ac5e9ec9b6bdd Signed-off-by: Amar Tumballi <amarts@redhat.com>
* xlator: add more metrics per fopsAmar Tumballi2017-11-081-0/+12
| | | | | | | | | | | Make sure to handle these counters in STACK_WIND/UNWIND macro, and keep the counters as part of xlator_t structure itself, to provide infra to monitoring. Updates #137 Change-Id: Ib54d45e2321c2b095dac5810c37e6cdffe1f71b7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* global options: add a sample option to handleAmar Tumballi2017-11-031-1/+14
| | | | | | | Fixes #303 Change-Id: Icdaa804711c43c65b9684f2649437aae1b5c1ed5 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* Add framework for global xlator in graphAmar Tumballi2017-11-031-1/+19
| | | | | | | Updates #303 Change-Id: Id0b9050c93ea87532dc80b4fda650c5663d285bd Signed-off-by: Amar Tumballi <amarts@redhat.com>
* mgtm/core : use sha hash function for volfile checkMohammed Rafi KC2017-07-101-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are storing the entire volfile and using this to check volfile change. With brick multiplexing there will be lot of graphs per process which will increase the memory foot print of the process. So instead of storing the entire graph we could use sha256 and we can compare the hash to see whether volfile change happened or not. Also with Brick multiplexing, the direct comparison of vol file is not correct. There are two problems. Problem 1: We are currently storing one single graph (the last updated volfile) whereas, what we need is the entire graph with all atttached bricks. If we fix this issue, we have second problem Problem 2: With multiplexing we have a graph that contains multiple bricks. But what we are checking as part of the reconfigure is, comparing the entire graph with one single graph, which will always fail. Solution: We create list in glusterfs_ctx_t that stores sha256 hash of individual brick graphs. When a graph changes happens we compare the stored hash and the current hash. If the hash matches, then no need for reconfigure. Otherwise we first do the reconfigure and then update the hash. For now, gfapi has not changed this way. Meaning when gfapi volfile fetch or reconfigure happens, we still store the entire graph and compare, each memory. This is fine, because libgfapi will not load brick graphs. But changing the libgfapi will make the code similar in both glusterfsd-mgmt and api. Also it helps to reduce some memory. Change-Id: I9df917a771a52b95622ab8f63af34ec390163a77 BUG: 1467986 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17709 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* nl-cache: In case of nameless operations do not cachePoornima G2017-05-221-0/+15
| | | | | | | | | | | | | | | | | | Issue: In nameless lookup/other fops, parent inode will be NULL, when we try to add the cache to the NULL inode, it causes a crash. Hence handle the scenario of nameless fops, and do not cache/serve the nameless fops. Change-Id: I3b90f882ac89e6aaf3419db89e6f890797f37700 BUG: 1451588 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17316 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* dht: send lookup on old name inside rename with bname and pargfidSusant Palai2017-04-291-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Inside rename, a lookup is done on the source name to make sure that the file is there. But we used to do a gfid based lookup and hence, even if the source name was renamed to a new name from some other client, lookup will be successful as server3_3_lookup will fetch the new path based on the gfid. So even if the source file does not exist any more rename will carry on, and as server3_3_link(destination is hashed to a different brick other than source cached scenario) also does gfid based resolve, it wont detect that the source name does not exist and hardlink creation will be successful (since gfid based resolve will get the new dentry). To solve this problem, do a name based lookup inside rename. So that rename will fail right away if the source does not exist. Change-Id: Ieba8bdd6675088dbf18de90ed4622df043d163bd BUG: 1412135 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16375 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* xlator: do not call dlclose() when debuggingNiels de Vos2017-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Valgrind can not show the symbols if a .so after calling dlclose(). The unhelpful ??? in the output gets resolved properly with this change: ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324 ==25170== at 0x4C29975: calloc (vg_replace_malloc.c:711) ==25170== by 0x52C7C0B: __gf_calloc (mem-pool.c:117) ==25170== by 0x12B0638A: ??? ==25170== by 0x528FCE6: __xlator_init (xlator.c:472) ==25170== by 0x528FE16: xlator_init (xlator.c:498) ==25170== by 0x52DA8D6: glusterfs_graph_init (graph.c:321) ==25170== by 0x52DB587: glusterfs_graph_activate (graph.c:695) ==25170== by 0x5046407: glfs_process_volfp (glfs-mgmt.c:79) ==25170== by 0x5043B9E: glfs_volumes_init (glfs.c:281) ==25170== by 0x5044FEC: glfs_init_common (glfs.c:986) ==25170== by 0x50451A7: glfs_init@@GFAPI_3.4.0 (glfs.c:1031) By not calling dlclose(), the dynamically loaded .so is still available upon program exit, and Valgrind is able to resolve the symbols. This will add an additional leak, so dlclose() is called for normal builds, but skipped when configuring with "./configure --enable-valgrind" or passing the "run-with-valgrind" xlator option. URL: http://valgrind.org/docs/manual/faq.html#faq.unhelpful Change-Id: I2044e21b1b8fcce32ad1a817fdd795218f967731 BUG: 1425623 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16809 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* libglusterfs: provide standardized atomic operationsNiels de Vos2017-04-051-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The current macros ATOMIC_INCREMENT() and ATOMIC_DECREMENT() expect a lock as first argument. There are at least two issues with this approach: 1. this lock is unused on architectures that have atomic operations 2. some structures use a single lock for multiple variables By defining a gf_atomic_t type, the unused lock can be removed, saving a few bytes on modern architectures. Because the gf_atomic_t type locates the lock for the variable (in case of older architectures), each variable is protected the same on all architectures. This makes the behaviour across all architectures more equal (per variable locking, by a gf_lock_t or compiler optimization). BUG: 1437037 Change-Id: Ic164892b06ea676e6a9566f8a98b7faf0efe76d6 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16963 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* libglusterfs: fix serious leak of xlator_t structuresJeff Darcy2017-02-091-0/+1
| | | | | | | | | | | | | | | There's a lot of logic (and some long comments) around how to free these structures safely, but then we didn't do it. Now we do. Change-Id: I9731ae75c60e99cc43d33d0813a86912db97fd96 BUG: 1420571 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16570 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-0/+72
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: serialize init/reconfigure callsJeff Darcy2017-01-051-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | These functions do not generally "expect" to be called more than once in parallel, and many are likely to misbehave in that case (one case in DHT already). Such parallel calls have not generally happened because there are only a few places where we call these functions, and those have been implicitly serialized until recently. However, recent changes in the epoll layer change that, as does brick multiplexing. Therefore, the serialization is now explicit at the init/reconfigure level. It would be sufficient to serialize calls to a particular translator's init and reconfigure functions, but that would require per-translator locks and a bit more complexity in maintaining/using them. Since there's no clear reason why we would need or want to support a higher level of parallelism, the simpler approach of a global lock should suffice. Change-Id: I26296c2826e91dc00b7f0c2061bcc2964ef90c4c BUG: 1399134 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/16030 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Fill loc.pargfid too for named lookups on individual shardsKrutika Dhananjay2016-11-081-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a sharded volume when a brick is replaced while IO is going on, named lookup on individual shards as part of read/write was failing with ENOENT on the replaced brick, and as a result AFR initiated name heal in lookup callback. But since pargfid was empty (which is what this patch attempts to fix), the resolution of the shards by protocol/server used to fail and the following pattern of logs was seen: Brick-logs: [2016-11-08 07:41:49.387127] W [MSGID: 115009] [server-resolve.c:566:server_resolve] 0-rep-server: no resolution type for (null) (LOOKUP) [2016-11-08 07:41:49.387157] E [MSGID: 115050] [server-rpc-fops.c:156:server_lookup_cbk] 0-rep-server: 91833: LOOKUP(null) (00000000-0000-0000-0000-000000000000/16d47463-ece5-4b33-9c93-470be918c0f6.82) ==> (Invalid argument) [Invalid argument] Client-logs: [2016-11-08 07:41:27.497687] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.497755] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.498500] W [MSGID: 114031] [client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-11-08 07:41:27.499680] E [MSGID: 133010] Also, this patch makes AFR by itself choose a non-NULL pargfid even if its ancestors fail to initialize all pargfid placeholders. Change-Id: I5f85b303ede135baaf92e87ec8e09941f5ded6c1 BUG: 1392445 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15788 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* core: add setactivelk () fopSusant Palai2016-05-011-0/+1
| | | | | | | | | | | Change-Id: Ic2ba77a1fdd27801a6e579e04e6c0dd93cd7127b BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14011 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* core: add getactivelk () fopSusant Palai2016-05-011-0/+1
| | | | | | | | | | | Change-Id: Ifd0ff278dcf43da064021f5c25e5dcd34347fcde BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/13970 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* core: add lease fopPoornima G2016-04-211-0/+1
| | | | | | | | | | | | | Change-Id: Ia27d66b1061b0377857827515590eb89b18515c9 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/11596 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: add seek() FOPNiels de Vos2016-01-311-0/+1
| | | | | | | | | | | | | | Minimal infrastructure changes for the seek() FOP. This will provide SEEK_HOLE and SEEK_DATA functionalities. BUG: 1220173 Change-Id: I4b74fce8b0bad2f45291fd2c2b9e243c4f4a1aa9 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11480 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* core : Use correct path in dlopen for socket.soAtin Mukherjee2015-11-191-1/+6
| | | | | | | | | | | | | | This patch fixes the path for socket.so file while loading the so dynamically. Also for config.memory-accounting & config.transport voltype is changed to glusterd to fix the warning message coming from xlator_volopt_dynload Change-Id: I0f7964814586f2018d4922b23c683f4e1eb3098e BUG: 1283485 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/12656 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>