summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/glusterfs
Commit message (Collapse)AuthorAgeFilesLines
* libglusterfs: fix dict leakRavishankar N2020-09-071-0/+2
| | | | | | | | | | | | | | | Problem: gf_rev_dns_lookup_cached() allocated struct dnscache->dict if it was null but the freeing was left to the caller. Fix: Moved dict allocation and freeing into corresponding init and fini routines so that its easier for the caller to avoid such leaks. Updates: #1000 Change-Id: I90d6a6f85ca2dd4fe0ab461177aaa9ac9c1fbcf9 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 079f7a7d8a2bd85070c1da4dde2452ca82a1cdbb)
* posix: Implement a janitor thread to close fdMohit Agrawal2020-08-211-0/+7
| | | | | | | | | | | | | | | Problem: In the commit fb20713b380e1df8d7f9e9df96563be2f9144fd6 we use syntask to close fd but we have found the patch is reducing the performance Solution: Use janitor thread to close fd's and save the pfd ctx into ctx janitor list and also save the posix_xlator into pfd object to avoid the race condition during cleanup in brick_mux environment Change-Id: Ifb3d18a854b267333a3a9e39845bfefb83fbc092 Fixes: #1396 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> (cherry picked from commit 41b9616435cbdf671805856e487e373060c9455b)
* open-behind: rewrite of internal logicXavi Hernandez2020-06-292-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a critical flaw in the previous implementation of open-behind. When an open is done in the background, it's necessary to take a reference on the fd_t object because once we "fake" the open answer, the fd could be destroyed. However as long as there's a reference, the release function won't be called. So, if the application closes the file descriptor without having actually opened it, there will always remain at least 1 reference, causing a leak. To avoid this problem, the previous implementation didn't take a reference on the fd_t, so there were races where the fd could be destroyed while it was still in use. To fix this, I've implemented a new xlator cbk that gets called from fuse when the application closes a file descriptor. The whole logic of handling background opens have been simplified and it's more efficient now. Only if the fop needs to be delayed until an open completes, a stub is created. Otherwise no memory allocations are needed. Correctly handling the close request while the open is still pending has added a bit of complexity, but overall normal operation is simpler. Change-Id: I6376a5491368e0e1c283cc452849032636261592 Fixes: #1225 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* syncop: improve scaling and implement more toolsXavi Hernandez2020-05-301-9/+44
| | | | | | | | | | | | | | | | | | | | The current scaling of the syncop thread pool is not working properly and can leave some tasks in the run queue more time than necessary when the maximum number of threads is not reached. This patch provides a better scaling condition to react faster to pending work. Condition variables and sleep in the context of a synctask have also been implemented. Their purpose is to replace regular condition variables and sleeps that block synctask threads and prevent other tasks to be executed. The new features have been applied to several places in glusterd. Change-Id: Ic50b7c73c104f9e41f08101a357d30b95efccfbf Fixes: #1116 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* Posix: Optimize posix code to improve file creationMohit Agrawal2020-04-061-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Before executing a fop in POSIX xlator it builds an internal path based on GFID.To validate the path it call's (l)stat system call and while .glusterfs is heavily loaded kernel takes time to lookup inode and due to that performance drops Solution: In this patch we followed two ways to improve the performance. 1) Keep open fd specific to first level directory(gfid[0]) in .glusterfs, it would force to kernel keep the inodes from all those files in cache. In case of memory pressure kernel won't uncache first level inodes. We need to open 256 fd's per brick to access the entry faster. 2) Use at based call's to access relative path to reduce path based lookup time. Note: To verify the patch we have executed kernel untar 100 times on 6 different clients after enabling metadata group-cache and some other option.We were getting more than 20 percent improvement in kenel untar after applying the patch. Credits: Xavi Hernandez <xhernandez@redhat.com> Change-Id: I1643e6b01ed669b2bb148d02f4e6a8e08da45343 updates: #891 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* Posix: Use simple approach to close fdMohit Agrawal2020-03-202-11/+2
| | | | | | | | | | | | | | | Problem: posix_release(dir) functions add the fd's into a ctx->janitor_fds and janitor thread closes the fd's.In brick_mux environment it is difficult to handle race condition in janitor threads because brick spawns a single janitor thread for all bricks. Solution: Use synctask to execute posix_release(dir) functions instead of using background a thread to close fds. Credits: Pranith Karampuri <pkarampu@redhat.com> Change-Id: Iffb031f0695a7da83d5a2f6bac8863dad225317e Fixes: bz#1811631 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* utime: resolve an issue of permission denied logsAmar Tumballi2020-03-131-1/+2
| | | | | | | | | | | | In case where uid is not set to be 0, there are possible errors from acl xlator. So, set `uid = 0;` with pid indicating this is set from UTIME activity. The message "E [MSGID: 148002] [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-dev_SNIP_data-utime: dict set of key for set-ctime-mdata failed [Permission denied]" repeated 2 times between [2019-12-19 21:27:55.042634] and [2019-12-19 21:27:55.047887] Change-Id: Ieadf329835a40a13ac0bf908dac776e66954466c Fixes: #832 Signed-off-by: Amar Tumballi <amar@kadalu.io>
* core: fix memory pool management racesXavi Hernandez2020-02-182-13/+18
| | | | | | | | | | | | | | | | | | | | | | Objects allocated from a per-thread memory pool keep a reference to it to be able to return the object to the pool when not used anymore. The object holding this reference can have a long life cycle that could survive a glfs_fini() call. This means that it's unsafe to destroy memory pools from glfs_fini(). Another side effect of destroying memory pools from glfs_fini() is that the TLS variable that points to one of those pools cannot be reset for all alive threads. This means that any attempt to allocate memory from those threads will access already free'd memory, which is very dangerous. To fix these issues, mem_pools_fini() doesn't destroy pool lists anymore. Only at process termination the pools are destroyed. Change-Id: Ib189a5510ab6bdac78983c6c65a022e9634b0965 Fixes: bz#1801684 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* libglusterfs-xlator: structure loggingyatipadia2020-02-141-1/+34
| | | | | | | | convert all gf_msg() to gf_smsg() Change-Id: Id542e05faadb8041b472a2298c71fe62730e65fc Updates: #657 Signed-off-by: yatipadia <ypadia@redhat.com>
* volgen: make thin-arbiter name unique in 'pending-xattr' optionAmar Tumballi2020-02-121-0/+1
| | | | | | | | | | Thin-arbiter module makes use of 'pending-xattr' name for the translator as the filename which gets created in thin-arbiter node. By making this unique, we can host single thin-arbiter node for multiple clusters. Updates: #763 Change-Id: Ib3c732e7e04e6dba229e71ae3e64f1f3cb6d794d Signed-off-by: Amar Tumballi <amar@kadalu.io>
* libglusterfs-options: structure loggingyatipadia2020-02-091-1/+15
| | | | | | | | convert all gf_msg() to gf_smsg() Change-Id: I8f1ff462b9c8012ed676c51450930a65ac403bf3 Updates: #657 Signed-off-by: yatipadia <ypadia@redhat.com>
* bitrot: Make number of signer threads configurableKotresh HR2020-02-071-0/+1
| | | | | | | | | | | | | The number of signing process threads (glfs_brpobj) is set to 4 by default. The recommendation is to set it to number of cores available. This patch makes it configurable as follows gluster vol bitrot <volname> signer-threads <count> fixes: bz#1797869 Change-Id: Ia883b3e5e34e0bc8d095243508d320c9c9c58adc Signed-off-by: Kotresh HR <khiremat@redhat.com>
* libglusterfs-event-epoll: structure loggingyatipadia2020-02-061-1/+36
| | | | | | | | convert gf_msg() to gf_smsg() Change-Id: Idf5bfc826b0c9f1a2674eea2a2e6164f30806b00 Updates: #657 Signed-off-by: yatipadia <ypadia@redhat.com>
* glusterfsd: structure loggingyatipadia2020-02-061-0/+9
| | | | | | | | convert gf_msg() to gf_smsg() Change-Id: I1cd6a5ac6f4361195d5d925efb2cc194045d0bba Updates: #657 Signed-off-by: yatip <ypadia@redhat.com>
* libglusterfs/common-utils: structure loggingyatipadia2020-02-041-1/+38
| | | | | | | | changes all the gf_msg() to gf_smgs() Change-Id: I3524bbd8f8070df2f2c888ea9b0fb7db1ee44453 Updates: #657 Signed-off-by: yatipadia <ypadia@redhat.com>
* dictionary: remove the 'extra_free' parameterYaniv Kaul2020-01-211-1/+0
| | | | | | | | | | | | | | | This parameter may have been used in the past, but is no longer needed. Removing it and the few locations it was actually referenced. This allows to remove an extra memdup as well, that was not needed in the 1st place in server_setvolume() and unserialize_rsp_direntp() functions. A followup separate patch will remove extra_stdfree parmeter from the dictionary structure. Change-Id: Ica0ff0a330672373aaa60e808b7e76ec489a0fe3 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* dict: use the free_pair's key presence or NULL as a sign of use.Yaniv Kaul2020-01-131-1/+0
| | | | | | | | | | | | | Instead of using a boolean parameter, we can use the key variable. If it's NULL, the pair is not used and can be used. Otherwise, it's in use - don't use. It saves this annoying boolean, which causes the compiler (or us explicitly) to pad with additional bytes the dict struct. Change-Id: I89f52db57f35b3ef8acf57b7de2cee37f5d18e06 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* afr: expose cluster.optimistic-change-log to CLI.Ravishankar N2020-01-071-0/+1
| | | | | | | | | | | This volume option was not made avaialble to `gluster volume set` CLI. Reported-by: epolakis(https://github.com/kinsu) in https://github.com/gluster/glusterfs/issues/781 fixes: bz#1787554 Change-Id: I7141bdd4e53ee99e22b354edde8d023bfc0b2cd7 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* posix-entry-ops.c: remove some tier related codeYaniv Kaul2020-01-011-1/+0
| | | | | | | | | | | Remove TIER_LINKFILE_GFID related code from posix Tier xlator was removed, but there are some code related to it scattered around in DHT and Posix xlators. Remove some of it. Change-Id: I3a878b31ed4a045ed419f936aa1d567ded1a273f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* afr: make heal info locklessRavishankar N2019-12-121-0/+2
| | | | | | | | | | | | | | | | | | | Changes in locks xlator: Added support for per-domain inodelk count requests. Caller needs to set GLUSTERFS_MULTIPLE_DOM_LK_CNT_REQUESTS key in the dict and then set each key with name 'GLUSTERFS_INODELK_DOM_PREFIX:<domain name>'. In the response dict, the xlator will send the per domain count as values for each of these keys. Changes in AFR: Replaced afr_selfheal_locked_inspect() with afr_lockless_inspect(). Logic has been added to make the latter behave same as the former, thus not breaking the current heal info output behaviour. fixes: bz#1774011 Change-Id: Ie9e83c162aa77f44a39c2ba7115de558120ada4d Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* ctime: Fix ctime inconsisteny with utimensatKotresh HR2019-12-101-0/+2
| | | | | | | | | | | | | | | | | | | | Problem: When touch is used to create a file, the ctime is not matching atime and mtime which ideally should match. There is a difference in nano seconds. Cause: When touch is used modify atime or mtime to current time (UTIME_NOW), the current time is taken from kernel. The ctime gets updated to current time when atime or mtime is updated. But the current time to update ctime is taken from utime xlator. Hence the difference in nano seconds. Fix: When utimesat uses UTIME_NOW, use the current time from kernel. fixes: bz#1773530 Change-Id: I9ccfa47dcd39df23396852b4216f1773c49250ce Signed-off-by: Kotresh HR <khiremat@redhat.com>
* debug/io-stats: add an option to set volume-idAmar Tumballi2019-11-291-0/+1
| | | | | | | | | | | 'volume-id' is good to have for a graph for uniquely identifying it. Add it to graph->volume_id while generating volfile itself. This can be further used in many other places. Updates: #763 Change-Id: I80516d62d28a284e8ff4707841570ced97a37e73 Signed-off-by: Amar Tumballi <amar@kadalu.io>
* Revert "afr: make heal info lockless"Ravishankar N2019-11-281-2/+0
| | | | | | | | This reverts commit fce5f68bc72d448490a0d41be494ac54a9181b3c. I merged the wrong patch by mistake! Hence reverting it. updates: bz#1774011 Change-Id: Id7d6ed1d727efc02467c8a9aea3374331261ebd5
* afr: make heal info locklessRavishankar N2019-11-281-0/+2
| | | | | | | | | | | | | | | | | | | Changes in locks xlator: Added support for per-domain inodelk count requests. Caller needs to set GLUSTERFS_MULTIPLE_DOM_LK_CNT_REQUESTS key in the dict and then set each key with name 'GLUSTERFS_INODELK_DOM_PREFIX:<domain name>'. In the response dict, the xlator will send the per domain count as values for each of these keys. Changes in AFR: Replaced afr_selfheal_locked_inspect() with afr_lockless_inspect(). Logic has been added to make the latter behave same as the former, thus not breaking the current heal info output behaviour. fixes: bz#1774011 Change-Id: I9ae08ce768b39aeb6ee230207b5b7fa744176952 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* common-utils.c: add gf_strn2boolean() functionYaniv Kaul2019-11-271-0/+2
| | | | | | | | | | | | | | | | | | | | | The function takes a string and its length and based on it returns if it's a boolean. It's identical in functionality to gf_string2boolean only with far less string comparisons since it takes into account the length of the string. dict_get_str_boolean() has been converted to use it. Other cases of gf_string2boolean() across the code base can be converted as well, but more importantly, they should be converted from dict_get_str() and then calling to gf_string2boolean to simply call dict_get_str_boolean(), which would take care of this for them. This is therefore a first step in the conversion. Change-Id: I9ee93abfc676f6e123a3919d8df8c25e8848b799 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* store.c/glusterd-store.c: remove sys_stat callsYaniv Kaul2019-11-271-2/+2
| | | | | | | | | | | | | | | Instead of querying for the file size and allocating a char array according to its size, let's just use a fixed size. Those calls are not really needed, and are either expensive or cached anyway. Since we do dynamic allocation/free, let's just use a fixed array instead. I'll see if there are other sys_stat() calls that are not really useful and try to eliminate them in separate patches. Change-Id: I76b40e78a52ab38f613fc0cdef4be60e6253bf20 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* libglusterfs: remove unused gfdb specific files from repoAmar Tumballi2019-11-181-5/+0
| | | | | | Updates: bz#1193929 Change-Id: Idb98394c51917e9b132aeb32facccd112effe672 Signed-off-by: Amar Tumballi <amarts@gmail.com>
* glusterd: Client Handling of Elastic ClustersMohit Agrawal2019-11-121-1/+1
| | | | | | | | | | | | | | Configure the list of gluster servers in the key GLUSTERD_BRICK_SERVERS at the time of GETSPEC RPC CALL and access the value in client side to update volfile serve list so that client would be able to connect next volfile server in case of current volfile server is down Updates #741 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I23f36ddb92982bb02ffd83937a8bd8a2c97e8104
* iobuf.c, h: minor fixesYaniv Kaul2019-11-071-6/+6
| | | | | | | | | | | | | | - Align structures - gf_iobuf_get_pagesize() will now also return the index, reducing the need for an additional very similar call. - Removal of an inefficient loop I've inadvertently added previously. It was harmless, but just inefficient. - New pool initialization does not need to be done under lock - no one can touch that pool yet, so no need to protect it. Change-Id: I61c50f2f14fa79edc131e515e9615a9928ee2dca updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* posix-helpers.c (and others): minor changes to posix_xattr_fill()Yaniv Kaul2019-11-061-3/+0
| | | | | | | | | | | | | | | | posix_xattr_fill() is called from several POSIX functions. Made minor changes to it and the functions called from it: 1. Dict functions to use known lengths (dict_getn() instead of dict_get(), etc.) 2. Re-ordered some static char[] arrays, to account (hopefully) to the frequency of the xattrs usage (based on grep in the code...) 3. Before strcmp(), check if the strings lengths match. 4. Removed some dead code. Hopefully, no functional changes. Change-Id: I510c0d2785e54ffe0f82c4c449782f2302d63a32 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* dict.{c,h}: remove the lock from the _data structYaniv Kaul2019-10-111-1/+0
| | | | | | | | | I'm not sure why it was there and I did not see any use for it. In the hope I did not miss anything, I removed it. Change-Id: I02fa2e8e2a598b488fddbff4c7168dc4a41929b2 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* protocol/handshake: pass volume-id for extra checkAmar Tumballi2019-09-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With added check of volume-id during handshake, we can be sure to not connect with a brick if this gets re-used in another volume. This prevents any accidental issues which can happen with a stale client process lurking along. Also added test case for testing same volume name which would fetch a different volfile (ie, different bricks, different type), and a different volume name, but same brick. For reference: Currently a client<->server handshake happens in glusterfs through protocol/client translator (setvolume) to protocol/server using a dictionary which containes many keys. Rejection happens in server side if some of the required keys are missing in handshake dictionary. Till now, there was no single unique identifier to validate for a client to tell server if it is actually talking to a corresponding server. All we look in protocol/client is a key called 'remote-subvolume', which should match with a subvolume name in server volume file, and for any volume with same brick name (can be present in same cluster due to recreate), it would be same. This could cause major issue, when a client was connected to a given brick, in one volume would be connected to another volume's brick if its re-created/re-used. To prevent this behavior, we are now passing along 'volume-id' in handshake, which would be preserved for the life of client process, which can prevent this accidental connections. NOTE: This behavior wouldn't be applicable for user-snapshot enabled volumes, as snapshotted volume's would have different volume-id. Fixes: bz#1620580 Change-Id: Ie98286e94ce95ae09c2135fd6ec7d7c2ca1e8095 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* core/syncop: Bail out if frame creation failsSoumya Koduri2019-09-121-0/+6
| | | | | | | | | | There could be cases (either due to insufficient memory or corrupted mem-pool) due to which frame creation fails. Bail out with error in such cases. Change-Id: I8cc0a5852f6f04d2bac991e4eb79ecb42577da11 Fixes: bz#1748448 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* cluster/ec: quorum-count implementationPranith Kumar K2019-09-081-1/+3
| | | | | | fixes: #721 Change-Id: I5333540e3c635ccf441cf1f4696e4c8986e38ea8 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* graph/cleanup: Fix race in graph cleanupMohammed Rafi KC2019-09-052-0/+3
| | | | | | | | | | | | | | | | | We were unconditionally cleaning up the grap when we get child_down followed by parent_down. But this is prone to race condition when some of the bricks are already disconnected. In this case, even before the last child down is executed in the client xlator code,we might have freed the graph. Because the child_down event is alreadt recevied. To fix this race, we have introduced a check to see if all client xlator have cleared thier reconnect chain, and called the child_down for last time. Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042 fixes: bz#1727329 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* ctime: Fix incorrect realtime passed to frame->root->ctimeKotresh HR2019-08-221-0/+2
| | | | | | | | | | | | On systems that don't support "timespec_get"(e.g., centos6), it was using "clock_gettime" with "CLOCK_MONOTONIC" to get unix epoch time which is incorrect. This patch introduces "timespec_now_realtime" which uses "clock_gettime" with "CLOCK_REALTIME" which fixes the issue. Change-Id: I57be35ce442d7e05319e82112b687eb4f28d7612 Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1743652
* logging: Structured logging reference PRAravinda VK2019-08-201-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To convert the existing `gf_msg` to `gf_smsg`: - Define `_STR` of respective Message ID as below(In `*-messages.h`) #define PC_MSG_REMOTE_OP_FAILED_STR "remote operation failed." - Change `gf_msg` to use `gf_smsg`. Convert values into fields and add any missing fields. Note: `errno` and `error` fields will be added automatically to log message in case errnum is specified. Example: gf_smsg( this->name, // Name or log domain GF_LOG_WARNING, // Log Level rsp.op_errno, // Error number PC_MSG_REMOTE_OP_FAILED, // Message ID "path=%s", local->loc.path, // Key Value 1 "gfid=%s", loc_gfid_utoa(&local->loc), // Key Value 2 NULL // Log End ); Key value pairs formatting Help: gf_slog( this->name, // Name or log domain GF_LOG_WARNING, // Log Level rsp.op_errno, // Error number PC_MSG_REMOTE_OP_FAILED, // Message ID "op=CREATE", // Static Key and Value "path=%s", local->loc.path, // Format for Value "brick-%d-status=%s", brkidx, brkstatus, // Use format for key and val NULL // Log End ); Before: [2019-07-03 08:16:18.226819] W [MSGID: 114031] [client-rpc-fops_v2.c \ :2633:client4_0_lookup_cbk] 0-gv3-client-0: remote operation failed. \ Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint \ is not connected] After: [2019-07-29 07:50:15.773765] W [MSGID: 114031] \ [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-gv1-client-0: \ remote operation failed. [{path=/f1}, \ {gfid=00000000-0000-0000-0000-000000000000}, \ {errno=107}, {error=Transport endpoint is not connected}] To add new `gf_smsg`, Add a Message ID in respective `*-messages.h` file and the follow the steps mentioned above. Change-Id: I4e7d37f27f106ab398e991d931ba2ac7841a44b1 Updates: #657 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* posix: In brick_mux brick is crashed while start/stop volume in loopMohit Agrawal2019-08-201-0/+3
| | | | | | | | | | | | | | | Problem: In brick_mux environment sometime brick is crashed while volume stop/start in a loop.Brick is crashed in janitor task at the time of accessing priv.If posix priv is cleaned up before call janitor task then janitor task is crashed. Solution: To avoid the crash in brick_mux environment introduce a new flag janitor_task_stop in posix_private and before send CHILD_DOWN event wait for update the flag by janitor_task_done Change-Id: Id9fa5d183a463b2b682774ab5cb9868357d139a4 fixes: bz#1730409 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* client_t.c: removal of dead code.Yaniv Kaul2019-08-201-3/+0
| | | | | | Change-Id: Id9f5f448db305f3135a1fdca61b1d7ec898c63a4 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* libglusterfs: remove dependency of rpcAmar Tumballi2019-08-162-11/+15
| | | | | | | | | | | | | | | | | | Goal: 'libglusterfs' files shouldn't have any dependency outside of the tree, specially the header files, shouldn't have '#include' from outside the tree. Fixes: * Had to introduce libglusterd so, methods and structures required for only mgmt/glusterd, and cli/ are separated from 'libglusterfs/' * Remove rpc/xdr/gen from build, which was used mainly so dependency for libglusterfs could be properly satisfied. * Move rpcsvc_auth_data to client_t.h, so all dependencies could be handled. Updates: bz#1636297 Change-Id: I0e80243a5a3f4615e6fac6e1b947ad08a9363fce Signed-off-by: Amar Tumballi <amarts@redhat.com>
* fuse: Set limit on invalidate queue sizeN Balachandran2019-08-142-0/+2
| | | | | | | | | | | | | If the glusterfs fuse client process is unable to process the invalidate requests quickly enough, the number of such requests quickly grows large enough to use a significant amount of memory. We are now introducing another option to set an upper limit on these to prevent runaway memory usage. Change-Id: Iddfff1ee2de1466223e6717f7abd4b28ed947788 Fixes: bz#1732717 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* fuse: rate limit reading from fuse device upon receiving EPERMCsaba Henk2019-08-081-0/+2
| | | | | | Fixes: bz#1644322 Change-Id: I53e8fa362cd8c7d04fb1c4abb606a9abb642c592 Signed-off-by: Csaba Henk <csaba@redhat.com>
* event: rename event_XXX with gf_ prefixedXiubo Li2019-07-291-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I hit one crash issue when using the libgfapi. In the libgfapi it will call glfs_poller() --> event_dispatch() in file api/src/glfs.c:721, and the event_dispatch() is defined by libgluster locally, the problem is the name of event_dispatch() is the extremly the same with the one from libevent package form the OS. For example, if a executable program Foo, which will also use and link the libevent and the libgfapi at the same time, I can hit the crash, like: kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp 00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000] The link for Foo is: lib_foo_LADD = -levent $(GFAPI_LIBS) It will crash. This is because the glfs_poller() is calling the event_dispatch() from the libevent, not the libglsuter. The gfapi link info : GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid If I link Foo like: lib_foo_LADD = $(GFAPI_LIBS) -levent It will works well without any problem. And if Foo call one private lib, such as handler_glfs.so, and the handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't and it will dlopen(handler_glfs.so), then the crash will be hit everytime. The link info will be: foo_LADD = -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like: foo_LADD = $(GFAPI_LIBS) -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS. And in some cases when the --as-needed link option is added(on many dists it is added as default), then the crash is back again, the above workaround won't work. Fixes: #699 Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa Signed-off-by: Xiubo Li <xiubli@redhat.com>
* ctime: Set mdata xattr on legacy filesKotresh HR2019-07-224-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The files which were created before ctime enabled would not have "trusted.glusterfs.mdata"(stores time attributes) xattr. Upon fops which modifies either ctime or mtime, the xattr gets created with latest ctime, mtime and atime, which is incorrect. It should update only the corresponding time attribute and rest from backend Solution: Creating xattr with values from brick is not possible as each brick of replica set would have different times. So create the xattr upon successful lookup if the xattr is not created Note To Reviewers: The time attributes used to set xattr is got from successful lookup. Instead of sending the whole iatt over the wire via setxattr, a structure called mdata_iatt is sent. The mdata_iatt contains only time attributes. Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 fixes: bz#1593542 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* Replace usleep() with nanosleep()Vijay Bellur2019-07-111-0/+6
| | | | | | | | | | | | | | | | | | | As usleep has been obsoleted, changed all invocations of usleep to nanosleep. From man 3 usleep: "4.3BSD, POSIX.1-2001. POSIX.1-2001 declares this function obsolete; use nanosleep(2) instead. POSIX.1-2008 removes the specification of usleep()." Added a helper function gf_nanosleep() to have a single place for handling edge cases that might arise from the conversion of usleep to nanosleep and allow the sleep to resume with right remaining value upon being interrupted. Fixes: bz#1721686 Change-Id: Ia39ab82c9e0f4669d2c00d4cdf25e38d94ef9f62 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* Remove hadoop related code from the codebaseVijay Bellur2019-07-091-1/+0
| | | | | | | | | As Hadoop is no longer supported, dropping code for handling Hadoop access. Fixes: bz#1728417 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I8fcf4faacb364f1c9a8abb0c48faec337087f845
* glusterd/svc: update pid of mux volumes from the shd processMohammed Rafi KC2019-07-092-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | For a normal volume, we are updating the pid from a the process while we do a daemonization or at the end of the init if it is no-daemon mode. Along with updating the pid we also lock the file, to make sure that the process is running fine. With brick mux, we were updating the pidfile from gluterd after an attach/detach request. There are two problems with this approach. 1) We are not holding a pidlock for any file other than parent process. 2) There is a chance for possible race conditions with attach/detach. For example, shd start and a volume stop could race. Let's say we are starting an shd and it is attached to a volume. While we trying to link the pid file to the running process, this would have deleted by the thread that doing a volume stop. Change-Id: I29a00352102877ce09ea3f376ca52affceb5cf1a Updates: bz#1722541 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* glusterfs-fops: fix the modularityAmar Tumballi2019-07-024-22/+248
| | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs-fops.h was moved to rpc/xdr to support compound fops. (ref: https://review.gluster.org/14032, 2f945b86d3) This was fine as long as all these header files were in single include directory after 'install'. With the move to separate out glusterfs specific header files into another directory inside /usr/include (ref: https://review.gluster.org/21746, 20ef211cfa), glusterfs-fops.h file was not in the proper path when an external .c file tried to include any of glusterfs specific .h file (like xlator.h). Now, we have removed compound-fops, with that, none of the enums declared in glusterfs-fops.h are actually getting used on wire anymore. Hence, it makes sense to get this to libglusterfs/src as a single point of definition. With this change, the external programs can use glusterfs header files. also remove some enum definitions which are not used in code anymore. Updates: bz#1636297 Change-Id: I423c44d3dbe2efc777299c544ece3cb172fc7e44 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* core: fix memory allocation issuesXavi Hernandez2019-06-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | Two problems have been identified that caused that gluster's memory usage were twice higher than required. 1. An off by 1 error caused that all objects allocated from the memory pools were taken from a pool bigger than required. Since each pool corresponds to a size equal to a power of two, this was wasting half of the available memory. 2. The header information used for accounting on each memory object was not taken into consideration when searching for a suitable memory pool. It was added later when each individual block was allocated. This made this space "invisible" to memory accounting. Credits: Thanks to Nithya Balachandran for identifying this problem and testing this patch. Fixes: bz#1722802 Change-Id: I90e27ad795fe51ca11c13080f62207451f6c138c Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* * core: do not assert in inode_unref if the inode table cleanup has startedRaghavendra Bhat2019-06-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | There is a good chance that, the inode on which unref came has already been zero refed and added to the purge list. This can happen when inode table is being destroyed (glfs_fini is something which destroys the inode table). Consider a directory 'a' which has a file 'b'. Now as part of inode table destruction zero refing of inodes does not happen from leaf to the root. It happens in the order inodes are present in the list. So, in this example, the dentry of 'b' would have its parent set to the inode of 'a'. So if 'a' gets zero refed first (as part of inode table cleanup) and then 'b' has to zero refed, then dentry_unset is called on the dentry of 'b' and it further goes on to call inode_unref on b's parent which is 'a'. In this situation, GF_ASSERT would be called as the refcount of 'a' has been already set to zero. So, return the inode (in the function inode_unref without doing anything) if the inode table cleanup has already started and inode's refcount is zero. Change-Id: I91e0a807d5c9ce0daae5a611c38da379fd11076e fixes: bz#1722546 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>