summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* features/shard: Use fd lookup post file openVinayakswami Hariharmath2020-06-111-43/+76
| | | | | | | | | | | | | | | Issue: When a process has the open fd and the same file is unlinked in middle of the operations, then file based lookup fails with ENOENT or stale file Solution: When the file already open and fd is available, use fstat to get the file attributes Change-Id: I0e83aee9f11b616dcfe13769ebfcda6742e4e0f4 Fixes: #1281 Signed-off-by: Vinayakswami Hariharmath <vharihar@redhat.com>
* glusterd: destroy all volume info locks and mutexesDmitry Antipov2020-06-102-2/+5
| | | | | | | | | | Add destroy calls for 'store_volinfo_lock' and 'lock' of volume info. Move initialization of 'store_volinfo_lock' from glusterd_op_create_volume() to common place, which is glusterd_volinfo_new() indeed. Change-Id: I5fae4469f28eab80c4fa6f5947646528e6aedad7 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1291
* test: Test case brick-mux-validation-in-cluster.t is failing on RHEL-8Mohit Agrawal2020-06-092-2/+8
| | | | | | | | | | | | | | | | | | | | | Brick process are not properly attached on any cluster node while some volume options are changed on peer node and glusterd is down on that specific node. Solution: At the time of restart glusterd it got a friend update request from a peer node if peer node having some changes on volume.If the brick process is started before received a friend update request in that case brick_mux behavior is not workingproperly. All bricks are attached to the same process even volumes options are not the same. To avoid the issue introduce an atomic flag volpeerupdate and update the value while glusterd has received a friend update request from peer for a specific volume.If volpeerupdate flag is 1 volume is started by glusterd_import_friend_volume synctask Change-Id: I4c026f1e7807ded249153670e6967a2be8d22cb7 Credit: Sanju Rakaonde <srakonde@redhat.com> fixes: #1290 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* glusterd: To do full heal in different online node when do ec/afr full healyinkui2020-06-091-3/+36
| | | | | | | | | | | | | | | | | | | For example: We have 3 nodes and create ec 3*(2+1) volume for test-disperse-0/test-disperse-1/test-disperse-2 when we do 'gluster v heal test full' in node-1 that can in node-1/ node-2/node-3 glustershd's get op=GF_EVENT_TRANSLATOR_OP and then do full heal in different disperse group. Let us say we have 2X(2+1) disperse with each brick from different machine m0, m1, m2, m3, m4, m5. and candidate_max is m5. and do full heal so '*index' is 3 and !gf_uuid_compare(MY_UUID, brickinfo->uuid) will be true in m3,and then m3's glustershd will be the heal-xlator. Id: I5c6762e6cfb375aed32d3fc11fe5eae3ee41aab4 Signed-off-by: yinkui <13965432176@163.com> Change-Id: Ic7ef3ddfd30b5f4714ba99b4e7b708c927d68764 fixes: bz#1724948
* When creating new file don't set xatrr "trusted.glusterfs.dht"Tamar Shacked2020-06-091-2/+4
| | | | | | | | | | The curr call to delete the xattr from the dict fails to find the key: dict_del_sizen(xdata, xattr_name); This is beacuse keysize is calculated as sizeof of xattr_name which is a pointer, this lead to wrong size -> hash. Fix: call to dict_deln which get keysize using strlen. fixes: #1282 Change-Id: I23ce1f8f7928e9daa43bc3a9fa8d3611e81bbc36 Signed-off-by: Tamar Shacked <tshacked@redhat.com>
* cluster/afr: Delay post-op for fsyncPranith Kumar K2020-06-085-9/+51
| | | | | | | | | | | | | | | | | | Problem: AFR doesn't delay post-op for fsync fop. For fsync heavy workloads this leads to un-necessary fxattrop/finodelk for every fsync leading to bad performance. Fix: Have delayed post-op for fsync. Add special flag in xdata to indicate that afr shouldn't delay post-op in cases where either the process will terminate or graph-switch would happen. Otherwise it leads to un-necessary heals when the graph-switch/process-termination happens before delayed-post-op completes. Fixes: #1253 Change-Id: I531940d13269a111c49e0510d49514dc169f4577 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* glusterd: check for same node while adding bricks in disperse volumeSheetal Pamecha2020-06-064-238/+248
| | | | | | | | | | | | | | | The optimal way for configuring disperse and replicate volumes is to have all bricks in different nodes. During create operation it fails saying it is not optimal, user must use force to over-ride this behavior. Implementing same during add-brick operation to avoid situation where all the added bricks end up from same host. Operation will error out accordingly. and this can be over-ridden by using force same as create. fixes: #1047 Change-Id: I3ee9c97c1a14b73f4532893bc00187ef9355238b Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* glusterd: fix memory leak in glusterd_store_retrieve_bricks()Dmitry Antipov2020-06-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Found with GCC's address sanitizer: ==67190==ERROR: LeakSanitizer: detected memory leaks Direct leak of 24624 byte(s) in 6 object(s) allocated from: #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837) #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122 #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144 #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511 #4 0x7f623e2f9ed7 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2389 Direct leak of 8208 byte(s) in 2 object(s) allocated from: #0 0x7f62535c0837 in __interceptor_calloc (/usr/lib64/libasan.so.6+0xb0837) #1 0x7f62532a1690 in __gf_default_calloc glusterfs/mem-pool.h:122 #2 0x7f62532a20ca in __gf_calloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:144 #3 0x7f62532c8128 in gf_store_iter_new /path/to/glusterfs/libglusterfs/src/store.c:511 #4 0x7f623e2f9cf0 in glusterd_store_retrieve_bricks /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:2363 #5 0x7fff5cb70bcf ([stack]+0x15bcf) #6 0x7f623e309113 in glusterd_store_retrieve_volumes /path/to/glusterfs/xlators/mgmt/glusterd/src/glusterd-store.c:3505 #7 0xfffeb96e61d (<unknown module>) #8 0x7f623e4586d7 (/usr/lib64/glusterfs/9dev/xlator/mgmt/glusterd.so+0x2f86d7) Change-Id: I9b2a543dc095f4fa739cd664fd4d608bf8c87d60 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1263
* cluster/afr: Prioritize ENOSPC over other errorskarthik-us2020-06-053-48/+6
| | | | | | | | | | | | | | | | | | | | | | Problem: In a replicate/arbiter volume if file creations or writes fails on quorum number of bricks and on one brick it is due to ENOSPC and on other brick it fails for a different reason, it may fail with errors other than ENOSPC in some cases. Fix: Prioritize ENOSPC over other lesser priority errors and do not set op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any error_no which can be misinterpreted by __afr_dir_write_finalize(). Also removing the function afr_has_arbiter_fop_cbk_quorum() which might consider a successful reply form a single brick as quorum success in some cases, whereas we always need fop to be successful on quorum number of bricks in arbiter configuration. Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08 Fixes: #1254 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* open-behind: rewrite of internal logicXavi Hernandez2020-06-043-823/+487
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a critical flaw in the previous implementation of open-behind. When an open is done in the background, it's necessary to take a reference on the fd_t object because once we "fake" the open answer, the fd could be destroyed. However as long as there's a reference, the release function won't be called. So, if the application closes the file descriptor without having actually opened it, there will always remain at least 1 reference, causing a leak. To avoid this problem, the previous implementation didn't take a reference on the fd_t, so there were races where the fd could be destroyed while it was still in use. To fix this, I've implemented a new xlator cbk that gets called from fuse when the application closes a file descriptor. The whole logic of handling background opens have been simplified and it's more efficient now. Only if the fop needs to be delayed until an open completes, a stub is created. Otherwise no memory allocations are needed. Correctly handling the close request while the open is still pending has added a bit of complexity, but overall normal operation is simpler. Change-Id: I6376a5491368e0e1c283cc452849032636261592 Fixes: #1225 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* meta: indicate writability of tunables in file modeCsaba Henk2020-06-031-4/+5
| | | | | | | | | | | | | | | | | | | | | | All files under .meta (synthetic subtree facilitating reflection for glusterfs clients, implemented by meta xlator) were shown as writable, but the vast majority of them are usable only for querying parameters or stats, not for setting them. (The exceptions are loglevel and measure_latency.) However, one could only find out about this only by trial and error, or reading the code. With this change we align file permissions with tunability, stripping the writable bits for those nodes which are only for querying. Also strip writable bits from directory permissions. updates: #1000 Change-Id: I82954e165ffc31cdf7307f4d990ef60b8154a2e2 Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse - remove unnecessary code blockBarak Sason Rofman2020-06-022-27/+0
| | | | | | | | | Per a "TODO" comment in the code, a code block can be removed as it's no longer required Change-Id: I60e064ece985ff2ea2a686bbd2f0e6cc850899e9 updates: #1000 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* io-cache,quick-read: deprecate volume options with flawed semantics or namingCsaba Henk2020-06-021-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | - performance.cache-size has a flawed semantics, as it's dispatched on two independent translators, io-cache and quick-read. - performance.qr-cache-timeout has a confusing name, as other options affecting quick-read have an unabbreviated "quick-read-..." prefix in their names. We keep these options with unchanged operation, but in the help output we indicate their deprecation. The following better alternatives are introduced: - performance.io-cache-size to tune cache-size option of io-cache - performance.quick-read-cache-size to tune cache-size option of quick-read - performance.quick-read-cache-timeout as a preferred synonym for performance.qr-cache-timeout Fixes: #952 Change-Id: Ibd04fb638de8cac450ba992ad8a415154f9f4281 Signed-off-by: Csaba Henk <csaba@redhat.com>
* afr: fix memory leak in afr_priv_destroy()Dmitry Antipov2020-06-011-0/+2
| | | | | | | | | | | | | | | | Found with GCC ASan: Direct leak of 202 byte(s) in 2 object(s) allocated from: #0 0x7fc6c6ef0667 in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb0667) #1 0x7fc6c6bd145b in __gf_malloc /path/to/glusterfs/libglusterfs/src/mem-pool.c:175 #2 0x7fc6c6bd17a3 in gf_vasprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:223 #3 0x7fc6c6bd1993 in gf_asprintf /path/to/glusterfs/libglusterfs/src/mem-pool.c:243 #4 0x7fc6b0dc92f6 in init /path/to/glusterfs/xlators/cluster/afr/src/afr.c:590 ... Change-Id: I29feb1d30a045fb70472758e6ed4e195888090b2 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1278
* dht - sparse files rebalance enhancementsBarak Sason Rofman2020-06-011-111/+102
| | | | | | | | | | | | | | | Currently data migration in rebalance reads sparse file sequentially, disregarding which segments are holes and which are data. This can lead to extremely long migration time for large sparse file. Data migration mechanism needs to be enhanced so only data segments are read and migrated. This can be achieved using lseek to seek for holes and data in the file. This enhancement is a consequence of https://bugzilla.redhat.com/show_bug.cgi?id=1823703 fixes: #1222 Change-Id: If5f448a0c532926464e1f34f504c5c94749b08c3 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* features/shard: Aggregate file size, block-count before unwinding removexattrKrutika Dhananjay2020-05-262-70/+196
| | | | | | | | | | | | | Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops. These iatts are further cached at layers like md-cache. Shard translator, in its current state, simply returns these values without updating the aggregated file size and block-count. This patch fixes this problem. Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872 Fixes: #1243 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* dht: add null check in gf_defrag_free_dir_dfmetaSusant Palai2020-05-261-1/+2
| | | | | | fixes: #1258 Change-Id: I9d1fb512072bcc540d21d47da5b15ae1b79cf2b8 Signed-off-by: Susant Palai <spalai@redhat.com>
* afr/changelog: fix NULL dereferences and error handlingAshish Pandey2020-05-262-10/+11
| | | | | | | | | This patch includes the following CID from Coverity Scan: *1419116 *1420206 Change-Id: Id92fd6a78c8a00726a61aa4697b5c126ced8ed4d Updates: #1202
* glusterd/snapshot: Improve log message during snapshot cloneMohammed Rafi KC2020-05-221-0/+10
| | | | | | | | | | | While taking a snapshot clone, if the snapshot is not activated, th cli was returning that the bricks are down. This patch clearly print tha the error is due to the snapshot state. Change-Id: Ia840e6e071342e061ad38bf15e2e2ff2b0dacdfa Fixes: #1255 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* features/shard: Aggregate size, block-count in iatt before unwinding setxattrKrutika Dhananjay2020-05-211-17/+191
| | | | | | | | | | | | | Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops. These iatts are further cached at layers like md-cache. Shard translator, in its current state, simply returns these values without updating the aggregated file size and block-count. This patch fixes this problem. Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea Fixes: #1243 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse: occasional logging for fuse device 'weird' write errorsCsaba Henk2020-05-212-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change is a followup to I510158843e4b1d482bdc496c2e97b1860dc1ba93. In referred change we pushed log messages about 'weird' write errors to fuse device out of sight, by reporting them at Debug loglevel instead of Error (where 'weird' means errno is not POSIX compliant but having meaningful semantics for FUSE protocol). This solved the issue of spurious error reporting. And so far so good: these messages don't indicate an error condition by themselves. However, when they come in high repetitions, that indicates a suboptimal condition which should be reported.[1] Therefore now we shall emit a Warning if a certain errno occurs a certain number of times[2] as the outcome of a write to the fuse device. ___ [1] typically ENOENTs and ENOTDIRs accumulate when glusterfs' inode invalidation lags behind the kernel's internal inode garbage collection (in this case above errnos mean that the inode which we requested to be invalidated is not found in kernel). This can be mitigated with the invalidate-limit command line / mount option, cf. bz#1732717. [2] 256, as of the current implementation. Change-Id: I8cc7fe104da43a88875f93b0db49d5677cc16045 Updates: #1000 Signed-off-by: Csaba Henk <csaba@redhat.com>
* glusterd: add missing synccond_broadcast()Xavi Hernandez2020-05-211-1/+3
| | | | | | | | | | After the changes in commit 3da22f8cb08b05562a4c6bd2694f2f19199cff7f, there was a place where synccond_broadcast() was missing. It could cause a hang if another synctask was waiting on the condition variable. Change-Id: I92bfe4e15c5c3591e4854a64aa9e1566d50dd204 Fixes: #1116 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* storage/posix: fix thread name to comply with common conventionDmitry Antipov2020-05-181-1/+1
| | | | | | | | | Rename disk space checking thread to comply with common convention, adjust related docs as well. Change-Id: I36d642cf09773a28abd95bbe337ce29134ad96a4 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1248
* features/bit-rot: invalid snprintf() buffer sizeDmitry Antipov2020-05-181-2/+2
| | | | | | | | | | | | | | | | Found with clang-10 -Wfortify-source: bit-rot-scrub.c:1802:15: warning: 'snprintf' size argument is too large; destination buffer has size 32, but size argument is 4096 [-Wfortify-source] len = snprintf(key, PATH_MAX, "quarantine-%d", j); ^ bit-rot-scrub.c:1813:9: warning: 'snprintf' size argument is too large; destination buffer has size 32, but size argument is 4096 [-Wfortify-source] snprintf(main_key, PATH_MAX, "quarantine-%d", tmp_count); Change-Id: I9b9c09ef2223ed181d81215154345de976b82f13 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1221
* mgmt/glusterd: Stop old shd before increasing replica countPranith Kumar K2020-05-161-0/+18
| | | | | | | | | | | | | | | | | | Problem: In add-brick that increases replica count SHD was restarted after pending xattrs are set on the new bricks and adding bricks. But before restarting SHD there is a possibility that old SHD would do a scan on root-directory see no heal is needed and delete index for root-dir leading to no heals until lookup is executed on the mount Fix: Stop shd, perform pending-xattr setting/adding new bricks and then restart shd Fixes: #1240 Change-Id: I94fd7c6c909211b597185dfe097a559db6c0d00f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* syncop: improve scaling and implement more toolsXavi Hernandez2020-05-139-33/+40
| | | | | | | | | | | | | | | | | | | | The current scaling of the syncop thread pool is not working properly and can leave some tasks in the run queue more time than necessary when the maximum number of threads is not reached. This patch provides a better scaling condition to react faster to pending work. Condition variables and sleep in the context of a synctask have also been implemented. Their purpose is to replace regular condition variables and sleeps that block synctask threads and prevent other tasks to be executed. The new features have been applied to several places in glusterd. Change-Id: Ic50b7c73c104f9e41f08101a357d30b95efccfbf Fixes: #1116 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* dht: Do opendir selectively in gf_defrag_process_dirSusant Palai2020-05-122-22/+55
| | | | | | | | | | | | | | | | Currently opendir is done from the cluster view. Hence, even if one opendir is successful, the opendir operation as a whole is considered successful. But since in gf_defrag_get_entry we fetch entries selectively from local_subvols, we need to opendir individually on those local subvols and keep track of fds separately. Otherwise it is possible that opendir failed on one of the subvol and we wind readdirp call on the fd to the corresponding subvol, which will ultimately result in EINVAL error. fixes: #1218 Change-Id: I50dd88b9597852a15579f4ee325918979417f570 Signed-off-by: Susant Palai <spalai@redhat.com>
* debug/error-gen: Don't return EBADF for loc fopsPranith Kumar K2020-05-091-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | The following crash happens when EBADF is returned for some of the loc operations. EBADF should be returned only for fd based fops (data=0x7ff1700606e8) at /home/jenkins/root/workspace/centos7-regression/xlators/cluster/dht/src/dht-helper.c:502 loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>} ret = -1 frame = 0x7ff1700606e8 local = 0x7ff170049208 fd = 0x0 this = 0x7ff17c01d720 subvol = 0x7ff17c0197f0 __FUNCTION__ = "dht_check_and_open_fd_on_subvol_task" /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop.c:279 task = 0x7ff17c0b4920 Fixes: #1230 Change-Id: I24c11a8820a3c8b4070127e69c520e2c61e70930 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Return correct error code and log messageAshish Pandey2020-05-081-2/+9
| | | | | | | | | | | | In case of readdir was send with an FD on which opendir was failed, this FD will be useless and we return it with error. For now, we are returning it with EINVAL without logging any message in log file. Return a correct error code and also log the message to improve thing to debug. fixes: #1220 Change-Id: Iaf035254b9c5aa52fa43ace72d328be622b06169
* tests: skip tests on absence of reflink in xfsPranith Kumar K2020-05-061-2/+1
| | | | | | Fixes: #1223 Change-Id: I36cb72d920ffd77405051546615c5262c392daef Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* core, cli, quota: cleanup malloc debugging and statsDmitry Antipov2020-05-041-6/+0
| | | | | | | | | | | | 1. Since mcheck()/mprobe() etc. features are no longer used, mcheck.h isn't required to be included. 2. Since mallinfo() is used to obtain malloc statistics, it should be detected instead of malloc_stats(). Change-Id: I54c7d2ee568e06ab29938efc01d1a2153c5bd5db Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1172
* cluster/dht: Don't access local after STACK_DESTROYPranith Kumar K2020-05-031-1/+3
| | | | | | | | | | | | | There is a possibility that 'frame' could have been destroyed in dht_selfheal_dir_setattr() which can lead to local->mds_heal_fresh_lookup showing junk non-zero number. That will lead to double STACK_DESTROY. Remembered the value of the variable before the call to fix the access. Fixes: #1214 Change-Id: I37d1657798bfb549bb3887e260484d58fff42c91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Fix varargs warning reported with clang-10Dmitry Antipov2020-05-011-10/+9
| | | | | | | | | | | | | | | | | | | | | Found with clang-10 -Wvarargs: xlators/cluster/ec/src/ec-combine.c:360:20: warning: passing an object that undergoes default argument promotion to 'va_start' has undefined behavior [-Wvarargs] va_start(args, global); ^ xlators/cluster/ec/src/ec-combine.c:348:34: note: parameter of type 'bool' is declared here gf_boolean_t global, ...) According to The C11 Standard, 7.16.1.4p4: If the parameter parmN is declared with the register storage class, with a function or array type, or with a type that is not compatible with the type that results after application of the default argument promotions, the behavior is undefined. Fixes: #1207 Change-Id: I527527845b2d574000d736c278be87cf19504761 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* storage/posix: fix initialization warning reported with clang-10Dmitry Antipov2020-04-301-8/+4
| | | | | | | | | | | | | | | | | xlators/storage/posix/src/posix-common.c:1440:18: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] .validate = GF_OPT_VALIDATE_MAX, ^~~~~~~~~~~~~~~~~~~ xlators/storage/posix/src/posix-common.c:1439:18: note: previous initialization is here .validate = GF_OPT_VALIDATE_MIN, ^~~~~~~~~~~~~~~~~~~ [4 times] Use GF_OPT_VALIDATE_BOTH for min/max-bounded values. Fixes: #1208 Change-Id: I073a27d23176f3b4a126f2eb50c079374a11418d Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* dht xlator: integer handling issuenik-redhat2020-04-294-11/+19
| | | | | | | | | | | | | | Issue: The ret value is passed to the function instead of the proper errno value Fix: Passing the errno generated to the log function CID: 1415824 : Improper use of negative value CID: 1420205 : Improper use of negative value Change-Id: Iaa7407ebd03eda46a2c027695e6bf0f598b371b2 Updates: #1060 Signed-off-by: nik-redhat <nladha@redhat.com>
* dht: Handle setxattr and rm race for directory in rebalanceSusant Palai2020-04-283-0/+33
| | | | | | | | | | | | | | Problem: Selfheal as part of directory does not return an error if the layout setxattr fails. This is because the actual lookup fop must have been successful to proceed for layout heal. Hence, we could not tell if fix-layout failed in rebalance. Solution: We can check this information in the layout structure that whether all the xlators have returned error. fixes: #1200 Change-Id: I3e5f2a36c0d934c21476a73a9a5473d8e490cde7 Signed-off-by: Susant Palai <spalai@redhat.com>
* afr: event gen changesRavishankar N2020-04-243-78/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general idea of the changes is to prevent resetting event generation to zero in the inode ctx, since event gen is something that should follow 'causal order'. Change #1: For a read txn, in inode refresh cbk, if event_generation is found zero, we are failing the read fop. This is not needed because change in event gen is only a marker for the next inode refresh to happen and should not be taken into account by the current read txn. Change #2: The event gen being zero above can happen if there is a racing lookup, which resets even get (in afr_lookup_done) if there are non zero afr xattrs. The resetting is done only to trigger an inode refresh and a possible client side heal on the next lookup. That can be acheived by setting the need_refresh flag in the inode ctx. So replaced all occurences of resetting even gen to zero with a call to afr_inode_need_refresh_set(). Change #3: In both lookup and discover path, we are doing an inode refresh which is not required since all 3 essentially do the same thing- update the inode ctx with the good/bad copies from the brick replies. Inode refresh also triggers background heals, but I think it is okay to do it when we call refresh during the read and write txns and not in the lookup path. The .ts which relied on inode refresh in lookup path to trigger heals are now changed to do read txn so that inode refresh and the heal happens. Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec Fixes: #1179 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Erik Jacobson <erik.jacobson at hpe.com>
* dht - fixing rebalance failures for files with holesBarak Sason Rofman2020-04-241-11/+10
| | | | | | | | | | | Rebalance process handling of files which contains holes casued rebalance to fail with "No space left on device" errors. This patch modifies the code-flow in such a way that files with holes will be rebalanced correctly. fixes: #1187 Change-Id: I89bc3d4ea7f074db7213d759c49307f379543932 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* md-cache: fix several NULL dereferencesXavi Hernandez2020-04-231-66/+129
| | | | | | | | | | | | | | This patch includes the following CID from Coverity Scan: * 1425196 * 1425197 * 1425198 * 1425199 * 1525200 Change-Id: Iddcfea449d3dd56d4dfcc39f4c3c608518e611e4 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Updates: #1060
* dht/rebalance - fixing recursive failure issueBarak Sason Rofman2020-04-211-1/+2
| | | | | | | | | | | If rebalance process is failing, recursive failures appear in the log file, which is distracting from the root cause. In order to avoid recursive failure, error handling mechanism has been modified. fixes: #1072 Change-Id: Iae19430323630acd97c2c8d35685626d8da747a7 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* dht - Remove "tier" code (part 1)v9devBarak Sason Rofman2020-04-173-476/+19
| | | | | | | | | | | | | This patch is removing some of the "tier" code in dht xlator, as it is no longer being used. Not all of the not-needed code is removed at once, so reviewing is easier. Follow up patches removing additional unused code will follow. This is based in the work done in https://review.gluster.org/#/c/glusterfs/+/23935/ Change-Id: I3cb6a0c5d8f14afcd87cf021ef8f74b91c0f908a updates: #1097 Signed-off-by: Barak Sason Rofman <bsaonro@redhat.com>
* md-cache: avoid clearing cache when not necessaryXavi Hernandez2020-04-161-72/+93
| | | | | | | | | | | | mdc_inode_xatt_set() blindly cleared current cache when dict was not NULL, even if there was no xattr requested. This patch fixes this by only calling mdc_inode_xatt_set() when we have explicitly requested something to cache. Change-Id: Idc91a4693f1ff39f7059acde26682ccc361b947d Fixes: #1140 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* posix: fix GF_VALIDATE_OR_GOTO(this->name, this, out)Sanju Rakonde2020-04-161-3/+1
| | | | | | | | | | | | | | | Remove GF_VALIDATE_OR_GOTO(this->name, this, out) when this is passed as an argument and is checked for NULL in the caller itself. GF_VALIDATE_OR_GOTO(this->name, this, out) is modified to use xlator name instead of this->name as we are still verifying whether this is NULL. updates: #1000 Change-Id: Ide3180da29d0d4a35b2c5b9a7604fdf2ff4a9ffb Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* mgmt/glusterd: use stat() syscall wrapperDmitry Antipov2020-04-131-2/+3
| | | | | | | | | | | | | | | Found with 0-symbol-check.t: ./tests/basic/0symbol-check.t .. 1..2 ./xlators/mgmt/glusterd/src/.libs/glusterd_la-glusterd-volume-set.o should call sys_stat, not stat ok 1 [ 40/ 41011] < 40> 'find . -name *.o -exec ./tests/basic/symbol-check.sh {} \;' not ok 2 [ 11/ 1] < 42> '[ ! -e ./.symbol-check-errors ]' -> '' Failed 1/2 subtests Change-Id: I8962f487cd88738a1f7a962049d513712687088c Fixes: #1160 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* protocol/server: Fix coverity issue RESOURCE_LEAKSheetal Pamecha2020-04-101-0/+1
| | | | | | | | | Handle case of arg not freed CID: 1422174 Updates: #1060 Change-Id: Ibd03908a3ea8369035c2b7f6e024b3e5be48f436 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* posix: Avoid dict_del logs in posix_is_layout_stale while key is NULLMohit Agrawal2020-04-091-2/+3
| | | | | | | | | | | | Problem: The key "GF_PREOP_PARENT_KEY" has been populated by dht and for non-distribute volume like 1x3 key is not populated so posix_is_layout stale throw a message while a file is created Solution: To avoid a log put a condition before delete a key Change-Id: I813ee7960633e7f9f5e9ad2f42f288053d9eb71f Fixes: #1150 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* mount/fuse: Wait for 'mount' child to exit before dyingPranith Kumar K2020-04-091-0/+27
| | | | | | | | | | | | | | | | Problem: tests/bugs/protocol/bug-1433815-auth-allow.t fails sometimes because of stale mount. This stale mount comes into picture when parent process dies without waiting for the child process which mounts fuse fs to die Fix: Wait for mounting child process to die before dying. Fixes: #1152 Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* dht - fixing a permission update issueBarak Sason Rofman2020-04-083-18/+39
| | | | | | | | | | | | | | | | | | When bringing back a downed brick and performing lookup from the client side, the permission on said brick aren't updated on the first lookup, but only on the second. This patch modifies permission update logic so the first lookup will trigger a permission update on the downed brick. LIMITATIONS OF THE PATCH: As the choice of source depends on whether the directory has layout or not. Even the directories on the newly added brick will have layout xattr[zeroed], but the same is not true for a root directory. Hence, in case in the entire cluster only the newly added bricks are up [and others are down], then any change in permission during this time will be overwritten by the older permissions when the cluster is restarted. fixes: #999 Change-Id: Ieb70246d41e59f9cae9f70bc203627a433dfbd33 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* cluster/afr: Removing unsupported options from code base to improve coveragekarthik-us2020-04-071-9/+0
| | | | | | | | | | | | | | | | Support for gluster volume heal <volname> info healed/heal-failed was removed by commit bb02cfb56ae08f56df4452c2b948fa962ae1212b in release-3.6. cli parser will display the usage message in all the supported versions whenever these clis are run, leading to some dead code in the latest branches. Since support for these clis were removed long back, this should not give any backward compatibility issues as well. Hence removing the dead code from the code base which will lead to better code coverage by the regression runs as well. Updates: #1052 Change-Id: I0c2b061469caf233c06d9699b0d159ce48e240b9 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* Posix: Optimize posix code to improve file creationMohit Agrawal2020-04-065-74/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Before executing a fop in POSIX xlator it builds an internal path based on GFID.To validate the path it call's (l)stat system call and while .glusterfs is heavily loaded kernel takes time to lookup inode and due to that performance drops Solution: In this patch we followed two ways to improve the performance. 1) Keep open fd specific to first level directory(gfid[0]) in .glusterfs, it would force to kernel keep the inodes from all those files in cache. In case of memory pressure kernel won't uncache first level inodes. We need to open 256 fd's per brick to access the entry faster. 2) Use at based call's to access relative path to reduce path based lookup time. Note: To verify the patch we have executed kernel untar 100 times on 6 different clients after enabling metadata group-cache and some other option.We were getting more than 20 percent improvement in kenel untar after applying the patch. Credits: Xavi Hernandez <xhernandez@redhat.com> Change-Id: I1643e6b01ed669b2bb148d02f4e6a8e08da45343 updates: #891 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>