summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* doc: Added release 7.7 notesv7.7Rinku Kothiya2020-07-201-0/+31
| | | | | | | Fixes: #1315 Change-Id: I722f540428058f1ac2f3a476d0c374551d8bc283 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* features/shard: Convert shard block indices to uint64Krutika Dhananjay2020-07-132-7/+10
| | | | | | | | | | | | | | | This patch fixes a crash in FOPs that operate on really large sharded files where number of participant shards could sometimes exceed signed int32 max. The patch also adds GF_ASSERTs to ensure that number of participating shards is always greater than 0 for files that do have more than one shard. Change-Id: I354de58796f350eb1aa42fcdf8092ca2e69ccbb6 Fixes: #1348 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit cdf01cc47eb2efb427b5855732d9607eec2abc8a)
* fuse: occasional logging for fuse device 'weird' write errorsCsaba Henk2020-07-132-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change is a followup to I510158843e4b1d482bdc496c2e97b1860dc1ba93. In referred change we pushed log messages about 'weird' write errors to fuse device out of sight, by reporting them at Debug loglevel instead of Error (where 'weird' means errno is not POSIX compliant but having meaningful semantics for FUSE protocol). This solved the issue of spurious error reporting. And so far so good: these messages don't indicate an error condition by themselves. However, when they come in high repetitions, that indicates a suboptimal condition which should be reported.[1] Therefore now we shall emit a Warning if a certain errno occurs a certain number of times[2] as the outcome of a write to the fuse device. ___ [1] typically ENOENTs and ENOTDIRs accumulate when glusterfs' inode invalidation lags behind the kernel's internal inode garbage collection (in this case above errnos mean that the inode which we requested to be invalidated is not found in kernel). This can be mitigated with the invalidate-limit command line / mount option, cf. bz#1732717. [2] 256, as of the current implementation. Change-Id: I8cc7fe104da43a88875f93b0db49d5677cc16045 Updates: #1000 Signed-off-by: Csaba Henk <csaba@redhat.com>
* features/shard: Aggregate file size, block-count before unwinding removexattrKrutika Dhananjay2020-07-133-70/+208
| | | | | | | | | | | | | | Posix translator returns pre and postbufs in the dict in {F}REMOVEXATTR fops. These iatts are further cached at layers like md-cache. Shard translator, in its current state, simply returns these values without updating the aggregated file size and block-count. This patch fixes this problem. Change-Id: I4b2dd41ede472c5829af80a67401ec5a6376d872 Fixes: #1243 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 32519525108a2ac6bcc64ad931dc8048d33d64de)
* cluster/afr: Prioritize ENOSPC over other errorskarthik-us2020-06-224-48/+86
| | | | | | | | | | | | | | | | | | | | | | | Problem: In a replicate/arbiter volume if file creations or writes fails on quorum number of bricks and on one brick it is due to ENOSPC and on other brick it fails for a different reason, it may fail with errors other than ENOSPC in some cases. Fix: Prioritize ENOSPC over other lesser priority errors and do not set op_errno in posix_gfid_set if op_ret is 0 to avoid receiving any error_no which can be misinterpreted by __afr_dir_write_finalize(). Also removing the function afr_has_arbiter_fop_cbk_quorum() which might consider a successful reply form a single brick as quorum success in some cases, whereas we always need fop to be successful on quorum number of bricks in arbiter configuration. Change-Id: I106e267f8b9451f681022f1cccb410d9bc824c08 Fixes: #1254 Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit fa63b45ca5edf172b1b89b28b5db3c5129cc57b6)
* afr: more quorum checks in lookup and new entry markingRavishankar N2020-06-184-13/+25
| | | | | | | | | | | | | | | | | Problem: See github issue for details. Fix: -In lookup if the entry exists in 2 out of 3 bricks, don't fail the lookup with ENOENT just because there is an entrylk on the parent. Consider quorum before deciding. -If entry FOP does not succeed on quorum no. of bricks, do not perform new entry mark. Fixes: #1303 Change-Id: I56df8c89ad53b29fa450c7930a7b7ccec9f4a6c5 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit c4a6748f25d2c1ab3ebcf89952278ebf94c8d371)
* features/shard: Aggregate size, block-count in iatt before unwinding setxattrKrutika Dhananjay2020-06-152-17/+222
| | | | | | | | | | | | | | Posix translator returns pre and postbufs in the dict in {F}SETXATTR fops. These iatts are further cached at layers like md-cache. Shard translator, in its current state, simply returns these values without updating the aggregated file size and block-count. This patch fixes this problem. Change-Id: I4da0eceb4235b91546df79270bcc0af8cd64e9ea Fixes: #1243 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 29ec66c6ab77e2d6893c6e213a3d1fb148702c99)
* open-behind: rewrite of internal logicXavi Hernandez2020-06-1512-823/+1393
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a critical flaw in the previous implementation of open-behind. When an open is done in the background, it's necessary to take a reference on the fd_t object because once we "fake" the open answer, the fd could be destroyed. However as long as there's a reference, the release function won't be called. So, if the application closes the file descriptor without having actually opened it, there will always remain at least 1 reference, causing a leak. To avoid this problem, the previous implementation didn't take a reference on the fd_t, so there were races where the fd could be destroyed while it was still in use. To fix this, I've implemented a new xlator cbk that gets called from fuse when the application closes a file descriptor. The whole logic of handling background opens have been simplified and it's more efficient now. Only if the fop needs to be delayed until an open completes, a stub is created. Otherwise no memory allocations are needed. Correctly handling the close request while the open is still pending has added a bit of complexity, but overall normal operation is simpler. Change-Id: I6376a5491368e0e1c283cc452849032636261592 Fixes: #1225 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/ec: Return correct error code and log messageAshish Pandey2020-06-151-2/+9
| | | | | | | | | | | | | In case of readdir was send with an FD on which opendir was failed, this FD will be useless and we return it with error. For now, we are returning it with EINVAL without logging any message in log file. Return a correct error code and also log the message to improve thing to debug. fixes: #1220 Change-Id: Iaf035254b9c5aa52fa43ace72d328be622b06169 (cherry picked from commit af70cb5eedd80207cd184e69f2a4fb252b72d070)
* performance/open-behind: seek fop should open_and_resumePranith Kumar K2020-06-101-0/+27
| | | | | | | | | | | Backport of: > fixes: bz#1760187 > Change-Id: I4c6ad13194d4fc5c7705e35bf9a27fce504b51f9 > Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Fixes: #1296 Change-Id: I4c6ad13194d4fc5c7705e35bf9a27fce504b51f9 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* tests: skip tests on absence of reflink in xfsPranith Kumar K2020-06-103-10/+12
| | | | | | Fixes: #1223 Change-Id: I36cb72d920ffd77405051546615c5262c392daef Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* doc: Added release 7.6 notesv7.6Rinku Kothiya2020-05-181-0/+27
| | | | | | | Fixes: #1247 Change-Id: Ic9571066aa83fc2142e3583ddf1f1b885e83470c Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* md-cache: fix several NULL dereferencesXavi Hernandez2020-05-091-66/+129
| | | | | | | | | | | | | | This patch includes the following CID from Coverity Scan: * 1425196 * 1425197 * 1425198 * 1425199 * 1525200 Change-Id: Iddcfea449d3dd56d4dfcc39f4c3c608518e611e4 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Updates: #1060
* afr: event gen changesRavishankar N2020-05-094-82/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general idea of the changes is to prevent resetting event generation to zero in the inode ctx, since event gen is something that should follow 'causal order'. Change #1: For a read txn, in inode refresh cbk, if event_generation is found zero, we are failing the read fop. This is not needed because change in event gen is only a marker for the next inode refresh to happen and should not be taken into account by the current read txn. Change #2: The event gen being zero above can happen if there is a racing lookup, which resets even get (in afr_lookup_done) if there are non zero afr xattrs. The resetting is done only to trigger an inode refresh and a possible client side heal on the next lookup. That can be acheived by setting the need_refresh flag in the inode ctx. So replaced all occurences of resetting even gen to zero with a call to afr_inode_need_refresh_set(). Change #3: In both lookup and discover path, we are doing an inode refresh which is not required since all 3 essentially do the same thing- update the inode ctx with the good/bad copies from the brick replies. Inode refresh also triggers background heals, but I think it is okay to do it when we call refresh during the read and write txns and not in the lookup path. The .ts which relied on inode refresh in lookup path to trigger heals are now changed to do read txn so that inode refresh and the heal happens. Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec Fixes: #1179 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Erik Jacobson <erik.jacobson at hpe.com> (cherry picked from commit f0fcd909ad4535b60c9208d4804ebe6afe421a09)
* md-cache: avoid clearing cache when not necessaryXavi Hernandez2020-05-091-72/+93
| | | | | | | | | | | | mdc_inode_xatt_set() blindly cleared current cache when dict was not NULL, even if there was no xattr requested. This patch fixes this by only calling mdc_inode_xatt_set() when we have explicitly requested something to cache. Change-Id: Idc91a4693f1ff39f7059acde26682ccc361b947d Fixes: #1140 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* gfapi: Suspend synctasks instead of blocking themSoumya Koduri2020-05-093-2/+50
| | | | | | | | | | | | | | | | | | | | | | | | | There are certain conditions which blocks the current execution thread (like waiting on mutex lock or condition variable or I/O response). In such cases, if it is a synctask thread, we should suspend the task instead of blocking it (like done in SYNCOP using synctask_yield) This is to avoid deadlock like the one mentioned below - 1) synctaskA sets fs->migration_in_progress to 1 and does I/O (LOOKUP) 2) Other synctask threads wait for fs->migration_in_progress to be reset to 0 by synctaskA and hence blocked 3) but synctaskA cannot resume as all synctask threads are blocked on (2). Note: this same approach is already used by few other components like syncbarrier etc. Change-Id: If90f870d663bb242c702a5b86ac52eeda67c6f0d Fixes: #1146 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 55914f968d907ed747774da15285b42653afda61)
* fuse: degrade logging of write failure to fuse deviceCsaba Henk2020-05-052-7/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: FUSE uses failures of communicating with /dev/fuse with various errnos to indicate in-kernel conditions to userspace. Some of these shouldn't be handled as an application error. Also the standard POSIX errno description should not be shown as they are misleading in this context. Solution: When writing to the fuse device, the caller of the respective convenience routine can mask those errnos which don't qualify to be an error for the application in that context, so then those shall be reported at DEBUG level. The possible non-standard errnos are reported with their POSIX name instead of their description to avoid confusion. (Eg. for ENOENT we don't log "no such file or directory", we log indeed literal "ENOENT".) Change-Id: I510158843e4b1d482bdc496c2e97b1860dc1ba93 >updates: bz#1193929 updates: #1000 Signed-off-by: Csaba Henk <csaba@redhat.com> (cherry picked from commit 1166df1920dd9b2bd5fce53ab49d27117db40238)
* doc: Added release 7.5 notesv7.5Rinku Kothiya2020-04-161-0/+28
| | | | | | | Fixes: #1174 Change-Id: Idca6d5dd2d069435df9a6882a6bef32fb6916bb9 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* posix: log aio_error return codes in posix_fs_health_checkMohit Agrawal2020-04-141-3/+2
| | | | | | | | | | | | | | | | | | Problem: Sometime brick is going down to health check thread is failed without logging error codes return by aio system calls. As per aio_error man page it returns a positive error number if the asynchronous I/O operation failed. Solution: log aio_error return codes in error message > Change-Id: I2496b1bc16e602b0fd3ad53e211de11ec8c641ef > (Cherry picked from commit 032862fa3944fc7152140aaa13cdc474ae594a51) > (Reviwed on upstream link https://review.gluster.org/#/c/glusterfs/+/23284/ > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> Change-Id: I2496b1bc16e602b0fd3ad53e211de11ec8c641ef Fixes: #1168 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* mount/fuse: Wait for 'mount' child to exit before dyingPranith Kumar K2020-04-141-0/+27
| | | | | | | | | | | | | | | | Problem: tests/bugs/protocol/bug-1433815-auth-allow.t fails sometimes because of stale mount. This stale mount comes into picture when parent process dies without waiting for the child process which mounts fuse fs to die Fix: Wait for mounting child process to die before dying. Fixes: #1152 Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* snap_scheduler: python3 compatibility and new test caseSunny Kumar2020-04-142-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: "snap_scheduler.py init" command failing with the below traceback: [root@dhcp43-104 ~]# snap_scheduler.py init Traceback (most recent call last): File "/usr/sbin/snap_scheduler.py", line 941, in <module> sys.exit(main(sys.argv[1:])) File "/usr/sbin/snap_scheduler.py", line 851, in main initLogger() File "/usr/sbin/snap_scheduler.py", line 153, in initLogger logfile = os.path.join(process.stdout.read()[:-1], SCRIPT_NAME + ".log") File "/usr/lib64/python3.6/posixpath.py", line 94, in join genericpath._check_arg_types('join', a, *p) File "/usr/lib64/python3.6/genericpath.py", line 151, in _check_arg_types raise TypeError("Can't mix strings and bytes in path components") from None TypeError: Can't mix strings and bytes in path components Solution: Added the 'universal_newlines' flag to Popen to support backward compatibility. Added a basic test for snapshot scheduler. Backport of: >Upstream patch: >https://review.gluster.org/#/c/glusterfs/+/24257/ >Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e >Fixes: #1134 >Signed-off-by: Sunny Kumar <sunkumar@redhat.com> >(cherry picked from commit a7d7ec066e56ac03bf252c26beb20fdc2c3b6772) Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e Fixes: #1134 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* features/utime: Don't access frame after stack-windPranith Kumar K2020-04-072-15/+52
| | | | | | | | | | | | | Problem: frame is accessed after stack-wind. This can lead to crash if the cbk frees the frame. Fix: Use new frame for the wind instead. Fixes: #832 Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* write-behind: fix data corruptionXavi Hernandez2020-04-073-2/+309
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a bug in write-behind that allowed a previous completed write to overwrite the overlapping region of data from a future write. Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are sequential, and W3 writes at the same offset of W2: W2.offset = W3.offset = W1.offset + W1.size Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes. So W3 should *always* overwrite the overlapping part of W2. Suppose write-behind processes the requests from 2 concurrent threads: Thread 1 Thread 2 <received W1> <received W2> wb_enqueue_tempted(W1) /* W1 is assigned gen X */ wb_enqueue_tempted(W2) /* W2 is assigned gen X */ wb_process_queue() __wb_preprocess_winds() /* W1 and W2 are sequential and all * other requisites are met to merge * both requests. */ __wb_collapse_small_writes(W1, W2) __wb_fulfill_request(W2) __wb_pick_unwinds() -> W2 /* In this case, since the request is * already fulfilled, wb_inode->gen * is not updated. */ wb_do_unwinds() STACK_UNWIND(W2) /* The application has received the * result of W2, so it can send W3. */ <received W3> wb_enqueue_tempted(W3) /* W3 is assigned gen X */ wb_process_queue() /* Here we have W1 (which contains * the conflicting W2) and W3 with * same gen, so they are interpreted * as concurrent writes that do not * conflict. */ __wb_pick_winds() -> W3 wb_do_winds() STACK_WIND(W3) wb_process_queue() /* Eventually W1 will be * ready to be sent */ __wb_pick_winds() -> W1 __wb_pick_unwinds() -> W1 /* Here wb_inode->gen is * incremented. */ wb_do_unwinds() STACK_UNWIND(W1) wb_do_winds() STACK_WIND(W1) So, as we can see, W3 is sent before W1, which shouldn't happen. The problem is that wb_inode->gen is only incremented for requests that have not been fulfilled but, after a merge, the request is marked as fulfilled even though it has not been sent to the brick. This allows that future requests are assigned to the same generation, which could be internally reordered. Solution: Increment wb_inode->gen before any unwind, even if it's for a fulfilled request. Special thanks to Stefan Ring for writing a reproducer that has been crucial to identify the issue. Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: #884
* utime: resolve an issue of permission denied logsAmar Tumballi2020-04-072-1/+12
| | | | | | | | | | | | | In case where uid is not set to be 0, there are possible errors from acl xlator. So, set `uid = 0;` with pid indicating this is set from UTIME activity. The message "E [MSGID: 148002] [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-dev_SNIP_data-utime: dict set of key for set-ctime-mdata failed [Permission denied]" repeated 2 times between [2019-12-19 21:27:55.042634] and [2019-12-19 21:27:55.047887] Change-Id: Ieadf329835a40a13ac0bf908dac776e66954466c Updates: #832 Signed-off-by: Amar Tumballi <amar@kadalu.io> (cherry picked from commit eb916c057036db8289b41265797e5dce066d1512)
* features/shard: Fix crash during shards cleanup in error casesKrutika Dhananjay2020-04-071-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | A crash is seen during a reattempt to clean up shards in background upon remount. And this happens even on remount (which means a remount is no workaround for the crash). In such a situation, the in-memory base inode object will not be existent (new process, non-existent base shard). So local->resolver_base_inode will be NULL. In the event of an error (in this case, of space running out), the process would crash at the time of logging the error in the following line - gf_msg(this->name, GF_LOG_ERROR, local->op_errno, SHARD_MSG_FOP_FAILED, "failed to delete shards of %s", uuid_utoa(local->resolver_base_inode->gfid)); Fixed that by using local->base_gfid as the source of gfid when local->resolver_base_inode is NULL. Change-Id: I0b49f2b58becd0d8874b3d4b14ff8d92a89d02d5 Fixes: #1127 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit cc43ac8651de9aa508b01cb259b43c02d89b2afc)
* afr: mark pending xattrs as a part of metadata healRavishankar N2020-04-072-1/+120
| | | | | | | | | | | | | | | | | | | | ...if pending xattrs are zero for all children. Problem: If there are no pending xattrs and a metadata heal needs to be performed, it can be possible that we end up with xattrs inadvertendly deleted from all bricks, as explained in the BZ. Fix: After picking one among the sources as the good copy, mark pending xattrs on all sources to blame the sinks. Now even if this metadata heal fails midway, a subsequent heal will still choose one of the valid sources that it picked previously. Updates: #1067 Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 2d5ba449e9200b16184b1e7fc84cabd015f1f779)
* open-behind: fix missing fd referenceXavi Hernandez2020-04-071-11/+16
| | | | | | | | | | | Open behind was not keeping any reference on fd's pending to be opened. This makes it possible that a concurrent close and an entry fop (unlink, rename, ...) caused destruction of the fd while it was still being used. Change-Id: Ie9e992902cf2cd7be4af1f8b4e57af9bd6afd8e9 Fixes: #1028 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* doc: Added release 7.4 notesv7.4Rinku Kothiya2020-03-181-0/+31
| | | | | | | Fixes: #1124 Change-Id: I02fbb97ea7ab7617086ebcece07213e67cb1aa67 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* cluster/ec: Change handling of heal failure to avoid crashAshish Pandey2020-03-172-13/+13
| | | | | | | | | | | | | | | | | Problem: ec_getxattr_heal_cbk was called with NULL as second argument in case heal was failing. This function was dereferencing "cookie" argument which caused crash. Solution: Cookie is changed to carry the value that was supposed to be stored in fop->data, so even in the case when fop is NULL in error case, there won't be any NULL dereference. Thanks to Xavi for the suggestion about the fix. Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c updates: #1061
* glusterd: Brick process fails to come up with brickmux onVishal Pandey2020-03-172-7/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: 1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1, vol2, vol3 with 3 bricks (one from each node) 2- Set cluster.brick-multiplex on 3- Start all 3 volumes 4- Check if all bricks on a node are running on same port 5- Kill N1 6- Set performance.readdir-ahead for volumes vol1, vol2, vol3 7- Bring N1 up and check volume status 8- All bricks processes not running on N1. Root Cause - Since, There is a diff in volfile versions in N1 as compared to N2 and N3 therefore glusterd_import_friend_volume() is called. glusterd_import_friend_volume() copies the new_volinfo and deletes old_volinfo and then calls glusterd_start_bricks(). glusterd_start_bricks() looks for the volfiles and sends an rpc request to glusterfs_handle_attach(). Now, since the volinfo has been deleted by glusterd_delete_stale_volume() from priv->volumes list before glusterd_start_bricks() and glusterd_create_volfiles_and_notify_services() and glusterd_list_add_order is called after glusterd_start_bricks(), therefore the attach RPC req gets an empty volfile path and that causes the brick to crash. Fix- Call glusterd_list_add_order() and glusterd_create_volfiles_and_notify_services before glusterd_start_bricks() cal is made in glusterd_import_friend_volume > Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa > Bug: bz#1773856 > Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> (cherry picked from commit 45e81aae791da9d013aba2286af44826227c05ec) Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa fixes: bz#1808964 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* events: fix IPv6 memory corruptionXavi Hernandez2020-03-171-41/+15
| | | | | | | | | | | | | | | | | | When an event was generated and the target host was resolved to an IPv6 address, there was a memory overflow when that address was copied to a fixed IPv4 structure (IPv6 addresses are longer than IPv4 ones). This fix correctly handles IPv4 and IPv6 addresses returned by getaddrinfo() Backport of: > Change-Id: I5864a0c6e6f1b405bd85988529570140cf23b250 > Fixes: bz#1790870 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Change-Id: I5864a0c6e6f1b405bd85988529570140cf23b250 Fixes: #1030 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* glusterd: stop stale bricks during handshaking in brick mux modeAtin Mukherjee2020-03-164-9/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses two problems: 1. During friend handshaking, if a volume is imported due to change in the version, the old bricks were not stopped which would lead to a situation where bricks will run with old volfiles. 2. As part of attaching shd service in glusterd_attach_svc, there might be a case that the volume for which we're attempting to attach a shd service might become stale and in the process of deletion and hence in every retrials (if the rpc connection isn't ready) check for the existance of the volume and then only attempt the further attach request. patch on master: https://review.gluster.org/#/c/glusterfs/+/23042/ > Bug: bz#1733425 > Change-Id: I6bac6b871f7e31cb5bf277db979289dec196a03e > Signed-off-by: Atin Mukherjee <amukherj@redhat.com> > Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> fixes: bz#1812849 Change-Id: I6bac6b871f7e31cb5bf277db979289dec196a03e Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* multiple: fix bad type castXavi Hernandez2020-03-166-21/+42
| | | | | | | | | | | | | | | | | When using inode_ctx_get() or inode_ctx_set(), a 'uint64_t *' is expected. In many cases, the value to retrieve or store is a pointer, which will be of smaller size in some architectures (for example 32-bits). In this case, directly passing the address of the pointer casted to an 'uint64_t *' is wrong and can cause memory corruption. Backport of: > Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> > Fixes: bz#1785611 Change-Id: Iae616da9dda528df6743fa2f65ae5cff5ad23258 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: bz#1785323
* cluster/afr: fix race when bricks come upXavi Hernandez2020-03-163-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | The was a problem when self-heal was sending lookups at the same time that one of the bricks was coming up. In this case there was a chance that the number of 'up' bricks changes in the middle of sending the requests to subvolumes which caused a discrepancy in the expected number of replies and the actual number of sent requests. This discrepancy caused that AFR continued executing requests before all requests were complete. Eventually, the frame of the pending request was destroyed when the operation terminated, causing a use- after-free issue when the answer was finally received. In theory the same thing could happen in the reverse way, i.e. AFR tries to wait for more replies than sent requests, causing a hang. Backport of: > Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> > Fixes: bz#1808875 Change-Id: I7ed6108554ca379d532efb1a29b2de8085410b70 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: bz#1809438
* eventsapi: Set IPv4/IPv6 family based on input IPAravinda VK2020-03-161-1/+4
| | | | | | | | | | | | | | | | | server.sin_family was set to AF_INET while creating socket connection, this was failing if the input address is IPv6(`::1`). With this patch, sin_family is set by reading the ai_family of `getaddrinfo` result. Backport of: > Fixes: bz#1752330 > Change-Id: I499f957b432842fa989c698f6e5b25b7016084eb > Signed-off-by: Aravinda VK <avishwan@redhat.com> Fixes: bz#1807785 Change-Id: I499f957b432842fa989c698f6e5b25b7016084eb Signed-off-by: Aravinda VK <avishwan@redhat.com>
* cluster/ec: skip updating ctx->loc again when ec_fix_open/opendirKinglong Mee2020-03-162-10/+14
| | | | | | | | | | | | | The ec_manager_open/opendir memsets ctx->loc which causes memory/inode leak, and ec_fheal uses ctx->loc out of fd->lock that loc_copy may copy bad data when memset it. This patch skips updating ctx->loc when it is initilizaed. With it, ctx->loc is filled once, and never updated. Change-Id: I3bf5ffce4caf4c1c667f7acaa14b451d37a3550a fixes: bz#1806843 Signed-off-by: Kinglong Mee <mijinlong@horiscale.com>
* afr: prevent spurious entry heals leading to gfid split-brainRavishankar N2020-02-257-29/+69
| | | | | | | | | | | | | | | | | | | | | Problem: In a hyperconverged setup with granular-entry-heal enabled, if a file is recreated while one of the bricks is down, and an index heal is triggered (with the brick still down), entry-self heal was doing a spurious heal with just the 2 good bricks. It was doing a post-op leading to removal of the filename from .glusterfs/indices/entry-changes as well as erroneous setting of afr xattrs on the parent. When the brick came up, the xattrs were cleared, resulting in the renamed file not getting healed and leading to gfid split-brain and EIO on the mount. Fix: Proceed with entry heal only when shd can connect to all bricks of the replica, just like in data and metadata heal. fixes: bz#1804591 Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)
* core: fix memory pool management racesXavi Hernandez2020-02-255-105/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Objects allocated from a per-thread memory pool keep a reference to it to be able to return the object to the pool when not used anymore. The object holding this reference can have a long life cycle that could survive a glfs_fini() call. This means that it's unsafe to destroy memory pools from glfs_fini(). Another side effect of destroying memory pools from glfs_fini() is that the TLS variable that points to one of those pools cannot be reset for all alive threads. This means that any attempt to allocate memory from those threads will access already free'd memory, which is very dangerous. To fix these issues, mem_pools_fini() doesn't destroy pool lists anymore. They should be destroyed when the library is unloaded or the process is terminated, but this cannot be done right now because gluster doesn't stop other threads before calling exit(), which could cause some races. This patch is the backport of 2 master patches: > Change-Id: Ib189a5510ab6bdac78983c6c65a022e9634b0965 > Fixes: bz#1801684 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> > > Change-Id: Id7cfb4407fcf208e28f03a7c3cdc3ef9c1f3bf9b > Fixes: bz#1801684 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Change-Id: Id7cfb4407fcf208e28f03a7c3cdc3ef9c1f3bf9b Fixes: bz#1805668 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/thin-arbiter: Wait for TA connection before ta-file lookupAshish Pandey2020-02-181-19/+21
| | | | | | | | | | | | | | | | | | | Problem: When we mount a ta volume, as soon as 2 data bricks are connected we consider that the mount is done and then send a lookup/create on ta file on ta node. However, this connection with ta node might not have been completed. Due to this delay, ta replica id file will not be created and we will see ENOTCONN error in log file if we do lookup. Solution: As we know that this ta node could have a higher latency, we should wait for reasonable time for connection to happen before sending lookup/create on replica id file. fixes: bz#1804058 Change-Id: I36f90865afe617e4e84cee57fec832a16f5dd6cc (cherry picked from commit a7fa54ddea3fe429f143b37e4de06a93b49d776a)
* doc: Added release 7.3 notesv7.3Rinku Kothiya2020-02-171-0/+34
| | | | | | | Fixes: bz#1803713 Change-Id: I0ff6c3152624b8c8f8f76057a1948aed2dc3cec0 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* volgen: make thin-arbiter name unique in 'pending-xattr' optionAmar Tumballi2020-02-172-2/+13
| | | | | | | | | | | Thin-arbiter module makes use of 'pending-xattr' name for the translator as the filename which gets created in thin-arbiter node. By making this unique, we can host single thin-arbiter node for multiple clusters. Updates: #763 Change-Id: Ib3c732e7e04e6dba229e71ae3e64f1f3cb6d794d Signed-off-by: Amar Tumballi <amar@kadalu.io> (cherry picked from commit 8db8202f716fd24c8c52f8ee5f66e169310dc9b1)
* tests: Fix spurious self-heald.t failurePranith Kumar K2020-02-132-34/+21
| | | | | | | | | | | | | | | | | | | Problem: heal-info code assumes that all indices in xattrop directory definitely need heal. There is one corner case. The very first xattrop on the file will lead to adding the gfid to 'xattrop' index in fop path and in _cbk path it is removed because the fop is zero-xattr xattrop in success case. These gfids could be read by heal-info and shown as needing heal. Fix: Check the pending flag to see if the file definitely needs or not instead of which index is being crawled at the moment. fixes: bz#1802449 Change-Id: I79f00dc7366fedbbb25ec4bec838dba3b34c7ad5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> (cherry picked from commit d27df94016b5526c18ee964d4a47508326329dda)
* server: Mount fails after reboot 1/3 gluster nodesMohit Agrawal2020-02-103-16/+29
| | | | | | | | | | | | | | | | | | | | | | Problem: At the time of coming up one server node(1x3) after reboot client is unmounted.The client is unmounted because a client is getting AUTH_FAILED event and client call fini for the graph.The client is getting AUTH_FAILED because brick is not attached with a graph at that moment Solution: To avoid the unmounting the client graph throw ENOENT error from server in case if brick is not attached with server at the time of authenticate clients. > Credits: Xavi Hernandez <xhernandez@redhat.com> > Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e > Fixes: bz#1793852 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (cherry picked from commit > f6421dff22a6ddaf14134f6894deae219948c89d) Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e Fixes: bz#1794019 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* rpc: Cleanup SSL specific data at the time of freeing rpc objectl17zhou2020-02-102-5/+40
| | | | | | | | | | | | | | | | | Problem: At the time of cleanup rpc object ssl specific data is not freeing so it has become a leak. Solution: To avoid the leak cleanup ssl specific data at the time of cleanup rpc object > Credits: l17zhou <cynthia.zhou@nokia-sbell.com.cn> > Fixes: bz#1768407 > Change-Id: I37f598673ae2d7a33c75f39eb8843ccc6dffaaf0 > (cherry picked from commit > > 54ed71dba174385ab0d8fa415e09262f6250430c) Change-Id: I37f598673ae2d7a33c75f39eb8843ccc6dffaaf0 Fixes: bz#1795540 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* Fix possible resource leaks.Xi Jinyu2020-02-101-0/+3
| | | | | | | | | | | | | | | | | | | | xlators/features/quota/src/quota.c quota_log_usage function. The quota_log_helper() function applies memory for path through inode_path(), should be GF_FREE(). Upstream Patch: https://review.gluster.org/#/c/glusterfs/+/24018/ Backport of: > fixes: bz#1792707 > Change-Id: I33143bdf272bf10837061df4a1b7b2fc146162d5 > Signed-off-by: Xi Jinyu <xijinyu@cmss.chinamobile.com> > (cherry picked from commit 18549de12bcfafe4ac30fc2e11ad7a3f3c216b38) fixes: bz#1791154 Change-Id: I33143bdf272bf10837061df4a1b7b2fc146162d5 Signed-off-by: Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* gf-event: Handle unix volfile-serversPranith Kumar K2020-02-101-1/+10
| | | | | | | | | | | | | | | Problem: glfsheal program uses unix-socket-based volfile server. volfile server will be the path to socket in this case. gf_event expects this to be hostname in all cases. So getaddrinfo will fail on the unix-socket path, events won't be sent in this case. Fix: In case of unix sockets, default to localhost fixes: bz#1793085 Change-Id: I60d27608792c29d83fb82beb5fde5ef4754bece8 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* geo-rep: Fix ssh-port validationSunny Kumar2020-02-103-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If non-standard ssh-port is used, Geo-rep can be configured to use ssh port by using config option, the value should be in allowed port range and non negative. At present it can accept negative value and outside allowed port range which is incorrect. Many Linux kernels use the port range 32768 to 61000. IANA suggests it should be in the range 1 to 2^16 - 1, so keeping the same. $ gluster volume geo-replication master 127.0.0.1::slave config ssh-port -22 geo-replication config updated successfully $ gluster volume geo-replication master 127.0.0.1::slave config ssh-port 22222222 geo-replication config updated successfully This patch fixes the above issue and have added few validations around this in test cases. Upstream Patch: https://review.gluster.org/#/c/glusterfs/+/24035/ Backport of: > Change-Id: I9875ab3f00d7257370fbac6f5ed4356d2fed3f3c > Fixes: bz#1792276 > Signed-off-by: Sunny Kumar <sunkumar@redhat.com> > (cherry picked from commit 485212e858bddd97573a3b2b811357b0d822005a) Change-Id: I9875ab3f00d7257370fbac6f5ed4356d2fed3f3c Fixes: bz#1793412 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* cli: duplicate defns of cli_default_conn_timeout and cli_ten_minutes_timeoutKaleb S. KEITHLEY2020-01-212-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Winter is coming. So is gcc-10. Compiling with gcc-10-20191219 snapshot reveals dupe defns of cli_default_conn_timeout and cli_ten_minutes_timeout in .../cli/src/cli.[ch] due to missing extern decl. There are many changes coming in gcc-10 described in https://gcc.gnu.org/gcc-10/changes.html compiling cli.c with gcc-9 we see: ... .quad .LC88 .comm cli_ten_minutes_timeout,4,4 .comm cli_default_conn_timeout,4,4 .text .Letext0: ... and with gcc-10: ... .quad .LC88 .globl cli_ten_minutes_timeout .bss .align 4 .type cli_ten_minutes_timeout, @object .size cli_ten_minutes_timeout, 4 cli_ten_minutes_timeout: .zero 4 .globl cli_default_conn_timeout .align 4 .type cli_default_conn_timeout, @object .size cli_default_conn_timeout, 4 cli_default_conn_timeout: .zero 4 .text .Letext0: ... which is reflected in the .o file as (gcc-9): ... 0000000000000004 C cli_ten_minutes_timeout 0000000000000004 C cli_default_conn_timeout ... and (gcc-10): ... 0000000000000020 B cli_ten_minutes_timeout 0000000000000024 B cli_default_conn_timeout ... See nm(1) and ld(1) for a description C (common) and B (BSS) and how they are treated by the linker. Note: gcc-10 will land in Fedora-32! Change-Id: I54ea485736a4910254eeb21222ad263721cdef3c Fixes: bz#1793492 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* doc: Added release 7.2 notesv7.2Rinku Kothiya2020-01-151-0/+30
| | | | | | | Fixes: bz#1791177 Change-Id: I0d732f82217ee4fecf1d32df4be4b7492022c0ca Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* tools/glusterfind: Remove an extra argumentShwetha K Acharya2020-01-141-1/+1
| | | | | | | | | | | | | Backport of: > Upstream Patch: https://review.gluster.org/#/c/glusterfs/+/24011/ >fixes: bz#1790748 >Change-Id: I1cb12c975142794139456d0f8e99fbdbb03c53a1 >Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> >(cherry picked from commit d73872e764214f8071c8915536a75bdac1e5e685) fixes: bz#1790846 Change-Id: I1cb12c975142794139456d0f8e99fbdbb03c53a1 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>