summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* dht-hashfn.c: remove a strlen()Yaniv Kaul2020-01-141-16/+19
| | | | | | | | | | | We already have the length of the name, or when we munge it, we can return the length of it instead of strlen() again. Also, reduce a bit the code under the lock. Change-Id: I0141b0725ed1a4134d8d9f81ed1187b551b038b5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* multiple xlators: reduce key lengthYaniv Kaul2020-01-141-3/+3
| | | | | | | | | | | | | | | In many cases, we were freely allocating long keys with no need. Smaller char arrays are just fine almost anywhere, so just went ahead and looked where they we can use smaller ones. In some cases, annotated the functions as static and the prefixes passed as const as it was easier to read and understand. Where relevant, converted the dict functions to use known key length. Change-Id: I882ab33ea20d90b63278336cd1370c09ffdab7f2 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* dht-rename.c: fix Coverity issues 1397018/7 - strcat into uninitialized valueYaniv Kaul2020-01-101-0/+4
| | | | | | | | | initialize both src and dst if they were not initialized already. fixes: CID#1397018 Change-Id: Ic91954423953e8bf24eaa11fc2798c554f304d28 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* afr: expose cluster.optimistic-change-log to CLI.Ravishankar N2020-01-071-0/+2
| | | | | | | | | | | This volume option was not made avaialble to `gluster volume set` CLI. Reported-by: epolakis(https://github.com/kinsu) in https://github.com/gluster/glusterfs/issues/781 fixes: bz#1787554 Change-Id: I7141bdd4e53ee99e22b354edde8d023bfc0b2cd7 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr: simplify afr_has_quorum()Yaniv Kaul2020-01-023-29/+10
| | | | | | | | | | | | | 1. Perform AFR_COUNT() once, in afr_has_quorum() and pass the result to afr_lookup_has_quorum() 2. Simplify afr_lookup_has_quorum() - pass less parameters to it. (Via the change in item 1 above). 3. Make afr_is_add_replica_mount_lookup_on_root() static function. 4. Remove dead code - afr_decide_heal_info() which was not used. Change-Id: If9168cd01e22788a0e60b91e315787d2aa60e97b updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Avoid buffer overwrite due to uuid_utoa() misuseDmitry Antipov2019-12-271-2/+3
| | | | | | | | | | | | | | | Code like: f(..., uuid_utoa(x), uuid_utoa(y)); is not valid (causes undefined behaviour) because uuid_utoa() uses the only static thread-local buffer which will be overwritten by the subsequent call. All such cases should be converted to use uuid_utoa_r() with explicitly specified buffer. Change-Id: I5e72bab806d96a9dd1707c28ed69ca033b9c8d6c Updates: bz#1193929 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
* afr_inode/xlator: structure loggingyatipadia2019-12-203-57/+159
| | | | | | | | convert gf_msg() into gf_smsg() Change-Id: I8f5b7bbb9caa78902b06f67257502b67adab7405 Updates: #657 Signed-off-by: yatipadia <ypadia@redhat.com>
* DHT - Reduce methods scope (dht-common.c)Barak Sason Rofman2019-12-173-804/+708
| | | | | | | | | | | Methods that should have been static were defined as global, and the other way around. This patch fixes the issue in order to enforce encapsulation. updates: bz#1776757 Change-Id: I3eb5781849c5e597c1dd347e03f356c00db62a39 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* dht-common.h/dht-helper.c: exctract the LOCK() from DHT_UPDATE_TIMEYaniv Kaul2019-12-172-20/+20
| | | | | | | | | | | | | Currently, the code (and only place) that is using this macro is in dht_inode_ctx_time_update() where it is called 3 times in a row, which is essentially 3 cycles of LOCK/UNLOCK on the same lock. Instead, extract the LOCK()/UNLOCK() part of the macro and wrap those calls with it. Change-Id: I6312b985e3d97517857b55f342440accc4063db6 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* afr: make heal info locklessRavishankar N2019-12-123-203/+211
| | | | | | | | | | | | | | | | | | | Changes in locks xlator: Added support for per-domain inodelk count requests. Caller needs to set GLUSTERFS_MULTIPLE_DOM_LK_CNT_REQUESTS key in the dict and then set each key with name 'GLUSTERFS_INODELK_DOM_PREFIX:<domain name>'. In the response dict, the xlator will send the per domain count as values for each of these keys. Changes in AFR: Replaced afr_selfheal_locked_inspect() with afr_lockless_inspect(). Logic has been added to make the latter behave same as the former, thus not breaking the current heal info output behaviour. fixes: bz#1774011 Change-Id: Ie9e83c162aa77f44a39c2ba7115de558120ada4d Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr/self-heald - Missing error logsBarak Sason Rofman2019-12-102-24/+77
| | | | | | | | | | As a follow up on https://review.gluster.org/#/c/glusterfs/+/23749/, adding error logging for the entire method. In addition, converted logging to structured logging in the method. Fixes: bz#1778457 Change-Id: I1f412159e6849d6f6ddbde53ec4a85ad709bbdf4 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* cluster/dht: Add comments to codeN Balachandran2019-12-102-2/+14
| | | | | | Change-Id: Ieb7531af19ae89fb8a8387e81663c7f157b10c02 Updates: bz#1765421 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr - coverity issue fixBarak Sason Rofman2019-11-281-0/+5
| | | | | | | | | | | Added a log for a failure in order to avoid "unused variable" coverity issue. fixes: CID#1274209 Change-Id: Ibc6b0ab4bdff482096e42e88fd4c8c7eadfeeadb Updates: bz#789278 Signed-off-by: Barak Sason Rofman <bsasonro@redhat.com>
* Revert "afr: make heal info lockless"Ravishankar N2019-11-283-178/+68
| | | | | | | | This reverts commit fce5f68bc72d448490a0d41be494ac54a9181b3c. I merged the wrong patch by mistake! Hence reverting it. updates: bz#1774011 Change-Id: Id7d6ed1d727efc02467c8a9aea3374331261ebd5
* afr: make heal info locklessRavishankar N2019-11-283-68/+178
| | | | | | | | | | | | | | | | | | | Changes in locks xlator: Added support for per-domain inodelk count requests. Caller needs to set GLUSTERFS_MULTIPLE_DOM_LK_CNT_REQUESTS key in the dict and then set each key with name 'GLUSTERFS_INODELK_DOM_PREFIX:<domain name>'. In the response dict, the xlator will send the per domain count as values for each of these keys. Changes in AFR: Replaced afr_selfheal_locked_inspect() with afr_lockless_inspect(). Logic has been added to make the latter behave same as the former, thus not breaking the current heal info output behaviour. fixes: bz#1774011 Change-Id: I9ae08ce768b39aeb6ee230207b5b7fa744176952 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: Add comments to the codeN Balachandran2019-11-161-12/+75
| | | | | | | | | Add comments to the code to explain what is being done and why. Change-Id: I50831d7bd4bb73e75f6cda05fafaeb5a8619baae Updates: bz#1765421 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* afr: fix log floodingRavishankar N2019-11-162-1/+4
| | | | | | | | | | | | | | | | | | | | | | | Commit "ccf33e789 - dict.c: remove redundant checks" removed some NULL checks in certain dict functions. This caused flooding of fuse mount logs when I/O was done on the mount on a replica volume: Message: W [dict.c:1478:dict_get_with_refn] (-->/usr/local/lib/libglusterfs.so.0(dict_get_uint32+0x4d) [0x7ff9121ec963] -->/usr/local/lib/libglusterfs.so.0(dict_get_with_ref+0x90) [0x7ff9121eb93f] -->/usr/local/lib/libglusterfs.so.0(+0x229be) [0x7ff9121eb9be] ) 0-dict: dict OR key (glusterfs.lk.lkmode) is NULL [Invalid argument] Fix: In the relevant AFR functions, check that dict is not NULL before trying to perform operations on it. See bug description for more details. fixes: bz#1772006 Change-Id: I30c89c0b5d6c80cc86a6047aae70127769412120 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: Don't skip entries with invalid statN Balachandran2019-11-131-5/+4
| | | | | | | | | Don't strip out entries with invalid stats in dht_readdirp_cbk. Change-Id: I136ab342762d020a3c0f43e51e0090aed2af4120 Fixes: bz#1769754 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/ec: Change handling of heal failure to avoid crashAshish Pandey2019-11-042-13/+13
| | | | | | | | | | | | | | | | | Problem: ec_getxattr_heal_cbk was called with NULL as second argument in case heal was failing. This function was dereferencing "cookie" argument which caused crash. Solution: Cookie is changed to carry the value that was supposed to be stored in fop->data, so even in the case when fop is NULL in error case, there won't be any NULL dereference. Thanks to Xavi for the suggestion about the fix. Change-Id: I0798000d5cadb17c3c2fbfa1baf77033ffc2bb8c fixes: bz#1729085
* afr: lock healing changesRavishankar N2019-10-307-36/+849
| | | | | | | | | | | | | | | | | | | Implements lock healing for gluster-block fencing use case. If mandatory lock is enabled: - Add domain lock/unlock to afr_lk fop. - Maintain a list of locks to be healed in afr_private_t. - Add lock to the list if afr_lk(F_SETLK or F_SETLKW) was sucessful. - Remove it from the list during afr_lk(F_UNLCK). - On child_down, mark lock as needing heal on that child. If lock is lost on quorum no. of bricks, remove it from the list and mark fd bad. - For fds marked as bad, fail the subsequent fd based fops. - On parent up, traverse the list and heal the locks IFF the client is the lk owner and has quorum. (shd does not heal any locks). updates: #613 Change-Id: I03c46ceaea30f5e6236d5ec13f71d843d827f1bc Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Take a copy of xattr-reqPranith Kumar K2019-10-301-2/+7
| | | | | | | | | | Afr adds its own xattrs to the req, so it should take a copy of the dictionary to prevent parent xlator re-using the modified xattr-req to another subvolume fixes: bz#1765155 Change-Id: I268e2dbd1b12323135d369e90a22a8bdde2cf7c2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* dht: Rebalance causing IO Error - File descriptor in bad stateMohit Agrawal2019-10-155-17/+116
| | | | | | | | | | | | | | | | Problem : When a file is migrated, dht attempts to re-open all open fds on the new cached subvol. Earlier, if dht had not opened the fd, the client xlator would be unable to find the remote fd and would fall back to using an anon fd for the fop. That behavior changed with https://review.gluster.org/#/c/glusterfs/+/15804, causing fops to fail with EBADFD if the fd was not available on the cached subvol. The client xlator returns EBADFD if the remote fd is not found but dht only checks for EBADF before re-opening fds on the new cached subvol. Solution: Handle EBADFD at dht code path to avoid the issue Change-Id: I43c51995cdd48d05b12e4b2889c8dbe2bb2a72d8 Fixes: bz#1758579
* cluster/afr: Add afr_seek to fops tablePranith Kumar K2019-10-142-0/+4
| | | | | | fixes: bz#1760189 Change-Id: Iffbf8d6f4c50b8e2de8364658697bdbe96549f5d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* Multiple files: make root gfid a static variableYaniv Kaul2019-10-144-15/+4
| | | | | | | | | | | | | In many places we use it, compare to it, etc. It could be a static variable, as it really doesn't change. I think it's better than initializing to 0 and then doing gfid[15] = 1 or other tricks. I think there are additional oppportunuties to make more variables static. This is an attempt at an easy one. Change-Id: I7f23a30a94056d8f043645371ab841cbd0f90d19 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* afr: align structsYaniv Kaul2019-10-142-138/+138
| | | | | | | | | | squash >50 warnings on padding of structs in afr structures. The warnings were found by manually added '-Wpadded' to the GCC command line. Change-Id: I961fbdeb33715cedf3dd10db8e4f8ef40cd3e867 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/afr: Heal entries when there is a source & no healed_sinkskarthik-us2019-10-091-0/+15
| | | | | | | | | | | | | | | | | | | | Problem: In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame anything for entry heal, heal will not complete even though we have clear source and sinks. This will happen because while doing afr_selfheal_find_direction() only the bricks which are blamed by non-accused bricks are considered as sinks. Later in __afr_selfheal_entry_finalize_source() when it tries to mark all the non-sources as sinks it fails to do so because there won't be any healed_sinks marked, no witness present and there will be a source. Fix: If there is a source and no healed_sinks, then reset all the locked sources to 0 and healed sinks to 1 to do conservative merge. Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548 Fixes: bz#1749322 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* afr: support split-brain CLI for replica 3Ravishankar N2019-10-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Ever since we added quorum checks for lookups in afr via commit bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution commands would not work for replica 3 because there would be no readables for the lookup fop. The argument was that split-brains do not occur in replica 3 but we do see (data/metadata) split-brain cases once in a while which indicate that there are a few bugs/corner cases yet to be discovered and fixed. Fortunately, commit 8016d51a3bbd410b0b927ed66be50a09574b7982 added GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD, split-brain resolution commands will work for replica 3 volumes too. Likewise, the check is added in shard_lookup as well to permit resolving split-brains by specifying "/.shard/shard-file.xx" as the file name (which previously used to fail with EPERM). Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9 Fixes: bz#1756938 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr: replace afr_frame_return() when possible with direct callYaniv Kaul2019-10-075-15/+9
| | | | | | | | | | | | If you are already under lock, just decrement the call count directly instead of removing the lock, re-taking the lock and decrementing. Implements https://github.com/gluster/glusterfs/issues/728 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Change-Id: I3fa20b4651fbdb826655c5a03baeed46e99b5487
* cluster/dht: Correct fd processing loopN Balachandran2019-10-021-22/+62
| | | | | | | | | | | | The fd processing loops in the dht_migration_complete_check_task and the dht_rebalance_inprogress_task functions were unsafe and could cause an open to be sent on an already freed fd. This has been fixed. Change-Id: I0a3c7d2fba314089e03dfd704f9dceb134749540 Fixes: bz#1757399 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/ec: Implement read-mask featurePranith Kumar K2019-09-273-0/+82
| | | | | | fixes: #725 Change-Id: Iaaefe6f49c8193c476b987b92df6bab3e2f62601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: prevent filling shd log with "table not found" messagesXavi Hernandez2019-09-261-2/+13
| | | | | | | | | | | | | | | When self-heal daemon receives an inodelk contention notification, it tries to locate the related inode using inode_find() and the inode table owned by top-most xlator, which in this case doesn't have any inode table. This causes many messages to be logged by inode_find() function because the inode table passed is NULL. This patch prevents this by making sure the inode table is not NULL before calling inode_find(). Change-Id: I8d001bd180aaaf1521ba40a536b097fcf70c991f Fixes: bz#1755344 Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
* afr-common.c, afr-self-heal.h: calloc/alloca0 -> malloc/allocaYaniv Kaul2019-09-202-5/+4
| | | | | | | | | | In 3 cases, there was a memory allocation and zeroing, followed directly by populating it with content. Replaced with memory allocation that did not zero the memory. Change-Id: I4fbb5c924fb3a144e415d2368126b784dde760ea updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/dht: Handle file truncates during migrationN Balachandran2019-09-171-26/+34
| | | | | | | | | File truncate operations during a migration were not handled properly. This has been fixed. Change-Id: Ic642d257e893641236a4a21ab69fcc7a569dd70a Fixes: bz#1745967 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* ctime/rebalance: Heal ctime xattr on directory during rebalanceKotresh HR2019-09-163-4/+7
| | | | | | | | | | | | | | | | After add-brick and rebalance, the ctime xattr is not present on rebalanced directories on new brick. This patch fixes the same. Note that ctime still doesn't support consistent time across distribute sub-volume. This patch also fixes the in-memory inconsistency of time attributes when metadata is self healed. Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df fixes: bz#1734026 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/ec: Mark release only when it is acquiredPranith Kumar K2019-09-122-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Mount-1 Mount-2 1)Tries to acquire lock on 'dir1' 1)Tries to acquire lock on 'dir1' 2)Lock is granted on brick-0 2)Lock gets EAGAIN on brick-0 and leads to blocking lock on brick-0 3)Gets a lock-contention 3) Doesn't matter what happens on mount-2 notification, marks lock->release from here on. to true. 4)New fop comes on 'dir1' which will be put in frozen list as lock->release is set to true. 5) Lock acquisition from step-2 fails because 3 bricks went down in 4+2 setup. Fop on mount-1 which is put in frozen list will hang because no codepath will move it from frozen list to any other list and the lock will not be retried. Fix: Don't set lock->release to true if lock is not acquired at the time of lock-contention-notification fixes: bz#1743573 Change-Id: Ie6630db8735ccf372cc54b873a3a3aed7a6082b7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: quorum-count implementationPranith Kumar K2019-09-086-59/+110
| | | | | | fixes: #721 Change-Id: I5333540e3c635ccf441cf1f4696e4c8986e38ea8 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Fix coverity issuesPranith Kumar K2019-09-071-12/+16
| | | | | | | | | | | | | | | | | Fixed the following coverity issue in both flush/fsync >>> CID 1404964: Null pointer dereferences (REVERSE_INULL) >>> Null-checking "fd" suggests that it may be null, but it has already been dereferenced on all paths leading to the check. >>> if (fd != NULL) { >>> fop->fd = fd_ref(fd); >>> if (fop->fd == NULL) { >>> gf_msg(this->name, GF_LOG_ERROR, 0, >>> "Failed to reference a " >>> "file descriptor."); fixes bz#1748836 Change-Id: I19c05d585e23f8fbfbc195d1f3775ec528eed671 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Fail fsync/flush for files on update size/version failurePranith Kumar K2019-09-065-1/+80
| | | | | | | | | | | | | | | | | Problem: If update size/version is not successful on the file, updates on the same stripe could lead to data corruptions if the earlier un-aligned write is not successful on all the bricks. Application won't have any knowledge of this because update size/version happens in the background. Fix: Fail fsync/flush on fds that are opened before update-size-version went bad. fixes: bz#1748836 Change-Id: I9d323eddcda703bd27d55f340c4079d76e06e492 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr/lookup: Pass xattr_req in while doing a selfheal in lookupMohammed Rafi KC2019-09-053-5/+16
| | | | | | | | | | We were not passing xattr_req when doing a name self heal as well as a meta data heal. Because of this, some xdata was missing which causes i/o errors Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f Fixes: bz#1728770 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* afr: wake up index healer threadsRavishankar N2019-08-305-11/+25
| | | | | | | | | | | | | ...whenever shd is re-enabled after disabling or there is a change in `cluster.heal-timeout`, without needing to restart shd or waiting for the current `cluster.heal-timeout` seconds to expire. See BZ 1743988 for more details. Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe fixes: bz#1744548 Reported-by: Glen Kiessling <glenk1973@hotmail.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr - Unused variablesBarak Sason2019-08-242-6/+9
| | | | | | | | | | | -Minor change to if-else structure to avoid code duplication. -Added logging in case method calls fails CID: 1394654 Updates: bz#789278 Change-Id: Ibef4450dc89ddd3bf951303d5b87f503924fd250 Signed-off-by: Barak Sason <bsasonro@redhat.com>
* afr: restore timestamp of parent dir during entry-healRavishankar N2019-08-141-0/+2
| | | | | | Fixes: bz#1734370 Change-Id: I29e338bac62104233a6f80212df8d0fb016affda Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/ec: Fix coverity issue.Ashish Pandey2019-08-131-1/+1
| | | | | | Change-Id: I727287784a15d89441865de7f438002e4a370250 fixes: bz#1738763 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: Update lock->good_mask on parent fop failurePranith Kumar K2019-08-072-0/+4
| | | | | | | | | | When discard/truncate performs write fop, it should do so after updating lock->good_mask to make sure readv happens on the correct mask fixes bz#1727081 Change-Id: Idfef0bbcca8860d53707094722e6ba3f81c583b7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: Log hashes in hexN Balachandran2019-08-064-15/+13
| | | | | | | | | Log layout hash ranges in hex to make it easier to compare them to the on disk xattrs. Change-Id: Ib75c2508bf8e0ab7f5ae26d0443ef02b792b7307 Fixes: bz#1697293 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* Multiple files: get trivial stuff done before lockYaniv Kaul2019-08-012-7/+5
| | | | | | | | | Initialize a dictionary for example seems to be prefectly fine to be done before taking a lock. Change-Id: Ib29516c4efa8f0e2b526d512beab488fcd16d2e7 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* cluster/ec: Create heal task with heal process idAshish Pandey2019-07-301-1/+19
| | | | | | | | | | | | | | | | | | | Problem: ec_data_undo_pending calls syncop_fxattrop->SYNCOP without a frame. In this case SYNCOP gets the frame of the task. However, when we create a synctask for heal we provide frame as NULL. Now, if the read-only feature is ON, it will receive the process ID of the shd as 0 and will consider that it as not an internal process. This will prevent healing of a file with "Read-only file system" error message log. Solution: While launching heal, create a synctask using frame and set process id of the SHD which is -6. Change-Id: I37195399c85de322cbcac75633888922c4e3db4a Fixes: bz#1734252
* cluster/ec: Fix reopen flags to avoid misbehaviorPranith Kumar K2019-07-302-3/+8
| | | | | | | | | | | | | | | | | | | | | | | Problem: when a file needs to be re-opened O_APPEND and O_EXCL flags are not filtered in EC. - O_APPEND should be filtered because EC doesn't send O_APPEND below EC for open to make sure writes happen on the individual fragments instead of at the end of the file. - O_EXCL should be filtered because shd could have created the file so even when file exists open should succeed - O_CREAT should be filtered because open happens with gfid as parameter. So open fop will create just the gfid which will lead to problems. Fix: Filter out these two flags in reopen. Change-Id: Ia280470fcb5188a09caa07bf665a2a94bce23bc4 Fixes: bz#1733935 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Always read from good-maskPranith Kumar K2019-07-262-5/+25
| | | | | | | | | | There are cases where fop->mask may have fop->healing added and readv shouldn't be wound on fop->healing. To avoid this always wind readv to lock->good_mask fixes bz#1727081 Change-Id: I2226ef0229daf5ff315d51e868b980ee48060b87 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: fix EIO error for concurrent writes on sparse filesXavi Hernandez2019-07-241-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1730715 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>