glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	glusterd: rebalance start should fail when quorum is not met	Sanju Rakonde	2019-10-10	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	rebalance start should not succeed if quorum is not met. this patch adds a condition to check whether quorum is met in pre-validation stage. fixes: bz#1760467 Change-Id: Ic7d0d08f69e4bc6d5e7abae713ec1881531c8ad4 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	Fix spurious failure in bug-1744548-heal-timeout.t	Pranith Kumar K	2019-10-09	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Script was assuming that the heal would have triggered by the time test was executed, which may not be the case. It can lead to following failures when the race happens: ... 18:29:45 not ok 14 [ 85/ 1] < 26> '[ 331 == 333 ]' -> '' ... 18:29:45 not ok 16 [ 10097/ 1] < 33> '[ 668 == 666 ]' -> '' Heal on 3rd brick didn't start completely first time the command was executed. So the extra count got added to the next profile info. Fixed it by depending on cumulative stats and waiting until the count is satisfied using EXPECT_WITHIN fixes: bz#1759002 Change-Id: I3b410671c902d6b1458a757fa245613cb29d967d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Heal entries when there is a source & no healed_sinks	karthik-us	2019-10-09	1	-0/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a situation where B1 blames B2, B2 blames B1 and B3 doesn't blame anything for entry heal, heal will not complete even though we have clear source and sinks. This will happen because while doing afr_selfheal_find_direction() only the bricks which are blamed by non-accused bricks are considered as sinks. Later in __afr_selfheal_entry_finalize_source() when it tries to mark all the non-sources as sinks it fails to do so because there won't be any healed_sinks marked, no witness present and there will be a source. Fix: If there is a source and no healed_sinks, then reset all the locked sources to 0 and healed sinks to 1 to do conservative merge. Change-Id: If40d8bc95d52a52b2730f55bdcf135109b421548 Fixes: bz#1749322 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	tests: Fix spurious failure in bug-1134691-afr-lookup-metadata-heal.t	Ravishankar N	2019-10-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The .t was examining the sink brick's iatt value before the launched client-side metadata heal got a chance to complete. Fix: Wait for heal completion. Fixes: bz#1759081 Change-Id: I4dd4e3a1cccf35fd18e8cdfea6aa76a726a4763b Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr: support split-brain CLI for replica 3	Ravishankar N	2019-10-09	1	-0/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ever since we added quorum checks for lookups in afr via commit bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution commands would not work for replica 3 because there would be no readables for the lookup fop. The argument was that split-brains do not occur in replica 3 but we do see (data/metadata) split-brain cases once in a while which indicate that there are a few bugs/corner cases yet to be discovered and fixed. Fortunately, commit 8016d51a3bbd410b0b927ed66be50a09574b7982 added GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we leverage this and allow lookups in afr when pid is GF_CLIENT_PID_GLFS_HEALD, split-brain resolution commands will work for replica 3 volumes too. Likewise, the check is added in shard_lookup as well to permit resolving split-brains by specifying "/.shard/shard-file.xx" as the file name (which previously used to fail with EPERM). Change-Id: I3c543dea79caf7cfbc1633e9089cb1cdd2538ba9 Fixes: bz#1756938 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	tests: add a pending test case	Amar Tumballi	2019-10-03	1	-1/+17
\| \| \| \| \| \| \| \| \|	While merging the protocol handshake fixes (bz#1620580), there was a case which was left out. Adding it separately now. Change-Id: I52133d5fe160b4567400a65e60aac8f7bc20697f Updates: bz#1193929 Signed-off-by: Amar Tumballi <amarts@gmail.com>
*	ssl: fix RHEL8 regression failure	Sanju Rakonde	2019-10-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This tests is failing with "SSL routines:SSL_CTX_use_certificate:ee key too small" in RHEL8. This change is made according to https://access.redhat.com/solutions/4157431 updates: bz#1756900 Change-Id: Ib436372c3bd94bcf7324976337add7da4088b3d5 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	protocol/handshake: pass volume-id for extra check	Amar Tumballi	2019-09-30	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With added check of volume-id during handshake, we can be sure to not connect with a brick if this gets re-used in another volume. This prevents any accidental issues which can happen with a stale client process lurking along. Also added test case for testing same volume name which would fetch a different volfile (ie, different bricks, different type), and a different volume name, but same brick. For reference: Currently a client<->server handshake happens in glusterfs through protocol/client translator (setvolume) to protocol/server using a dictionary which containes many keys. Rejection happens in server side if some of the required keys are missing in handshake dictionary. Till now, there was no single unique identifier to validate for a client to tell server if it is actually talking to a corresponding server. All we look in protocol/client is a key called 'remote-subvolume', which should match with a subvolume name in server volume file, and for any volume with same brick name (can be present in same cluster due to recreate), it would be same. This could cause major issue, when a client was connected to a given brick, in one volume would be connected to another volume's brick if its re-created/re-used. To prevent this behavior, we are now passing along 'volume-id' in handshake, which would be preserved for the life of client process, which can prevent this accidental connections. NOTE: This behavior wouldn't be applicable for user-snapshot enabled volumes, as snapshotted volume's would have different volume-id. Fixes: bz#1620580 Change-Id: Ie98286e94ce95ae09c2135fd6ec7d7c2ca1e8095 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests/shard: Remove dependence on distributed cache	Pranith Kumar K	2019-09-27	1	-3/+3
\| \| \| \| \| \|	fixes: bz#1756211 Change-Id: Iee5b37af89ab624c16a45df364806003238280e5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: Implement read-mask feature	Pranith Kumar K	2019-09-27	1	-0/+114
\| \| \| \| \| \|	fixes: #725 Change-Id: Iaaefe6f49c8193c476b987b92df6bab3e2f62601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	read-ahead/io-cache: turn off by default	Raghavendra Gowdappa	2019-09-26	2	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've found perf xlators io-cache and read-ahead not adding any performance improvement. At best read-ahead is redundant due to kernel read-ahead and at worst io-cache is degrading the performance for workloads that doesn't involve re-read. Given that VFS already have both these functionalities, this patch makes these two translators turned off by default for native fuse mounts. For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have these xlators on by having custom profiles. Change-Id: Ie7535788909d4c741844473696f001274dc0bb60 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> fixes: bz#1676479
*	gfapi: 'glfs_h_creat_open' - new API to create handle and open fd	Soumya Koduri	2019-09-25	2	-0/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now we have two separate APIs, one - 'glfs_h_creat_handle' to create handle & another - 'glfs_h_open' to create a glfd to return to application Having two separate routines can result in access errors while trying to create and write into a read-only file. Since a fd is opened even during file/directory creation, introducing a new API to make these two operations atomic i.e, which can create both handle & fd and pass them to application Change-Id: Ibf513fcfcdad175f4d7eb6fa7a61b8feec6d33b5 Fixes: bz#1753569 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	tests : test case for non-root geo-rep setup	Sunny Kumar	2019-09-25	1	-0/+251
\| \| \| \| \| \| \| \|	Added test case for non-root geo-rep setup. Change-Id: Ib6ebee79949a9f61bdc5c7b5e11b51b262750e98 fixes: bz#1717827 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	ctime/rebalance: Heal ctime xattr on directory during rebalance	Kotresh HR	2019-09-16	8	-1/+497
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After add-brick and rebalance, the ctime xattr is not present on rebalanced directories on new brick. This patch fixes the same. Note that ctime still doesn't support consistent time across distribute sub-volume. This patch also fixes the in-memory inconsistency of time attributes when metadata is self healed. Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df fixes: bz#1734026 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	tests/shd: Mark "tests/basic/volume-scale-shd-mux.t" as bad	Mohammed Rafi KC	2019-09-16	1	-0/+2
\| \| \| \| \| \| \| \| \|	This test case is failing in upstream. Marking this test as bad for now. Change-Id: I014c67628c14683c32a3c1dd770b10aaf35ad4cc Updates: bz#1752331 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	protocol/client: don't reopen fds on which POSIX locks are held after a ↵	Raghavendra G	2019-09-12	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reconnect Bricks cleanup any granted locks after a client disconnects and currently these locks are not healed after a reconnect. This means post reconnect a competing process could be granted a lock even though the first process which was granted locks has not unlocked. By not re-opening fds, subsequent operations on such fds will fail forcing the application to close the current fd and reopen a new one. This way we prevent any silent corruption. A new option "client.strict-locks" is introduced to control this behaviour. This option is set to "off" by default. Change-Id: Ieed545efea466cb5e8f5a36199aa26380c301b9e Signed-off-by: Raghavendra G <rgowdapp@redhat.com> updates: bz#1694920
*	libgfapi: return correct errno on invalid volume name	Sheetal Pamecha	2019-09-12	3	-8/+95
\| \| \| \| \| \| \| \| \| \| \| \| \|	glfs_init when called with volume name prefixed by '/' sets errno to 0. Setting errno to EINVAL to resolve the issue. Also volname is a parameter to glfs_new. Thus, validating volname in glfs_new itself and returning EINVAL from that function fixes: bz#1507896 Change-Id: I0d4d2423e26cc07644d50ec8cce788ecc639203d Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	api: Cleanup of executable not done	Sheetal	2019-09-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In test tests/bugs/gfapi/bug-1447266/bug-1447266.t actual file created is - tests/bugs/gfapi/bug-1447266/bug-1447266 which is not cleaned up later fixes: bz#1750618 Change-Id: I93120418e54b95018a7213d106a1f1c990766281 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	tests: revive back volume-scale-shd-mux.t	Atin Mukherjee	2019-09-12	5	-30/+28
\| \| \| \| \| \| \|	Fixes: bz#1708929 Change-Id: I9cc81a9047ff874df752ca5552e00bf033485bd8 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests: Fix spurious failure	Pranith Kumar K	2019-09-11	1	-2/+20
\| \| \| \| \| \| \| \| \| \|	If heal from next brick starts after the first brick completes heal, then opendir on the brick can change atime leading to failure of the test. When ctime is disabled it is better to just check mtime to be same after heal. fixes: bz#1751134 Change-Id: Ia03e30fd547e6bbe85c1e299845ffa122f3a2692 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: quorum-count implementation	Pranith Kumar K	2019-09-08	3	-0/+224
\| \| \| \| \| \|	fixes: #721 Change-Id: I5333540e3c635ccf441cf1f4696e4c8986e38ea8 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/ec: Fail fsync/flush for files on update size/version failure	Pranith Kumar K	2019-09-06	2	-0/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If update size/version is not successful on the file, updates on the same stripe could lead to data corruptions if the earlier un-aligned write is not successful on all the bricks. Application won't have any knowledge of this because update size/version happens in the background. Fix: Fail fsync/flush on fds that are opened before update-size-version went bad. fixes: bz#1748836 Change-Id: I9d323eddcda703bd27d55f340c4079d76e06e492 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	graph/cleanup: Fix race in graph cleanup	Mohammed Rafi KC	2019-09-05	2	-3/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were unconditionally cleaning up the grap when we get child_down followed by parent_down. But this is prone to race condition when some of the bricks are already disconnected. In this case, even before the last child down is executed in the client xlator code,we might have freed the graph. Because the child_down event is alreadt recevied. To fix this race, we have introduced a check to see if all client xlator have cleared thier reconnect chain, and called the child_down for last time. Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042 fixes: bz#1727329 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	afr/lookup: Pass xattr_req in while doing a selfheal in lookup	Mohammed Rafi KC	2019-09-05	2	-0/+53
\| \| \| \| \| \| \| \| \| \|	We were not passing xattr_req when doing a name self heal as well as a meta data heal. Because of this, some xdata was missing which causes i/o errors Change-Id: Ibfb1205a7eb0195632dc3820116ffbbb8043545f Fixes: bz#1728770 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests: fix spurious failure of bug-1402841.t-mt-dir-scan-race.t	Ravishankar N	2019-09-04	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Since commit 600ba94183333c4af9b4a09616690994fd528478, shd starts healing as soon as it is toggled from disabled to enabled. This was causing the following line in the .t to fail on a 'fast' machine (always on my laptop and sometimes on the jenkins slaves). EXPECT_NOT "^0$" get_pending_heal_count $V0 because by the time shd was disabled, the heal was already completed. Fix: Increase the no. of files to be healed and make it a variable called FILE_COUNT, should we need to bump it up further because the machines become even faster. Also created pending metadata heals to increase the time taken to heal a file. fixes: bz#1748744 Change-Id: I5a26b08e45b8c19bce3c01ce67bdcc28ed48198d Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr: wake up index healer threads	Ravishankar N	2019-08-30	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \|	...whenever shd is re-enabled after disabling or there is a change in `cluster.heal-timeout`, without needing to restart shd or waiting for the current `cluster.heal-timeout` seconds to expire. See BZ 1743988 for more details. Change-Id: Ia5ebd7c8e9f5b54cba3199c141fdd1af2f9b9bfe fixes: bz#1744548 Reported-by: Glen Kiessling <glenk1973@hotmail.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	performance/md-cache: Do not skip caching of null character xattr values	Anoop C S	2019-08-20	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Null character string is a valid xattr value in file system. But for those xattrs processed by md-cache, it does not update its entries if value is null('\0'). This results in ENODATA when those xattrs are queried afterwards via getxattr() causing failures in basic operations like create, copy etc in a specially configured Samba setup for Mac OS clients. On the other side snapview-server is internally setting empty string("") as value for xattrs received as part of listxattr() and are not intended to be cached. Therefore we try to maintain that behaviour using an additional dictionary key to prevent updation of entries in getxattr() and fgetxattr() callbacks in md-cache. Credits: Poornima G <pgurusid@redhat.com> Change-Id: I7859cbad0a06ca6d788420c2a495e658699c6ff7 Fixes: bz#1726205 Signed-off-by: Anoop C S <anoopcs@redhat.com>
*	glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing	Mohit Agrawal	2019-08-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: sometime ./tests/bugs/glusterd/bug-1595320.t is failing is failing at the time of checking brick_process after sending a kill signal to brick process Solution: Wait sometime after just sending a kill signal to brick process to make sure brick process is stopped Change-Id: Iee9e91284618abfc62a550d47e4f9117785def58 Fixes: bz#1743200 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests/dht: Add a test file for file renames	N Balachandran	2019-08-19	1	-0/+1021
\| \| \| \| \| \| \| \| \|	Test the various combinations of hashed and cached subvols for the src and dst. Change-Id: I41416f9e5f2b7ea1c880d1913fdd6576da1ee868 fixes: bz#1626543 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	tests: mark ↵	Atin Mukherjee	2019-08-19	1	-0/+1
\| \| \| \| \| \| \| \|	bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t as BRICK_MUX_BAD_TEST Updates: bz#1743069 Change-Id: I1eea0186ca0c1b1226f4b3d0d7c0e41fc7821cbd Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	afr: restore timestamp of parent dir during entry-heal	Ravishankar N	2019-08-14	1	-0/+78
\| \| \| \| \| \|	Fixes: bz#1734370 Change-Id: I29e338bac62104233a6f80212df8d0fb016affda Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	glusterd: create separate logdirs for cluster.rc instances	N Balachandran	2019-08-14	1	-5/+6
\| \| \| \| \| \| \| \| \| \|	Create a separate logdir for each host instance created by cluster.rc. This makes it easier to determine the files belonging to a particular instance. Change-Id: Ic8321f83f98995412b7d5f095b3d3f0391767a8b Fixes: bz#1733042 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	features/shard: Send correct size when reads are sent beyond file size	Krutika Dhananjay	2019-08-12	1	-0/+29
\| \| \| \| \| \|	Change-Id: I0cebaaf55c09eb1fb77a274268ff564e871b743b fixes bz#1738419 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	tests: fix bug-880898.t crash	Ravishankar N	2019-08-12	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: https://build.gluster.org/job/centos7-regression/7337/consoleFull indicates the shd crashing for this .t. On looking at the core, I see the crash is at the time of shd init and glusterfs context is null: (gdb) bt (gdb) p ctx $2 = (glusterfs_ctx_t *) 0xf00000000 The .t is killing all gluster processes immediately after volume start, so it looks like a race between shd coming up and it being killed. Fix: Kill gluster processes only after they are up and running. Fixes: bz#1740017 Change-Id: I7cf589201669bd9f535e968d147015dc99e9a4b6 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	tests/shd: Break down shd mux tests into multiple .t file	Mohammed Rafi KC	2019-08-05	4	-149/+191
\| \| \| \| \| \| \| \| \| \|	Test file tests/basic/shd-mux.t was taking longer than 200 seconds in some iterations. So this patch is breaking the test case to three files Change-Id: I1430f58798f876edf6368d6f4b8b5a75f0114c31 Updates: bz#1708929 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	xdr: add code so we have more xdr functions covered	Amar Tumballi	2019-08-04	2	-0/+11
\| \| \| \| \| \|	Updates: bz#1693692 Change-Id: Ia10ccca5e1fed6c4269842ebb4d507662ca0f6a6 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	lcov: check for zerofill/discard fops on arbiter	Amar Tumballi	2019-08-01	1	-0/+32
\| \| \| \| \| \|	Updates: bz#1693692 Change-Id: I145df4c0d0b0ce738f1d34b02341ec606e38522e Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	gfapi: increase function-coverage	Amar Tumballi	2019-07-31	2	-25/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add few more mgmt functions to the coverage * While testing mgmt function, found an issue, where if the 'glfs_set_volfile_server()' is not called before calling 'glfs_unset_volfile_server()', unset would cause a crash. Null check of few variables fixes the issue, which is handled in this patch itself. * Added a test for volfile API Updates: bz#1693692 Change-Id: Iba151f8da1b64107e2f436ddbfef9da45b1c1588 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: heal-info add --xml option for more coverage	Amar Tumballi	2019-07-30	1	-0/+43
\| \| \| \| \| \|	Updates: bz#1693692 Change-Id: I13de7cf4c380b9663e6258e2fd62ce1f180591b4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	error-gen: increase coverage by reducing error-rate	Amar Tumballi	2019-07-30	1	-1/+1
\| \| \| \| \| \|	Updates: bz#1693692 Change-Id: I07371ea1e2613fed6dd9ea67c40cbb3ebcb9387e Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	trace: add more coverage by testing it with glfs-coverage too.	Amar Tumballi	2019-07-30	1	-0/+22
\| \| \| \| \| \| \| \| \| \|	make sure to provide 'log-file' option, so we can see the logs. This test does test volgen inserting the trace xlator in server graph. Updates: bz#1693692 Change-Id: I26c736b04376674b4c094d48060660421e6c983c Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	cluster/ec: fix EIO error for concurrent writes on sparse files	Xavi Hernandez	2019-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1730715 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	ctime: Set mdata xattr on legacy files	Kotresh HR	2019-07-22	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The files which were created before ctime enabled would not have "trusted.glusterfs.mdata"(stores time attributes) xattr. Upon fops which modifies either ctime or mtime, the xattr gets created with latest ctime, mtime and atime, which is incorrect. It should update only the corresponding time attribute and rest from backend Solution: Creating xattr with values from brick is not possible as each brick of replica set would have different times. So create the xattr upon successful lookup if the xattr is not created Note To Reviewers: The time attributes used to set xattr is got from successful lookup. Instead of sending the whole iatt over the wire via setxattr, a structure called mdata_iatt is sent. The mdata_iatt contains only time attributes. Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 fixes: bz#1593542 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep: Test case for upgrading config file	Shwetha K Acharya	2019-07-22	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Added test case for the patch https://review.gluster.org/#/c/glusterfs/+/22894/4 Also updated if else structure in gsyncdconfig.py to avoid repeated occurance of values in new configfile. fixes: bz#1707731 Change-Id: If97e1d37ac52dbd17d47be6cb659fc5a3ccab6d7 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	tests: Fix bug-1717819-metadata-split-brain-detection.t failure	karthik-us	2019-07-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: tests/bugs/replicate/bug-1717819-metadata-split-brain-detection.t fails intermittently in test cases #49 & #50, which compare the values of the user set xattr values after enabling the heal. We are not waiting for the heal to complete before comparing those values, which might lead those tests to fail. Fix: Wait till the HEAL-TIMEOUT before comparing the xattr values. Also cheking for the shd to come up and the bricks to connect to the shd process in another case. Change-Id: I0e245b328da9df23ce70c5300278fad1c1d9f7ff Fixes: bz#1729847 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	glusterd: do not mark skip_locking as true for geo-rep operations	Sanju Rakonde	2019-07-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to send the commit req to peers in case of geo-rep operations even though it is a no volname operation. In commit phase peers try to set the txn_opinfo which will fail because it is a no volname operation where we don't require a commit phase. We mark skip_locking as true for no volname operations, but we have to give an exception to geo-rep operations, so that they can set txn_opinfo in commit phase. Please refer to detailed RCA at the bug: 1729463 fixes: bz#1729463 Change-Id: I9f2478b12a281f6e052035c0563c40543493a3fc Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	cluster/afr: Fix incorrect reporting of gfid & type mismatch	karthik-us	2019-07-12	1	-0/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problems: 1. When checking for type and gfid mismatch, if the type or gfid is unknown because of missing gfid handle and the gfid xattr it will be reported as type or gfid mismatch and the heal will not complete. 2. If the source selected during entry heal has null gfid the same will be sent to afr_lookup_and_heal_gfid(). In this function when we try to assign the gfid on the bricks where it does not exist, we are considering the same gfid and try to assign that on those bricks. This will fail in posix_gfid_set() since the gfid sent is null. Fix: If the gfid sent to afr_lookup_and_heal_gfid() is null choose a valid gfid before proceeding to assign the gfid on the bricks where it is missing. In afr_selfheal_detect_gfid_and_type_mismatch(), do not report type/gfid mismatch if the type/gfid is unknown or not set. Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da fixes: bz#1722507 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	quick-read: rename cache-invalidation key to avoid redundant keys	Atin Mukherjee	2019-07-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With group-metadata-cache group profile settings performance.cache-invalidation option when turned on enables both md-cache and quick-read xlator's cache-invalidation feature. While the intent of the group-metadata-cache is to set md-cache xlator's cache-invalidation feature, quick-read xlator also gets affected due to the same. While md-cache feature and it's profile existed since release-3.9, quick-read cache-invalidation was introduced in release-4 and due to this op-version mismatch on any cluster which is >= glusterfs-4 when this group profile is applied it breaks backward compatibility with the old clients. The proposed fix here is to rename the key in quick-read to 'quick-read-cache-invalidation' so that both these features have distinct identification. While this brings in by itself a backward compatibility challenge where this feature is enabled in an existing cluster and when the same is upgraded to a version where this change exists, it will lead to an unidentified old key. But as a workaround we can always ask users upgrading to release-7 version to turn off this option, upgrade the cluster and turn it back on with the new key. This needs to be documented once the patch is accepted. Fixes: bz#1698042 Change-Id: I30422ba6496208e21191a8d78ad29b2e21078664 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	glusterd/thin-arbiter: Thin-arbiter integration with GD1	Vishal Pandey	2019-06-28	2	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster volume create <VOLNAME> replica 2 thin-arbiter 1 <host1>:<brick1> <host2>:<brick2> <thin-arbiter-host>:<path-to-store-replica-id-file> [force] The changes have been made in a way that the last brick in the bricks list will be treated as the thin-arbiter. GD1 will be manipulated to consider replica count to be as 2 and continue creating the volume like any other replica 2 volume but since thin-arbiter volumes need ta-brick client xlator entries for each subvolume in fuse volfile, volfile generation is modified in a way to inject these entries seperately in the volfile for every subvolume. Few more additions - 1- Save the volinfo with new fields ta_bricks list and thin_arbiter_count. 2- Introduce a new option client.ta-brick-port to add remote-port to ta-brick xlator entry in fuse volfiles. The option can be set using the following CLI syntax - gluster volume set <VOLNAME> client.ta-brick-port <PORTNO.> 3- Volume Info will contain a Thin-Arbiter-path entry to distinguish from other replicate volumes. Change-Id: Ib434e2313b29716f32476c6c211d282c4ef39406 Updates #687 Signed-off-by: Vishal Pandey <vpandey@redhat.com>
*	posix: modify storage.reserve option to take size and percent	Sheetal Pamecha	2019-06-26	1	-17/+12
\| \| \| \| \| \| \| \| \| \| \|	* reverting changes made in https://review.gluster.org/#/c/glusterfs/+/21686/ * Now storage.reserve can take value in percent or bytes fixes: bz#1651445 Change-Id: Id4826210ec27991c55b17d1fecd90356bff3e036 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>