glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	glusterd: bulkvoldict thread is not handling all volumes	Mohit Agrawal	2019-05-27	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In commit ac70f66c5805e10b3a1072bd467918730c0aeeb4 I missed one condition to populate volume dictionary in multiple threads while brick_multiplex is enabled.Due to that glusterd is not sending volume dictionary for all volumes to peer. Solution: Update the condition in code as well as update test case also to avoid the issue Change-Id: I06522dbdfee4f7e995d9cc7b7098fdf35340dc52 fixes: bz#1711250 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests: Add changelog api tests	Kotresh HR	2019-05-27	2	-0/+135
\| \| \| \| \| \|	updates: bz#1193929 Change-Id: Iee9aab8140882069165621189741f189fb2cc884 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd/tier: remove tier related code from glusterd	Hari Gowtham	2019-05-27	4	-179/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	The handler functions are pointed to dummy functions. The switch case handling for tier also have been moved to point default case to avoid issues, if reintroduced. The tier changes in DHT still remain as such. updates: bz#1693692 Change-Id: I80d80c9a3eb862b4440a36b31ae82b2e9d92e4dc Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
*	tests: Add history api tests	Kotresh HR	2019-05-27	5	-0/+171
\| \| \| \| \| \|	updates: bz#1193929 Change-Id: Ic26ab5277f720c734f083150c1c541763dfa64aa Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	gfapi:add missng api to increase code coverage	Sheetal Pamecha	2019-05-26	1	-18/+340
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add test for async Read/Write combinations glfs_read_async/write_async glfs_pread_async/pwrite_async glfs_readv_async/writev_async glfs_preadv_async/pwritev_async ftruncate/ftruncate_async fsync/fsync_async fdatasync/fdatasync_async Updates: #655 Change-Id: I12beb97029fd60bce79650a376d8fcd8d383ef16 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	api/glfsxmp.c: minor fixes	Sheetal Pamecha	2019-05-26	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \|	* add more fops: f{get,set,list,remove}xattr(), access(), fstat(), fsetattr(), getxattr(), lgetxattr(), llistxattr(), lsetxattr(), fgetxattr() * handle some error cases (like volume not found) Updates: #655 Change-Id: I3334bdf3090eafd83a54e1be12036ea01b181089 Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	cluster/ec: honor contention notifications for partially acquired locks	Xavi Hernandez	2019-05-25	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC was ignoring lock contention notifications received while a lock was being acquired. When a lock is partially acquired (some bricks have granted the lock but some others not yet) we can receive notifications from acquired bricks, which should be honored, since we may not receive more notifications after that. Since EC was ignoring them, once the lock was acquired, it was not released until the eager-lock timeout, causing unnecessary delays on other clients. This fix takes into consideration the notifications received before having completed the full lock acquisition. After that, the lock will be releaed as soon as possible. Fixes: bz#1708156 Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	tests: Fix spurious failures in ta-write-on-bad-brick.t	Pranith Kumar K	2019-05-24	5	-17/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: afr_child_up_status_meta works only when LOOKUP on $M0 is successful. There are cases where quorum is not met and LOOKUP fails on $M0 which leads to failures similar to: grep: /mnt/glusterfs/0/.meta/graphs/active/patchy-replicate-0/private: Transport endpoint is not connected This was happening once in a while based on attribute-timeout and md-cache not serving the lookup. Fix: Find child-up status based on statedump instead. Also changed mount options to include --entry-timeout=0 and --attribute-timeout=0 updates bz#1193929 Change-Id: Ic0de72c3006d7399a5feb3e4d10d4748949b2ab3 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests: Test openfd heal doesn't truncate files	Pranith Kumar K	2019-05-24	2	-0/+218
\| \| \| \| \| \|	fixes bz#1706603 Change-Id: I0bfd30f787f157b7a54f71088f767ccfd7621208 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	tests/quick-read-with-upcall.t: increase the timeout	Amar Tumballi	2019-05-21	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Running with 2 second sleep at this place caused failures like: `not ok 14 [ 2014/ 7] < 41> 'test-message1 cat /mnt/glusterfs/1/test.txt' -> 'Got "test-message0" instead of "test-message1"'` in few runs in 100 iterations. But when increased to higher than sleep 3, have not seen any failures in 100 runs. While I don't know the exact reasons for the behavior yet, looks like this increase in wait helps to pass the regression without failures. updates: bz#1693692 Change-Id: I0610b79bea53e36de3eea6c11234b7fc9dfd6232 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: change usleep() to sleep()	Sanju Rakonde	2019-05-16	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	While running a test case the following warning messages are seen on the display. To avoid suh warnings changing usleep() to sleep(). warning: usleep is deprecated, and will be removed in near future! warning: use "sleep 0.25" instead... updates: bz#1193929 Signed-off-by: Sanju Rakonde <srakonde@redhat.com> Change-Id: I48b79ede1c70b101f654635dd4cc83e50ea55b73
*	features/shard: Fix crash during background shard deletion in a specific case	Krutika Dhananjay	2019-05-16	3	-1/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider the following case - 1. A file gets FALLOCATE'd such that > "shard-lru-limit" number of shards are created. 2. And then it is deleted after that. The unique thing about FALLOCATE is that unlike WRITE, all of the participant shards are resolved and created and fallocated in a single batch. This means, in this case, after the first "shard-lru-limit" number of shards are resolved and added to lru list, as part of resolution of the remaining shards, some of the existing shards in lru list will need to be evicted. So these evicted shards will be inode_unlink()d as part of eviction. Now once the fop gets to the actual FALLOCATE stage, the lru'd-out shards get added to fsync list. 2 things to note at this point: i. the lru'd out shards are only part of fsync list, so each holds 1 ref on base shard ii. and the more recently used shards are part of both fsync and lru list. So each of these shards holds 2 refs on base inode - one for being part of fsync list, and the other for being part of lru list. FALLOCATE completes successfully and then this very file is deleted, and background shard deletion launched. Here's where the ref counts get mismatched. First as part of inode_resolve()s during the deletion, the lru'd-out inodes return NULL, because they are inode_unlink()'d by now. So these inodes need to be freshly looked up. But as part of linking them in lookup_cbk (precisely in shard_link_block_inode()), inode_link() returns the lru'd-out inode object. And its inode ctx is still valid and ctx->base_inode valid from the last time it was added to list. But shard_common_lookup_shards_cbk() passes NULL in the place of base_pointer to __shard_update_shards_inode_list(). This means, as part of adding the lru'd out inode back to lru list, base inode is not ref'd since its NULL. Whereas post unlinking this shard, during shard_unlink_block_inode(), ctx->base_inode is accessible and is unref'd because the shard was found to be part of LRU list, although the matching ref didn't occur. This at some point leads to base_inode refcount becoming 0 and it getting destroyed and released back while some of its associated shards are continuing to be unlinked in parallel and the client crashes whenever it is accessed next. Fix is to pass base shard correctly, if available, in shard_link_block_inode(). Also, the patch fixes the ret value check in tests/bugs/shard/shard-fallocate.c Change-Id: Ibd0bc4c6952367608e10701473cbad3947d7559f Updates: bz#1696136 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	geo-rep: Fix sync hang with tarssh	Kotresh HR	2019-05-13	2	-0/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Geo-rep sync hangs when tarssh is used as sync engine at heavy workload. Analysis and Root cause: It's found out that the tar process was hung. When debugged further, it's found out that stderr buffer of tar process on master was full i.e., 64k. When the buffer was copied to a file from /proc/pid/fd/2, the hang is resolved. This can happen when files picked by tar process to sync doesn't exist on master anymore. If this count increases around 1k, the stderr buffer is filled up. Fix: The tar process is executed using Popen with stderr as PIPE. The final execution is something like below. tar \| ssh <args> root@slave tar --overwrite -xf - -C <path> It was waiting on ssh process first using communicate() and then tar. Note that communicate() reads stdout and stderr. So when stderr of tar process is filled up, there is no one to read until untar via ssh is completed. This can't happen and leads to deadlock. Hence we should be waiting on both process parallely, so that stderr is read on both processes. Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf fixes: bz#1707728 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	cli: Validate invalid slave url	Kotresh HR	2019-05-11	1	-0/+4
\| \| \| \| \| \| \| \| \|	This patch validates the invalid slave url in cli itself and throws appropriate error. fixes: bz#1098991 Change-Id: I278e2a04a4d619d2c2d1db0dd56ab5bdf7e7f469 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd: Add gluster volume stop operation to glusterd_validate_quorum()	Vishal Pandey	2019-05-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ISSUE: gluster volume stop succeeds even if quorum is not met. Fix: Add GD_OP_STOP_VOLUME to gluster_validate_quorum in glusterd_mgmt_v3_pre_validate (). Since the volume stop command has been ported from synctask to mgmt_v3, the quorum check was missed out. Change-Id: I7a634ad89ec2e286ea262d7952061efad5360042 fixes: bz#1690753 Signed-off-by: Vishal Pandey <vpandey@redhat.com>
*	tests: fix bug-1319374.c compile warnings.	Ravishankar N	2019-05-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was looking at a downstream failure of bug-1319374-THIS-crash.t when I saw the compiler was throwing a warning while running the test: tests/bugs/gfapi/bug-1319374.c:17:61: warning: implicit declaration of function ‘strerror’; did you mean ‘perror’? [-Wimplicit-function-declaration] fprintf(stderr, "\nglfs_new: returned NULL (%s)\n", strerror(errno)); ^~~~~~~~ perror So I compiled the .c with -Wall and saw a lot many more warnings, all due of a missing header. This patch fixes it. fixes: bz#1708163 Change-Id: I8b6dd8e1404178a3d99b2d92d01f4575f5203e58 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	shd/glusterd: Serialize shd manager to prevent race condition	Mohammed Rafi KC	2019-05-10	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \|	At the time of a glusterd restart, while doing a handshake there is a possibility that multiple shd manager might get executed. Because of this, there is a chance that multiple shd get spawned during a glusterd restart Change-Id: Ie20798441e07d7d7a93b7d38dfb924cea178a920 fixes: bz#1707081 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	tests: improve and fix some test scripts	Xavier Hernandez	2019-05-09	18	-69/+162
\| \| \| \| \| \|	Change-Id: Iceefe22af754096c599dc570d4894d14fce4deae Updates: bz#1193929 Signed-off-by: Xavier Hernandez <xhernandez@redhat.com>
*	geo-rep: Fix sync-method config	Kotresh HR	2019-05-09	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When 'use_tarssh' is set to true, it exits with successful message but the default 'rsync' was used as sync-engine. The new config 'sync-method' is not allowed to set from cli. Analysis and Fix: The 'use_tarssh' config is deprecated with new config framework and 'sync-method' is the new config to choose sync-method i.e. tarssh or rsync. This patch fixes the 'sync-method' config. The allowed values are tarssh and rsync. Change-Id: I0edb0319cad0455b29e49f2f08a64ce324735e84 fixes: bz#1707686 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	tests/geo-rep: Fix arequal checksum comparison	Kotresh HR	2019-05-09	5	-9/+10
\| \| \| \| \| \| \| \| \|	The arequal checkusm comparison was always returning as successful, eventhough, if it was not. Fixed the same. Change-Id: I5083da25c0954126e452d06311d2d376f8540555 fixes: bz#1707742 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	tests: enhance the auth.allow test to validate all failures of 'login' module	Amar Tumballi	2019-05-08	1	-4/+49
\| \| \| \| \| \| \| \|	now the enhanced test covers most of the code in auth.login and auth.addr module. updates: bz#1693692 Change-Id: I1f43c7dc414e2e4d443a93e9a37051359fd46ea4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	dht: Custom xattrs are not healed in case of add-brick	root	2019-05-08	1	-0/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If any custom xattrs are set on the directory before add a brick, xattrs are not healed on the directory after adding a brick. Solution: xattr are not healed because dht_selfheal_dir_mkdir_lookup_cbk checks the value of MDS and if MDS value is not negative selfheal code path does not take reference of MDS xattrs.Change the condition to take reference of MDS xattr so that custom xattrs are populated on newly added brick Updates: bz#1702299 Change-Id: Id14beedb98cce6928055f294e1594b22132e811c Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests: delete the snapshots and the volume after the tests	Raghavendra Bhat	2019-05-06	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \|	In uss.t multiple snapshots are taken and after all the tests things are left for the cleanup () function to get removed. Instead of that, delete the snapshots and the volume once all the tests are over so that cleanup operation becomes relatively a light operation. Change-Id: I2342740bbb185cd6c9a450eb3b4f5cbbba78974c fixes: bz#1704888 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	tests: validate volfile grammar - strings in volfile	Amar Tumballi	2019-05-06	1	-0/+73
\| \| \| \| \| \| \| \|	* libglusterfs/graph-print: remove unused code updates: bz#1693692 Change-Id: Iae81bb6a3af5911c3da07ab8f1d8f58f27e06905 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests/cli: add .t file to increase line coverage in cli	Sanju Rakonde	2019-05-02	1	-0/+21
\| \| \| \| \| \| \|	updates: bz#1693692 Change-Id: Ib188c5fddea8c762e89ff15aa83b08c35cdb21e1 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	tests: add .t files to increase cli code coverage	rishubhjain	2019-05-02	2	-2/+3
\| \| \| \| \| \| \| \|	different volume profile sub options are added in the test. Change-Id: I93100c37f51afc10870e60b91fcd86e7859e734a updates: bz#1693692 Signed-off-by: rishubhjain <rishubhjain47@gmail.com>
*	tests: Add changelog snapshot testcase	Kotresh HR	2019-05-02	1	-0/+60
\| \| \| \| \| \| \| \| \| \|	Add testcase to test snapshot creation while I/O is happening with changelog enabled. updates: bz#1193929 Change-Id: Ice4cb596286c583ed7308484d65902007a48396c Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	nl-cache:add test to increase code coverage	Sheetal Pamecha	2019-04-29	1	-0/+30
\| \| \| \| \| \|	Change-Id: Ie0a5c522dfa0123ca45f9decf5015d39b92cb0f3 updates: bz#1693692 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	performance/decompounder: remove the translator as the feature is not used ↵	Amar Tumballi	2019-04-29	1	-6/+1
\| \| \| \| \| \| \| \|	anymore updates: bz#1693692 Change-Id: Id5932b11e115ca6da1c2bfff7ae1460787109e06 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterd: define dumpops in the xlator_api of glusterd	Sanju Rakonde	2019-04-27	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: statedump is not capturing information related to glusterd Solution: statdump is not capturing glusterd info because trav->dumpops is null in gf_proc_dump_single_xlator_info () where trav is glusterd xlator object. trav->dumpops is null because we missed to define dumpops in xlator_api of glusterd. defining dumpops in xlator_api of glusterd fixes the issue. fixes: bz#1703629 Change-Id: If85429ecb1ef580aced8d5b88d09fc15258bfc4c Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	geo-rep: Fix rename with existing destination with same gfid	Sunny Kumar	2019-04-26	5	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Geo-rep fails to sync the rename properly if destination exists. It results in source to be remained on slave causing more number of files on slave. Also heavy rename workload like logrotate caused lot of ESTALE errors Cause: Geo-rep fails to sync rename if destination exists if creation of source file also falls into single batch of changelogs being processed. This is because, after fixing problematic gfids verifying from master, while re-processing original entries, CREATE also was re-processed causing more files on slave and rename to be failed. Solution: Entries need to be removed from retrial list after fixing problematic gfids on slave so that it's not re-created again on slave. Also treat ESTALE as EEXIST so that the error is properly handled verifying the op on master volume. Change-Id: I50cf289e06b997adddff0552bf2466d9201dd1f9 fixes: bz#1694820 Signed-off-by: Kotresh HR <khiremat@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	features/bit-rot: Unconditionally sign the files during oneshot crawl	Raghavendra Bhat	2019-04-25	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently bit-rot feature has an issue with disabling and reenabling it on the same volume. Consider enabling bit-rot detection which goes on to crawl and sign all the files present in the volume. Then some files are modified and the bit-rot daemon goes on to sign the modified files with the correct signature. Now, disable bit-rot feature. While, signing and scrubbing are not happening, previous checksums of the files continue to exist as extended attributes. Now, if some files with checksum xattrs get modified, they are not signed with new signature as the feature is off. At this point, if the feature is enabled again, the bit rot daemon will go and sign those files which does not have any bit-rot specific xattrs (i.e. those files which were created after bit-rot was disabled). Whereas the files with bit-rot xattrs wont get signed with proper new checksum. At this point if scrubber runs, it finds the on disk checksum and the actual checksum of the file to be different (because the file got modified) and marks the file as corrupted. FIX: The fix is to unconditionally sign the files when the bit-rot daemon comes up (instead of skipping the files with bit-rot xattrs). Change-Id: Iadfb47dd39f7e2e77f22d549a4a07a385284f4f5 fixes: bz#1700078 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	tests/geo-rep: Add pause and resume test case for geo-rep	Shwetha K Acharya	2019-04-24	2	-0/+12
\| \| \| \| \| \| \| \|	Added pause and resume test case for geo-rep fixes: bz#1696077 Change-Id: Ib6fcc1926c3be1263bca1235194f737b895c8333 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	tests: add .t files to increase cli code coverage	rishubhjain	2019-04-24	1	-0/+62
\| \| \| \| \| \| \| \| \|	Tests added for gluster volume top and profile with and without xml output Change-Id: I66aa6390b53ca448014059a3d27dc72e405216d2 updates: bz#1693692 Signed-off-by: rishubhjain <rishubhjain47@gmail.com>
*	tests: add .t file to increase cli code coverage	Sanju Rakonde	2019-04-24	3	-1/+97
\| \| \| \| \| \| \|	updates: bz#1693692 Change-Id: I848e622d7b8562e864f0e208aafdc21d9cb757d3 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	cluster/ec: fix fd reopen	Xavi Hernandez	2019-04-23	2	-11/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently EC tries to reopen fd's that have been opened while a brick was down. This is done as part of regular write operations, just after having acquired the locks, and it's sent as a sub-fop of the main write fop. There were two problems: 1. The reopen was attempted on all UP bricks, even if a previous lock didn't succeed. This is incorrect because most probably the open will fail. 2. If reopen is sent and fails, the error is propagated to the main operation, causing it to fail when it shouldn't. To fix this, we only attempt reopens on bricks where the current fop owns a lock, and we prevent any error to be propagated to the main fop. To implement this behaviour an argument used to indicate the minimum number of required answers has overloaded to also include some flags. To make the change consistent, it has been necessary to rename the argument, which means that a lot of files have been changed. However there are no functional changes. This change has also uncovered a problem in discard code, which didn't correctely process requests of small sizes because no real discard fop was being processed, only a write of 0's on some region. In this case some fields of the fop remained uninitialized or with incorrect values. To fix this, a new function has been created to simulate success on a fop and it's used in the discard case. Thanks to Pranith for providing a test script that has also detected an issue in this patch. This patch includes a small modification of this script to force data to be written into bricks before stopping them. Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec Fixes: bz#1699866 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	extras/hooks: syntactical errors in SELinux hooks, scipt logic improved	Milan Zink	2019-04-18	1	-1/+3
\| \| \| \| \| \|	Fixes: bz#1542072 Change-Id: Ia5fa1df81bbaec3a84653d136a331c76b457f42c Signed-off-by: Milan Zink <zeten30@gmail.com>
*	tests: Heal should fail when read/write fails	Pranith Kumar K	2019-04-16	1	-0/+65
\| \| \| \| \| \|	updates: bz#1699866 Change-Id: I7ccd1fc5fc134eeb6d443c755962a20819320d48 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	glusterd: Optimize glusterd handshaking code path	Mohit Agrawal	2019-04-15	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: At the time of handshaking glusterd populate volume data in a dictionary.While no. of volumes are configured more than 1500 glusterd takes more than 10 min to generated the data.Due to taking more time rpc request times out and rpc start bailing of call frames. Solution: To optimize the code done below changes 1) Spawn multiple threads to populate volumes data in bulk in separate dictionary and introduce an option glusterd.brick-dict-thread-count to configure no. of threads to populate volume data. 2) Populate tier data only while volume type is tier 3) Compare snap data only while snap_count is non zero Fixes: bz#1699339 Change-Id: I38dc71970c049217f9d1a06fc0aaf4c26eab18f5 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	cluster/afr: Remove local from owners_list on failure of lock-acquisition	Pranith Kumar K	2019-04-15	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \|	When eager-lock lock acquisition fails because of say network failures, the local is not being removed from owners_list, this leads to accumulation of waiting frames and the application will hang because the waiting frames are under the assumption that another transaction is in the process of acquiring lock because owner-list is not empty. Handled this case as well in this patch. Added asserts to make it easier to find these problems in future. fixes bz#1696599 Change-Id: I3101393265e9827755725b1f2d94a93d8709e923 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	core: Log level changes do not effect on running client process	Mohit Agrawal	2019-04-15	1	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level per xlator during reconfigure only for a brick process not for the client process. Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure about brick multiplex introudce a flag brick_mux at ctx->cmd_args. Note: There are two other changes done with this patch 1) Ignore client-log-level option to attach a brick with already running brick if brick_mux is enabled 2) Add a log to print pid of the running process to make easier debugging Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> Fixes: bz#1696046
*	posix/ctime: Fix stat(time attributes) inconsistency during readdirp	Kotresh HR	2019-04-15	2	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Creation of tar file on gluster volume throws warning 'file changed as we read it' Cause: During readdirp, for few of the files whose inode is not present, time attributes were served from backend. This caused the ctime of few files to be different between before readdir and after readdir by tar. Solution: If ctime feature is enabled and inode is not present, don't serve the time attributes from backend file, serve it from xattr. fixes: bz#1698078 Change-Id: I427ef865f97399475faf5aa6ca495f7e317603ae Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	core: Brick is not able to detach successfully in brick_mux environment	Mohit Agrawal	2019-04-14	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In brick_mux environment, while volumes are stopped in a loop bricks are not detached successfully. Brick's are not detached because xprtrefcnt has not become 0 for detached brick. At the time of initiating brick detach process server_notify saves xprtrefcnt on detach brick and once counter has become 0 then server_rpc_notify spawn a server_graph_janitor_threads for cleanup brick resources.xprtrefcnt has not become 0 because socket framework is not working due to assigning 0 as a fd for socket. In commit dc25d2c1eeace91669052e3cecc083896e7329b2 there was a change in changelog fini to close htime_fd if htime_fd is not negative, by default htime_fd is 0 so it close 0 also. Solution: Initialize htime_fd to -1 after just allocate changelog_priv by GF_CALLOC Fixes: bz#1699025 Change-Id: I5f7ca62a0eb1c0510c3e9b880d6ab8af8d736a25 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	tests/dht: Test that lookups are sent post brick up	N Balachandran	2019-04-12	1	-0/+83
\| \| \| \| \| \|	Change-Id: I3556793c5e9d58cc6a08644b41dc5740fab2610b updates: bz#1628194 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	test: Change glustershd_pid update in .t file	Mohit Agrawal	2019-04-12	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	Problem: bug-1650403.t && bug-858215.t are throwing error at the time of access glustershd pidfile Solution: Use ps command to findout glustershd pid Change-Id: I3477345b6220aa039e012e674cba21d741e9abab fixes: bz#1697486 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	tests: make sure to traverse all of meta dir	Amar Tumballi	2019-04-12	1	-0/+27
\| \| \| \| \| \| \| \|	Just to make all files will be listed, which means we have max code-coverage updates: bz#1693692 Change-Id: I11d36ac2f4d6d4fb91223aacd423ad23242eb454 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: correctly check open fd's when gfid is missing	Xavi Hernandez	2019-04-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The helper funcion get_fd_count() returns how many open fd's has a given gfid on a brick. It could happen that the brick doesn't have information about that inode because it has not been previously accessed. Before this patch, the function returned "" when the inode was not present. This caused basic/ec/ec-fix-openfd.t test to fail because it was expecting '0' as the result. This patch forces get_fd_count() to return '0' when the gfid is not present in the state dump. Change-Id: I848b57744e96656bf81fbb7b126a5faf44e535eb updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	mgmt/glusterd: Make changes related to cloudsync xlator	Anuradha Talur	2019-04-10	2	-0/+69
\| \| \| \| \| \| \| \| \| \|	1) The placement of cloudsync xlator has been changed to make it shard xlator's child. If cloudsync has to work with shard in the graph, it needs to be child of shard. Change-Id: Ib55424fdcb7ce8edae9f19b8a6e3d3ba86c1f0c4 fixes: bz#1642168 Signed-off-by: Anuradha Talur <atalur@commvault.com>
*	protocol: add an option to force using old-protocol	Amar Tumballi	2019-04-10	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	As protocol implements every fop, and in general a large part of the codebase. Considering our regression is run mostly in 1 machine, there was no way of forcing the client to use old protocol (while new one is available). With this patch, a new 'testing' option is provided which forces client to use old protocol if found. This should help increase the code coverage by at least 10k lines overall. updates: bz#1693692 Change-Id: Ie45256f7dea250671b689c72b4b6f25037cef948 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	ec: increase line coverage of ec	Xavi Hernandez	2019-04-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	Test ec-cpu-extensions.t has been modified so that it uses a bigger matrix. This makes use of more functions from ec-code-c.c. Changing read-policy to round-robin increases even more the functions used, reaching 100% of line and function coverage for this file. Change-Id: I26e4d33269cbd67f5d76d862f4cf1e69285e85e1 updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>