summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* gnfs: support inode dumpXie Changlong2019-06-141-0/+16
| | | | | | | | So, we will get more debug info. fixes: #679 Change-Id: I3588e204ad25c20b69271c1a4ee17d0d158bd794 Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
* upcall: Avoid sending notifications for invalid inodesSoumya Koduri2019-06-141-1/+18
| | | | | | | | | | | | | | | | | | | For nameless LOOKUPs, server creates a new inode which shall remain invalid until the fop is successfully processed post which it is linked to the inode table. But incase if there is an already linked inode for that entry, it discards that newly created inode which results in upcall notification. This may result in client being bombarded with unnecessary upcalls affecting performance if the data set is huge. This issue can be avoided by looking up and storing the upcall context in the original linked inode (if exists), thus saving up on those extra callbacks. Change-Id: I044a1737819bb40d1a049d2f53c0566e746d2a17 fixes: bz#1718338 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* gfapi: fix incorrect initialization of upcall syncop argumentsSoumya Koduri2019-06-141-37/+72
| | | | | | | | | | While sending upcall notifications via synctasks, the argument used to carry relevant data for these tasks is not initialized properly. This patch is to fix the same. Change-Id: I9fa8f841e71d3c37d3819fbd430382928c07176c fixes: bz#1718316 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* cli: Remove-brick warning seems unnecessaryShwetha K Acharya2019-06-121-8/+9
| | | | | | | | | | | As force-migration option is disabled by default, the warning seems unnessary. Rephrased the warning to make best sense out of it. fixes: bz#1712668 Change-Id: Ia18c3c5e7b3fec808fce2194ca0504a837708822 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* extras/hooks: Add SELinux label on new bricks during add-brickAnoop C S2019-06-121-0/+100
| | | | | | Change-Id: Ifd8ae5eeb91b968cc1a9a9b5d15844c5233d56db fixes: bz#1717953 Signed-off-by: Anoop C S <anoopcs@redhat.com>
* geo-rep : fix mountbroker setupSunny Kumar2019-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Problem: Unable to setup mountbroker root directory while creating geo-replication session for non-root user. Casue: With patch[1] which defines the max-port for glusterd one extra sapce got added in field of 'option max-port'. [1]. https://review.gluster.org/#/c/glusterfs/+/21872/ In geo-rep spliting of key-value pair form vol file was done on the basis of space so this additional space caused "ValueError: too many values to unpack". Solution: Use split so that it can treat consecutive whitespace as a single separator. Fixes: bz#1709248 Change-Id: Ia22070a43f95d66d84cb35487f23f9ee58b68c73 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* libglusterfs: cleanup iovec functionsXavi Hernandez2019-06-117-114/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch cleans some iovec code and creates two additional helper functions to simplify management of iovec structures. iov_range_copy(struct iovec *dst, uint32_t dst_count, uint32_t dst_offset, struct iovec *src, uint32_t src_count, uint32_t src_offset, uint32_t size); This function copies up to 'size' bytes from 'src' at offset 'src_offset' to 'dst' at 'dst_offset'. It returns the number of bytes copied. iov_skip(struct iovec *iovec, uint32_t count, uint32_t size); This function removes the initial 'size' bytes from 'iovec' and returns the updated number of iovec vectors remaining. The signature of iov_subset() has also been modified to make it safer and easier to use. The new signature is: iov_subset(struct iovec *src, int src_count, uint32_t start, uint32_t size, struct iovec **dst, int32_t dst_count); This function creates a new iovec array containing the subset of the 'src' vector starting at 'start' with size 'size'. The resulting array is allocated if '*dst' is NULL, or copied to '*dst' if it fits (based on 'dst_count'). It returns the number of iovec vectors used. A new set of functions to iterate through an iovec array have been created. They can be used to simplify the implementation of other iovec-based helper functions. Change-Id: Ia5fe57e388e23392a8d6cdab17670e337cadd587 Updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* tests: keep glfsxmp in tests directoryAmar Tumballi2019-06-1110-17/+1821
| | | | | | | | | | | | | this is critical so all the tests will be contained in the same directory, and one can just 'cp -a tests/ <any-location>/' and run glusterfs tests. only 'glfsxmp.c' was an exception as it was just copying the file from api example directory. Now moved it to tests. updates: bz#1193929 Change-Id: I00359d64be580bffc5b3c3a090968d86c2c6952a Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests: Fix split-brain-favorite-child-policy.t failurekarthik-us2019-06-101-3/+4
| | | | | | | | | | | | | | | | | | Problem: The test case is failing to heal the volume within $HEAL_TIMEOUT @195. This is happening because as part of split-brain resolution the file gets expunged from the sink and the new entry mark for that file will be done on the source bricks as part of impunging. Since the source bricks shd-threads failed to get the heal-domain lock, they will wait for the heal-timeout of 10 minutes, which is greater than $HEAL_TIMEOUT. Fix: Set the cluster.heal-timeout to 5 seconds to trigger the heal so that one of the source brick heals the file within the $HEAL_TIMEOUT. Change-Id: Ie73c578cc5361c0d617a48ccc86026734d20ba8c fixes: bz#1718998 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* Cluster/afr: Don't treat all bricks having metadata pending as split-brainkarthik-us2019-06-104-67/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in metadata split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others for metadata. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks. Fix: Pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. This patch also adds restriction of all the bricks to be up to perform metadata heal to avoid any metadata loss. Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t as it was doing metadata heal even when only 2 of 3 bricks were up. Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09 fixes: bz#1717819 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* tests: added cleanup for lock filesSunny Kumar2019-06-101-0/+35
| | | | | | | | | | | | | | | | | | | | Problem: useradd fails with: Cannot allocate memory useradd: cannot lock /etc/passwd; try again later. Solution: Lock files should get automatically removed once "usradd" or "groupadd" command finishes. But sometimes we encounter situations (bugs) where some of these files may not get properly unlocked after the execution of the command. In that case, when we execute useradd next time, it may show the error “cannot lock /etc/password” or “unable to lock group file”. So, to avoid any such errors, check for any lock files under /etc and remove those. updates: bz#1193929 Change-Id: If6456a271c2bc0717f768d7101a40ce44a9af3d7 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* features/shard: Fix extra unref when inode object is lru'd out and added backKrutika Dhananjay2019-06-092-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Long tale of double unref! But do read... In cases where a shard base inode is evicted from lru list while still being part of fsync list but added back soon before its unlink, there could be an extra inode_unref() leading to premature inode destruction leading to crash. One such specific case is the following - Consider features.shard-deletion-rate = features.shard-lru-limit = 2. This is an oversimplified example but explains the problem clearly. First, a file is FALLOCATE'd to a size so that number of shards under /.shard = 3 > lru-limit. Shards 1, 2 and 3 need to be resolved. 1 and 2 are resolved first. Resultant lru list: 1 -----> 2 refs on base inode - (1) + (1) = 2 3 needs to be resolved. So 1 is lru'd out. Resultant lru list - 2 -----> 3 refs on base inode - (1) + (1) = 2 Note that 1 is inode_unlink()d but not destroyed because there are non-zero refs on it since it is still participating in this ongoing FALLOCATE operation. FALLOCATE is sent on all participant shards. In the cbk, all of them are added to fync_list. Resulting fsync list - 1 -----> 2 -----> 3 (order doesn't matter) refs on base inode - (1) + (1) + (1) = 3 Total refs = 3 + 2 = 5 Now an attempt is made to unlink this file. Background deletion is triggered. The first $shard-deletion-rate shards need to be unlinked in the first batch. So shards 1 and 2 need to be resolved. inode_resolve fails on 1 but succeeds on 2 and so it's moved to tail of list. lru list now - 3 -----> 2 No change in refs. shard 1 is looked up. In lookup_cbk, it's linked and added back to lru list at the cost of evicting shard 3. lru list now - 2 -----> 1 refs on base inode: (1) + (1) = 2 fsync list now - 1 -----> 2 (again order doesn't matter) refs on base inode - (1) + (1) = 2 Total refs = 2 + 2 = 4 After eviction, it is found 3 needs fsync. So fsync is wound, yet to be ack'd. So it is still inode_link()d. Now deletion of shards 1 and 2 completes. lru list is empty. Base inode unref'd and destroyed. In the next batched deletion, 3 needs to be deleted. It is inode_resolve()able. It is added back to lru list but base inode passed to __shard_update_shards_inode_list() is NULL since the inode is destroyed. But its ctx->inode still contains base inode ptr from first addition to lru list for no additional ref on it. lru list now - 3 refs on base inode - (0) Total refs on base inode = 0 Unlink is sent on 3. It completes. Now since the ctx contains ptr to base_inode and the shard is part of lru list, base shard is unref'd leading to a crash. FIX: When shard is readded back to lru list, copy the base inode pointer as is into its inode ctx, even if it is NULL. This is needed to prevent double unrefs at the time of deleting it. Change-Id: I99a44039da2e10a1aad183e84f644d63ca552462 Updates: bz#1696136 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* ec/fini: Fix race between xlator cleanup and on going async fopMohammed Rafi KC2019-06-086-15/+56
| | | | | | | | | | | | | | | | Problem: While we process a cleanup, there is a chance for a race between async operations, for example ec_launch_replace_heal. So this can lead to invalid mem access. Solution: Just like we track on going heal fops, we can also track fops like ec_launch_replace_heal, so that we can decide when to send a PARENT_DOWN request. Change-Id: I055391c5c6c34d58aef7336847f3b570cb831298 fixes: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* xlator/log: Add more logging in xlator_is_cleanup_startingMohammed Rafi KC2019-06-081-3/+9
| | | | | | | | This patch will add two extra logs for invalid argument Change-Id: I3950b4f4b9d88b1f1e788ef93d8f09d4bd8d4d8b updates: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* Fixing formatting errors in markdown fileskshithijiyer2019-06-0819-468/+474
| | | | | | | | | | There are a lot of fromatting error is markdown files peresent under /doc directiory of the project. Fixing formatting errors and sending a patch. Fixes: bz#1718273 Change-Id: I08f938088bbaaafddf634f73616ea0dbfe7aedf3 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* uss: Ensure that snapshot is deleted before creating a new snapshotRaghavendra Bhat2019-06-084-3/+29
| | | | | | | | * Also some logging enhancements in snapview-server Change-Id: I6a7646771cedf4bd1c62806eea69d720bbaf0c83 fixes: bz#1715921 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/replicate: Modify command in unit file to assign port correctlyAshish Pandey2019-06-082-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In unit file of TA process we have been using ta-vol as volume id and also ta-vol-server.transport.socket.listen-port=24007 In volume file for TA process we only consider volname as "ta" and not as "ta-vol". That's why it was not able to assign this port number to ta process as in volume file it will try to find server xlator as ta-vol volume ta-server <<<<<<<<< not ta-vol-server 46 type protocol/server 47 option transport.listen-backlog 10 48 option transport.socket.keepalive-count 9 49 option transport.socket.keepalive-interval 2 50 option transport.socket.keepalive-time 20 51 option transport.tcp-user-timeout 0 52 option transport.socket.keepalive 1 53 option auth.addr./mnt/thin-arbiter.allow * 54 option auth-path /mnt/thin-arbiter 55 option transport.address-family inet 56 option transport-type tcp 57 subvolumes ta-io-stats 58 end-volume Solution: Provide "ta" as vol id for the command which Unit file is going to execute. Also, made changes in setup-thin-arbiter.sh to correctly identify the directory of Unit file irrespective of the location from where we are executing this script. Change-Id: Ia7bbccdc0304e7dfaaa732bebb726fba731d1d33 fixes: bz#1716766 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* glusterd: store fips-mode-rchecksum option in the info fileRavishankar N2019-06-081-0/+11
| | | | | | | | | | | commit 146e4b45d0ce906ae50fd6941a1efafd133897ea enabled storage.fips-mode-rchecksum option for all new volumes with op-version >=GD_OP_VERSION_7_0 but `gluster vol get $volname storage.fips-mode-rchecksum` was displaying it as 'off'. This patch fixes it. fixes: bz#1717782 Change-Id: Ie09f89838893c5776a3f60569dfe8d409d1494dd Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* glusterd: Fix typosAnoop C S2019-06-071-1/+1
| | | | | | Change-Id: I8cf0a153f84ef2d162e6dd03261441d211c07d40 updates: bz#1193929 Signed-off-by: Anoop C S <anoopcs@redhat.com>
* tests/utils: Fix py2/py3 util python scriptsKotresh HR2019-06-077-30/+258
| | | | | | | | | | | | | | Following files are fixed. tests/bugs/distribute/overlap.py tests/utils/changelogparser.py tests/utils/create-files.py tests/utils/gfid-access.py tests/utils/libcxattr.py Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 updates: bz#1193929 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* tests/quick-read-upcall: mark it badAmar Tumballi2019-06-071-0/+5
| | | | | | | | | | | | | | | | Frequent intermittent failures observed. ``` 08:59:24 ok 11 [ 10/ 3] < 36> 'write_to /mnt/glusterfs/0/test.txt test-message1' 08:59:24 ok 12 [ 10/ 6] < 37> 'test-message1 cat /mnt/glusterfs/0/test.txt' 08:59:24 ok 13 [ 10/ 4] < 38> 'test-message0 cat /mnt/glusterfs/1/test.txt' 08:59:24 not ok 14 [ 3715/ 6] < 45> 'test-message1 cat /mnt/glusterfs/1/test.txt' -> 'Got "test-message0" instead of "test-message1"' 08:59:24 ok 15 [ 10/ 162] < 47> 'gluster --mode=script --wignore volume set patchy features.cache-invalidation on' 08:59:24 ok 16 [ 10/ 148] < 48> 'gluster --mode=script --wignore volume set patchy performance.qr-cache-timeout 15' ``` updates: bz#1718191 Change-Id: Ieb9e5a9a428995ff178f77bc4a5155b8298d3fa0 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests/volume-scale-shd-mux: mark as bad testAmar Tumballi2019-06-071-0/+3
| | | | | | | | | | | | The test is giving frequent failures in regression. Error seen is normally like below: `09:09:24 not ok 58 [ 14/ 80343] < 104> '^3$ number_healer_threads_shd patchy_distribute1 __afr_shd_healer_wait' -> 'Got "1" instead of "^3$"'` updates: bz#1708929 Change-Id: I240bdcfb76b1f953d75937a53c5dfabba134f282 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests/shd: Add test coverage for shd muxMohammed Rafi KC2019-06-064-0/+372
| | | | | | | | | | | | | | | | | | | | This patch add more test cases for shd mux test cases The test case includes 1) Createing multiple volumes to check the attach and detach of self heal daemon requests. 2) Make sure the healing happens in all sceanarios 3) After a volume detach make sure the threads of the detached volume is all cleaned. 4) Repeat all the above tests for ec volume 5) Node Reboot case 6) glusterd restart cases 7) Add-brick/remove brick 8) Convert a distributed volume to disperse volume 9) Convert a replicated volume to distributed volume Change-Id: I7c317ef9d23a45ffd831157e4890d7c83a8fce7b fixes: bz#1708929 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* tests/geo-rep: Add geo-rep cli testcasesKotresh HR2019-06-062-0/+19
| | | | | | Change-Id: Icf93b90bcac022a355d4718220698987dbc91ecf Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1693692
* cluster/dht: Fix directory perms during selfhealN Balachandran2019-06-051-3/+5
| | | | | | | | | Fixed a bug in the revalidate code path that wiped out directory permissions if no mds subvol was found. Change-Id: I8b4239ffee7001493c59d4032a2d3062586ea115 fixes: bz#1716830 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* across: clang-scan: fix NULL dereferencing warningsAmar Tumballi2019-06-0410-26/+39
| | | | | | | | | All these checks are done after analyzing clang-scan report produced by the CI job @ https://build.gluster.org/job/clang-scan updates: bz#1622665 Change-Id: I590305af4ceb779be952974b2a36066ffc4865ca Signed-off-by: Amar Tumballi <amarts@redhat.com>
* lcov: run more fops on translatorsAmar Tumballi2019-06-046-2/+159
| | | | | | | | | | | | | | Translators covered: * playground/template * debug/delay-gen * debug/error-gen * features/namespace * features/quiesce * meta updates: bz#1693692 Change-Id: Ic8fde8efcb309ea492d8e819241f786f7ff467a1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* features/shard: Fix block-count accounting upon truncate to lower sizeKrutika Dhananjay2019-06-045-13/+92
| | | | | | | | | | | | | | | | | The way delta_blocks is computed in shard is incorrect, when a file is truncated to a lower size. The accounting only considers change in size of the last of the truncated shards. FIX: Get the block-count of each shard just before an unlink at posix in xdata. Their summation plus the change in size of last shard (from an actual truncate) is used to compute delta_blocks which is used in the xattrop for size update. Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53 fixes: bz#1705884 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* tests/geo-rep: Add geo-rep glusterd test casesKotresh HR2019-06-043-0/+223
| | | | | | | | | | 1. Add geo-rep fanout test case 2. Add glusterd geo-rep negative test cases 3. Add glusterd geo-rep config test cases Change-Id: I856c087eb3216d8f0ffd1f266deac88e9a4effec Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1693692
* tests/geo-rep: Remove a rename test case on EC volumeKotresh HR2019-06-042-5/+5
| | | | | | | | | | Rename with existing name testcase is occasionaly failing on EC volume. Hence commenting the same until it's analysed Change-Id: Icb2ad189b9e4d12101e8f5abcb8a033181360386 Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1193929
* glusterd: coverity fixMohit Agrawal2019-06-041-5/+11
| | | | | | | | | | 1401716: Resource leak 1401714: Dereference before null check updates: bz#789278 Change-Id: I8fb0b143a1d4b37ee6be7d880d9b5b84ba00bf36 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* glusterd/tier: gluster upgrade broken because of tierhari gowtham2019-06-031-0/+9
| | | | | | | | | | | | | | | | | | | | Problem: While tier code was removed, the is_tier_enabled related to tier wasn't handled for upgrade. As this option was missing in the info file, the checksum mismatch issue happens during upgrade. This results in the peer rejections happening. Fix: use the op_version check and note down the is_tier_enabled always. This way it will be dummy key, but the future upgrades will work fine. NOTE: Just having the key from 3.10 to 7 will cause issues when upgraded from 5 to 8 or any such upgrade which skips the version where we handle it. Change-Id: I9951e2b74f16e58e884e746c34dcf53e559c7143 fixes: bz#1714973 Signed-off-by: hari gowtham <hgowtham@redhat.com>
* lcov: improve line coverageAmar Tumballi2019-06-033-117/+55
| | | | | | | | | | | | upcall: remove extra variable assignment and use just one initialization. open-behind: reduce the overall number of lines, in functions not frequently called selinux: reduce some lines in init failure cases updates: bz#1693692 Change-Id: I7c1de94f2ec76a5bfe1f48a9632879b18e5fbb95 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* across: coverity fixesAmar Tumballi2019-06-035-5/+11
| | | | | | | | | | | | | | | * locks/posix.c: key was not freed in one of the cases. * locks/common.c: lock was being free'd out of context. * nfs/exports: handle case of missing free. * protocol/client: handle case of entry not freed. * storage/posix: handle possible case of double free CID: 1398628, 1400731, 1400732, 1400756, 1124796, 1325526 updates: bz#789278 Change-Id: Ieeaca890288bc4686355f6565f853dc8911344e8 Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* posix: add storage.reserve-size optionSheetal Pamecha2019-06-036-13/+138
| | | | | | | | | | | storage.reserve-size option will take size as input instead of percentage. If set, priority will be given to storage.reserve-size over storage.reserve. Default value of this option is 0. fixes: bz#1651445 Change-Id: I7a7342c68e436e8bf65bd39c567512ee04abbcea Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
* glusterd: remove trivial conditionsSanju Rakonde2019-06-011-4/+2
| | | | | | | updates: bz#1193929 Change-Id: Ieb5e35d454498bc389972f9f15fe46b640f1b97d Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* glusterd: Optimize code to copy dictionary in handshake code pathMohit Agrawal2019-05-317-41/+187
| | | | | | | | | | | | | Problem: While high no. of volumes are configured around 2000 glusterd has bottleneck during handshake at the time of copying dictionary Solution: To avoid the bottleneck serialize a dictionary instead of copying key-value pair one by one Change-Id: I9fb332f432e4f915bc3af8dcab38bed26bda2b9a fixes: bz#1711297 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* tests/geo-rep: Add tests to cover glusterd geo-repKotresh HR2019-05-311-0/+3
| | | | | | Change-Id: Ide59a3fde11b23f654b1ec03d72b4ec53b36a03b Signed-off-by: Kotresh HR <khiremat@redhat.com> updates: bz#1693692
* glusterd/shd: Optimize the glustershd manager to send reconfigureMohammed Rafi KC2019-05-312-5/+5
| | | | | | | | | | | | | Traditionally all svc manager will execute process stop and then followed by start each time when they called. But that is not required by shd, because the attach request implemented in the shd multiplex has the intelligence to check whether a detach is required prior to attaching the graph. So there is no need to send an explicit detach request if we are sure that the next call is an attach request Change-Id: I9157c8dcaffdac038f73286bcf5646a3f1d3d8ec fixes: bz#1710054 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* glusterfsd/cleanup: Protect graph object under a lockMohammed Rafi KC2019-05-313-28/+50
| | | | | | | | | | | While processing a cleanup_and_exit function, we are accessing a graph object. But this has not been protected under a lock. Because a parallel cleanup of a graph is quite possible which might lead to an invalid memory access Change-Id: Id05ca70d5b57e172b0401d07b6a1f5386c044e79 fixes: bz#1708926 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* tests/geo-rep: Add EC volume test caseShwetha K Acharya2019-05-312-0/+447
| | | | | | | | Added geo-rep regression tests with EC volume. fixes: bz#1650095 Change-Id: Ifb6e68e0a6103a98fced7f84d3088b8edf33d52f Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* lcov: more coverage to shard, old-protocol, sdfsAmar Tumballi2019-05-316-6/+58
| | | | | | updates: bz#1693692 Change-Id: If4c30572d4501d169bb4b0871c677d974515867c Signed-off-by: Amar Tumballi <amarts@redhat.com>
* glusterd/svc: Stop stale process using the glusterd_proc_stopMohammed Rafi KC2019-05-311-3/+3
| | | | | | | | | | | While restarting a glusterd process, when we have a stale pid we were doing a simple kill. Instead we can use glusterd_proc_stop Because it has more logging plus force kill in case if there is any problem with kill signal handling. Change-Id: I4a2dadc210a7a65762dd714e809899510622b7ec updates: bz#1710054 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* glusterd/svc: glusterd_svcs_stop should call individual wrapper functionMohammed Rafi KC2019-05-312-7/+15
| | | | | | | | | | | | glusterd_svcs_stop should call individual wrapper function to stop a daemon rather than calling glusterd_svc_stop. For example for shd, it should call glusterd_shdsvc_stop instead of calling basic API function to stop. Because the individual functions for each daemon could be doing some specific operation in their wrapper function. Change-Id: Ie6d40590251ad470ef3901d1141ab7b22c3498f5 fixes: bz#1712741 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* glusterd: add an op-version checkSanju Rakonde2019-05-311-1/+5
| | | | | | | | | | | | | | | | | | | | Problem: "gluster v status" is hung in heterogenous cluster when issued from a non-upgraded node. Cause: commit 34e010d64 fixes the txn-opinfo mem leak in op-sm framework by not setting the txn-opinfo if some conditions are true. When vol status is issued from a non-upgraded node, command is hanging in its upgraded peer as the upgraded node setting the txn-opinfo based on new conditions where as non-upgraded nodes are following diff conditions. Fix: Add an op-version check, so that all the nodes follow same set of conditions to set txn-opinfo. fixes: bz#1710159 Change-Id: Ie1f353212c5931ddd1b728d2e6949dfe6225c4ab Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* scripts: Find hung frames given a directory with statedumpsPranith Kumar K2019-05-301-0/+53
| | | | | | | | | | | | | | | | | | | | Given a directory with statedumps captured at different times if there are any stacks that appear in multiple statedumps, it prints them. Sample output: glusterdump.25425.dump repeats=5 stack=0x7f53642cb968 pid=0 unique=0 lk-owner= glusterdump.25427.dump repeats=5 stack=0x7f85002cb968 pid=0 unique=0 lk-owner= glusterdump.25428.dump repeats=5 stack=0x7f962c2cb968 pid=0 unique=0 lk-owner= glusterdump.25428.dump repeats=2 stack=0x7f962c329f18 pid=60830 unique=0 lk-owner=88f50620967f0000 glusterdump.25429.dump repeats=5 stack=0x7f20782cb968 pid=0 unique=0 lk-owner= glusterdump.25472.dump repeats=5 stack=0x7f27ac2cb968 pid=0 unique=0 lk-owner= glusterdump.25473.dump repeats=5 stack=0x7f4fbc2cb9d8 pid=0 unique=0 lk-owner= NOTE: stacks with lk-owner=""/lk-owner=0000000000000000/unique=0 may not be hung frames and need further inspection fixes bz#1714415 Change-Id: Ib64a3fca63f49df2fafedcd4baa57e9b25411b08 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* stack: Make sure to have unique call-stacks in all casesPranith Kumar K2019-05-305-14/+10
| | | | | | | | | | | | | | | At the moment new stack doesn't populate frame->root->unique in all cases. This makes it difficult to debug hung frames by examining successive state dumps. Fuse and server xlators populate it whenever they can, but other xlators won't be able to assign 'unique' when they need to create a new frame/stack because they don't know what 'unique' fuse/server xlators already used. What we need is for unique to be correct. If a stack with same unique is present in successive statedumps, that means the same operation is still in progress. This makes 'finding hung frames' part of debugging hung frames easier. fixes bz#1714098 Change-Id: I3e9a8f6b4111e260106c48a2ac3a41ef29361b9e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* glusterd: coverity fixSanju Rakonde2019-05-301-2/+0
| | | | | | | | | 1401590: Deadcode updates: bz#789278 Change-Id: I3aa1d3aa9769e6990f74b6a53e288e788173c5e0 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* marker: remove some unused functionsAmar Tumballi2019-05-307-148/+8
| | | | | | | | | After basic analysis, found that these methods were not being used at all. updates: bz#1693692 Change-Id: If9cfa1ab189e6e7b56230c4e1d8e11f9694a9a65 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests: add tests for different signal handlingAmar Tumballi2019-05-307-15/+80
| | | | | | | | | | | Also some cleanup: * old-protocol.t was actually added to make sure we have line-coverage * first-test.t should have been removed as per the comment. It doesn't do anything. * add statvfs to rpc-coverage so we can cover statvfs in few xlators. updates: bz#1693692 Change-Id: Ie8651ce007de484c4abced16b4de765aa5e517be Signed-off-by: Amar Tumballi <amarts@redhat.com>