summaryrefslogtreecommitdiffstats
path: root/glustolibs-gluster
Commit message (Collapse)AuthorAgeFilesLines
* [LibFix] Check to ignore peer validation in case of single node serverArthy Loganathan12 days1-0/+5
| | | | | Change-Id: I5ad55be92b3acaa605e66de246ce7d40bcec6d5b Signed-off-by: Arthy Loganathan <aloganat@redhat.com>
* [LibFix] Add dirname support form_bricks_list()“Milind14 days1-5/+17
| | | | | | | Adding arg dirname as a gluster brick directory Change-Id: I1bb69b4d719bad4cbac3a0e6a497fdae386c6004 Signed-off-by: “Milind <“mwaykole@redhat.com”>
* [LibFix] Optimizing setup_volume api.srijan-sivakumar2020-12-181-2/+2
| | | | | | | | | | | | | | | | Currently the setup volume API is calling the get_volume_info to obtain the volume information to check if the said volume already exists. Internally the get_volume_info would have to parse the complete xml dump received for volume info. Instead of that one can invoke the get_volume_list which would mean reduced effort put into parsing the output received as volume name is the only important factor in this check inside setup_volume. Change-Id: I024d42fe471bf26ac85dd3108d6f123cd56a0766 Signed-off-by: srijan-sivakumar <ssivakum@redhat.com>
* [Test+Lib] Add test to check rebalance impact on aclkshithijiyer2020-12-181-0/+76
| | | | | | | | | | | | | | | | | | | | Test case: 1. Create a volume, start it and mount it to a client. 2. Create 10 files on the mount point and set acls on the files. 3. Check the acl value and collect arequal-checksum. 4. Add bricks to the volume and start rebalance. 5. Check the value of acl(it should be same as step 3), collect and compare arequal-checksum with the one collected in step 3 Additional functions added: a. set_acl(): Set acl rule on a specific file b. get_acl(): Get all acl rules set to a file c. delete_acl(): Delete a specific or all acl rules set on a file Change-Id: Ia420cbcc8daea272cd4a282ae27d24f13b4991fe Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [LibFix] Adding retry for start_glusterdsrijan-sivakumar2020-11-271-4/+11
| | | | | | | | | | | | Issue: Glusterd start fails after repeated start and stop. ( Due to the cap on maximum of 6 starts of the service within an hour ) Fix: Hence it is prudent to add the retry option similar to that of restart_glusterd so as to run `systemctl reset-failed glusterd` on the servers. Change-Id: Ic0378934623dfa6dc5ab265246c746269f6995bc Signed-off-by: srijan-sivakumar <ssivakum@redhat.com>
* [Lib] Add get_usable_size_per_disk() to librarykshithijiyer2020-10-291-0/+22
| | | | | | | | | | Changes done in this patch: 1. Adding get_usable_size_per_disk() to lib_utils.py. 2. Removing the redundant code from dht/test_rename_with_brick_min_free_limit_crossed.py. Change-Id: I80c1d6124b7f0ce562d8608565f7c46fd8612d0d Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Test] Add 2 memory leak tests and fix library issueskshithijiyer2020-10-211-27/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Scenarios added: ---------------- Test case: 1. Create a volume, start it and mount it. 2. Start I/O from mount point. 3. Check if there are any memory leaks and OOM killers. Test case: 1. Create a volume, start it and mount it. 2. Set features.cache-invalidation to ON. 3. Start I/O from mount point. 4. Run gluster volume heal command in a loop 5. Check if there are any memory leaks and OOM killers on servers. Design change: -------------- - self.id() is moved into test class as it was hitting bound errors in the original logic. - Logic changed for checking leaks fuse. - Fixed breakage in methods where ever needed. Change-Id: Icb600d833d0c08636b6002abb489342ea1f946d7 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Lib] Add memory and cpu leak testing frameworkkshithijiyer2020-09-111-0/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: ======== Currently we don't have a memory and cpu leak testing framework which blocks automation development of all testcases around memory and CPU leaks. To solve this problem we need the libraries for: 1. Logging memory and CPU utilization 2. Checking for OOM killer on gluster processes 3. Checking for memory leaks 4. Checking for cpu usage spikes 5. Wrapper functions in base class to make development easy 6. Compute statistics of usage Detailed description: ===================== We have already added script to log CPU and memory usage through patch [1]. In this patch we would be using patch [1] and building logic to process the CSV files generated by the script which would help us to achieve the following: 1. Checking if there are memory leaks or CPU spikes 2. Computing statistics of memory and CPU usage Function sets added: ~~~~~~~~~~~~~~~~~~~~ Set 1 - Functions to perfrom logging using script ------------------------------------------------- Public functions: 1. check_upload_memory_and_cpu_logger_script() 2. log_memory_and_cpu_usage_on_servers() 3. log_memory_and_cpu_usage_on_clients() 4. log_memory_and_cpu_usage_on_cluster() 5. wait_for_logging_processes_to_stop() 6. kill_all_logging_processes() Private functions to support public functions: 1. _start_logging_processes() 2. _process_wait_flag_append() Set 2 - Functions to check for OOM killers ------------------------------------------ Public functions: 1. check_for_oom_killers_on_servers() 2. check_for_oom_killers_on_clients() Private functions to support public functions: 1. _check_for_oom_killers() Set 3 - Functions to check for memory leaks ------------------------------------------- Public functions: 1. check_for_memory_leaks_in_glusterd() 2. check_for_memory_leaks_in_glusterfs() 3. check_for_memory_leaks_in_glusterfsd() 4. check_for_memory_leaks_in_glusterfs_fuse() Private functions to support public functions: 1. _perform_three_point_check_for_memory_leak() Set 4 - Functions to check for cpu usage spikes ----------------------------------------------- Public functions: 1. check_for_cpu_usage_spikes_on_glusterd() 2. check_for_cpu_usage_spikes_on_glusterfs() 3. check_for_cpu_usage_spikes_on_glusterfsd() 4. check_for_cpu_usage_spikes_on_glusterfs_fuse() Private functions to support public functions: 1. _check_for_cpu_usage_spikes() Set 7 - Functions to calculate stats ------------------------------------ Public functions: 1. compute_data_usage_stats_on_servers() 2. compute_data_usage_stats_on_clients() Private functions to support public functions: 1. _get_min_max_mean_median() 2. _compute_min_max_mean_median() Set 6 - Wrapper functions added to base class --------------------------------------------- 1. start_memory_and_cpu_usage_logging() 2. compute_and_print_usage_stats() 3. check_for_memory_leaks_and_oom_kills_on_servers() 4. check_for_memory_leaks_and_oom_kills_on_clients() 5. check_for_cpu_usage_spikes_on_servers() 6. check_for_cpu_spikes_on_clients() Set 7 - Other generic functions ------------------------------- Public functions: 1. create_dataframe_from_csv() Third party libraries added to glusto-tests through patch: 1. Numpy(It is being used in file_dir_ops.py but it's installed on clients and not on management node.) 2. Pandas 3. Statistics How do I use it in my testcase? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For example if we take the testcase to Check the server side memory leak with fuse mount, we would need to perfrom the below steps: 1. Create Disperse volumes 2. Fuse mount on the client 3. Start creating files in 1000's in parallel with 3. Create directories in 1000's in parallel 4. Do linux untar and wait for completion 5. Watch the memory usage on the server side and check for the OOM killers. Here steps 1-4 would be as usual, for step five what we need to do would be after step 2 we would need to start logging memory usage with the below code: ``` proc_dict = cls.start_memory_and_cpu_usage_logging() assertIsNotNone(proc_dict, <Error message>) ``` Once step 4 is complete we would need to wait for the logging process to stop with the below code: ``` ret = wait_for_logging_processes_to_stop(proc_dict, cluster=True) assertTrue(ret, <Error message>) ``` And lastly to check for memory leaks and OOM killers we would need use the below code: ``` ret = cls.check_for_memory_leaks_and_oom_kills_on_servers() assertTrue(ret, 'Memory leaks or OOM killer found') ``` NOTE: Interval and count of function start_memory_and_cpu_usage_logging() and gain of check_for_memory_leaks_and_oom_kills_on_servers() would need tweaking on a case by case scenario. Links: ====== [1] https://review.gluster.org/#/c/glusto-tests/+/24659/ Change-Id: Ib617fae102b8280723e54d0a38f77791558f5658 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix] Fix python3 break in c_unit32()kshithijiyer2020-09-081-2/+8
| | | | | | | | | | | | | | | | | Problem: In python3, c_unit32() needs explict declaration of encoding due to which the hash calculated by compute_hash.py was wrong causeing 8 failures in latest CI runs on the newer platforms like CentOS 8. Solution: Add logic to specifiy encoding based on the version of python. Change-Id: I8907d8d266ac20d29d730a5ed948cf4da30f01b8 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix] Change logic to restart glusterdkshithijiyer2020-08-311-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: bring_bricks_online() method uses direct invocation of `service glusterd restart` at line 212. However due to change in behaviour of glusterd through patch [1] and [2] if glusterd is restart more than 6 times in an hour it goes into failed state which has to be reset using the `systemctl rest-failed` command. This causes a random failures as shown below: ``` 2020-08-25 18:00:51,600 INFO (run) root@rhs-vm24.blr.com (cp): service glusterd restart 2020-08-25 18:00:51,601 DEBUG (_get_ssh_connection) Retrieved connection from cache: root@rhs-vm24.blr.com 2020-08-25 18:00:51,830 INFO (_log_results) ^[[34;1mRETCODE (root@rhs-vm24.blr.com): 1^[[0m 2020-08-25 18:00:51,830 INFO (_log_results) ^[[31;1mSTDERR (root@rhs-vm24.blr.com)... Redirecting to /bin/systemctl restart glusterd.service Job for glusterd.service failed. See "systemctl status glusterd.service" and "journalctl -xe" for details. ^[[0m 2020-08-25 18:00:51,830 ERROR (bring_bricks_online) Unable to restart glusterd on node rhs-vm24.blr.com ``` Fix: Change the code to use restart_glusterd() from gluster_init. Links: [1] https://review.gluster.org/#/c/glusterfs/+/23751/ [2] https://review.gluster.org/#/c/glusterfs/+/23970/ Change-Id: Ibe44463ac1d444f3d2155c9ae11680c9ffd8dab9 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix] Add interval_check for monitor_heal_completionBala Konda Reddy M2020-08-211-3/+6
| | | | | | | | | | | | | | | | | | | | | | | Adding a parameter `interval_check` will ease the user and help in the reducing the waiting time for heal in scenarios. By default, ls -l <brickpath> | grep -ve "xattrop-" | wc -l is checked for every 2 minutes. Problem: I have 100 files that needs to be healed, after 2 minutes. suppose there are only 2/3 files that needs to be healed. With the existing approach the next check will wait for whole 2 minutes even though the files would have healed by 10 seconds after previouscheck. Solution: Giving an option for the user to check at which interval to check for the files that needs to healed, we can reduce the unnecssary waiting time. It won't affect the existing cases as the interval_check is defaults to 120. Change-Id: Ib288c75549644b6f6c94b5288f1c07cce7933915 Signed-off-by: Bala Konda Reddy M <bala12352@gmail.com>
* [LibFix] Fix code to avoid failuresayaleeraut2020-08-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Description: Earlier, on running a test script which calls get_volume_type(), the test script displayed below failure message in glusto logs: "INFO (get_volume_type) Failed to find brick-path 10.70.47.44:/bricks/brick2/testvol_distributed_brick0// for volume testvol_distributed" - even though the brick-path was present in the volume. Checked by directly calling the function as well: >>> from glusto.core import Glusto as g >>> from glustolibs.gluster.volume_libs import get_volume_type >>> ret = get_volume_type('10.70.47.44:/bricks/brick2/vol1-b1/') >>> print ret Distribute >>> ret = get_volume_type('10.70.47.44:/bricks/brick2/vol1-b1//') >>> print ret None Observed that the issue occurs if an extra "/" is present at the end of the brickdir_path(str) parameter passed to the function. Hence have added a check for the same. Change-Id: I01fe2d05b7f206d7767c83e57e714053358dc42c Signed-off-by: sayaleeraut <saraut@redhat.com>
* [LibFix] Fix wrong indentation in nfs_ganesha_opsPranav2020-08-181-10/+10
| | | | | | | | Cluster authentication between the nodes are done after setting a password for the user 'hacluster' on all the nodes Change-Id: Ic8b8838ef9490d2776172467c177d61cb615720f Signed-off-by: Pranav <prprakas@redhat.com>
* [Libfix] Fix python3 getfattr() issueskshithijiyer2020-08-171-3/+5
| | | | | | | | | | | | | | | | | Problem: Due to patch [1] which was sent for issue #24 causes a large number of testcases to fail or get stuck in the latest DHT run. Solution: Make changes sot that getfattr command sends back the output in text wherever needed. Links: [1] https://review.gluster.org/#/c/glusto-tests/+/24841/ Change-Id: I6390e38130b0699ceae652dee8c3b2db2ef3f379 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix]Fix for Default version change to vers=4.1 for GaneshaManisha Saini2020-08-131-2/+2
| | | | | Change-Id: I19e7e5e4338fce0d77e42dae716cc5eb5f814a17 Signed-off-by: Manisha Saini <msaini@redhat.com>
* [Libfix] Modify Client section of NFS-Ganesha to support RHEL8Manisha Saini2020-08-122-20/+25
| | | | | Change-Id: I62ff8409a2170becdebdaf8274a1032e63db40ea Signed-off-by: Manisha Saini <msaini@redhat.com>
* [LibFix] Add encoding type to get_fattr()sayaleeraut2020-08-121-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the get_fattr() function returns the xattr value in the hex encoding format. Adding the option to add/specify other encoding types, i.e. text and base64, where hex will be the default encoding format. Reason - When the xattrs are custom set through mountpoint, it becomes easier to test if the value is correct by using the "text" encoding. Example - >>> ret = get_fattr(host, fqpath, fattr) >>> print ret 0x414243 >>> ret = get_fattr(host, fqpath, fattr, encode="text") >>> print ret "ABC" The value "ABC" is easily readable as opposed to 0x414243 when performing a test. Change-Id: Ie2377b924816ebab0a2af116d82600e01f03d61f Signed-off-by: sayaleeraut <saraut@redhat.com>
* [Libfix] Fix UnicodeDecodeError in get_fattr()kshithijiyer2020-08-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Some testcases fail with UnicodeDecodeError when the framework is run using python3. This happens becuase the get_fattr() command returns non-unicode output which leads to data.decode() used in subprocess.Popen to fail. This isn't the case in python2 as it doesn't bother about encoding and dumps whatever is the output back to the management node. ``` gfid = get_fattr(brick_tuple[0], brick_path + '/' + direc, 'trusted.gfid') /root/glusto-tests/tests/functional/dht/test_dht_create_dir.py:127: /usr/local/lib/python3.8/site-packages/glustolibs_gluster-0.22-py3.8.egg/glustolibs/gluster/glusterfile.py:113: in get_fattr rcode, rout, rerr = g.run(host, command) /usr/local/lib/python3.8/site-packages/glusto-0.72-py3.8.egg/glusto/connectible.py:132: in run stdout, stderr = proc.communicate() /usr/lib64/python3.8/subprocess.py:1024: in communicate stdout, stderr = self._communicate(input, endtime, timeout) /usr/lib64/python3.8/subprocess.py:1904: in _communicate stdout = self._translate_newlines(stdout, self = <subprocess.Popen object at 0x7f22b4e2f490>, data = b'\xber\t\nO\xebO\xee\xa4\x9c\xc4L\xac\x1cj\xd5', encoding = 'UTF-8', errors = 'strict' def _translate_newlines(self, data, encoding, errors): data = data.decode(encoding, errors) E UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 0: invalid start byte /usr/lib64/python3.8/subprocess.py:901: UnicodeDecodeError ``` Solution: Change get_fattr() command to return xattr value in hex to avoid UnicodeDecodeError error from Popen. Fixes: https://github.com/gluster/glusto-tests/issues/24 Change-Id: I8c4786c882adf6079404b97eca2c399535db068f Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Testfix] Add logic to log more infokshithijiyer2020-08-061-3/+10
| | | | | | | | | Adding code to get dir tree and dump all xattr in hex for Bug 1810901 before remove-brick also adding logic to set log-level to debug. Change-Id: I9c9c970c4de7d313832f6f189cdca8428a073b1e Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix] Move NFS Ganesha support to GlusterBaseClassPranav2020-07-302-162/+143
| | | | | | | | | | | | | | | | Problem: NFS-Ganesha Tests inherits 'NfsGaneshaClusterSetupClass' whereas the other tests inherits 'GlusterBaseClass'. This causes a cyclic dependency when trying to run other modules with Nfs-Ganesha. Fix: 1. Move the Nfs-Ganesha dependencies to GlusterBaseClass 2. Modify the Nfs-Ganesha tests to inherit from GlusterBaseClass 3. Remove setup_nfs_ganesha method call from existing Ganesha tests as its invoked by default from GlusterBaseClass.SetUpClass Change-Id: I1e382fdb2b29585c097dfd0fea0b45edafb6442b Signed-off-by: Pranav <prprakas@redhat.com>
* [Libfix] Remove get_gluster_version() checkskshithijiyer2020-07-274-66/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When run for nightly gluster builds dht cases fails with below traceback: ``` def get_gluster_version(host): """Checks the gluster version on the nodes Args: host(str): IP of the host whose gluster version has to be checked. Returns: (float): The gluster version value. """ command = 'gluster --version' _, out, _ = g.run(host, command) g.log.info("The Gluster verion of the cluster under test is %s", out) > return float(out.split(' ')[1]) E ValueError: could not convert string to float: '20200719.9334a8d\nRepository ``` This is due to nightly builds returning the below output: ``` $ gluster --version glusterfs 20200708.cdf01cc Repository revision: git://git.gluster.org/glusterfs.git Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. ``` Instead of: ``` $ gluster --version glusterfs 6.0 ``` This is caused as while building the we use `VERSION="${GIT_DATE}.${GIT_HASH}` which causes the version API to return new output. Solution: Remove the checks and modify the function to return string instead of float. Fixes: https://github.com/gluster/glusto-tests/issues/22 Change-Id: I2e889bd0354a1aa75de25aedf8b14eb5ff5ecbe6 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix] Change warn to infokshithijiyer2020-07-231-2/+2
| | | | | | Fixes: https://github.com/gluster/glusto-tests/issues/21 Change-Id: I08115a2c11d657cdcb0ab0cc4fe9be697c947a8f Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Lib] Library to setup ctdb for samba.vdas-redhat2020-07-201-0/+142
| | | | | | | | CTDB works as an HA for samba. Configuring ctdb is mandatory for samba. Change-Id: I5ee28afb86dbc5853e5d54ad2b4460d37c8bfcef Signed-off-by: vdas-redhat <vdas@redhat.com>
* [Libfix] Fix rpyc dependency for NFS-Ganesha libsPranav2020-07-102-45/+51
| | | | | | | | | | | | | Problem: The rpyc connection fails in envs where the python versions are different, resulting in test failures Fix: Replace rpyc with standard ssh approach to overcome this issue Change-Id: Iee4bb968b8b94a6ab3e0fe0d16babacad914a92d Signed-off-by: Pranav <prprakas@redhat.com>
* [lib] CTDB library operations.vdas-redhat2020-07-081-0/+478
| | | | | Change-Id: Ia323aa80efdf5331d58c57be1f087b012fc94e1a Signed-off-by: vdas-redhat <vdas@redhat.com>
* [Libfix] Remove tier libraries from glusto-testsBala Konda Reddy M2020-07-085-2138/+161
| | | | | | | | | | | | Tier libraries are not used across test cases and due to checks across brick_libs.py and volume_libs.py, performance of regular test cases(time taken for execution) is getting degraded. One more factor to remove Tier libraries across glusto-tests is, the functionality is deprecated. Change-Id: Ie56955800515b2ff5bb3b55debaad0fd88b5ab5e Signed-off-by: Bala Konda Reddy M <bala12352@gmail.com>
* [Libfix] Skip gluster_shared_storage deletionkshithijiyer2020-07-021-9/+24
| | | | | | | | | | | | | | | | Problem: In the present logic gluster_shared_storage gets deleted in the force cleanup, this causes nfs-ganesha testcases to fail. Fix: Add logic to check is shared_storage is enabled if enabled skip: 1. Peer cleanup and peer probe 2. Deleting gluster_shared_storage vol files Change-Id: I5219491e081bd36dd40342262eaba540ccf00f51 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix] Fix is_linkto_file() compatible with newer platformsPranav2020-06-301-5/+2
| | | | | | | | | | | | | Problem: The 'file <file_path>' command output differs in newer platforms. An additional ',' is present in latest builds of the packages, causing tests which uses this method to fail on newer platforms. Fix: Modify the method to handle the latest package output as well Change-Id: I3e59a69b09b960e3a38131a3e76d664b34799ab1 Signed-off-by: Pranav <prprakas@redhat.com>
* [LibFix] Monitor heal only on specific bricksLeela Venkaiah G2020-06-241-3/+12
| | | | | | | | | - Add an optional argument (bricks) to monitor_heal_completion - If provides, heal will be monitored on these set of bricks - Useful when dealing with EC volumes Change-Id: I1c3b137e98966e21c52e0e212efc493aca9c5da0 Signed-off-by: Leela Venkaiah G <lgangava@redhat.com>
* [Test] Start/Stop of snapd deamon on the cloned volumesrivickynesh2020-06-241-1/+27
| | | | | | | | | | | | Test Cases in this module tests the USS functionality of snapshots snapd on cloned volume and validated snapshots are present inside .snaps directory by terminating snapd on one by one nodes and validating .snaps directory is still accessible. Change-Id: I98d48268e7c5c5952a7f0f544960203d8634b7ac Signed-off-by: Sri Vignesh <sselvan@redhat.com>
* [Lib] Add parse_vol_file methodPranav2020-06-241-0/+59
| | | | | | | | This method parses the given .vol file and returns the content as a dictionary. Change-Id: I6d57366ddf4d4c0249fff6faaca2ed005cd89e7d Signed-off-by: Pranav <prprakas@redhat.com>
* [Libfix] Add retry logic to restart_glusterd()kshithijiyer2020-06-172-19/+14
| | | | | | | | | | | | | | | | | | | | | | Problem: Patch [1] and [2] sent to glusterfs where changes are made to glusterd.service.in to not allow glusterd restart for more than 6 times within an hour, due this glusterd restarts present in testcases may fail as there is no way to figure out when we reach the 6 restart limit. Fix: Add code to check if glusterd restart has failed if true then call reset_failed_glusterd(), and redo the restart. Links: [1] https://review.gluster.org/#/c/glusterfs/+/23751/ [2] https://review.gluster.org/#/c/glusterfs/+/23970/ Change-Id: I041a019f9a8757d8fead00302e6bbcd6563dc74e Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Lib] Add library for reset-failedkshithijiyer2020-06-162-4/+39
| | | | | | | | | | | | | | | | | Adding library function reset_failed_glusterd() and modifying scratch_cleanup() to use reset_failed_glusterd(). This is needed because of patch [1] and [2] sent to glusterfs where changes are made to glusterd.service.in to not allow glusterd restart for more than 6 times within an hour. Links: [1] https://review.gluster.org/#/c/glusterfs/+/23751/ [2] https://review.gluster.org/#/c/glusterfs/+/23970/ Change-Id: I25f982517420f20f11a610e8a68afc71f3b7f2a9 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix] Fix find_specific_hashed methodPranav2020-06-101-2/+6
| | | | | | | | | | | | | | | Problem: There are scenarios where multiple files are to be renamed to hash to a particular subvol. The existing method returns the same name as the loop always starts from 1. Fix: Adding an optional argument, existing_names which contains names already hashed to the subvol. An additional check is added to ensure the name found is not already used Change-Id: I453ee290c8462322194cebb42c40e8fbc7c373ed Signed-off-by: Pranav <prprakas@redhat.com>
* [Lib] Add method set_rebalance_throttle()sayaleeraut2020-06-091-0/+30
| | | | | | | | | The method takes mnode, volname and throttle-type as parameters. It sets the rebal-throttle for the volume as per the mentioned throttle-type. Change-Id: I9eb14e39f87158c9ae7581636c2cad1333fd573c Signed-off-by: sayaleeraut <saraut@redhat.com>
* [Libfix] Change sequence of option set & start opPranav2020-06-031-8/+24
| | | | | | | | As SSL cannot be set after volume start op, moving set_volume_option prior to volume start. Change-Id: I14e1dc42deb0c0c28736f03e07cf25f3adb48349 Signed-off-by: Pranav <prprakas@redhat.com>
* [Libfix] Fix get_bricks_to_bring_offline_from_replicated_volumePranav2020-06-031-2/+2
| | | | | | | | | | | | | | Finding the offline brick limit using ceil returns incorrect value. E.g., For replica count 3, ceil(3/2) returns 2, and the subsequent method uses this value to bring down 2 out of 3 available bricks, resulting in IO and many other failures. Fix: Change ceil to floor. Also change the '/' operator to '//' for py2/3 compatibility Change-Id: I3ee10647bb037a3efe95d1b04e0864cf61e2499e Signed-off-by: Pranav <prprakas@redhat.com>
* [Lib] Add find_specific_hashed methodPranav2020-06-021-0/+34
| | | | | | | | This method helps in cases of rename scenarios where the new filename has to be hashed to a specific subvol Change-Id: Ia36ea8e3d279ddf130f3a8a940dbe1fcb1910974 Signed-off-by: Pranav <prprakas@redhat.com>
* [Libfix] Fetch all entries under a directory in recursive fashionnchilaka2020-05-221-5/+12
| | | | | | | This method fetches all entries under a directory in recursive fashion Change-Id: I4fc066ccf7a3a4730d568f96d926e46dea7b20a1 Signed-off-by: nchilaka <nchilaka@redhat.com>
* [Libfix] Add parameter for volume create onlyBala Konda Reddy M2020-05-182-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently setup_volume in volume_libs.py and gluster_base_class.py are to create volume and start it. There are tests, where only volume_create is required and if the test has to run on all volume types. Any contributor have to do all the validations which are already implemented in setup_volume and classmethod of setup volume in the gluster_base_class to their test. Solution: Added a parameter in the setup_volume() function "create_only" by default it is false, unless specified this paramter setup_volume will work as it is. similarly, have added a parameter in classmethod of setup_volume in gluster_base_class.py "only_volume_create", here also defaults to false unless specified. Note: After calling "setup_volume() -> volume_stop" is not same as just "volume_create()" in the actual test. Change-Id: I76cde1b668b3afcac41dd882c2a376cb6fac88a3 Signed-off-by: Bala Konda Reddy M <bala12352@gmail.com>
* [Libfix] Assign correct atime, ctime, mtime valuesnchilaka2020-05-151-2/+2
| | | | | | | | | Changed get_file_stat function to assign correct key-value pairs for atime, mtime and ctime respectively. Previously, all timestamp keys were assigned to atime value Change-Id: I471ec341d1a213395a89f6c01315f3d0f2e976af Signed-off-by: nchilaka <nchilaka@redhat.com>
* [Libfix] Fixing the pkill commandBala Konda Reddy M2020-05-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Below command to 'pkill pidof glusterd' is not right, as it is not getting the pidof glusterd. eg: cmd = "pkill pidof glusterd" ret, out ,err = g.run("10.20.30.40", cmd, "root") >>> ret, out, err (2, '', "pkill: only one pattern can be provided\n Try `pkill --help' for more information.\n") Here command is failing. Solution: Added `pidof glusterd` which will get proper glusterd pid and kill the stale pid after glusterd stop failed. cmd = "pkill `pidof glusterd`" ret, out ,err = g.run("10.20.30.40", cmd, "root") >>> ret, out, err (1, '', '') Note: The ret value is 1, as it is tried on a machine where glusterd is running. The purpose of the fix is to get the proper pid. Change-Id: Iacba3712852b9d16546ced9a4c071c62182fe385 Signed-off-by: Bala Konda Reddy M <bala12352@gmail.com>
* [Libfix] Kill stale bricks in scratch_cleanupBala Konda Reddy M2020-05-151-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: While performing scratch clenaup, observerd posix health checkers warnings once the glusterd is started as shown below [2020-05-05 12:19:10.633623] M [MSGID: 113075] [posix-helpers.c:2194:posix_health_check_thread_proc] 0-testvol_distributed-dispersed-posix: health-check failed, going down Solution: In scartch cleanup, once the glusterd is stopped, and runtime socket file removed for glusterd daemon, there are stale glusterfsd present on few the machines. Adding a step to get glusterfsd processes if any and using kill_process method killing the stale glusterfsd processes and continuing with the existing procedure. Once the glusterd is started won't see any posix health checkers. Change-Id: Ib3e9492ec029b5c9efd1c07b4badc779375a66d6 Signed-off-by: Bala Konda Reddy M <bala12352@gmail.com>
* [Libfix] Added atime, ctime and mtime for filesBala Konda Reddy M2020-05-141-2/+10
| | | | | | | | | | | | | | | | | | | | | get_file_stat function doesn't have access time modified time and change time for a file or directory. Added respective parameters for get- ting the values into the dictionary. Changed the separator from ':' to '$', reason is to overcome the unpacking of the tuple error as below: 2020-04-02 19:27:45.962477021 If ":" as separator is used, will be hitting "ValueError: too many values to unpack" error. Used $ as separator, as it is not used for the filenames in the glusto-tests and not part of the stat output. Change-Id: I40b0c1fd08a5175d3730c1cf8478d5ad8df6e8dd Signed-off-by: Bala Konda Reddy M <bala12352@gmail.com>
* [Libfix] Remove ssl_ops.py librarykshithijiyer2020-05-121-226/+0
| | | | | | | | | | | | | | | | Problem: Ideally operations done in ssl_ops.py should be performed on a gluster cluster even before peer probing the nodes. This makes the library useless as we can't run any library in glusto-tests without peer probing Solution: Enable SSL on gluster cluster before parsing it to run tests present in glusto-tests. Change-Id: If803179c67d5b3271b70c1578269350444aa3cf6 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Libfix] Fix georep_config_get methodPranav2020-05-121-1/+1
| | | | | | | | The command creation with a specific user had six substitutions, but had only 5 placeholders. Change-Id: I2c9f63213f78e5cec9e5bd30cac8d75eb8dbd6ce Signed-off-by: Pranav <prprakas@redhat.com>
* [Lib] Add create_link_file() to glusterfile.pykshithijiyer2020-05-041-0/+39
| | | | | | | | Adding function create_link_file() to create soft and hard links for an existing file. Change-Id: I6be313ded1a640beb450425fbd29374df51fbfa3 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
* [Test] functional/disperse: Verify remove brick operationSri Vignesh2020-04-291-1/+44
| | | | | | | | | | This test verifies remove brick operations on disperse volume. Change-Id: If4be3ffc39a8b58e4296d58b288e3843a218c468 Co-authored-by: Sunil Kumar Acharya <sheggodu@redhat.com> Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Signed-off-by: Sri Vignesh <sselvan@redhat.com>
* [Test+Lib] No fresh lookups on directoryBala Konda Reddy M2020-04-281-0/+33
| | | | | | | | | | | | | | | | | | | Test Steps: 1. Create a volume and set the volume option 'diagnostics.client-log-level' to DEBUG mount the volume on one client. 2. Create a directory 3. Validate the number of lookups for the directory creation from the log file. 4. Perform a new lookup of the directory 5. No new lookups should have happened on the directory, validate from the log file. 6. Bring down one subvol of the volume and repeat step 4, 5 7. Bring down one brick from the online bricks and repeat step 4, 5 8. Start the volume with force and wait for all process to be online. Change-Id: I162766837fd7e61625238a669c4050c2ec9c8a8b Signed-off-by: Bala Konda Reddy M <bmekala@redhat.com>
* [Libfix] Fix check_brick_pid_matches_glusterfsd_pid() to use pgrepkshithijiyer2020-04-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Problem: On latest platforms pidof command is returning multiple pids as shown below: 27190 27078 26854 This is becasue it was returning glusterd,glusterfsd and glusterfs processes as well. The problem is that /usr/sbin/glusterd is a link to glusterfsd. 'pidof' has a new feature that pidof searches for the pattern in /proc/PID/cmdline, /proc/PID/stat and finally /proc/PID/exe. Hence pidof matches realpath of /proc/<pid_of_glusterd>/exe as /usr/sbin/glusterfsd and results in glusterd, glusterfs and glusterfsd pids being returned in output. Fix: Use pgrep instead of pidof to get glusterfsd pids. And change the split logic accordingly. Change-Id: I729e05c3f4cacf7bf826592da965a94a49bb6f33 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>