summaryrefslogtreecommitdiffstats
path: root/glustolibs-gluster
diff options
context:
space:
mode:
authorkshithijiyer <kshithij.ki@gmail.com>2020-09-03 22:19:11 +0530
committerArthy Loganathan <aloganat@redhat.com>2020-09-11 09:09:44 +0000
commitd8eae4bf67b789a95822512f27fdf9f4f349d7d5 (patch)
treeacdaf8bbf1812c5c8fe1a9236e28725944d29980 /glustolibs-gluster
parentc79453974f5602830f475fd5f133d5771ce81f23 (diff)
[Lib] Add memory and cpu leak testing framework
Summary: ======== Currently we don't have a memory and cpu leak testing framework which blocks automation development of all testcases around memory and CPU leaks. To solve this problem we need the libraries for: 1. Logging memory and CPU utilization 2. Checking for OOM killer on gluster processes 3. Checking for memory leaks 4. Checking for cpu usage spikes 5. Wrapper functions in base class to make development easy 6. Compute statistics of usage Detailed description: ===================== We have already added script to log CPU and memory usage through patch [1]. In this patch we would be using patch [1] and building logic to process the CSV files generated by the script which would help us to achieve the following: 1. Checking if there are memory leaks or CPU spikes 2. Computing statistics of memory and CPU usage Function sets added: ~~~~~~~~~~~~~~~~~~~~ Set 1 - Functions to perfrom logging using script ------------------------------------------------- Public functions: 1. check_upload_memory_and_cpu_logger_script() 2. log_memory_and_cpu_usage_on_servers() 3. log_memory_and_cpu_usage_on_clients() 4. log_memory_and_cpu_usage_on_cluster() 5. wait_for_logging_processes_to_stop() 6. kill_all_logging_processes() Private functions to support public functions: 1. _start_logging_processes() 2. _process_wait_flag_append() Set 2 - Functions to check for OOM killers ------------------------------------------ Public functions: 1. check_for_oom_killers_on_servers() 2. check_for_oom_killers_on_clients() Private functions to support public functions: 1. _check_for_oom_killers() Set 3 - Functions to check for memory leaks ------------------------------------------- Public functions: 1. check_for_memory_leaks_in_glusterd() 2. check_for_memory_leaks_in_glusterfs() 3. check_for_memory_leaks_in_glusterfsd() 4. check_for_memory_leaks_in_glusterfs_fuse() Private functions to support public functions: 1. _perform_three_point_check_for_memory_leak() Set 4 - Functions to check for cpu usage spikes ----------------------------------------------- Public functions: 1. check_for_cpu_usage_spikes_on_glusterd() 2. check_for_cpu_usage_spikes_on_glusterfs() 3. check_for_cpu_usage_spikes_on_glusterfsd() 4. check_for_cpu_usage_spikes_on_glusterfs_fuse() Private functions to support public functions: 1. _check_for_cpu_usage_spikes() Set 7 - Functions to calculate stats ------------------------------------ Public functions: 1. compute_data_usage_stats_on_servers() 2. compute_data_usage_stats_on_clients() Private functions to support public functions: 1. _get_min_max_mean_median() 2. _compute_min_max_mean_median() Set 6 - Wrapper functions added to base class --------------------------------------------- 1. start_memory_and_cpu_usage_logging() 2. compute_and_print_usage_stats() 3. check_for_memory_leaks_and_oom_kills_on_servers() 4. check_for_memory_leaks_and_oom_kills_on_clients() 5. check_for_cpu_usage_spikes_on_servers() 6. check_for_cpu_spikes_on_clients() Set 7 - Other generic functions ------------------------------- Public functions: 1. create_dataframe_from_csv() Third party libraries added to glusto-tests through patch: 1. Numpy(It is being used in file_dir_ops.py but it's installed on clients and not on management node.) 2. Pandas 3. Statistics How do I use it in my testcase? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For example if we take the testcase to Check the server side memory leak with fuse mount, we would need to perfrom the below steps: 1. Create Disperse volumes 2. Fuse mount on the client 3. Start creating files in 1000's in parallel with 3. Create directories in 1000's in parallel 4. Do linux untar and wait for completion 5. Watch the memory usage on the server side and check for the OOM killers. Here steps 1-4 would be as usual, for step five what we need to do would be after step 2 we would need to start logging memory usage with the below code: ``` proc_dict = cls.start_memory_and_cpu_usage_logging() assertIsNotNone(proc_dict, <Error message>) ``` Once step 4 is complete we would need to wait for the logging process to stop with the below code: ``` ret = wait_for_logging_processes_to_stop(proc_dict, cluster=True) assertTrue(ret, <Error message>) ``` And lastly to check for memory leaks and OOM killers we would need use the below code: ``` ret = cls.check_for_memory_leaks_and_oom_kills_on_servers() assertTrue(ret, 'Memory leaks or OOM killer found') ``` NOTE: Interval and count of function start_memory_and_cpu_usage_logging() and gain of check_for_memory_leaks_and_oom_kills_on_servers() would need tweaking on a case by case scenario. Links: ====== [1] https://review.gluster.org/#/c/glusto-tests/+/24659/ Change-Id: Ib617fae102b8280723e54d0a38f77791558f5658 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
Diffstat (limited to 'glustolibs-gluster')
-rwxr-xr-xglustolibs-gluster/glustolibs/gluster/gluster_base_class.py210
1 files changed, 210 insertions, 0 deletions
diff --git a/glustolibs-gluster/glustolibs/gluster/gluster_base_class.py b/glustolibs-gluster/glustolibs/gluster/gluster_base_class.py
index b43318fe4..baec1be8a 100755
--- a/glustolibs-gluster/glustolibs/gluster/gluster_base_class.py
+++ b/glustolibs-gluster/glustolibs/gluster/gluster_base_class.py
@@ -1105,3 +1105,213 @@ class GlusterBaseClass(TestCase):
raise ExecutionError("Force cleanup of nfs-ganesha "
"cluster failed")
g.log.info("Teardown nfs ganesha cluster succeeded")
+
+ @classmethod
+ def start_memory_and_cpu_usage_logging(cls, interval=60, count=100):
+ """Upload logger script and start logging usage on cluster
+
+ Kawrgs:
+ interval(int): Time interval after which logs are to be collected
+ (Default: 60)
+ count(int): Number of samples to be collected(Default: 100)
+
+ Returns:
+ proc_dict(dict):Dictionary of logging processes
+ """
+ # imports are added inside function to make it them
+ # optional and not cause breakage on installation
+ # which don't use the resource leak library
+ from glustolibs.io.memory_and_cpu_utils import (
+ check_upload_memory_and_cpu_logger_script,
+ log_memory_and_cpu_usage_on_cluster)
+
+ # Checking if script is present on servers or not if not then
+ # upload it to servers.
+ if not check_upload_memory_and_cpu_logger_script(cls.servers):
+ return None
+
+ # Checking if script is present on clients or not if not then
+ # upload it to clients.
+ if not check_upload_memory_and_cpu_logger_script(cls.clients):
+ return None
+
+ # Start logging on servers and clients
+ proc_dict = log_memory_and_cpu_usage_on_cluster(
+ cls.servers, cls.clients, cls.id(), interval, count)
+
+ return proc_dict
+
+ @classmethod
+ def compute_and_print_usage_stats(cls, proc_dict, kill_proc=False):
+ """Compute and print CPU and memory usage statistics
+
+ Args:
+ proc_dict(dict):Dictionary of logging processes
+
+ Kwargs:
+ kill_proc(bool): Kill logging process if true else wait
+ for process to complete execution
+ """
+ # imports are added inside function to make it them
+ # optional and not cause breakage on installation
+ # which don't use the resource leak library
+ from glustolibs.io.memory_and_cpu_utils import (
+ wait_for_logging_processes_to_stop, kill_all_logging_processes,
+ compute_data_usage_stats_on_servers,
+ compute_data_usage_stats_on_clients)
+
+ # Wait or kill running logging process
+ if kill_proc:
+ nodes = cls.servers + cls.clients
+ ret = kill_all_logging_processes(proc_dict, nodes, cluster=True)
+ if not ret:
+ g.log.error("Unable to stop logging processes.")
+ else:
+ ret = wait_for_logging_processes_to_stop(proc_dict, cluster=True)
+ if not ret:
+ g.log.error("Processes didn't complete still running.")
+
+ # Compute and print stats for servers
+ ret = compute_data_usage_stats_on_servers(cls.servers, cls.id())
+ g.log.info('*' * 50)
+ g.log.info(ret) # TODO: Make logged message more structured
+ g.log.info('*' * 50)
+
+ # Compute and print stats for clients
+ ret = compute_data_usage_stats_on_clients(cls.clients, cls.id())
+ g.log.info('*' * 50)
+ g.log.info(ret) # TODO: Make logged message more structured
+ g.log.info('*' * 50)
+
+ @classmethod
+ def check_for_memory_leaks_and_oom_kills_on_servers(cls, gain=30.0):
+ """Check for memory leaks and OOM kills on servers
+
+ Kwargs:
+ gain(float): Accepted amount of leak for a given testcase in MB
+ (Default:30)
+
+ Returns:
+ bool: True if memory leaks or OOM kills are observed else false
+ """
+ # imports are added inside function to make it them
+ # optional and not cause breakage on installation
+ # which don't use the resource leak library
+ from glustolibs.io.memory_and_cpu_utils import (
+ check_for_memory_leaks_in_glusterd,
+ check_for_memory_leaks_in_glusterfs,
+ check_for_memory_leaks_in_glusterfsd,
+ check_for_oom_killers_on_servers)
+
+ # Check for memory leaks on glusterd
+ if check_for_memory_leaks_in_glusterd(cls.servers, cls.id(), gain):
+ g.log.error("Memory leak on glusterd.")
+ return True
+
+ # Check for memory leaks on shd
+ if check_for_memory_leaks_in_glusterfs(cls.servers, cls.id(), gain):
+ g.log.error("Memory leak on shd.")
+ return True
+
+ # Check for memory leaks on brick processes
+ if check_for_memory_leaks_in_glusterfsd(cls.servers, cls.id(), gain):
+ g.log.error("Memory leak on brick process.")
+ return True
+
+ # Check OOM kills on servers for all gluster server processes
+ ret = check_for_oom_killers_on_servers(cls.servers)
+ if not ret:
+ g.log.error('OOM kills present on servers.')
+ return True
+ return False
+
+ @classmethod
+ def check_for_memory_leaks_and_oom_kills_on_clients(cls, gain=30):
+ """Check for memory leaks and OOM kills on clients
+
+ Kwargs:
+ gain(float): Accepted amount of leak for a given testcase in MB
+ (Default:30)
+
+ Returns:
+ bool: True if memory leaks or OOM kills are observed else false
+ """
+ # imports are added inside function to make it them
+ # optional and not cause breakage on installation
+ # which don't use the resource leak library
+ from glustolibs.io.memory_and_cpu_utils import (
+ check_for_memory_leaks_in_glusterfs_fuse,
+ check_for_oom_killers_on_clients)
+
+ # Check for memory leak on glusterfs fuse process
+ if check_for_memory_leaks_in_glusterfs_fuse(cls.clients, cls.id(),
+ gain):
+ g.log.error("Memory leaks observed on FUSE clients.")
+ return True
+
+ # Check for oom kills on clients
+ if check_for_oom_killers_on_clients(cls.clients):
+ g.log.error("OOM kills present on clients.")
+ return True
+ return False
+
+ @classmethod
+ def check_for_cpu_usage_spikes_on_servers(cls, threshold=3):
+ """Check for CPU usage spikes on servers
+
+ Kwargs:
+ threshold(int): Accepted amount of instances of 100% CPU usage
+ (Default:3)
+ Returns:
+ bool: True if CPU spikes are more than threshold else False
+ """
+ # imports are added inside function to make it them
+ # optional and not cause breakage on installation
+ # which don't use the resource leak library
+ from glustolibs.io.memory_and_cpu_utils import (
+ check_for_cpu_usage_spikes_on_glusterd,
+ check_for_cpu_usage_spikes_on_glusterfs,
+ check_for_cpu_usage_spikes_on_glusterfsd)
+
+ # Check for CPU usage spikes on glusterd
+ if check_for_cpu_usage_spikes_on_glusterd(cls.servers, cls.id(),
+ threshold):
+ g.log.error("CPU usage spikes observed more than threshold "
+ "on glusterd.")
+ return True
+
+ # Check for CPU usage spikes on shd
+ if check_for_cpu_usage_spikes_on_glusterfs(cls.servers, cls.id(),
+ threshold):
+ g.log.error("CPU usage spikes observed more than threshold "
+ "on shd.")
+ return True
+
+ # Check for CPU usage spikes on brick processes
+ if check_for_cpu_usage_spikes_on_glusterfsd(cls.servers, cls.id(),
+ threshold):
+ g.log.error("CPU usage spikes observed more than threshold "
+ "on shd.")
+ return True
+ return False
+
+ @classmethod
+ def check_for_cpu_spikes_on_clients(cls, threshold=3):
+ """Check for CPU usage spikes on clients
+
+ Kwargs:
+ threshold(int): Accepted amount of instances of 100% CPU usage
+ (Default:3)
+ Returns:
+ bool: True if CPU spikes are more than threshold else False
+ """
+ # imports are added inside function to make it them
+ # optional and not cause breakage on installation
+ # which don't use the resource leak library
+ from glustolibs.io.memory_and_cpu_utils import (
+ check_for_cpu_usage_spikes_on_glusterfs_fuse)
+
+ ret = check_for_cpu_usage_spikes_on_glusterfs_fuse(cls.clients,
+ cls.id(),
+ threshold)
+ return ret