Since collectd is started before the pods are launched and shutdown after the last pod is launched, we gather data outside the pod launch window which can adversely influence the per pod launch stats. This is especially true after the last pod launches as all the pods are then deleted before collectd stops collecting metrics. This patch isolates the collectd data used to only coincide with the pod launch window. And additional change in this patch is to improve the secondary y axis scaling. There was an ill-advised check in previously to force the scale to be at least 1. This does not work well when the pod number is significantly higher than say 100 (the max possible cpu idle value). This patch changes the scaling to be across all data to be graphed. The special condition for interface drops and interface errors, where the data is typically 0. We don't scale by 0. Signed-off-by: David Lyle <dklyle0@gmail.com>
Metric testing for scaling on Kubernetes.
This folder contains tools to aid in measuring the scaling capabilities of Kubernetes clusters.
The tools tend to take one of two forms:
- Tools to take measurements
- Tools to analyse results
For more details, see individual sub-folders. A brief summary of available tools is below:
| Tool | Description |
|---|---|
| lib | General library helper functions for forming and launching workloads, and storing results in a uniform manner to aid later analysis |
| scaling | Tests to measure scaling, such as linear or parallel launching of pods |
| report | Rmarkdown based report generator, used to produce a PDF comparison report of 1 or more sets of results |
Results storage and analysis
The tools generate JSON formatted results files via the lib/json.bash functions. The metrics_json_save()
function in that file has the ability to also curl or socat the JSON results to a database defined
by environment variables (see the file source for details). This method has been used to store results in
Elasticsearch and InfluxDB databases for instance, but should be adaptable to use with any REST API that accepts
JSON input.
Scaling execution
This section describes a complete step-by-step scaling execution up to results reporting by using scaling/k8s_scale.sh tool which launches a series of workloads and take memory metric measurements after each launch.
Requirements
- A Kubernetes cluster up and running (tested on v1.15.3).
bcandjqpackages.- Docker (only for report generation).
The steps to execute a run of the scaling framework are listed below, which need to be executed on the master node of a Kubernetes cluster to avoid network issues:
- Clone
cloud-native-setuprepository into a preferred directory and change directory up tocloud-native-setup/metrics:$ git clone https://github.com/clearlinux/cloud-native-setup.git $ cd cloud-native-setup/metrics - Launch the execution by:
$ ./scaling/k8s_scale.sh INFO: Initialising command: bc: yes command: jq: yes INFO: Checking Kubernetes accessible INFO: 1 Kubernetes nodes in 'Ready' state found starting kubectl proxy Starting to serve on 127.0.0.1:8090 daemonset.apps/stats created Waiting for daemon set "stats" rollout to finish: 0 of 1 updated pods are available... daemon set "stats" successfully rolled out INFO: Running test INFO: And grab some stats INFO: idle [98.49] free [29031100] launch [0] node [clr-30f01b5149ba4ab8b05a7ee03b6812a5] inodes_free [31103039] INFO: Testing replicas 1 of 20 INFO: Content of runtime_command=:/@RUNTIMECLASS@/d ...
The above execution might take about 4min because it launch up to 20 pods by default and takes measurements for CPU utilization, memory utilization and pod boot time, finally it will generate a k8s-scaling.json result file at result directory.
Note: to test the launch of pods concurrently, k8s_parallel.sh may be used. For quicker testing, k8s_scale_rapid.sh can be used in place of k8s_scale.sh. The rest of the launch instructions remain consistent other than script name.
Note: by default the scaling framework makes call to the Kubernetes API directly so, if facing connectivity issues verify that kubelet service's proxies and no_proxy environment variable are properly setup.
Note: by default the scaling framework uses default values for all its required variables, which can be checked through scaling/k8s_scale.sh -h and updated when launching the execution, i.e.:
$ ./scaling/k8s_scale.sh -h
Usage: ./scaling/k8s_scale.sh [-h] [options]
Description:
Launch a series of workloads and take memory metric measurements after
each launch.
Options:
-h, Help page.
Environment variables:
Name (default)
Description
TEST_NAME (k8s scaling)
Can be set to over-ride the default JSON results filename
NUM_PODS (20)
Number of pods to launch
STEP (1)
Number of pods to launch per cycle
wait_time (30)
Seconds to wait for pods to become ready
delete_wait_time (600)
Seconds to wait for all pods to be deleted
settle_time (5)
Seconds to wait after pods ready before taking measurements
use_api (yes)
specify yes or no to use the API to launch pods
grace (30)
specify the grace period in seconds for workload pod termination
$ use_api=no ./scaling/k8s_scale.sh
The steps to generate the result report are listed below:
- Having the
results/k8s-scaling.jsonresult file, create a subdirectory in theresultsdirectory with a preferred name and copy thek8s-scaling.jsonfile into it, so the file distribution looks like:$ tree result results/ └── scaling └── k8s-scaling.json
Note: if k8s_scale_rapid.sh was run instead of k8s_scale.sh, that the <node_name>.tar.gz files that appear in the results directory also need to be copied into the newly created subdirectory. And the results file is named k8s-rapid.json rather than k8s-scaling.json.
If k8s_parallel.sh was run, the results file is named k8s-parallel.json rather than k8s-scaling.json.
-
Launch the report generation by:
./report/makereport.shNote: the first time you launch the report generation it will build a docker container to generate the reports and this process can take several minutes. Subsequent runs will be much faster.
The above execution will generate a
report/outputdirectory with the final reports, such as:$ tree report/output/ report/output/ ├── dut-1.png ├── metrics_report.pdf ├── scaling-1.png ├── scaling-2.png ├── scaling-3.png └── scaling-4.png
More details about result reporting can be reviewed at report directory.