mirror of https://github.com/clearlinux/cloud-native-setup.git synced 2026-04-28 11:03:40 +00:00

Files

Antti Kervinen 696861ce66 metrics: change collectd output to host /opt/collectd/run

Currently we loose collectd data from a node when scaling ends to a
system failure on the node - yet this data can be very helpful in root
causing the failure. This patch changes collectd configuration so that
the output will be continuously written to host filesystem instead of
the collectd container overlay that will be lost unless scaling
reaches graceful exit.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>

2020-05-19 19:56:54 +01:00

collectd

metrics: change collectd output to host /opt/collectd/run

2020-05-19 19:56:54 +01:00

lib

Add time to pod network metric.

2019-12-05 11:00:41 +00:00

report

metrics: report: Error more cleanly

2020-05-19 19:53:23 +01:00

scaling

metrics: k8s_scale_net: whitespace fixes

2020-05-15 09:11:37 -06:00

README.md

metrics: Imrove documentation

2020-05-15 09:11:37 -06:00

README.md

Metric testing for scaling on Kubernetes.
- Results storage and analysis
Developers
- Metrics gathering
  - collectd statistics
  - privileged statistics pods
- Configuring constant 'loads'

Metric testing for scaling on Kubernetes.

This folder contains tools to aid in measuring the scaling capabilities of Kubernetes clusters.

Primarily these tools were designed to measure scaling of large number of pods on a single node, but the code is structured to handle multiple nodes, and may also be useful in that scenario.

The tools tend to take one of two forms:

Tools to launch jobs and take measurements
Tools to analyse results

For more details, see individual sub-folders. A brief summary of available tools is below:

Folder	Description
collectd	`collectd` based statistics/metrics gathering daemonset code
lib	General library helper functions for forming and launching workloads, and storing results in a uniform manner to aid later analysis
lib/cpu-load*	Helper functions to enable CPU load generation on a cluster whilst under test
report	Rmarkdown based report generator, used to produce a PDF comparison report of one or more sets of results
scaling	Tests to measure scaling, such as linear or parallel launching of pods

Results storage and analysis

The tools generate JSON formatted results files via the lib/json.bash functions. The metrics_json_save() function has the ability to also curl or socat the JSON results to a database defined by environment variables (see the file source for details). This method has been used to store results in Elasticsearch and InfluxDB databases for instance, but should be adaptable to use with any REST API that accepts JSON input.

Prerequisites

There are some basic pre-requisites required in order to run the test and process the results:

A Kubernetes cluster up and running (tested on v1.15.3).
bc and jq packages.
Docker (only for report generation).

Developers

Below are some architecture and internal details of how the code is structured and configured. This will be helpful for improving, modifying or submitting fixes to the code base.

Metrics gathering

Metrics can be gathered using either a daemonset deployment of privileged pods used to gather statistics directly from the nodes using a combination of mpstat, free and df, or a daemonset deployment based around collectd. The general recommendation is to use the collectd based collection if possible, as it is more efficient, as the system does not have to poll and wait for results, and thus executes the test cycle faster. The collectd results are collected asyncronously, and the report generator code later aligns the results with the pod execution in the timeline.

`collectd` statistics

The collected based code can be found in the collectd subdirectory. It uses the collected configuration found in the collectd.conf file to gather statistics, and store the results on the nodes themselves whilst tests are running. At the end of the test, the results are copied from the nodes and stored in the results directory for later processing.

The collectd statistics are only configured and gathered if the environment variable SMF_USE_COLLECTD is set to non-empty by the test code (that is, it is only enabled upon request).

privileged statistics pods

The privileged statistics pods YAML can be found in the scaling/stats.yaml file. An example of how to invoke and use this daemonset to extract statistics can be found in the scaling/k8s_scale.sh file.

Configuring constant 'loads'

The framework includes some tooling to assist in setting up constant pre-defined 'loads' across the cluster to aid evaluation of their impacts on the scaling metrics. See the cpu-load documentation for more information.

README.md

Metric testing for scaling on Kubernetes.

Results storage and analysis

Prerequisites

Developers

Metrics gathering

collectd statistics

privileged statistics pods

Configuring constant 'loads'

`collectd` statistics