kubernetes 1.25 changed the key for the NoSchedule taint to
'control-plane'.
Fix the script to handle both pre and post version 1.25
Signed-off-by: Peter W. Morreale <pwmorreale@gmail.com>
* Removing static cpu pinning due to confusion and complications it causes with disabling/enabling cores.
* Static CPU policy by defaults pins the process to CPUs if a kubernetes pod limits are set to be full cpus. While this a neat thing, this also causes issues when setaffinity is tried in a particular CPU. While we would like to set limits, pinning a process to a core is something we will try to avoid in the near future.
per https://github.com/AntonioMeireles/ClearLinux-packer/issues/24 the
ClearLinux libvirt boxes default size was lowered from the previous 40G
to a more manageable 5G, having the user the ability to, at box
instantiation, to increase that value to whatever is adequate.
In order to avoid surprises in this side the Vagrantfile was modded to
have hardcoded the previous default value.
This change is forward and backward compatible as it will just be
ignored when using older boxes ( since the 'new' root volume size just
the same as original one) and will resize the box to the expected 40G in
newer ones.
Signed-off-by: António Meireles <antonio.meireles@reformi.st>
Previously used IP range seems to be allocated to NASA and probably not
a good idea to use it here. Switching the private network to be part of
the IPs allocated to private networks.
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
* Update multi-net components to latest releases
Multus CNI 3.4.2
SR-IOV CNI 2.3
SR-IOV DP 3.2
Tested as per the README. Works fine.
Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
* Provide dpdk-stable 3 LTS and 1 recent release
Testpmd manifests of the last 3 LTS and 1 latest release from stable
repo to help test actual DPDK app instead of sleep.
Currently 17.11 and 19.11 are the only functioning ones without
privileged.
```
NAME READY STATUS RESTARTS AGE
dpdk-1711 1/1 Running 0 3m5s
dpdk-1811 0/1 Error 0 3m5s
dpdk-1911 1/1 Running 0 3m5s
dpdk-2002 0/1 Error 0 3m5s
```
```
EAL: PCI device 0000:07:06.4 on NUMA socket 0
EAL: probe driver: 8086:154c net_i40e_vf
EAL: Getting a vfio_dev_fd for 0000:07:06.4 failed
EAL: Requested device 0000:07:06.4 cannot be used
…
testpmd: No probed ethernet devices
EAL: Error - exiting with code: 1
Cause: Invalid port 1
```
Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
Currently the system.conf.d/proxy.conf file is saved at /usr/lib
which is the vendor path, and can be dropped by
"swupd repair --picky --force".
This commit creates the local administrator /etc/systemd/system.conf.d
directory and store the proxy.conf inside.
Currently we loose collectd data from a node when scaling ends to a
system failure on the node - yet this data can be very helpful in root
causing the failure. This patch changes collectd configuration so that
the output will be continuously written to host filesystem instead of
the collectd container overlay that will be lost unless scaling
reaches graceful exit.
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Clean up the rest of the report R files to allow them to quit
cleanly when they find an error or missing data, so that the
final PDF report gives meaninful errors such as 'No data found',
rather than cryptic R errors.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
When there are no files to process, we tend to quit with a loud
and not helpful error. Improve that by spotting the obvious error
cases (such as no files to process for a specific test), and quit
with a nicer error/warning message that ends up in the rendered
report.
Start with the tidy_scaling test. The only clean way to quit a
fragment of Rmarkdown R looks to be to place it inside a function
so we can 'return'. Otherwise, all other forms of 'quit', quit the
whole Rmarkdown render pipeline, which is not what we want - we
want to carry on and try to process the rest of the fragments for
the rest of the tests.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Fix some indentation that had gone rogue.
Note, there are other whitespace fixes that can be done in this file,
it appears to have a mix of tabs and spaces.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Improve and expand the documents across the metrics subsystem.
Clarify and re-order some documents. Add some more details around
each individual test.
Note that only the 'rapid' test is currently actively used, and the
other tests may need some nurturing if they are found to be useful.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
This PR uses a specific version of rocker/tidyverse as the latest version
does not have the latex-xcolor package which makes impossible to create
the metrics report.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The metrics-server package has moved out of the kubernetes incubator
github location and is now in the sigs location.
Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
Modified setup_system.sh to use Systemd level proxy instead of per
service proxy.
Tested create_stack all with defaults and ran Sonobuoy successfully against change.
Closes issue #307
Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
* Fix rook single node setup
Modified the rook installation to support both multinode kubernetes
clusters and standalone kubernetes clusters. Multinode installations
will occur as before, with changes for standalone installations.
v0.8.3 I found it was not currently working and
is too old to continue to support, will submit a PR for removing it. Also made a couple minor spacing
changes to yaml as detected by yamllint
closes issue 306 https://github.com/clearlinux/cloud-native-setup/issues/306
closes issue 311 https://github.com/clearlinux/cloud-native-setup/issues/311
Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
* Fix rook single node setup
Modified the rook installation to support both multinode kubernetes
clusters and standalone kubernetes clusters. Multinode installations
will occur as before, with changes for standalone installations.
v0.8.3 I found it was not currently working and
is too old to continue to support, will submit a PR for removing it. Also made a couple minor spacing
changes to yaml as detected by yamllint
closes issue 306 clearlinux#306
closes issue 311 clearlinux#311
Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
Running setup_system.sh on a system setting the OS version to keep setup_system from upgrading the OS causes an error because a package has been removed. Need to add the --force flag to the
sudo swupd repair -m "${CLR_VER}" --picky command
Closes issue #297
Some CNIs takes longer for its related deployments to become ready, that
is why `proc_wait_time` needs to be customized. Now `proc_wait_time` can
be set at execution time and has a default value too for time to pod
network test harness.
Signed-off-by: Morales Quispe, Marcela <marcela.morales.quispe@intel.com>
To measure the time to pod network, a deployment that uses agnhost
image is used, which get exposed as a net server and replies to curl
calls, the test measure this reply time and saves it for further reporting.
Then, only the exposed net service gets deleted.
Signed-off-by: Morales Quispe, Marcela <marcela.morales.quispe@intel.com>
Move the legends to the bottom (underneath) for the tidy scaling graphs
to make them wider on the page, and thus easier to read with more
resolution.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
The bootdata assignments were outside the 'valid file' check loop,
which meant in the case there was a data directory which did not
contain a valid scaling file, we would fail the assignment (as the
`local_bootdata` would be empty).
Fix by moving the assignments into the loop, thus only assigning when
we know we have valid data.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Most of the time we have 0 interface errors or drops, so we pin the y
scale to '1', so we don't hit 'infinity' errors. That left us with a
strange y-axis label anomoly - as the axis was automatically divided
into 5 labels, and we got for some reason the sequence '0,0,0,1,1'.
That just plain looked wrong and confusing.
Fix it by using `pretty_breaks()` for the error/drop y axis, whilst
maintaining the `comma` count for the pod count y axis.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
The pdf output by default has large page margins, which wastes a lot of
page space, and reduces our 'resolution'. Shrink the margins to a pretty
minimal 1cm to increase the graph resolution. The document itself then
does not look as 'pretty', but we can see more data visually.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Move the legends under the graphs to give more width, and thus
resolution, to the final pictures.
This works well for the collectd graphs as they are spread out
into sets of single column graphs per page.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
This is in order to avoid tty-related issues in our CI systems
which by default is not supporting tty. With this change we'll
avoid the following faling report's generation `docker run` command.
```
the input device is not a TTY
```
Signed-off-by: Obed N Munoz <obed.n.munoz@intel.com>