272 Commits

Author SHA1 Message Date
Robert Dower
2b6c3ec3cf Update README.md 2025-08-07 13:34:33 -07:00
Peter W. Morreale
9e3697308e Fix create_stack for version 1.25
kubernetes 1.25 changed the key for the NoSchedule taint to
'control-plane'.

Fix the script to handle both pre and post version 1.25

Signed-off-by: Peter W. Morreale <pwmorreale@gmail.com>
v1.25.1
2022-11-15 14:39:32 -07:00
Ganesh Maharaj Mahalingam
af1171a6af Add storage back to all function (#346)
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2022-08-31 09:37:52 -07:00
Ganesh Maharaj Mahalingam
17ee9d5040 Upgrades to components in preparation of k8s 1.25 (#345)
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
v1.25
2022-08-29 12:17:33 -07:00
Ganesh Maharaj Mahalingam
ae85c01b63 Revert check for prometheus deployment and update nginx
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2022-08-16 17:18:48 -06:00
Ganesh Maharaj Mahalingam
8f2f1d7422 Fix bash command to check prometheus crds
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2022-08-15 17:00:30 -06:00
Ganesh Maharaj Mahalingam
32f421a2f9 fix canal download URL
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
v1.22
2022-05-18 10:57:13 -06:00
Justin Scott
6aff0ac601 Remove static cpu from kubeadm.yaml
* Removing static cpu pinning due to confusion and complications it causes with disabling/enabling cores.
* Static CPU policy by defaults pins the process to CPUs if a kubernetes pod limits are set to be full cpus. While this a neat thing, this also causes issues when setaffinity is tried in a particular CPU. While we would like to set limits, pinning a process to a core is something we will try to avoid in the near future.
2022-04-04 14:31:15 -07:00
Ganesh Maharaj Mahalingam
0d1d72e7a1 Update rook, kata, metrics-server and kube-prometheus (#340)
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2022-03-29 16:17:07 -07:00
Ganesh Maharaj Mahalingam
dc39af186f Update CNIs to versions that work well with 1.20.X of Kubernetes
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
v1.21
2022-03-04 12:00:27 -07:00
Ganesh Maharaj Mahalingam
6b56a905a6 Update CNIs
Canal: v3.18
Cilium: v1.9
Flannel: 0.14.0-rc1

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2021-05-18 09:08:55 -06:00
Ganesh Maharaj Mahalingam
f9b260088d Update README to take not of k8s-migration package (#334)
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
v1.19
2021-03-17 13:26:12 -07:00
Mark Horn
5ca64545b1 Disable swap before OS Update (#333)
Mask all swap targets.

Signed-off-by: Mark D Horn <mark.d.horn@intel.com>
2020-12-17 10:38:43 -08:00
Hyunsun Moon
e74b3ca892 Add support of CNI version 0.3.1 to vfioveth
Tested with K8S v1.18.6 and the latest multus stable image.
2020-09-10 15:14:19 -07:00
António Meireles
55b2aa2d19 accommodate upstream libvirt boxes changes in a forward and backward way (#330)
per https://github.com/AntonioMeireles/ClearLinux-packer/issues/24 the
ClearLinux libvirt boxes default size was lowered from the previous 40G
to a more manageable 5G, having the user the ability to, at box
instantiation, to increase that value to whatever is adequate.

In order to avoid surprises in this side the Vagrantfile was modded to
have hardcoded the previous default value.

This change is forward and backward compatible as it will just be
ignored when using older boxes ( since the 'new' root volume size just
the same as original one) and will resize the box to the expected 40G in
newer ones.

Signed-off-by: António Meireles <antonio.meireles@reformi.st>
2020-08-19 14:23:51 -07:00
Ganesh Maharaj Mahalingam
aad050f944 Switch private network for Vagrant VMs (#324)
Previously used IP range seems to be allocated to NASA and probably not
a good idea to use it here. Switching the private network to be part of
the IPs allocated to private networks.

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2020-06-15 09:40:22 -07:00
David Lyle
a872ccca6c adding support for HA clusters, using HAProxy (#317)
* adding support for HA clusters, using HAProxy

Signed-off-by: David Lyle <dklyle0@gmail.com>

* fixing typo

* fixing load balancer port default value
2020-06-15 09:38:34 -07:00
Saikrishna Edupuganti
1525407bd4 Update multi-net components to latest releases (#329)
* Update multi-net components to latest releases

Multus CNI 3.4.2
SR-IOV CNI 2.3
SR-IOV DP  3.2

Tested as per the README. Works fine.

Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>

* Provide dpdk-stable 3 LTS and 1 recent release

Testpmd manifests of the last 3 LTS and 1 latest release from stable
repo to help test actual DPDK app instead of sleep.

Currently 17.11 and 19.11 are the only functioning ones without
privileged.

```
NAME              READY   STATUS    RESTARTS   AGE
dpdk-1711         1/1     Running   0          3m5s
dpdk-1811         0/1     Error     0          3m5s
dpdk-1911         1/1     Running   0          3m5s
dpdk-2002         0/1     Error     0          3m5s
```

```
EAL: PCI device 0000:07:06.4 on NUMA socket 0
EAL:   probe driver: 8086:154c net_i40e_vf
EAL: Getting a vfio_dev_fd for 0000:07:06.4 failed
EAL: Requested device 0000:07:06.4 cannot be used
…
testpmd: No probed ethernet devices
EAL: Error - exiting with code: 1
  Cause: Invalid port 1
```

Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
2020-06-05 09:06:08 -07:00
Miguel Bernal Marin
9b4f0a8582 setup_system: use local admin path for proxy.conf (#328)
Currently the system.conf.d/proxy.conf file is saved at /usr/lib
which is the vendor path, and can be dropped by
"swupd repair --picky --force".

This commit creates the local administrator /etc/systemd/system.conf.d
directory and store the proxy.conf inside.
2020-05-26 10:45:25 -07:00
Antti Kervinen
696861ce66 metrics: change collectd output to host /opt/collectd/run
Currently we loose collectd data from a node when scaling ends to a
system failure on the node - yet this data can be very helpful in root
causing the failure. This patch changes collectd configuration so that
the output will be continuously written to host filesystem instead of
the collectd container overlay that will be lost unless scaling
reaches graceful exit.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2020-05-19 19:56:54 +01:00
Graham Whaley
07fd8412da metrics: report: Error more cleanly
Clean up the rest of the report R files to allow them to quit
cleanly when they find an error or missing data, so that the
final PDF report gives meaninful errors such as 'No data found',
rather than cryptic R errors.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-05-19 19:53:23 +01:00
Graham Whaley
058e1753ae metrics: report: quit cleanly on tidy_scaling failure
When there are no files to process, we tend to quit with a loud
and not helpful error. Improve that by spotting the obvious error
cases (such as no files to process for a specific test), and quit
with a nicer error/warning message that ends up in the rendered
report.

Start with the tidy_scaling test. The only clean way to quit a
fragment of Rmarkdown R looks to be to place it inside a function
so we can 'return'. Otherwise, all other forms of 'quit', quit the
whole Rmarkdown render pipeline, which is not what we want - we
want to carry on and try to process the rest of the fragments for
the rest of the tests.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-05-19 19:53:23 +01:00
Graham Whaley
091e76c3d8 metrics: k8s_scale_net: whitespace fixes
Fix some indentation that had gone rogue.
Note, there are other whitespace fixes that can be done in this file,
it appears to have a mix of tabs and spaces.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-05-15 09:11:37 -06:00
Graham Whaley
7121418dd3 metrics: Imrove documentation
Improve and expand the documents across the metrics subsystem.
Clarify and re-order some documents. Add some more details around
each individual test.
Note that only the 'rapid' test is currently actively used, and the
other tests may need some nurturing if they are found to be useful.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-05-15 09:11:37 -06:00
Gabriela Cervantes
e10260e99c metrics: Use a specific version of rocker/tidyverse
This PR uses a specific version of rocker/tidyverse as the latest version
does not have the latex-xcolor package which makes impossible to create
the metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2020-05-13 15:18:05 +01:00
CraigSterrett
e732cc693b Updated metrics github location (#323)
The metrics-server package has moved out of the kubernetes incubator
github location and is now in the sigs location.

Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
2020-05-06 13:00:07 -07:00
Hyunsun Moon
cf9b85cf03 Update vfioveth CNI to add VF device id as alias 2020-04-15 11:20:27 -07:00
CraigSterrett
a82b9d9601 Modified setup_system to use Systemd level proxy (#316)
Modified setup_system.sh to use Systemd level proxy instead of per
service proxy.
Tested create_stack all with defaults and ran Sonobuoy successfully against change.

Closes issue #307

Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
2020-04-03 09:09:30 -07:00
CraigSterrett
96978c5228 Rook updated to v1.2.6 (#315)
Rook updated to V1.2.6

Closes issue #312
https://github.com/clearlinux/cloud-native-setup/issues/312

Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
2020-03-27 11:22:17 -07:00
CraigSterrett
8c17c4b47c removed v0.8.3 of rook (#314)
Rook v0.8.3 was found to be not working and is too old to continue
supporting.

Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
2020-03-26 10:25:13 -07:00
CraigSterrett
514efd6592 Fix rook single node setup (#313)
* Fix rook single node setup

Modified the rook installation to support both multinode kubernetes
clusters and standalone kubernetes clusters. Multinode installations
will occur as before, with changes for standalone installations.
v0.8.3 I found it was not currently working and
is too old to continue to support, will submit a PR for removing it. Also made a couple minor spacing
changes to yaml as detected by yamllint

closes issue 306 https://github.com/clearlinux/cloud-native-setup/issues/306
closes issue 311 https://github.com/clearlinux/cloud-native-setup/issues/311

Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>

* Fix rook single node setup

Modified the rook installation to support both multinode kubernetes
clusters and standalone kubernetes clusters. Multinode installations
will occur as before, with changes for standalone installations.
v0.8.3 I found it was not currently working and
is too old to continue to support, will submit a PR for removing it. Also made a couple minor spacing
changes to yaml as detected by yamllint

closes issue 306 clearlinux#306
closes issue 311 clearlinux#311

Signed-off-by: Craig Sterrett <craig.Sterrett@intel.com>
2020-03-26 10:07:18 -07:00
Julio Rivera
603c42703f Add initial Jenkinsfile (#301)
Signed-off-by: Rivera Gonzalez, Julio C <julio.c.rivera.gonzalez@intel.com>
2020-02-12 09:44:18 -08:00
Khanak Nangia
52d1a8406b Updating flannel (#299) v1.17 v1.9 2020-01-11 01:04:53 -08:00
CraigSterrett
61b8702472 Added --force flag to swupd repair command (#298)
Running setup_system.sh on a system setting the OS version to keep setup_system from upgrading the OS causes an error because a package has been removed. Need to add the --force flag to the
sudo swupd repair -m "${CLR_VER}" --picky command

Closes issue #297
2020-01-09 14:22:39 -08:00
Justin Scott
00c1d60470 Update kubeadm.yaml to 1.17 version (#296)
Closes #295

Signed-off-by: Justin Scott <justin.a.scott@intel.com>
2020-01-08 12:43:57 -08:00
Khanak Nangia
07c2231e62 Updating ingress-nginx to v0.26.1 (#285)
* Updating ingress-nginx to v0.26.1

* removing extra line
2019-12-09 10:44:25 -08:00
Morales Quispe, Marcela
4ef8d34671 Make and rename net server process variables configurable
Some CNIs takes longer for its related deployments to become ready, that
is why `proc_wait_time` needs to be customized. Now `proc_wait_time` can
be set at execution time and has a default value too for time to pod
network test harness.

Signed-off-by: Morales Quispe, Marcela <marcela.morales.quispe@intel.com>
2019-12-09 14:56:29 +00:00
Khanak Nangia
bc0f257176 Updating metrics to v0.3.6 (#286) 2019-12-05 14:47:46 -08:00
Khanak Nangia
e70e32d36e Updating MetalLB to v0.8.3 (#284) 2019-12-05 14:40:11 -08:00
Khanak Nangia
518fa87f27 Updating cilium to v1.6.4 (#289) 2019-12-05 14:39:46 -08:00
Khanak Nangia
6cd87d74be Updating rook to v1.1.7 (#283) 2019-12-05 12:36:42 -08:00
Morales Quispe, Marcela
927ceddc9c Add time to pod network metric.
To measure the time to pod network, a deployment that uses agnhost
image is used, which get exposed as a net server and replies to curl
calls, the test measure this reply time and saves it for further reporting.
Then, only the exposed net service gets deleted.

Signed-off-by: Morales Quispe, Marcela <marcela.morales.quispe@intel.com>
2019-12-05 11:00:41 +00:00
Graham Whaley
6298cf2054 metrics: tidy: widen the graphs
Move the legends to the bottom (underneath) for the tidy scaling graphs
to make them wider on the page, and thus easier to read with more
resolution.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2019-12-04 10:02:36 +00:00
Graham Whaley
acf5a95177 metrics: tidy: move local assign inside loop
The bootdata assignments were outside the 'valid file' check loop,
which meant in the case there was a data directory which did not
contain a valid scaling file, we would fail the assignment (as the
`local_bootdata` would be empty).

Fix by moving the assignments into the loop, thus only assigning when
we know we have valid data.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2019-12-04 10:02:36 +00:00
Graham Whaley
68a62f50bc metrics: report: improve interface y axis divs
Most of the time we have 0 interface errors or drops, so we pin the y
scale to '1', so we don't hit 'infinity' errors. That left us with a
strange y-axis label anomoly - as the axis was automatically divided
into 5 labels, and we got for some reason the sequence '0,0,0,1,1'.
That just plain looked wrong and confusing.
Fix it by using `pretty_breaks()` for the error/drop y axis, whilst
maintaining the `comma` count for the pod count y axis.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2019-12-04 10:02:36 +00:00
Graham Whaley
9431dd9f38 metrics: report: shrink page margins for more resolution
The pdf output by default has large page margins, which wastes a lot of
page space, and reduces our 'resolution'. Shrink the margins to a pretty
minimal 1cm to increase the graph resolution. The document itself then
does not look as 'pretty', but we can see more data visually.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2019-12-04 10:02:36 +00:00
Graham Whaley
9b8c7c093f metrics: collectd: move legends under graphs
Move the legends under the graphs to give more width, and thus
resolution, to the final pictures.
This works well for the collectd graphs as they are spread out
into sets of single column graphs per page.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2019-12-04 10:02:36 +00:00
Syed Ahsan
bdcb4fb5b7 Update Kata to v1.9.1 (#282)
This patch updates Kata to use v 1.9.1 and adds the kustomization.

Signed-off: Syed Ahsan <syed.ahsan.shamim.zaidi@intel.com>
2019-12-02 16:34:10 -08:00
Eric Ernst
a0ca2a2017 set the snapshotter to devmapper in setup script
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-11-25 13:29:13 -08:00
Obed N Munoz
46b3f230ee scaling: Remove tty parameter in report's generation cmd
This is in order to avoid tty-related issues in our CI systems
which by default is not supporting tty. With this change we'll
avoid the following faling report's generation `docker run` command.
```
the input device is not a TTY
```

Signed-off-by: Obed N Munoz <obed.n.munoz@intel.com>
2019-11-22 09:17:03 +00:00