Page MenuHomePhabricator

[infra,k8s] Upgrade Toolforge Kubernetes to version 1.27
Closed, ResolvedPublic

Description

K8s

https://v1-27.docs.kubernetes.io/blog/2023/04/11/kubernetes-v1-27-release/

Working etherpad: https://etherpad.wikimedia.org/p/k8s-1.26-to-1.27-upgrade
Persistent wiki page: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Upgrading_Kubernetes/1.26_to_1.27_notes

1.26 - > 1.27 retrospective post-mortem: https://docs.google.com/document/d/1_Mudr7sOxT-tTw1Dj76FtI7EXtJMpYsih8pxliuQbU8/edit

Workgroup page: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Ongoing_Efforts/Toolforge_Upgrade_Workgroup/Upgrades_Overview

Components

Pre-k8s upgrade

should be upgraded (potentially not blocking, tests pass without them upgrading)
can be upgraded

Post-k8s upgrade

need upgrading
can be upgraded

Related Objects

StatusSubtypeAssignedTask
In ProgressRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedSlst2020
Resolvedaborrero
Resolvedaborrero
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolved Bstorm
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedrook
Resolvedtaavi
Resolvedtaavi
Resolvedrook
Resolvedtaavi
Resolvedrook
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolved Bstorm
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolved Bstorm
DeclinedNone
Resolvedrook
Resolvedtaavi
ResolvedBUG REPORTNone
DuplicateNone
Resolvedtaavi
Resolvedtaavi
OpenNone
Resolvedrook
Resolvedtaavi
ResolvedBUG REPORTtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedaborrero
Stalleddcaro
Stalleddcaro
Resolvedkomla
DeclinedNone
Resolvedtaavi
ResolvedAndrew
ResolvedAndrew
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Stalleddcaro
StalledSlst2020
In ProgressSlst2020
ResolvedSlst2020
ResolvedSlst2020
Resolvedaborrero
ResolvedSlst2020
OpenNone
ResolvedBUG REPORTdcaro
ResolvedBUG REPORTdcaro
OpenBUG REPORTbd808
Resolveddcaro
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Openaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedAndrew
Duplicatedcaro
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Declinedaborrero
Declinedaborrero
Declinedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Openaborrero
DuplicateNone
Resolvedaborrero
ResolvedSlst2020
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedSlst2020
ResolvedSlst2020
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedSlst2020
ResolvedSlst2020
ResolvedSlst2020
ResolvedSlst2020
ResolvedSlst2020
ResolvedSlst2020
ResolvedAndrew
Resolveddcaro

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-17T08:32:42Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-17T08:35:25Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud (T359641)

Running the refresh certs cookbook did the trick:

dcaro@urcuchillay$ cookbooks wmcs.vps.refresh_puppet_certs --fqdn tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud --task-id T359641

It was though missing a lot of setup, so I'm looking to see if the worker is behaving as expected.

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-17T08:40:20Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-17T08:40:49Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-17T08:41:24Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-17T08:43:32Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-17T08:43:40Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-17T08:46:35Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 (T359641)

I found tools-k8s-worker-nfs-75 and 70 that were missing a full puppet run (I'm guessing after creation?), they had a bunch of stuck pods in 'ContainerCreating' status, fix the certs, ran puppet again (and did a lot of changes), and rebooted them, and now they are back online.

It seems also that when removing the old workers, the certs were not cleaned up properly (maybe they were removed by hand?), so I ran this on the puppetsevrer:

root@tools-puppetserver-01:~# for host in $(grep -o 'cert_name="[^"]*' /var/lib/prometheus/node.d/openstack_stale_puppet_certs.prom  | cut -d'"' -f2); do ping -c1 -w1 "$host" && { echo "SKIPPING: $host is alive"; continue; }; puppetserver ca clean --certname "$host"; done

Anything left to do here? Or can this task be closed?

Anything left to do here? Or can this task be closed?

We were waiting mostly to see if it was stable, as it was a bumpy upgrade, maybe @Raymond_Ndibe is wants to upgrade something else in the list still, though that can be done in a different task too.

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:35:37Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:35:43Z] <wmbot~raymondndibe@wmf3402> Updating container image docker-registry.tools.wmflabs.org/docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:35:47Z] <wmbot~raymondndibe@wmf3402> END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:36:09Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:36:11Z] <wmbot~raymondndibe@wmf3402> Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:36:20Z] <wmbot~raymondndibe@wmf3402> END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:37:30Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:37:32Z] <wmbot~raymondndibe@wmf3402> Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T22:38:08Z] <wmbot~raymondndibe@wmf3402> END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T23:11:49Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T23:11:53Z] <wmbot~raymondndibe@wmf3402> Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.10.1 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T23:12:25Z] <wmbot~raymondndibe@wmf3402> END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T23:17:02Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T23:17:06Z] <wmbot~raymondndibe@wmf3402> Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.10 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T23:17:08Z] <wmbot~raymondndibe@wmf3402> END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T00:25:14Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T00:30:52Z] <wmbot~raymondndibe@wmf3402> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T00:32:30Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T00:39:29Z] <wmbot~raymondndibe@wmf3402> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T16:54:38Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T16:54:42Z] <wmbot~raymondndibe@wmf3402> Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T16:56:37Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T16:56:39Z] <raymond-ndibe@cloudcumin1001> Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T16:57:03Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T16:59:34Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T16:59:36Z] <raymond-ndibe@cloudcumin1001> Updating container image docker-registry.tools.wmflabs.org/calico/ctl:v3.28.2 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T16:59:46Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:02:18Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:02:20Z] <raymond-ndibe@cloudcumin1001> Updating container image docker-registry.tools.wmflabs.org/calico/kube-controllers:v3.28.2 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:02:32Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:03:08Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:03:10Z] <raymond-ndibe@cloudcumin1001> Updating container image docker-registry.tools.wmflabs.org/calico/node:v3.28.2 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:04:02Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:04:38Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:04:41Z] <raymond-ndibe@cloudcumin1001> Updating container image docker-registry.tools.wmflabs.org/calico/typha:v3.28.2 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:04:51Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:05:57Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:05:59Z] <raymond-ndibe@cloudcumin1001> Updating container image docker-registry.tools.wmflabs.org/calico/pod2daemon-flexvol:v3.28.2 (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T17:06:02Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T359641)

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/524

calico: bump to 0.0.10-20240920171811-afd7f481

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/525

calico: bump to 0.0.11-20240920172624-a95e9e60

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/526

calico: bump to 0.0.12-20240920190545-a342cb41

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/527

calico: bump to 0.0.14-20240920191533-f94f2f8d

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-24T21:35:51Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component kyverno (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-24T21:41:01Z] <wmbot~raymondndibe@wmf3402> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-24T21:41:48Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component kyverno (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-24T21:48:25Z] <wmbot~raymondndibe@wmf3402> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno (T359641)

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/533

volume-admission: bump to 0.0.56-20240926151825-d311e795

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-26T15:24:56Z] <dcaro@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component volume-admission (T359641)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-26T15:29:56Z] <dcaro@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission (T359641)

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/534

registry-admission: bump to 0.0.51-20240926154338-be8dc0fd

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/535

ingress-admission: bump to 0.0.51-20240926154329-19cfe59e

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy