-
Notifications
You must be signed in to change notification settings - Fork 40.6k
kind [ipv6?] CI jobs failing sometimes on network not ready #131948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
#131883 was pretty recent on the kubernetes/kubernetes changes side of things. It doesn't appear to have flaked on this though. https://prow.k8s.io/pr-history/?org=kubernetes&repo=kubernetes&pr=131883 Other commits don't stand out. Not sure about the infra yet, but that sounds more likely at the moment. |
https://github.com/kubernetes/test-infra/commits/master/ nothing obvious here? |
Or in https://github.com/kubernetes/k8s.io/commits/main/ Maybe the cluster itself. These ran in I don't think we've had other changes to that cluster lately, it could've auto-upgraded maybe. |
Upgrade logs might align: {
"insertId": "1m0nhfte82p86",
"jsonPayload": {
"operation": "operation-1747999572322-1bde5da1-4f3f-4340-9c30-ad42e3f6cdf2",
"@type": "type.googleapis.com/google.container.v1beta1.UpgradeEvent",
"resource": "projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool5-20210928124956061000000001",
"currentVersion": "1.32.2-gke.1297002",
"operationStartTime": "2025-05-23T11:26:12.322248554Z",
"resourceType": "NODE_POOL",
"targetVersion": "1.32.3-gke.1785003"
},
"resource": {
"type": "gke_nodepool",
"labels": {
"location": "us-central1",
"nodepool_name": "pool5-20210928124956061000000001",
"cluster_name": "prow-build",
"project_id": "k8s-infra-prow-build"
}
},
"timestamp": "2025-05-23T11:26:25.010880914Z",
"severity": "NOTICE",
"logName": "projects/k8s-infra-prow-build/logs/container.googleapis.com%2Fnotifications",
"receiveTimestamp": "2025-05-23T11:26:25.026034909Z"
} |
I think this is a kernel issue with the node pool. SIG K8s Infra is looking to migrate to COS + cgroup v2 + C4 VMs (this is running on Ubuntu + v1 + N1), but we were still testing this. Looking into CI node pool down/up-grade. |
/triage accepted |
|
Specifically there appear to be netfilter UDP bug(s) that affected Ubuntu and COS (and others, it's an upstream kernel issue), EDIT: There's a workaround available for the known impact to GKE clusters with intranode visibility, the |
https://bugzilla.netfilter.org/show_bug.cgi?id=1795 |
this is impacting ~all kind e2e jobs If you see it fail at
|
NOTE: we're heading into a 3 day weekend here in the US. I think this might be IPV6 only, digging through more of the failures. kernel version upgraded from 6.8.0-1019 to 6.8.0-1022 |
Tentatively the COS + C4 + cgroup v2 nodepool (pool6....) is good. https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/126563/pull-kubernetes-e2e-kind-ipv6/1926052312141271040 To migrate a job:
The operation to rollback the main (pool5...) nodepool upgrade is still pending some capacity issues are slowing it down so the pool is only partially rolled back. This can be checked like: |
The job updated above (pull-kubernetes-e2e-kind-ipv6) seems to be working reliably. |
Uh oh!
There was an error while loading. Please reload this page.
Which jobs are failing?
pull-kubernetes-e2e-kind-ipv6
possibly others..
Which tests are failing?
cluster creation, network is unready
Since when has it been failing?
looks like this is failing a lot more in the past day: https://go.k8s.io/triage?pr=1&job=kind&test=SynchronizedBeforeSuite
Testgrid link
No response
Reason for failure (if possible)
and similar network unready errors (visible in e.g. coredns logs)
Anything else we need to know?
containerd 2.1.1 was adopted a few days ago kubernetes-sigs/kind@31a79fd
That doesn't align with the failure spike though:
Further back we updated other dependencies recently-ish, but again, that doesn't align
We haven't merged anything in
kind
since the 20th, but there's failure spike in the past day or so.So I suspect either the CI infra, or kubernetes/kubernetes changes.
Relevant SIG(s)
/sig testing
The text was updated successfully, but these errors were encountered: