4.7 -> 4.8 CI has been red for a long time now [1]. One leading contributor is: [sig-network] pods should successfully create sandboxes by other with: [sig-network] pods should successfully create sandboxes by getting pod also contributing. There are also API-server alert issues, but that's probably orthogonal. I had been expecting bug 1908378 to be the underlying issue, but it's now VERIFIED for a while, the update issues persist, and Elana has clearly scoped it to static pods [2]. Picking on a recent job [3]: [sig-network] pods should successfully create sandboxes by other 0s 4 failures to create the sandbox ns/openshift-multus pod/network-metrics-daemon-qdd59 node/ip-10-0-176-46.us-west-2.compute.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-metrics-daemon-qdd59_openshift-multus_3cd19768-6f58-45bb-860c-7ca5f89e72bd_0(50d13a74e1d486564f00b2fc458a18e5e502846a1b3528e421e94280a2ad2238): [openshift-multus/network-metrics-daemon-qdd59:openshift-sdn]: error adding container to network "openshift-sdn": failed to find plugin "openshift-sdn" in path [/opt/multus/bin /var/lib/cni/bin /usr/libexec/cni] ns/openshift-network-diagnostics pod/network-check-target-lc8sm node/ip-10-0-156-130.us-west-2.compute.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-check-target-lc8sm_openshift-network-diagnostics_e78b6504-e78e-4588-8d46-36fe1aa65ded_0(e77da0277e4fbd1a3d46eaf8ec4895c64aaad45a2e23400f933fb4eae28b0396): [openshift-network-diagnostics/network-check-target-lc8sm:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'Get "https://api-int.ci-op-zbk23sg5-7dd68.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/openshift-network-diagnostics/pods/network-check-target-lc8sm": dial tcp 10.0.200.16:6443: connect: connection refused ' ns/openshift-network-diagnostics pod/network-check-target-lc8sm node/ip-10-0-156-130.us-west-2.compute.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-check-target-lc8sm_openshift-network-diagnostics_e78b6504-e78e-4588-8d46-36fe1aa65ded_0(d880c73f032d9cd54421dddd271a359501dc5d78183ff0fe42ca903a6960ecb0): Multus: [openshift-network-diagnostics/network-check-target-lc8sm]: error getting pod: Get "https://[api-int.ci-op-zbk23sg5-7dd68.origin-ci-int-aws.dev.rhcloud.com]:6443/api/v1/namespaces/openshift-network-diagnostics/pods/network-check-target-lc8sm?timeout=1m0s": dial tcp 10.0.200.16:6443: connect: connection refused ns/e2e-k8s-sig-apps-daemonset-upgrade-480 pod/ds1-52nj5 node/ip-10-0-146-52.us-west-2.compute.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ds1-52nj5_e2e-k8s-sig-apps-daemonset-upgrade-480_16b8af98-5804-4d28-9d51-83a3e85b1885_0(d8ed34679aba0696456ee3f58888d28a5c8dfc143d1323f8e147d87fc7c6d9e7): Multus: [e2e-k8s-sig-apps-daemonset-upgrade-480/ds1-52nj5]: error getting pod: Get "https://[api-int.ci-op-zbk23sg5-7dd68.origin-ci-int-aws.dev.rhcloud.com]:6443/api/v1/namespaces/e2e-k8s-sig-apps-daemonset-upgrade-480/pods/ds1-52nj5?timeout=1m0s": dial tcp 10.0.147.9:6443: connect: connection refused and: [sig-network] pods should successfully create sandboxes by getting pod 0s 1 failures to create the sandbox ns/openshift-network-diagnostics pod/network-check-target-vnj96 node/ip-10-0-204-38.us-west-2.compute.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-check-target-vnj96_openshift-network-diagnostics_bce7e215-af5e-4a2a-9106-584a15015f10_0(80916235c75a63456346be7c2a297a7ab7ad86157d404f269cda41b3fde360fd): Multus: [openshift-network-diagnostics/network-check-target-vnj96]: error getting pod: pods "network-check-target-vnj96" is forbidden: User "system:serviceaccount:openshift-multus:multus" cannot get resource "pods" in API group "" in the namespace "openshift-network-diagnostics": RBAC: [clusterrole.rbac.authorization.k8s.io "multus" not found, clusterrole.rbac.authorization.k8s.io "system:basic-user" not found, clusterrole.rbac.authorization.k8s.io "system:build-strategy-jenkinspipeline" not found, clusterrole.rbac.authorization.k8s.io "system:oauth-token-deleter" not found, clusterrole.rbac.authorization.k8s.io "system:build-strategy-docker" not found, clusterrole.rbac.authorization.k8s.io "system:service-account-issuer-discovery" not found, clusterrole.rbac.authorization.k8s.io "self-access-reviewer" not found, clusterrole.rbac.authorization.k8s.io "system:scope-impersonation" not found, clusterrole.rbac.authorization.k8s.io "system:openshift:public-info-viewer" not found, clusterrole.rbac.authorization.k8s.io "helm-chartrepos-viewer" not found, clusterrole.rbac.authorization.k8s.io "whereabouts-cni" not found, clusterrole.rbac.authorization.k8s.io "system:openshift:discovery" not found, clusterrole.rbac.authorization.k8s.io "basic-user" not found, clusterrole.rbac.authorization.k8s.io "cluster-status" not found, clusterrole.rbac.authorization.k8s.io "system:webhook" not found, clusterrole.rbac.authorization.k8s.io "system:discovery" not found, clusterrole.rbac.authorization.k8s.io "system:public-info-viewer" not found, clusterrole.rbac.authorization.k8s.io "multus-admission-controller-webhook" not found, clusterrole.rbac.authorization.k8s.io "system:build-strategy-source" not found, clusterrole.rbac.authorization.k8s.io "console-extensions-reader" not found] Setting high severity, because green 4.7 -> 4.8 updates are important, and something that we want very solid by the time 4.8 GAs. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1908378#c30 [3]: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1380591032978116608
*** This bug has been marked as a duplicate of bug 1927264 ***
Reopening. Bug 1927264 is now VERIFIED, with the referenced PR landing in master 11 days ago [1]. But "pods should successfully create sandboxes by other" is still wildly popular in CI, so whatever bug 1927264 fixed, there's certainly still more to go: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=pods+should+successfully+create+sandboxes+by+other' | grep '4\.[89].*failures match' | sort periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-azure-compact (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 8 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade-rollback (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 2 runs, 50% failed, 100% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 2 runs, 50% failed, 100% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 3 runs, 67% failed, 100% of failures match = 67% impact periodic-ci-openshift-release-master-ci-4.9-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 13 runs, 69% failed, 44% of failures match = 31% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-compact-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade (all) - 14 runs, 100% failed, 57% of failures match = 57% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade (all) - 13 runs, 92% failed, 75% of failures match = 69% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-azure-ovn-upgrade (all) - 4 runs, 50% failed, 100% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade (all) - 4 runs, 75% failed, 67% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade (all) - 3 runs, 67% failed, 100% of failures match = 67% impact periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.9-e2e-gcp-csi (all) - 2 runs, 50% failed, 100% of failures match = 50% impact periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-metal-ipi-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-okd-4.8-e2e-vsphere (all) - 6 runs, 100% failed, 50% of failures match = 50% impact pull-ci-openshift-machine-api-operator-release-4.8-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact pull-ci-openshift-machine-api-operator-release-4.8-e2e-metal-ipi-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact pull-ci-openshift-origin-release-4.8-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact pull-ci-openshift-ovn-kubernetes-master-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 7 runs, 100% failed, 29% of failures match = 29% impact rehearse-15939-periodic-ci-openshift-release-master-ci-4.9-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-15939-periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-15939-periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.7-e2e-aws-upgrade-paused (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-15939-periodic-ci-openshift-release-master-stable-4.8-upgrade-from-stable-4.6-e2e-aws-upgrade-paused (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-17730-pull-ci-openshift-installer-release-4.9-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-17730-pull-ci-openshift-installer-release-4.9-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-18877-periodic-ci-openshift-release-master-okd-4.8-upgrade-from-4.7-e2e-upgrade-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-19228-periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade (all) - 3 runs, 67% failed, 50% of failures match = 33% impact rehearse-19239-periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-single-node (all) - 9 runs, 44% failed, 25% of failures match = 11% impact rehearse-19285-periodic-ci-openshift-release-master-ci-4.9-e2e-aws-upgrade-single-node (all) - 3 runs, 33% failed, 100% of failures match = 33% impact rehearse-19285-periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade (all) - 3 runs, 33% failed, 100% of failures match = 33% impact rehearse-19285-pull-ci-openshift-installer-release-4.9-e2e-aws-upgrade (all) - 4 runs, 25% failed, 100% of failures match = 25% impact release-openshift-ocp-installer-e2e-azure-ovn-4.9 (all) - 3 runs, 67% failed, 50% of failures match = 33% impact release-openshift-ocp-installer-upgrade-remote-libvirt-ppc64le-4.7-to-4.8 (all) - 2 runs, 100% failed, 50% of failures match = 50% impact release-openshift-ocp-installer-upgrade-remote-libvirt-s390x-4.7-to-4.8 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact release-openshift-origin-installer-old-rhcos-e2e-aws-4.9 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact [1]: https://github.com/openshift/multus-cni/pull/101#event-4844582487
Reasonable amount of those from recent releases seem to look like [1]: ns/openshift-multus pod/network-metrics-daemon-zx2pz node/ci-op-bwcbtfmb-25656-9n58p-master-1 - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-metrics-daemon-zx2pz_openshift-multus_968336b3-1fef-4098-8e2d-f37b3cbee8f7_0(6ea40a13af26babba135f17a209ba100ffcb534ff174da10eb569f8a045c36ac): Multus: [openshift-multus/network-metrics-daemon-zx2pz]: error getting pod: Unauthorized ns/openshift-multus pod/network-metrics-daemon-rjhfv node/ci-op-bwcbtfmb-25656-9n58p-worker-eastus3-8ssnx - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-metrics-daemon-rjhfv_openshift-multus_7342244f-1581-4bd3-b6f1-25d013cc4e34_0(fb86f0f60c86921d1dda1dc977336fbc6a93eec6c03da3e3ee59c6c4a2a991a5): Multus: [openshift-multus/network-metrics-daemon-rjhfv]: error getting pod: Unauthorized Searching: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=network-metrics-daemon.*never+deleted.*reason/FailedCreatePodSandBox.*failed+to+create+pod+network+sandbox.*error+getting+pod:+Unauthorized' | grep 'failures match' | grep -v 'pull-ci-\|rehearse-' | sort periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.9-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 13 runs, 69% failed, 22% of failures match = 15% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade (all) - 14 runs, 100% failed, 43% of failures match = 43% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade (all) - 13 runs, 92% failed, 58% of failures match = 54% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-azure-ovn-upgrade (all) - 4 runs, 50% failed, 100% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade (all) - 3 runs, 67% failed, 100% of failures match = 67% impact periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact release-openshift-okd-installer-e2e-aws-upgrade (all) - 7 runs, 100% failed, 29% of failures match = 29% impact release-openshift-origin-installer-old-rhcos-e2e-aws-4.9 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact But I guess that's not 4.7 -> 4.8, so I'll spin it off into a new bug. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade/1404639671656386560
Perhaps the same as https://bugzilla.redhat.com/show_bug.cgi?id=1972167 ?
Bug 1972167 seems to manifest as "error getting pod: Unauthorized", and there are a other few bugs in that space, including bug 1972490. But checking on the 4.7->4.8 updates: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=pods+should+successfully+create+sandboxes+by+other' | grep '4.8-upgrade-from.*4.7.*fai lures match' | sort periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 8 runs, 100% failed, 88% of failures match = 88% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 3 runs, 67% failed, 50% of failures match = 33% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 5 runs, 100% failed, 80% of failures match = 80% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 3 runs, 67% failed, 100% of failures match = 67% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 3 runs, 100% failed, 67% of failures match = 67% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-vsphere-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 4 runs, 75% failed, 33% of failures match = 25% impact pull-ci-openshift-ovn-kubernetes-master-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact Finding a job: $ curl -s 'https://search.ci.openshift.org/search?maxAge=24h&type=junit&search=pods+should+successfully+create+sandboxes+by+other&name=periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade' | jq -r 'keys[]' https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1405188888481239040 That job has: : [sig-network] pods should successfully create sandboxes by other 0s 1 failures to create the sandbox ns/openshift-network-diagnostics pod/network-check-target-87lq2 node/ip-10-0-222-37.ec2.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_network-check-target-87lq2_openshift-network-diagnostics_dac0da1d-6e7a-40c9-bd1c-92d2e4076d02_0": error locating item named "manifest-sha256:fa0f2cad0e8d907a10bf91b2fe234659495a694235a9e2ef7015eb450ce9f1ba" for image with ID "c8420102ec4009c486f7a4085fb574c2cc68b6047e871c1206b29b775d6c0a34": file does not exist Checking to see how common that symptom is: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=FailedCreatePodSandBox.*file+does+not+exist' | grep 'failures match' | sort periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade (all) - 14 runs, 93% failed, 8% of failures match = 7% impact periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 5 runs, 80% failed, 25% of failures match = 20% impact periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-upgrade (all) - 8 runs, 75% failed, 17% of failures match = 13% impact So not everything. Let's move over to GCP: $ curl -s 'https://search.ci.openshift.org/search?maxAge=24h&type=junit&search=pods+should+successfully+create+sandboxes+by+other&name=periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade' | jq -r 'keys[]' https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1404922286883999744 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1405215035801735168 The first of those has five like: ns/openshift-dns pod/dns-default-z4hp4 node/ci-op-xf1lrxzf-3b3f8-n9rlk-worker-b-6t5sq - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_dns-default-z4hp4_openshift-dns_09f88700-2071-4378-90a1-7c76ef21c3a7_0(19158e53eeb638a87c0ffdf885c4693970d820d669f9e3f3788254340d4d03a4): Multus: [openshift-dns/dns-default-z4hp4]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/80-openshift-network.conf. pollimmediate error: timed out waiting for the condition The second has a single: ns/openshift-kube-apiserver pod/revision-pruner-9-ci-op-zgwmkdxz-3b3f8-qzwml-master-1 node/ci-op-zgwmkdxz-3b3f8-qzwml-master-1 - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2021-06-16T19:04:58Z" level=error msg="container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: Unit crio-861e1151e82ecb87a527315b9c027c2130772a4405ed83824a2bb88ced40d277.scope not found." So I dunno if there's a consistent pattern. But we want 4.7->4.8 to be reliably green. Maybe just keep this open as an umbrella tracker, and circle back once we've fixed the other bugs around this test-case to see what's left?
Possibly helpful query for ranking the last bits from these error messages: $ curl -s 'https://search.ci.openshift.org/search?maxAge=24h&type=junit&context=0&search=reason/FailedCreatePodSandBox&name=4.8-upgrade-from-.*4.7' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's/.*://' | sort | uniq -c | sort -n 1 1 '[openshift-multus/network-metrics-daemon-bs9q8 2d5fc65c6b3329c9b87ed040562ad539d8e155c18d9938fe70998fba25f7fc07] [openshift-multus/network-metrics-daemon-bs9q8 2d5fc65c6b3329c9b87ed040562ad539d8e155c18d9938fe70998fba25f7fc07] timed out waiting for annotations 1 EOF 1 Unit crio-861e1151e82ecb87a527315b9c027c2130772a4405ed83824a2bb88ced40d277.scope not found." 1 client connection lost 1 connection refused 1 failed to find plugin "openshift-sdn" in path [/opt/multus/bin /var/lib/cni/bin /usr/libexec/cni] 1 file does not exist 1 pod "installer-7-ci-op-y4f9imhp-8929c-hh85g-master-0" not found 1 request timed out 1 image-puller" not found 1 scope-impersonation" not found, clusterrole.rbac.authorization.k8s.io "whereabouts-cni" not found, clusterrole.rbac.authorization.k8s.io "self-access-reviewer" not found] 3 timed out waiting for annotations 10 timed out waiting for the condition And ranking over all CI without limiting to 4.7->4.8: $ curl -s 'https://search.ci.openshift.org/search?maxAge=24h&type=junit&context=0&search=reason/FailedCreatePodSandBox' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's/.*://;s/\(pods\?\) "[^"]*" not found/\1 "..." not found/' | sort | uniq -c | sort -n | tail 8 connection refused 13 timed out waiting for annotations 15 15 'pods "..." not found 16 timed out waiting for OVS flows 21 timed out waiting for the condition 51 EOF 81 i/o timeout 126 pods "..." not found 154 Unauthorized
We currently believe this is a dupe of 1972167, and should be verified as such, or reopened if determined otherwise. *** This bug has been marked as a duplicate of bug 1972167 ***
4.7.19 -> 4.8.0-rc.2 failed on [1]: : [sig-network] pods should successfully create sandboxes by getting pod 0s 2 failures to create the sandbox ns/openshift-controller-manager pod/controller-manager-twwfk node/ip-10-0-161-230.ec2.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_controller-manager-twwfk_openshift-controller-manager_877f87bd-92a5-4599-b880-c15779979c7a_0(cac21b592d2559456968829fc58964a8e62994926b123126a9eab8bc8fb566b9): error adding pod openshift-controller-manager_controller-manager-twwfk to CNI network "multus-cni-network": Multus: [openshift-controller-manager/controller-manager-twwfk]: error getting pod: pods "controller-manager-twwfk" is forbidden: User "system:serviceaccount:openshift-multus:multus" cannot get resource "pods" in API group "" in the namespace "openshift-controller-manager": RBAC: [clusterrole.rbac.authorization.k8s.io "system:build-strategy-source" not found, clusterrole.rbac.authorization.k8s.io "system:scope-impersonation" not found, clusterrole.rbac.authorization.k8s.io "system:build-strategy-docker" not found, clusterrole.rbac.authorization.k8s.io "system:openshift:discovery" not found, clusterrole.rbac.authorization.k8s.io "system:build-strategy-jenkinspipeline" not found, clusterrole.rbac.authorization.k8s.io "console-extensions-reader" not found, clusterrole.rbac.authorization.k8s.io "whereabouts-cni" not found, clusterrole.rbac.authorization.k8s.io "system:openshift:public-info-viewer" not found, clusterrole.rbac.authorization.k8s.io "system:public-info-viewer" not found, clusterrole.rbac.authorization.k8s.io "multus" not found, clusterrole.rbac.authorization.k8s.io "system:oauth-token-deleter" not found, clusterrole.rbac.authorization.k8s.io "system:basic-user" not found, clusterrole.rbac.authorization.k8s.io "cluster-status" not found, clusterrole.rbac.authorization.k8s.io "multus-admission-controller-webhook" not found, clusterrole.rbac.authorization.k8s.io "system:discovery" not found, clusterrole.rbac.authorization.k8s.io "self-access-reviewer" not found, clusterrole.rbac.authorization.k8s.io "system:webhook" not found, clusterrole.rbac.authorization.k8s.io "helm-chartrepos-viewer" not found, clusterrole.rbac.authorization.k8s.io "basic-user" not found, clusterrole.rbac.authorization.k8s.io "system:service-account-issuer-discovery" not found] ns/openshift-network-diagnostics pod/network-check-target-27zht node/ip-10-0-161-230.ec2.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-check-target-27zht_openshift-network-diagnostics_ba96ba9c-a453-41ee-a262-162eb5284cab_0(b793f8042104005964acc38a8805a10842eb5fbc90d43efd041dc39a2fef82f3): error adding pod openshift-network-diagnostics_network-check-target-27zht to CNI network "multus-cni-network": Multus: [openshift-network-diagnostics/network-check-target-27zht]: error getting pod: pods "network-check-target-27zht" is forbidden: User "system:serviceaccount:openshift-multus:multus" cannot get resource "pods" in API group "" in the namespace "openshift-network-diagnostics": RBAC: [clusterrole.rbac.authorization.k8s.io "system:service-account-issuer-discovery" not found, clusterrole.rbac.authorization.k8s.io "system:scope-impersonation" not found, clusterrole.rbac.authorization.k8s.io "system:build-strategy-source" not found, clusterrole.rbac.authorization.k8s.io "system:build-strategy-docker" not found, clusterrole.rbac.authorization.k8s.io "system:openshift:discovery" not found, clusterrole.rbac.authorization.k8s.io "system:build-strategy-jenkinspipeline" not found, clusterrole.rbac.authorization.k8s.io "console-extensions-reader" not found, clusterrole.rbac.authorization.k8s.io "whereabouts-cni" not found, clusterrole.rbac.authorization.k8s.io "system:openshift:public-info-viewer" not found, clusterrole.rbac.authorization.k8s.io "system:public-info-viewer" not found, clusterrole.rbac.authorization.k8s.io "multus" not found, clusterrole.rbac.authorization.k8s.io "system:oauth-token-deleter" not found, clusterrole.rbac.authorization.k8s.io "system:basic-user" not found, clusterrole.rbac.authorization.k8s.io "cluster-status" not found, clusterrole.rbac.authorization.k8s.io "multus-admission-controller-webhook" not found, clusterrole.rbac.authorization.k8s.io "system:discovery" not found, clusterrole.rbac.authorization.k8s.io "self-access-reviewer" not found, clusterrole.rbac.authorization.k8s.io "system:webhook" not found, clusterrole.rbac.authorization.k8s.io "helm-chartrepos-viewer" not found, clusterrole.rbac.authorization.k8s.io "basic-user" not found] Which is very similar to my c0 here, despite bug 1972167 being VERIFIED in 4.8 for a while. I'm moving this over to claim it as a dup of bug 1972490. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1410786292060393472 *** This bug has been marked as a duplicate of bug 1972490 ***