Bug 1908378

Summary:	[sig-network] pods should successfully create sandboxes by getting pod - Static Pod Failures
Product:	OpenShift Container Platform	Reporter:	Tim Rozet <trozet>
Component:	Node	Assignee:	Elana Hashman <ehashman>
Node sub component:	Kubelet	QA Contact:	Weinan Liu <weinliu>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	urgent
Priority:	high	CC:	anbhat, anusaxen, aos-bugs, bbennett, deads, ecordell, ehashman, emoss, ffranz, fpaoline, gzaidman, jcallen, jchaloup, jerzhang, juzhao, kir, nagrawal, openshift-bugzilla-robot, rphillips, schoudha, tsweeney, weinliu, wking, xiyuan
Version:	4.7	Keywords:	Reopened
Target Milestone:	---	Flags:	ehashman: needinfo- ehashman: needinfo- ehashman: needinfo-
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Sometimes when pods are created and deleted rapidly, the pod sandbox does not have a chance to finish creation before the pod starts deletion. Consequence: "CreatePodSandbox" will throw an error. Fix: We ignore this error if the pod is terminating. Result: Pod termination will no longer fail when CreatePodSandbox did not complete successfully during pod deletion.	Story Points:	---
Clone Of:	1886922
Clones:	1929674 (view as bug list)		Environment:	[sig-network] pods should successfully create sandboxes by getting pod [sig-network] pods should successfully create sandboxes by other [sig-network] pods should successfully create sandboxes by reading container [sig-network] pods should successfully create sandboxes by writing network status
Last Closed:	2021-07-27 22:35:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1886922
Bug Blocks:	1919069

Description Tim Rozet 2020-12-16 14:55:23 UTC

+++ This bug was initially created as a clone of Bug #1886922 +++

test:
[sig-network] pods should successfully create sandboxes by getting pod 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network%5C%5D+pods+should+successfully+create+sandboxes+by+getting+pod


FIXME: Replace this paragraph with a particular job URI from the search results to ground discussion.  A given test may fail for several reasons, and this bug should be scoped to one of those reasons.  Ideally you'd pick a job showing the most-common reason, but since that's hard to determine, you may also chose to pick a job at random.  Release-gating jobs (release-openshift-...) should be preferred over presubmits (pull-ci-...) because they are closer to the released product and less likely to have in-flight code changes that complicate analysis.

FIXME: Provide a snippet of the test failure or error from the job log

--- Additional comment from Jan Chaloupka on 2020-10-12 12:27:34 UTC ---

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-ovn-upgrade-4.5-stable-to-4.6-ci/1315588345421631488

kubelet logs from masters contain:
```
Oct 12 11:06:54.725413 ci-op-smi1zdxn-a7421-k5w9w-master-0 hyperkube[1536]: I1012 11:06:54.725392    1536 kubelet.go:1680] Creating a mirror pod for static pod "recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0_openshift-infra(edb4583065f51a0ee864e663b2289f3a)"
Oct 12 11:06:54.737018 ci-op-smi1zdxn-a7421-k5w9w-master-0 hyperkube[1536]: E1012 11:06:54.736316    1536 kubelet.go:1682] Failed creating a mirror pod for "recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0_openshift-infra(edb4583065f51a0ee864e663b2289f3a)": pods "recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0" is forbidden: a mirror pod may not reference service accounts
...
Oct 12 11:10:40.771726 ci-op-smi1zdxn-a7421-k5w9w-master-0 hyperkube[1536]: E1012 11:10:40.771514    1536 pod_workers.go:191] Error syncing pod f9f51de44362b05394788110d92dcade ("recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0_openshift-infra(f9f51de44362b05394788110d92dcade)"), skipping: failed to "CreatePodSandbox" for "recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0_openshift-infra(f9f51de44362b05394788110d92dcade)" with CreatePodSandboxError: "CreatePodSandbox for pod \"recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0_openshift-infra(f9f51de44362b05394788110d92dcade)\" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0_openshift-infra_f9f51de44362b05394788110d92dcade_0(ba794b471acdc719df6c91675df75d1a70ebebaf90e6b6513873359b50769c13): Multus: [openshift-infra/recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0]: error getting pod: pods \"recyler-pod-ci-op-smi1zdxn-a7421-k5w9w-master-0\" not found"
...
```

--- Additional comment from Jan Chaloupka on 2020-10-12 12:35:38 UTC ---

There are other "similar" tests failing which might be caused by the same underlying issue:
- [sig-network] pods should successfully create sandboxes by other
- [sig-network] pods should successfully create sandboxes by reading container
- [sig-network] pods should successfully create sandboxes by writing network status

--- Additional comment from Jan Chaloupka on 2020-10-12 12:38:13 UTC ---

- [sig-network] pods should successfully create sandboxes by writing child

--- Additional comment from Aniket Bhat on 2020-10-12 14:41:21 UTC ---

Assigning to Tim.

--- Additional comment from Eric Paris on 2020-10-12 15:00:26 UTC ---

This bug has set a target release without specifying a severity. As part of triage when determining the importance of bugs a severity should be specified. Since these bugs have not been properly triaged we are removing the target release. Teams will need to add a severity before setting the target release again.

--- Additional comment from Tim Rozet on 2020-10-12 19:25:33 UTC ---

There looks to be a problem with the test itself that it categorizes "error getting pod: pods "pod1" not found" as a failure, when really that is usually because a pod was deleted right after it was created, so the sandbox creation cant find it anymore. An example here:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/301/pull-ci-openshift-ovn-kubernetes-master-e2e-ovn-hybrid-step-registry/1315650856250511360

ns/e2e-gc-8508 pod/pod1 node/ip-10-0-186-62.us-west-2.compute.internal - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_pod1_e2e-gc-8508_ef0dae73-c25e-4216-a1ab-3a0d06f1c5d4_0(ef9f1f04d668e4b76df25d24b6816f3dce2d44ebef00ff523caf2623e092af3e): Multus: [e2e-gc-8508/pod1]: error getting pod: pods "pod1" not found

I1012 14:33:05.508991       1 scheduler.go:597] "Successfully bound pod to node" pod="e2e-gc-8508/pod1" node="ip-10-0-186-62.us-west-2.compute.internal" evaluatedNodes=6 feasibleNodes=3

ovnkube adds the port, then sees a delete 1 second later:
I1012 14:33:06.305650       1 pods.go:78] Deleting pod: e2e-gc-8508/pod1

We should fix the test to remove false positives, then recheck the CI afterwards. David, can you create a bug/fix for this test and link it to this bug please?

--- Additional comment from Alexander Constantinescu on 2020-10-26 15:33:00 UTC ---



--- Additional comment from Emily Moss on 2020-11-07 00:01:05 UTC ---

Build watcher saw this today on 4.7 ci

--- Additional comment from Tim Rozet on 2020-11-09 21:17:07 UTC ---

David updated the test case:
https://github.com/openshift/origin/commit/06a30a752a2362ca60ae17d584565d3ce92e14e0#diff-a8882c52a9bd2ef6a4b8294fb52ec3342d37aa809581a438c3fabd02bcc63467


and now I see some more legit failures that need to be investigated:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/317/pull-ci-openshift-ovn-kubernetes-master-e2e-ovn-hybrid-step-registry/1325860268961435648/

23.80 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webserver-57656d96fb-8ckmc_e2e-deployment-9222_7312d362-35aa-404a-ab7a-dca63ab41d1e_0(fa2324e91cfb4a05e3773d9ec7db826fe4781574eca6879c02bf88833317df68): [e2e-deployment-9222/webserver-57656d96fb-8ckmc:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-deployment-9222/webserver-57656d96fb-8ckmc] failed to configure pod interface: timed out waiting for pod flows for pod: webserver-57656d96fb-8ckmc, error: timed out waiting for the condition

I see in ovnkube-master:
I1109 19:00:35.952244       1 informer.go:294] Successfully synced 'e2e-deployment-9222/webserver-57656d96fb-8ckmc'
I1109 19:00:36.082358       1 pods.go:272] [e2e-deployment-9222/webserver-57656d96fb-8ckmc] addLogicalPort took 90.186464ms
I1109 19:00:40.313403       1 pods.go:78] Deleting pod: e2e-deployment-9222/webserver-57656d96fb-8ckmc
I1109 19:00:40.313416       1 informer.go:294] Successfully synced 'e2e-deployment-9222/webserver-57656d96fb-8ckmc'


The pod was added and deleted in less than 5 seconds, but we see on the CNI side:

I1109 19:00:39.783451    3500 cniserver.go:147] Waiting for ADD result for pod e2e-deployment-9222/webserver-57656d96fb-8ckmc
I1109 19:00:39.783475    3500 cni.go:147] [e2e-deployment-9222/webserver-57656d96fb-8ckmc] dispatching pod network request &{ADD e2e-deployment-9222 webserver-57656d96fb-8ckmc fa2324e91cfb4a05e3773d9ec7db826fe4781574eca6879c02bf88833317df68 /var/run/netns/108d17e4-4686-4d3e-8bac-b713a047cc5e eth0 0xc00067ff00}
I1109 19:00:41.585640    3500 informer.go:294] Successfully synced 'e2e-deployment-9222/webserver-57656d96fb-8ckmc'

I1109 19:01:01.524480    3500 cni.go:157] [e2e-deployment-9222/webserver-85f6b8449-f9zxm] CNI request &{DEL e2e-deployment-9222 webserver-85f6b8449-f9zxm ed235ef6f42896a855c0bee6467a988b0775f0d28f089440b628fad8974a1606 /var/run/netns/1be4344f-79f1-4493-ad2f-8d2f092c052a eth0 0xc000646900}, result "", err <nil>
I1109 19:01:01.813063    3500 cni.go:157] [e2e-deployment-9222/webserver-57656d96fb-8ckmc] CNI request &{ADD e2e-deployment-9222 webserver-57656d96fb-8ckmc fa2324e91cfb4a05e3773d9ec7db826fe4781574eca6879c02bf88833317df68 /var/run/netns/108d17e4-4686-4d3e-8bac-b713a047cc5e eth0 0xc00067ff00}, result "", err failed to configure pod interface: timed out waiting for pod flows for pod: webserver-57656d96fb-8ckmc, error: timed out waiting for the condition

For some reason the delete came in 23 seconds later...investigating

--- Additional comment from Tim Rozet on 2020-11-11 14:58:35 UTC ---

So it turns out in OVN we could have a case where we have a series of events like this:
1. pod add happens (ovnkube-node is waiting for annnotation from ovnkube-master)
2. ovnkube-master annotates pod
3. ovnkube-node begins cmdAdd CNI
4. pod delete happens (ovnkube-master deletes the LSP)
5. ovnkube-node in cmdAdd is stuck waiting for 20 seconds for openflow flow to show up for the now deleted pod

This results in a failed to add pod time of over 20 seconds. Really we should be able in ovnkube-node to detect the pod was deleted somehow and then cancel the waiting for flows. Similarly we also wait for annotation for some time, we could cancel the wait there as well.

--- Additional comment from Fabiano Franz on 2020-11-11 16:54:57 UTC ---

Correlated

- [sig-network] pods should successfully create sandboxes by not timing out

--- Additional comment from Antonio Ojea on 2020-11-13 13:54:25 UTC ---



--- Additional comment from  on 2020-11-27 07:47:09 UTC ---

Still happens frequently in the last 24 hours:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#release-openshift-ocp-installer-e2e-aws-mirrors-4.7
e.g.: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-mirrors-4.7/1332044148731023360

--- Additional comment from Tim Rozet on 2020-12-16 02:09:47 UTC ---

The new failures look like they have nothing to do with ovn-kubernetes. I can see this happens:
Dec 12 05:55:13.781773 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:13.781182    1683 kubelet.go:1834] SyncLoop (ADD, "api"): "pod-submit-status-2-4_e2e-pods-5892(756b44df-f705-445a-9e24-bf1e126b6c5a)"
Dec 12 05:55:13.807757 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:13.801436    1683 kubelet.go:1841] SyncLoop (UPDATE, "api"): "pod-submit-status-2-4_e2e-pods-5892(756b44df-f705-445a-9e24-bf1e126b6c5a)"

Dec 12 05:55:13.862733 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:13.862183    1683 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-dw6sj" (UniqueName: "kubernetes.io/secret/756b44df-f705-445a-9e24-bf1e126b6c5a-default-token-dw6sj") pod "pod-submit-status-2-4" (UID: "756b44df-f705-445a-9e24-bf1e126b6c5a")

Dec 12 05:55:14.045725 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:14.045445    1683 kubelet.go:1850] SyncLoop (DELETE, "api"): "pod-submit-status-2-4_e2e-pods-5892(756b44df-f705-445a-9e24-bf1e126b6c5a)"
Dec 12 05:55:14.058721 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:14.058498    1683 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-dw6sj" (UniqueName: "kubernetes.io/secret/756b44df-f705-445a-9e24-bf1e126b6c5a-default-token-dw6sj") pod "pod-submit-status-2-4" (UID: "756b44df-f705-445a-9e24-bf1e126b6c5a")
Dec 12 05:55:14.090579 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:14.089034    1683 kubelet.go:1850] SyncLoop (DELETE, "api"): "pod-submit-status-2-4_e2e-pods-5892(756b44df-f705-445a-9e24-bf1e126b6c5a)"
Dec 12 05:55:14.096983 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:14.096951    1683 kubelet.go:1844] SyncLoop (REMOVE, "api"): "pod-submit-status-2-4_e2e-pods-5892(756b44df-f705-445a-9e24-bf1e126b6c5a)"
Dec 12 05:55:14.100720 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:14.097149    1683 kubelet.go:2038] Failed to delete pod "pod-submit-status-2-4_e2e-pods-5892(756b44df-f705-445a-9e24-bf1e126b6c5a)", err: pod not found
Dec 12 05:55:14.299745 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:14.299337    1683 operation_generator.go:663] MountVolume.SetUp succeeded for volume "default-token-dw6sj" (UniqueName: "kubernetes.io/secret/756b44df-f705-445a-9e24-bf1e126b6c5a-default-token-dw6sj") pod "pod-submit-status-2-4" (UID: "756b44df-f705-445a-9e24-bf1e126b6c5a")
Dec 12 05:55:14.584995 ip-10-0-168-106 hyperkube[1683]: I1212 05:55:14.581377    1683 kuberuntime_manager.go:425] No sandbox for pod "pod-submit-status-2-4_e2e-pods-5892(756b44df-f705-445a-9e24-bf1e126b6c5a)" can be found. Need to start a new one
Dec 12 05:55:14.590742 ip-10-0-168-106 crio[1648]: time="2020-12-12 05:55:14.589448461Z" level=info msg="Running pod sandbox: e2e-pods-5892/pod-submit-status-2-4/POD" id=9eb340e4-0a7b-419b-a2b0-025802037287 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Dec 12 05:55:17.936748 ip-10-0-168-106 crio[1648]: time="2020-12-12 05:55:17.934805119Z" level=info msg="Got pod network &{Name:pod-submit-status-2-4 Namespace:e2e-pods-5892 ID:f42209c3922dcdfe5692971a9ac3fc47532efb7690b2fdd9b4e061a1efd3e2ef NetNS:/var/run/netns/cd7e6f4b-1650-45ab-ba60-3126086c6a32 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]}"
Dec 12 05:55:18.009733 ip-10-0-168-106 crio[1648]: time="2020-12-12 05:55:18.005337584Z" level=error msg="Error adding network: Multus: [e2e-pods-5892/pod-submit-status-2-4]: error getting pod: pods \"pod-submit-status-2-4\" not found"

tl;dr the pod is deleted right after it is created, and kube runtime still tries to make the sandbox anyway. At that point crio calls multus and multus cannot find the pod so it errors out. ovn-kubernetes never even gets to CNI ADD the pod. This looks like either expected behavior and the test needs to be tweaked or a bug in kubelet.

Comment 1 Jesus M. Rodriguez 2021-01-08 19:26:30 UTC

Seems to be cropping up again. 

https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere-upi/1347487017306427392
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/promote-release-openshift-machine-os-content-e2e-aws-4.7/1347586669322178560
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-fips-4.7/1347592202766782464

Comment 4 Russell Teague 2021-01-14 14:28:39 UTC

Build watcher:
This bug is linked to the top four failing jobs for release 4.7,
https://sippy.ci.openshift.org/?release=4.7#TopFailingTestsWithABug

Comment 6 Gal Zaidman 2021-01-19 16:38:25 UTC

On ovirt the:
[sig-network] pods should successfully create....
Tests are always failing, at least one of them each run.

I know they are ignored in terms of the job passing or not, but it is still annoying.
I saw this is the only bug that is open, but do we know why? is it a real problem that needs to be fixed or just a problem in the test?

For reference:
https://search.ci.openshift.org/?search=pods+should+successfully+create+sandboxes+&maxAge=336h&context=1&type=junit&name=release-openshift-ocp-installer-e2e-ovirt-4.7&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 7 Elana Hashman 2021-01-26 00:31:28 UTC

Some of those errors from CI in the last comment look like this: 

ns/e2e-pods-4806 pod/pod-submit-status-2-0 node/ovirt12-xw2l5-worker-0-9z6pb - 21.03 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF

Tracking that at https://github.com/kubernetes/kubernetes/issues/98142 / https://bugzilla.redhat.com/show_bug.cgi?id=1915085


That is different than the errors linked in the initial report, fwiw, which look like

ns/e2e-deployment-5578 pod/webserver-9569696c8-tgks7 node/ci-op-2wxd6sh9-56515-7jz4t-worker-6xxnz - 21.99 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_webserver-9569696c8-tgks7_e2e-deployment-5578_dbb1018e-5a39-4710-94bb-b193ae9718ec_0(c407d51fd9da8c86c345bd42c067a222ccdd2513ac48ff9b03309e26bd743204): [e2e-deployment-5578/webserver-9569696c8-tgks7:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'pods "webserver-9569696c8-tgks7" not found

Comment 8 Elana Hashman 2021-01-27 23:21:53 UTC

This is another test that involves a lot of pods being created and then deleted quickly. I suspect this is a manifestation of the race condition in #1915085. Will track there.

*** This bug has been marked as a duplicate of bug 1915085 ***

Comment 9 Federico Paolinelli 2021-02-17 09:20:51 UTC

I see this was closed as duplicate, but https://bugzilla.redhat.com/show_bug.cgi?id=1915085 is verified and CI is still failing (see for example https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-ovn-4.7/1361939592273465344 )

Reopening the bz

Comment 25 Weinan Liu 2021-03-23 09:15:56 UTC

Following comment#24, creating and deleting a static pod prior to start work without errors.
4.8.0-0.nightly-2021-03-22-104536

Comment 26 Kir Kolyshkin 2021-03-26 21:09:43 UTC

Should this be closed now?

Comment 27 Tim Rozet 2021-03-30 15:38:37 UTC

I see in the linked gcp-rt job that there is a long amount of time before kube runtime tries to create the sandbox:

Mar 30 09:14:49.322899 ci-op-6y6zi1g1-134e7-dc6xh-worker-d-whddz hyperkube[1554]: I0330 09:14:49.322891    1554 kubelet.go:1920] SyncLoop (ADD, "api"): "webserver-fc8b59899-w2tvg_e2e-deployment-427(d5e58a4b-c3ac-4bf4-b6b0-13e8cb4142bf)"
Mar 30 09:15:09.297827 ci-op-6y6zi1g1-134e7-dc6xh-worker-d-whddz crio[1524]: time="2021-03-30 09:15:09.297415658Z" level=info msg="Running pod sandbox: e2e-deployment-427/webserver-fc8b59899-w2tvg/POD" id=2d15b70d-461a-46ec-8a99-ee0e9d9a6249 
Mar 30 09:15:09.574818 ci-op-6y6zi1g1-134e7-dc6xh-worker-d-whddz crio[1524]: time="2021-03-30 09:15:09.574201273Z" level=error msg="Error adding network: Multus: [e2e-deployment-427/webserver-fc8b59899-w2tvg]: error getting pod: pods \"webserver-fc8b59899-w2tvg\" not found"

The question is why does it take kube runtime so long to start the sandbox. Ryan took a look at this with me and we think there is some type of network outage trying to get to the API service. Either way the gcp-rt job looks pretty unstable with a lot of failures and on the other gcp nightly job I dont see the failure. I think it's OK to close the bug for now and if we see it again reopen.

Comment 28 Ryan Phillips 2021-03-30 18:00:34 UTC

*** Bug 1927102 has been marked as a duplicate of this bug. ***

Comment 29 Yu Qi Zhang 2021-03-31 18:02:59 UTC

Checking some upgrade jobs between 4.7 to 4.8 today I see a lot of 

[sig-network] pods should successfully create sandboxes by other 

Which linked to this BZ.

Recent examples:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade/1377201785415929856
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1377172439389179904
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1377172439347236864
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1377218187266887680
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1377257801742553088
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1377233268348293120

Reopening with higher priority, is this the same issue being observed?

Comment 30 Elana Hashman 2021-04-05 17:23:47 UTC

Hi Yu Qi Zhang,

I just looked at the CI runs linked and none of the pods affected are static pods. Updating the title to clarify.

Comment 31 Yu Qi Zhang 2021-04-07 03:31:38 UTC

Hi,

Thanks for clarifying. I see you've also moved this to verified. Is that associated with any PRs merging or a bump in upstream kube?

Comment 32 Elana Hashman 2021-04-07 16:47:38 UTC

Hi Yu Qi,

Yes, see the linked PRs. This BZ was previously in VERIFIED when you moved it back to NEW. I returned it to its previous status.

Comment 33 W. Trevor King 2021-04-09 22:27:33 UTC

I've spun out a new bug 1948066 for Jerry's 4.7->4.8 update issue from comment 29.

Comment 36 errata-xmlrpc 2021-07-27 22:35:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438