Description of problem: We get into this state when CNI times out on GCP. There are a couple of other CNI related fixes going in. 540 238806:Apr 21 05:10:24.645254 ci-op-prmz6-m-2.c.openshift-gce-devel-ci.internal crio[1325]: time="2020-04-21 05:10:24.644532275Z" level=info msg="Removed container 2dc97bea64ed280d94189461528946dfa0366d5cfae 5ff5f3aa5ca65bf54b835: openshift-apiserver/apiserver-5b9677479-z84j7/fix-audit-permissions" id=24bde45f-e983-4b0a-9009-e7d29e25f782 541 238807:Apr 21 05:10:24.647175 ci-op-prmz6-m-2.c.openshift-gce-devel-ci.internal crio[1325]: time="2020-04-21 05:10:24.647072805Z" level=info msg="attempting to run pod sandbox with infra container: openshift -apiserver/apiserver-5b9677479-z84j7/POD" id=01e8cbfe-7ec5-4385-9186-f0f9831ab297 542 238828:Apr 21 05:10:24.913845 ci-op-prmz6-m-2.c.openshift-gce-devel-ci.internal hyperkube[1424]: I0421 05:10:24.913201 1424 manager.go:950] Added container: "/kubepods.slice/kubepods-burstable.slice/kubep ods-burstable-pod8d047e85_f52d_42e9_bd7d_4d2626bd52c8.slice/crio-aeb8e6443733adfa7bf944bfe5b777915a1cbb86d290c71704f3beac5c2f84a9.scope" (aliases: [k8s_POD_apiserver-5b9677479-z84j7_openshift-apiserver_8d047 e85-f52d-42e9-bd7d-4d2626bd52c8_0 aeb8e6443733adfa7bf944bfe5b777915a1cbb86d290c71704f3beac5c2f84a9], namespace: "crio") 543 238898:Apr 21 05:10:25.410752 ci-op-prmz6-m-2.c.openshift-gce-devel-ci.internal hyperkube[1424]: E0421 05:10:25.410720 1424 kuberuntime_manager.go:955] getPodContainerStatuses for pod "apiserver-5b9677479 -z84j7_openshift-apiserver(8d047e85-f52d-42e9-bd7d-4d2626bd52c8)" failed: rpc error: code = Unknown desc = container with ID starting with 2dc97bea64ed280d94189461528946dfa0366d5cfae5ff5f3aa5ca65bf54b835 not found: ID does not exist Version-Release number of selected component (if applicable): 4.5 How reproducible: related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1826329 CI job: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.4/285 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I'm seeing an issue where crictl shows that a pod has exited but the kubelet hasn't realised and openshift still thinks it's running. Will attach logs. The pod in question is oauth-openshift-7fd76bfc9-vv47t. It lived around 2020/04/23 04:43:26 - 04:43:34 UTC. At 04:43:27 the kubelet shows SyncLoop (PLEG) ContainerStarted *and* ContainerDied messages. Some error in cri-o which is causing it to signal ContainerDied too early? Apr 23 04:43:27 aro-master-1 hyperkube[3632442]: I0423 04:43:27.659181 3632442 kubelet.go:1953] SyncLoop (PLEG): "oauth-openshift-7fd76bfc9-vv47t_openshift-authentication(f9b3ca89-25c9-4852-af2b-4f7abf890a04)", event: &pleg.PodLifecycleEvent{ID:"f9b3ca89-25c9-4852-af2b-4f7abf890a04", Type:"ContainerDied", Data:"cf93a4b04ab3796b028640cca6c39baaf294152387f1bf75de13995b969e8906"} Apr 23 04:43:27 aro-master-1 hyperkube[3632442]: I0423 04:43:27.659472 3632442 kubelet.go:1953] SyncLoop (PLEG): "oauth-openshift-7fd76bfc9-vv47t_openshift-authentication(f9b3ca89-25c9-4852-af2b-4f7abf890a04)", event: &pleg.PodLifecycleEvent{ID:"f9b3ca89-25c9-4852-af2b-4f7abf890a04", Type:"ContainerStarted", Data:"2d805ae593298f6ab7af88544db5fa1b35bf6aba05f3d6d415203ca9b866ef6c"} crictl inspect 2d805ae593298 | grep finishedAt "finishedAt": "2020-04-23T04:43:33.401990767Z", cri-o logs don't seem to show much.
^ the above on 4.3.10.
Created attachment 1681191 [details] output of crictl inspect
Created attachment 1681192 [details] crio logs from journal
Created attachment 1681193 [details] kubelet log snippet
[root@aro-master-1 ~]# rpm -qa |grep ^cri- cri-o-1.16.4-1.dev.rhaos4.3.git9238eee.el8.x86_64 cri-tools-1.14.0-2.rhaos4.2.el8.x86_64
Is this a dup of bug 1819906?
Yes. Thanks Trevor! *** This bug has been marked as a duplicate of bug 1819906 ***