Bug 1965059

Summary:	container stuck in "container creating", failed install
Product:	OpenShift Container Platform	Reporter:	Ben Parees <bparees>
Component:	Node	Assignee:	Ryan Phillips <rphillips>
Node sub component:	Kubelet	QA Contact:	MinLi <minmli>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	low
Priority:	low	CC:	aos-bugs, wking
Version:	4.8
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-08-06 13:11:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ben Parees 2021-05-26 17:17:19 UTC

Description of problem:
The install in this job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1397380367442251776

failed because the kube-scheduler pod's container couldn't start on at least one node:

Operator degraded (InstallerPodContainerWaiting_ContainerCreating): InstallerPodContainerWaitingDegraded: Pod "installer-4-ip-10-0-145-161.us-west-2.compute.internal" on node "ip-10-0-145-161.us-west-2.compute.internal" container "installer" is waiting since 2021-05-26 02:51:52 +0000 UTC because ContainerCreating



Version-Release number of selected component (if applicable):
registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-05-25-223219


How reproducible:
rare

Actual results:
pod has a container stuck in container create

Expected results:
container starts successfully


Additional info:
presumably this is an issue w/ the particular node this pod landed on, since pods elsewhere started ok

I did not dig into whether any other pods started successfully on this particular node.

Comment 1 Peter Hunt 2021-05-27 19:33:01 UTC

hm this looks like a quick API add/delete:

```
02:51:52.159376    1324 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal]
...
02:51:56.761526    1324 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal]
...
02:51:59.450534    1324 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF" pod="openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal"
```

another piece of note is the kubelet then forever attempts to open the logs and fails:
```
grep -r 'Unable to fetch pod log stats.*installer-4' | wc -l
1185
```

Comment 3 W. Trevor King 2021-06-12 20:35:15 UTC

Should this be folded into bug 1952224?  My bug 1960772 was also about "unable to fetch pod log stats", and was closed as a dup of bug 1952224.

Comment 4 Ryan Phillips 2021-08-06 13:11:55 UTC


*** This bug has been marked as a duplicate of bug 1952224 ***