1965059 – container stuck in "container creating", failed install

Bug 1965059 - container stuck in "container creating", failed install

Summary: container stuck in "container creating", failed install

Keywords:
Status:	CLOSED DUPLICATE of bug 1952224
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Ryan Phillips
QA Contact:	MinLi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-26 17:17 UTC by Ben Parees
Modified:	2021-08-06 13:11 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-06 13:11:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ben Parees 2021-05-26 17:17:19 UTC

Description of problem:
The install in this job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1397380367442251776

failed because the kube-scheduler pod's container couldn't start on at least one node:

Operator degraded (InstallerPodContainerWaiting_ContainerCreating): InstallerPodContainerWaitingDegraded: Pod "installer-4-ip-10-0-145-161.us-west-2.compute.internal" on node "ip-10-0-145-161.us-west-2.compute.internal" container "installer" is waiting since 2021-05-26 02:51:52 +0000 UTC because ContainerCreating



Version-Release number of selected component (if applicable):
registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-05-25-223219


How reproducible:
rare

Actual results:
pod has a container stuck in container create

Expected results:
container starts successfully


Additional info:
presumably this is an issue w/ the particular node this pod landed on, since pods elsewhere started ok

I did not dig into whether any other pods started successfully on this particular node.

Comment 1 Peter Hunt 2021-05-27 19:33:01 UTC

hm this looks like a quick API add/delete:

```
02:51:52.159376    1324 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal]
...
02:51:56.761526    1324 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal]
...
02:51:59.450534    1324 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF" pod="openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal"
```

another piece of note is the kubelet then forever attempts to open the logs and fails:
```
grep -r 'Unable to fetch pod log stats.*installer-4' | wc -l
1185
```

Comment 3 W. Trevor King 2021-06-12 20:35:15 UTC

Should this be folded into bug 1952224?  My bug 1960772 was also about "unable to fetch pod log stats", and was closed as a dup of bug 1952224.

Comment 4 Ryan Phillips 2021-08-06 13:11:55 UTC


*** This bug has been marked as a duplicate of bug 1952224 ***

Note You need to log in before you can comment on or make changes to this bug.