Bug 1965059 - container stuck in "container creating", failed install
Summary: container stuck in "container creating", failed install
Keywords:
Status: CLOSED DUPLICATE of bug 1952224
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Ryan Phillips
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-26 17:17 UTC by Ben Parees
Modified: 2021-08-06 13:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-06 13:11:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ben Parees 2021-05-26 17:17:19 UTC
Description of problem:
The install in this job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1397380367442251776

failed because the kube-scheduler pod's container couldn't start on at least one node:

Operator degraded (InstallerPodContainerWaiting_ContainerCreating): InstallerPodContainerWaitingDegraded: Pod "installer-4-ip-10-0-145-161.us-west-2.compute.internal" on node "ip-10-0-145-161.us-west-2.compute.internal" container "installer" is waiting since 2021-05-26 02:51:52 +0000 UTC because ContainerCreating



Version-Release number of selected component (if applicable):
registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-05-25-223219


How reproducible:
rare

Actual results:
pod has a container stuck in container create

Expected results:
container starts successfully


Additional info:
presumably this is an issue w/ the particular node this pod landed on, since pods elsewhere started ok

I did not dig into whether any other pods started successfully on this particular node.

Comment 1 Peter Hunt 2021-05-27 19:33:01 UTC
hm this looks like a quick API add/delete:

```
02:51:52.159376    1324 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal]
...
02:51:56.761526    1324 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal]
...
02:51:59.450534    1324 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF" pod="openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal"
```

another piece of note is the kubelet then forever attempts to open the logs and fails:
```
grep -r 'Unable to fetch pod log stats.*installer-4' | wc -l
1185
```

Comment 3 W. Trevor King 2021-06-12 20:35:15 UTC
Should this be folded into bug 1952224?  My bug 1960772 was also about "unable to fetch pod log stats", and was closed as a dup of bug 1952224.

Comment 4 Ryan Phillips 2021-08-06 13:11:55 UTC

*** This bug has been marked as a duplicate of bug 1952224 ***


Note You need to log in before you can comment on or make changes to this bug.