Bug 1965059

Summary: container stuck in "container creating", failed install
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: NodeAssignee: Ryan Phillips <rphillips>
Node sub component: Kubelet QA Contact: MinLi <minmli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: low    
Priority: low CC: aos-bugs, wking
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-06 13:11:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Parees 2021-05-26 17:17:19 UTC
Description of problem:
The install in this job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1397380367442251776

failed because the kube-scheduler pod's container couldn't start on at least one node:

Operator degraded (InstallerPodContainerWaiting_ContainerCreating): InstallerPodContainerWaitingDegraded: Pod "installer-4-ip-10-0-145-161.us-west-2.compute.internal" on node "ip-10-0-145-161.us-west-2.compute.internal" container "installer" is waiting since 2021-05-26 02:51:52 +0000 UTC because ContainerCreating



Version-Release number of selected component (if applicable):
registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-05-25-223219


How reproducible:
rare

Actual results:
pod has a container stuck in container create

Expected results:
container starts successfully


Additional info:
presumably this is an issue w/ the particular node this pod landed on, since pods elsewhere started ok

I did not dig into whether any other pods started successfully on this particular node.

Comment 1 Peter Hunt 2021-05-27 19:33:01 UTC
hm this looks like a quick API add/delete:

```
02:51:52.159376    1324 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal]
...
02:51:56.761526    1324 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal]
...
02:51:59.450534    1324 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF" pod="openshift-kube-scheduler/installer-4-ip-10-0-145-161.us-west-2.compute.internal"
```

another piece of note is the kubelet then forever attempts to open the logs and fails:
```
grep -r 'Unable to fetch pod log stats.*installer-4' | wc -l
1185
```

Comment 3 W. Trevor King 2021-06-12 20:35:15 UTC
Should this be folded into bug 1952224?  My bug 1960772 was also about "unable to fetch pod log stats", and was closed as a dup of bug 1952224.

Comment 4 Ryan Phillips 2021-08-06 13:11:55 UTC

*** This bug has been marked as a duplicate of bug 1952224 ***