Bug 2009646
| Summary: | kube-scheduler goes to degraded state with error “InstallerPodContainerWaitingDegraded: Pod "installer-5" on node "master-2.c.openshift-qe.internal" container "installer" is waiting since 2021-09-30 09:23:43 +0000 UTC because ContainerCreating” | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | RamaKasturi <knarra> |
| Component: | Node | Assignee: | Ryan Phillips <rphillips> |
| Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-bugs, jchaloup, mfojtik |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-11-04 21:27:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
RamaKasturi
2021-10-01 07:01:05 UTC
Log for must-gather link: http://virt-openshift-05.lab.eng.nay.redhat.com/pdhamdhe/must-gather.tar.gz Thank you for the must-gather. From must-gather.local.5320996955247938316/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-d34d8f0bd94598f3d245fe274b79fca3926996e3b371d48b72f524c1908b034f/host_service_logs/masters/kubelet_service.log
```
Sep 30 09:23:43.275498 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:43.275437 1845 kubelet.go:1944] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal]
Sep 30 09:23:43.442155 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:43.442109 1845 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/1a3dd6f4-9504-4096-bf4a-e9d8be863f1e-kubelet-dir\") pod \"installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal\" (UID: \"1a3dd6f4-9504-4096-bf4a-e9d8be863f1e\") "
Sep 30 09:23:43.442451 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:43.442212 1845 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access\" (UniqueName: \"kubernetes.io/projected/1a3dd6f4-9504-4096-bf4a-e9d8be863f1e-kube-api-access\") pod \"installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal\" (UID: \"1a3dd6f4-9504-4096-bf4a-e9d8be863f1e\") "
Sep 30 09:23:43.543297 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:43.543119 1845 reconciler.go:269] "operationExecutor.MountVolume started for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/1a3dd6f4-9504-4096-bf4a-e9d8be863f1e-kubelet-dir\") pod \"installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal\" (UID: \"1a3dd6f4-9504-4096-bf4a-e9d8be863f1e\") "
Sep 30 09:23:43.543297 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:43.543239 1845 reconciler.go:269] "operationExecutor.MountVolume started for volume \"kube-api-access\" (UniqueName: \"kubernetes.io/projected/1a3dd6f4-9504-4096-bf4a-e9d8be863f1e-kube-api-access\") pod \"installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal\" (UID: \"1a3dd6f4-9504-4096-bf4a-e9d8be863f1e\") "
Sep 30 09:23:43.543690 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:43.543636 1845 operation_generator.go:698] MountVolume.SetUp succeeded for volume "kubelet-dir" (UniqueName: "kubernetes.io/host-path/1a3dd6f4-9504-4096-bf4a-e9d8be863f1e-kubelet-dir") pod "installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal" (UID: "1a3dd6f4-9504-4096-bf4a-e9d8be863f1e")
Sep 30 09:23:43.631979 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:43.631922 1845 kubelet.go:1951] "SyncLoop UPDATE" source="api" pods=[openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal]
Sep 30 09:23:44.240195 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:44.240153 1845 operation_generator.go:698] MountVolume.SetUp succeeded for volume "kube-api-access" (UniqueName: "kubernetes.io/projected/1a3dd6f4-9504-4096-bf4a-e9d8be863f1e-kube-api-access") pod "installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal" (UID: "1a3dd6f4-9504-4096-bf4a-e9d8be863f1e")
Sep 30 09:23:44.506328 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:44.506173 1845 kuberuntime_manager.go:460] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal"
Sep 30 09:23:44.507180 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: E0930 09:23:44.507159 1845 kubelet_pods.go:412] "Hostname for pod was too long, truncated it" podName="installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal" hostnameMaxLen=63 truncatedHostname="installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.intern"
Sep 30 09:23:45.106450 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:45.102496 1845 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal]
Sep 30 09:23:46.324453 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:46.320768 1845 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal]
Sep 30 09:23:47.280550 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: E0930 09:23:47.280373 1845 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF" pod="openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal"
Sep 30 09:24:25.232409 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: E0930 09:24:25.230819 1845 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-kube-scheduler_installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal_1a3dd6f4-9504-4096-bf4a-e9d8be863f1e: no such file or directory" pod="openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal"
Sep 30 09:24:37.912549 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: E0930 09:24:37.912480 1845 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-kube-scheduler_installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal_1a3dd6f4-9504-4096-bf4a-e9d8be863f1e: no such file or directory" pod="openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal"
Sep 30 09:24:38.526327 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: E0930 09:24:38.526235 1845 kubelet_pods.go:412] "Hostname for pod was too long, truncated it" podName="installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal" hostnameMaxLen=63 truncatedHostname="installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.intern"
Sep 30 09:24:40.729439 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:24:40.729244 1845 kubelet_pods.go:1285] "Killing unwanted pod" podName="installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal"
Sep 30 09:24:41.573186 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: E0930 09:24:41.572940 1845 kubelet_pods.go:412] "Hostname for pod was too long, truncated it" podName="installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal" hostnameMaxLen=63 truncatedHostname="installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.intern"
```
From must-gather.local.5320996955247938316/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-d34d8f0bd94598f3d245fe274b79fca3926996e3b371d48b72f524c1908b034f/host_service_logs/masters/crio_service.log:
```
Sep 30 09:23:44.510415 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: time="2021-09-30 09:23:44.508106562Z" level=info msg="Running pod sandbox: openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal/POD" id=40757908-0b8b-44d9-bdaf-2227962f574b name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Sep 30 09:23:44.658591 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: time="2021-09-30 09:23:44.658509784Z" level=info msg="Got pod network &{Name:installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal Namespace:openshift-kube-scheduler ID:d384431413d02e1dfc7faad276117201805c860e4c8e36069f4c15fcf886211c UID:1a3dd6f4-9504-4096-bf4a-e9d8be863f1e NetNS:/var/run/netns/89501363-5641-4645-82d7-9c526292efea Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Sep 30 09:23:44.658591 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: time="2021-09-30 09:23:44.658595967Z" level=info msg="Adding pod openshift-kube-scheduler_installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal to CNI network \"multus-cni-network\" (type=multus)"
Sep 30 09:23:46.301252 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: 2021-09-30T09:23:46Z [verbose] Add: openshift-kube-scheduler:installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal:1a3dd6f4-9504-4096-bf4a-e9d8be863f1e:ovn-kubernetes(ovn-kubernetes):eth0 {"cniVersion":"0.4.0","interfaces":[{"name":"d384431413d02e1","mac":"1a:b2:0a:f4:3d:39"},{"name":"eth0","mac":"0a:58:0a:81:00:13","sandbox":"/var/run/netns/89501363-5641-4645-82d7-9c526292efea"}],"ips":[{"version":"4","interface":1,"address":"10.129.0.19/23","gateway":"10.129.0.1"}],"dns":{}}
Sep 30 09:23:46.301252 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: I0930 09:23:46.274036 38712 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-scheduler", Name:"installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal", UID:"1a3dd6f4-9504-4096-bf4a-e9d8be863f1e", APIVersion:"v1", ResourceVersion:"19377", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.129.0.19/23] from ovn-kubernetes
Sep 30 09:23:46.301921 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: time="2021-09-30 09:23:46.301849626Z" level=info msg="Got pod network &{Name:installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal Namespace:openshift-kube-scheduler ID:d384431413d02e1dfc7faad276117201805c860e4c8e36069f4c15fcf886211c UID:1a3dd6f4-9504-4096-bf4a-e9d8be863f1e NetNS:/var/run/netns/89501363-5641-4645-82d7-9c526292efea Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Sep 30 09:23:46.302136 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: time="2021-09-30 09:23:46.302099602Z" level=info msg="Checking pod openshift-kube-scheduler_installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal for CNI network multus-cni-network (type=multus)"
Sep 30 09:23:46.794221 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: time="2021-09-30 09:23:46.793570590Z" level=info msg="Got pod network &{Name:installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal Namespace:openshift-kube-scheduler ID:d384431413d02e1dfc7faad276117201805c860e4c8e36069f4c15fcf886211c UID:1a3dd6f4-9504-4096-bf4a-e9d8be863f1e NetNS:/var/run/netns/89501363-5641-4645-82d7-9c526292efea Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Sep 30 09:23:46.794221 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: time="2021-09-30 09:23:46.793888445Z" level=info msg="Deleting pod openshift-kube-scheduler_installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal from CNI network \"multus-cni-network\" (type=multus)"
Sep 30 09:23:47.245188 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: 2021-09-30T09:23:46Z [verbose] Del: openshift-kube-scheduler:installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal:1a3dd6f4-9504-4096-bf4a-e9d8be863f1e:ovn-kubernetes:eth0 {"cniVersion":"0.4.0","dns":{},"ipam":{},"logFile":"/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log","logLevel":"4","logfile-maxage":5,"logfile-maxbackups":5,"logfile-maxsize":100,"name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay"}
```
The installer_state_controller.go examines all installer pods (independently of the revision number). If at least one of them is in Pending state for at least 5 minutes, the InstallerPodContainerWaitingDegraded condition is set to True. Whenever a new revision installer pod is created, its predecessor is asked to be deleted. Checking the timestamps from the kubelet and crio logs:
```
Sep 30 09:23:45.106450 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:45.102496 1845 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal]
Sep 30 09:23:46.324453 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:23:46.320768 1845 kubelet.go:1960] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal]
Sep 30 09:23:46.794221 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: time="2021-09-30 09:23:46.793888445Z" level=info msg="Deleting pod openshift-kube-scheduler_installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal from CNI network \"multus-cni-network\" (type=multus)"
Sep 30 09:23:47.245188 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal crio[1810]: 2021-09-30T09:23:46Z [verbose] Del: openshift-kube-scheduler:installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal:1a3dd6f4-9504-4096-bf4a-e9d8be863f1e:ovn-kubernetes:eth0 {"cniVersion":"0.4.0","dns":{},"ipam":{},"logFile":"/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log","logLevel":"4","logfile-maxage":5,"logfile-maxbackups":5,"logfile-maxsize":100,"name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay"}
Sep 30 09:24:40.729439 pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal hyperkube[1845]: I0930 09:24:40.729244 1845 kubelet_pods.go:1285] "Killing unwanted pod" podName="installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal"
```
Compared to the reported data:
```
09-30 15:16:53.743 Last Transition Time: 2021-09-30T09:30:50Z
09-30 15:16:53.743 Message: InstallerPodContainerWaitingDegraded: Pod "installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal" on node "pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal" container "installer" is waiting since 2021-09-30 09:23:43 +0000 UTC because ContainerCreating
```
The installer-5-pdhamdhe-30gcp-7hmff-master-2.c.openshift-qe.internal pod was asked to be deleted and to be killed before 09:24:40.740749. Yet, the reported data at 2021-09-30T09:30:50Z still claims the pod is there. Also, the kubelet/crio logs does not says anything about (un)successfully deleting/removing/killing the pod.
Two outcomes from the investigation:
- The installer controller pod could be updated to only take into account the latest installer pods. On the other hand it will be harder to detect the reported scenario.
- The container is clearly not removed -> it can be a known issue on the container runtime and already fixed in the latest 4.8 release.
Re-assigning to the Node team for further examination.
FYI > The installer controller pod could be updated to only take into account the latest installer pods. On the other hand it will be harder to detect the reported scenario. Already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1986385 *** This bug has been marked as a duplicate of bug 1986003 *** |