Bug 1995911
Summary: | "co/kube-apiserver is still in Progressing" is seen in upgrade ci | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Rahul Gangwar <rgangwar> |
Component: | Node | Assignee: | Harshal Patil <harpatil> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aos-bugs, cblecker, danw, harpatil, jokerman, jupierce, kewang, mfojtik, nagrawal, rgangwar, sanchezl, schoudha, sttts, travi, vpickard, wking, xxia |
Version: | 4.7 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1909600 | Environment: | |
Last Closed: | 2021-09-16 05:02:43 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rahul Gangwar
2021-08-20 06:50:28 UTC
The installer 17 failed because the kubelet was not ready: NodeInstallerDegraded: 1 nodes are failing on revision 17:\nNodeInstallerDegraded: no detailed termination message, see `oc get -n \"openshift-kube-apiserver\" pods/\"installer-17-leap02054839-tf2qv-control-plane-0\" -oyaml`\nNodeControllerDegraded: The master nodes not ready: node \"leap02054839-tf2qv-control-plane-2\" not ready since 2021-08-02 05:07:52 +0000 UTC because KubeletNotReady ([container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]) In 4.8 we have an installer retry. In 4.7 we don't and the backport is not feasible. Moving to node to understand why the pod failed. This is the 4.8 change: https://github.com/openshift/library-go/pull/979 @Harshal: Can you please check this log? If they are useful find . -name kubelet_service.log ./quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-040f48c020420ff93b227216469f6c2971cf10fac2b0b52ea9853e88ec1964a6/host_service_logs/masters/kubelet_service.log @Harshal: We re-running the upgrade job, if we able to replicate error we can share the journal system logs, till then please don't close bug Please find the attached journal logs. https://drive.google.com/file/d/1iGLAIp-0LRjXhkzQdbSZ81C7ivgK6ZRZ/view?usp=sharing @harshal: Any update on this, it's long time pending issue network is unavailable because the sdn pod failed to start because kubelet or cri-o was apparently unable to set it up correctly: Aug 27 01:32:16 rgangwar27114550-xlbt5-control-plane-1 hyperkube[2577]: E0827 05:32:16.921366 2577 pod_workers.go:190] Error syncing pod ab763d18-06eb-11ec-a51c-fa163e167285 ("sdn-q6v8g_openshift-sdn(ab763d18-06eb-11ec-a51c-fa163e167285)"), skipping: failed to "StartContainer" for "sdn" with CreateContainerError: "container create failed: time=\"2021-08-27T05:32:16Z\" level=warning msg=\"exit status 1\"\ntime=\"2021-08-27T05:32:16Z\" level=error msg=\"container_linux.go:349: starting container process caused \\\"process_linux.go:449: container init caused \\\\\\\"rootfs_linux.go:58: mounting \\\\\\\\\\\\\\\"/var/lib/kubelet/pods/ab763d18-06eb-11ec-a51c-fa163e167285/volumes/kubernetes.io~configmap/config\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/containers/storage/overlay/d2349e5ab6f0c58bbb439f29a0bd42d2e9003486a865d23a33fdc8940ba07c0f/merged\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/config\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"stat /var/lib/kubelet/pods/ab763d18-06eb-11ec-a51c-fa163e167285/volumes/kubernetes.io~configmap/config: no such file or directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\"\ncontainer_linux.go:349: starting container process caused \"process_linux.go:449: container init caused \\\"rootfs_linux.go:58: mounting \\\\\\\"/var/lib/kubelet/pods/ab763d18-06eb-11ec-a51c-fa163e167285/volumes/kubernetes.io~configmap/config\\\\\\\" to rootfs \\\\\\\"/var/lib/containers/storage/overlay/d2349e5ab6f0c58bbb439f29a0bd42d2e9003486a865d23a33fdc8940ba07c0f/merged\\\\\\\" at \\\\\\\"/config\\\\\\\" caused \\\\\\\"stat /var/lib/kubelet/pods/ab763d18-06eb-11ec-a51c-fa163e167285/volumes/kubernetes.io~configmap/config: no such file or directory\\\\\\\"\\\"\"\n" AFAICT there is no must-gather corresponding to this run. In the earlier must-gather all of the sdn pods are fine, so (assuming it was even the same problem both times), the problem is not with Networking. back to Node. (But is there really a diagnosable bug here at this point?) |