Bug 1917133 - kubelet might fail to start kube-apiserver
Summary: kubelet might fail to start kube-apiserver
Keywords:
Status: CLOSED DUPLICATE of bug 1882750
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Harshal Patil
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-17 12:36 UTC by Eran Cohen
Modified: 2021-01-20 09:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-20 09:57:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eran Cohen 2021-01-17 12:36:40 UTC
Description of problem:
Once the kubeapiserver-operator rollout a new static pod configuration, kubelet take down the current kube-apiserver and not starting a new one, according to the log it seems that it's still trying to delete the mirror of the previous pod

 
Version-Release number of selected component (if applicable):
Release image: quay.io/eranco74/ocp-release:bootstrap-in-place-poc
  
cluster-kube-apiserver-operator                sha256:e09665ccb52085b7b729b862c8a3db849c22a4544923dad54e2460d80af9b1ce

Steps to Reproduce:
Install single node according to this guide:
https://docs.google.com/document/d/1jWprFXCVjV0PaEXtIsRVkvFE4sgdhsMKRSLh8h3wMLg/edit?usp=sharing


Actual results:
kubelet log:
Dec 31 14:55:40 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:55:40.861203   22082 kubelet_getters.go:176] "Pod status updated" pod="kube-system/bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0" status=Running
Dec 31 14:56:14 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:14.516732   22082 kubelet.go:1902] SyncLoop (REMOVE, "file"): "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system(ea69bb2238629f286e60aaf48f585626)"
Dec 31 14:56:14 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:14.519480   22082 kubelet_pods.go:1253] Killing unwanted pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0"
Dec 31 14:56:15 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:15.625181   22082 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/kube-apiserver-insecure-readyz/0.log"
Dec 31 14:56:15 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:15.626226   22082 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/kube-apiserver/0.log"
Dec 31 14:56:19 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:19.884165   22082 prober.go:117] Readiness probe for "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system(ea69bb2238629f286e60aaf48f585626):kube-apiserver" failed (failure): Get "https://192.168.126.10:6443/healthz": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Dec 31 14:56:22 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:22.488496   22082 prober.go:117] Liveness probe for "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system(ea69bb2238629f286e60aaf48f585626):kube-apiserver" failed (failure): Get "https://192.168.126.10:6443/healthz": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Dec 31 14:56:41 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 14:56:41.672634   22082 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: <nil>, extraDiskErr: could not stat "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/setup/0.log" to get inode usage: stat /var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/setup/0.log: no such file or directory
Dec 31 14:56:41 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:41.894493   22082 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/kube-apiserver-insecure-readyz/0.log"
Dec 31 14:56:41 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:41.895048   22082 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/kube-apiserver/0.log"
Dec 31 14:56:59 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:56:59.856755   22082 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/kube-apiserver-insecure-readyz/0.log"
Dec 31 14:57:41 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 14:57:41.417246   22082 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: <nil>, extraDiskErr: could not stat "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/kube-apiserver/1.log" to get inode usage: stat /var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/kube-apiserver/1.log: no such file or directory
Dec 31 14:57:41 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 14:57:41.621241   22082 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: <nil>, extraDiskErr: could not stat "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/85184bea1b44e09e074dfe08fd869366528e8d718b367ef99cf9680cee639180.log" to get inode usage: stat /var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/85184bea1b44e09e074dfe08fd869366528e8d718b367ef99cf9680cee639180.log: no such file or directory
Dec 31 14:58:29 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 14:58:29.799802   22082 container_log_manager.go:243] Container "546cfbf54edfcdee043c503cb04ae4a5ad0fde8b0727effd71b2aad9bbfcda8c" log "/var/log/pods/kube-system_bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_ea69bb2238629f286e60aaf48f585626/kube-apiserver/1.log" doesn't exist, reopen container log failed: rpc error: code = Unknown desc = container is not created or running
Dec 31 14:58:32 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:58:32.918911   22082 mirror_client.go:125] Deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system" (uid (*types.UID)(nil))
Dec 31 14:58:32 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 14:58:32.919786   22082 mirror_client.go:132] Failed deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system": Delete "https://api-int.test-infra-cluster-66afed19.redhat.com:6443/api/v1/namespaces/kube-system/pods/bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0": dial tcp 192.168.126.10:6443: connect: connection refused
Dec 31 14:58:34 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 14:58:34.923932   22082 mirror_client.go:125] Deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system" (uid (*types.UID)(nil))





Dec 31 15:04:10 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 15:04:10.936590   22082 mirror_client.go:132] Failed deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system": Delete "https://api-int.test-infra-cluster-66afed19.redhat.com:6443/api/v1/namespaces/kube-system/pods/bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0": dial tcp 192.168.126.10:6443: connect: connection refused
Dec 31 15:04:12 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 15:04:12.924387   22082 mirror_client.go:125] Deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system" (uid (*types.UID)(nil))
Dec 31 15:04:12 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 15:04:12.925367   22082 mirror_client.go:132] Failed deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system": Delete "https://api-int.test-infra-cluster-66afed19.redhat.com:6443/api/v1/namespaces/kube-system/pods/bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0": dial tcp 192.168.126.10:6443: connect: connection refused
Dec 31 15:04:14 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 15:04:14.928346   22082 mirror_client.go:125] Deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system" (uid (*types.UID)(nil))
Dec 31 15:04:14 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 15:04:14.929837   22082 mirror_client.go:132] Failed deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system": Delete "https://api-int.test-infra-cluster-66afed19.redhat.com:6443/api/v1/namespaces/kube-system/pods/bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0": dial tcp 192.168.126.10:6443: connect: connection refused
Dec 31 15:04:16 test-infra-cluster-66afed19-master-0 hyperkube[22082]: I1231 15:04:16.930712   22082 mirror_client.go:125] Deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system" (uid (*types.UID)(nil))
Dec 31 15:04:16 test-infra-cluster-66afed19-master-0 hyperkube[22082]: E1231 15:04:16.931902   22082 mirror_client.go:132] Failed deleting a mirror pod "bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0_kube-system": Delete "https://api-int.test-infra-cluster-66afed19.redhat.com:6443/api/v1/namespaces/kube-system/pods/bootstrap-kube-apiserver-test-infra-cluster-66afed19-master-0": dial tcp 192.168.126.10:6443: connect: connection refused


Expected results:
New kube-apiserver pod started

Additional info:

Comment 1 Stefan Schimanski 2021-01-18 09:24:21 UTC
This pretty much looks like a dead-lock in the kubelet state machien for static pods (in the case the pod is the only kube-apiserver).

Moving over.

Comment 2 Harshal Patil 2021-01-20 09:57:58 UTC

*** This bug has been marked as a duplicate of bug 1882750 ***


Note You need to log in before you can comment on or make changes to this bug.