Bug 2005350
| Summary: | kube-scheduler operator went to degraded state with error StaticPodsDegraded: pod/openshift-kube-scheduler container "kube-scheduler" is terminated: Completed | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | RamaKasturi <knarra> |
| Component: | Node | Assignee: | Harshal Patil <harpatil> |
| Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-bugs, fkrepins, harpatil, mfojtik |
| Version: | 4.9 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-11-25 06:00:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
RamaKasturi
2021-09-17 13:30:23 UTC
Link to must-gather: http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2005350/must-gather.local.4429227838798661642/ the symptoms are very similar to https://bugzilla.redhat.com/show_bug.cgi?id=2005225 > 2. Now enable LocalStorageCapacityIsolation=False via /etc/kubernetes/manifests/kube-scheduler-pod.yaml file
Can you more elaborate on this step? Do you change /etc/kubernetes/manifests/kube-scheduler-pod.yaml directly on one of the master nodes? Or, do you use a different mechanism for enabling the LocalStorageCapacityIsolation feature. Are the steps to enable the feature documented?
```
2021-09-17T07:51:35.744864602Z I0917 07:51:35.744825 1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T07:49:33Z","message":"StaticPodsDegraded: pods \"openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal\" not found","reason":"StaticPods_Error","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-09-17T07:47:32Z","message":"NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"NodeInstaller","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 2 nodes are active; 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
...
2021-09-17T07:51:45.411972239Z I0917 07:51:45.411911 1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T07:51:45Z","message":"NodeControllerDegraded: All master nodes are ready","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-09-17T07:47:32Z","message":"NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"NodeInstaller","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 2 nodes are active; 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
```
The first log line says the status Degraded=True occurred at 2021-09-17T07:49:33Z and lasted until 2021-09-17T07:51:45Z (based on the second log line).
```
2021-09-17T09:59:54.373055582Z I0917 09:59:54.372673 1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T09:35:47Z","message":"StaticPodsDegraded: pods \"openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal\" not found","reason":"StaticPods_Error","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-09-17T09:09:32Z","message":"NodeInstallerProgressing: 3 nodes are at revision 9","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 9","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
```
logs another occurrence of when the Degraded changed to True occurring from 2021-09-17T09:35:47Z until 09:59:54.372673 at least.
The first time the Degraded=True condition appeared corresponds with the bootstrapping process as (for some reason) the third scheduler instance (openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal) has not been created. Given your listing of scheduler instances has openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal running, we can ignore the first log.
The second time openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is missing. Which correlates with the list of instances you provided:
```
Reverting the changes back to normal did not bring back the pod at all and the operator remained in degraded state.
installer-9-ip-10-0-203-82.cn-north-1.compute.internal 0/1 Completed 0 4h11m
openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal 3/3 Running 0 4h9m
openshift-kube-scheduler-ip-10-0-203-82.cn-north-1.compute.internal 3/3 Running 0 4h11m
revision-pruner-4-ip-10-0-203-82.cn-north-1.compute.internal 0/1 Completed 0 5h28m
```
Kubelet logs for openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal from around 09:35:
```
Sep 17 09:33:23.307175 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.307139 1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:33:23.309005 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.308944 1354 kubelet.go:2076] "SyncLoop ADD" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:33:23.315002 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.313279 1354 kubelet.go:2083] "SyncLoop UPDATE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:33:23.318856 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.318827 1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:33:24.124054 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:24.123952 1354 prober.go:116] "Probe failed" probeType="Liveness" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" podUID=7965b44b-c369-487b-952a-b2138da7b3c4 containerName="kube-scheduler" probeResult=failure output="Get \"https://10.0.154.184:10259/healthz\": dial tcp 10.0.154.184:10259: connect: connection refused"
Sep 17 09:33:24.124318 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:24.124276 1354 prober.go:116] "Probe failed" probeType="Readiness" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" podUID=7965b44b-c369-487b-952a-b2138da7b3c4 containerName="kube-scheduler" probeResult=failure output="Get \"https://10.0.154.184:10259/healthz\": dial tcp 10.0.154.184:10259: connect: connection refused"
```
Which reports the kubelet registered the file on the disk changed.
```
Sep 17 09:28:04.224578 ip-10-0-154-184 hyperkube[1354]: I0917 09:28:04.224543 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:29:04.224914 ip-10-0-154-184 hyperkube[1354]: I0917 09:29:04.224884 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:30:04.225713 ip-10-0-154-184 hyperkube[1354]: I0917 09:30:04.225632 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:31:04.227250 ip-10-0-154-184 hyperkube[1354]: I0917 09:31:04.226783 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:32:04.227866 ip-10-0-154-184 hyperkube[1354]: I0917 09:32:04.227829 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:33:04.229055 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:04.228948 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:34:04.229222 ip-10-0-154-184 hyperkube[1354]: I0917 09:34:04.229229 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:35:04.229893 ip-10-0-154-184 hyperkube[1354]: I0917 09:35:04.229767 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:36:04.230128 ip-10-0-154-184 hyperkube[1354]: I0917 09:36:04.230040 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:37:04.230410 ip-10-0-154-184 hyperkube[1354]: I0917 09:37:04.230380 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:38:04.231844 ip-10-0-154-184 hyperkube[1354]: I0917 09:38:04.231469 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:39:04.233008 ip-10-0-154-184 hyperkube[1354]: I0917 09:39:04.231737 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:40:04.232948 ip-10-0-154-184 hyperkube[1354]: I0917 09:40:04.232890 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:41:04.234377 ip-10-0-154-184 hyperkube[1354]: I0917 09:41:04.233868 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:42:04.234716 ip-10-0-154-184 hyperkube[1354]: I0917 09:42:04.234616 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:43:04.235042 ip-10-0-154-184 hyperkube[1354]: I0917 09:43:04.235055 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:44:04.236589 ip-10-0-154-184 hyperkube[1354]: I0917 09:44:04.236238 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:45:04.236831 ip-10-0-154-184 hyperkube[1354]: I0917 09:45:04.236700 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:46:04.237611 ip-10-0-154-184 hyperkube[1354]: I0917 09:46:04.237621 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:47:04.237811 ip-10-0-154-184 hyperkube[1354]: I0917 09:47:04.237816 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:48:04.238839 ip-10-0-154-184 hyperkube[1354]: I0917 09:48:04.238745 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:49:04.239495 ip-10-0-154-184 hyperkube[1354]: I0917 09:49:04.239459 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:50:04.240366 ip-10-0-154-184 hyperkube[1354]: I0917 09:50:04.240253 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:51:04.240455 ip-10-0-154-184 hyperkube[1354]: I0917 09:51:04.240422 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:52:04.241090 ip-10-0-154-184 hyperkube[1354]: I0917 09:52:04.241058 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:53:04.242282 ip-10-0-154-184 hyperkube[1354]: I0917 09:53:04.242248 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:54:04.243539 ip-10-0-154-184 hyperkube[1354]: I0917 09:54:04.243445 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:55:04.243916 ip-10-0-154-184 hyperkube[1354]: I0917 09:55:04.243801 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:56:04.244825 ip-10-0-154-184 hyperkube[1354]: I0917 09:56:04.244792 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:57:04.246895 ip-10-0-154-184 hyperkube[1354]: I0917 09:57:04.246887 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:58:04.247939 ip-10-0-154-184 hyperkube[1354]: I0917 09:58:04.247912 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:59:04.248206 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:04.248164 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:59:53.959059 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.958953 1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:59:53.959630 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.959573 1354 kubelet.go:2076] "SyncLoop ADD" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:59:53.959668 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.959657 1354 kubelet.go:2186] "SyncLoop (probe)" probe="liveness" status="unhealthy" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal"
```
Saying openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is running. Unfortunatelly, http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2005350/must-gather.local.4429227838798661642/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-514b5bd41b36c20014cd237c8aa7eec548e02df0b5d289ae15385eb2a8c6eebf/namespaces/openshift-kube-scheduler/pods/ does not have the logs for the pod.
http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2005350/must-gather.local.4429227838798661642/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-514b5bd41b36c20014cd237c8aa7eec548e02df0b5d289ae15385eb2a8c6eebf/host_service_logs/masters/crio_service.log shows:
```
Sep 17 09:07:43.755407 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.755366931Z" level=info msg="Created container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=d08b19bc-2ff7-4529-92b3-284fdc81f764 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Sep 17 09:07:43.756022 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.755984608Z" level=info msg="Starting container: 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57" id=808fa82a-bd49-46c2-92ee-3da3efebd548 name=/runtime.v1alpha2.RuntimeService/StartContainer
Sep 17 09:07:43.774435 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.774380005Z" level=info msg="Started container" PID=119747 containerID=5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57 description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler id=808fa82a-bd49-46c2-92ee-3da3efebd548 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427
...
Sep 17 09:07:44.230872 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.230827571Z" level=info msg="Created container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=facf3fdb-60c7-4a96-b25a-84763c83251f name=/runtime.v1alpha2.RuntimeService/CreateContainer
Sep 17 09:07:44.231364 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.231318791Z" level=info msg="Starting container: a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e" id=7b73ea19-d925-4ad6-9b14-f8801d9f4e7a name=/runtime.v1alpha2.RuntimeService/StartContainer
Sep 17 09:07:44.249862 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.249231641Z" level=info msg="Started container" PID=119873 containerID=a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller id=7b73ea19-d925-4ad6-9b14-f8801d9f4e7a name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427
...
Sep 17 09:07:43.986771 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.986725074Z" level=info msg="Created container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=9a20a376-52bc-46ee-9656-ffc651d6a989 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Sep 17 09:07:43.987452 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.987419608Z" level=info msg="Starting container: 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91" id=6891ae48-a6ba-4c43-b9a6-bcbbdd590017 name=/runtime.v1alpha2.RuntimeService/StartContainer
Sep 17 09:07:44.001663 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.001614185Z" level=info msg="Started container" PID=119825 containerID=7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91 description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer id=6891ae48-a6ba-4c43-b9a6-bcbbdd590017 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427
...
Sep 17 09:33:23.498782 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.498737761Z" level=info msg="Stopped container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=f0b425bc-7788-4110-b534-9e8ccba2cb71 name=/runtime.v1alpha2.RuntimeService/StopContainer
Sep 17 09:33:23.515016 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.514964942Z" level=info msg="Stopped container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=3ce1dcfb-530a-406d-b2f1-bcef3b814d7d name=/runtime.v1alpha2.RuntimeService/StopContainer
Sep 17 09:33:23.533013 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.532962967Z" level=info msg="Stopped container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=2ea6122e-fd6d-4140-b2d8-5916a65862bc name=/runtime.v1alpha2.RuntimeService/StopContainer
...
Sep 17 10:00:15.841950 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.841917034Z" level=info msg="Removed container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=907ac0df-37a6-49fb-b745-95d9cfde5c78 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.842934 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.842906113Z" level=info msg="Removing container: 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57" id=bd4a13d4-8907-4f5d-b56f-c89657a3aafe name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.860676 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.860646998Z" level=info msg="Removed container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=bd4a13d4-8907-4f5d-b56f-c89657a3aafe name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.861689 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.861660152Z" level=info msg="Removing container: 52fdc3baf7f94f1511ed3a0413f6f3e7511ddbc7620c26f7272ba1f5c595725d" id=41f54833-90df-4f5c-a5f8-a1691aafd039 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.909234 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.909203533Z" level=info msg="Removed container 52fdc3baf7f94f1511ed3a0413f6f3e7511ddbc7620c26f7272ba1f5c595725d: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/wait-for-host-port" id=41f54833-90df-4f5c-a5f8-a1691aafd039 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.910528 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.910499929Z" level=info msg="Removing container: 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91" id=04e0d12e-49cd-4a73-8ebe-747d2419f1f6 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.929278 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.929250492Z" level=info msg="Removed container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=04e0d12e-49cd-4a73-8ebe-747d2419f1f6 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
```
All three openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal containers were stopped at 09:33:23. Correlating with the kubelet logs clearly saying openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is running even though the crio.logs says all the containers are stopped, the kubelet incorrectly reports the pod status.
Hello Harshal,
I see that the test has only failed once on one profile (aws on china region) and after that i did not hit the issue again. But i can run the test on the same profile and see if that works. Is there any other thing you would want me to look in particular ?
Thanks
kasturi
Hello Harshal,
I tried to run this test with latest CI builds on the profile which was different from where the bug was actually reported and i do not see the issue happening there. We have a problem bringing up that profile and once it is up i will run the case there as well and provide information.
[knarra@knarra cucushift]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-11-20-024310 True False 9h Cluster version is 4.9.0-0.nightly-2021-11-20-024310
Thanks
kasturi
@harshal Patil, i think bug should be moved to ON_QA rather than CLOSED -> CURRENT RELEASE as there is a fix which has gone into the payload. WDYT ? |