Bug 2005350 - kube-scheduler operator went to degraded state with error StaticPodsDegraded: pod/openshift-kube-scheduler container "kube-scheduler" is terminated: Completed
Summary: kube-scheduler operator went to degraded state with error StaticPodsDegraded:...
Keywords:
Status: CLOSED DUPLICATE of bug 2023866
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Harshal Patil
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-17 13:30 UTC by RamaKasturi
Modified: 2021-11-25 11:31 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-25 06:00:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description RamaKasturi 2021-09-17 13:30:23 UTC
Description of problem:
When enabling featuregate LocalStorageCapacityIsolation=False via /etc/kubernetes/manifests/kube-scheduler-pod.yaml kube-scheduler operator goes to degraded state with error message below & the pod being in completed instead of running.

kube-scheduler                             4.9.0-0.nightly-2021-09-16-215330   True        False         True       116m    StaticPodsDegraded: pod/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal container "kube-scheduler" is terminated: Completed: ...

Version-Release number of selected component (if applicable):
[knarra@knarra OpenShift]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-09-16-215330   True        False         5h1m    Error while reconciling 4.9.0-0.nightly-2021-09-16-215330: the cluster operator kube-scheduler is degraded


How reproducible:
Hit it once

Steps to Reproduce:
1. Install 4.9 cluster with latest bits
2. Now enable LocalStorageCapacityIsolation=False via /etc/kubernetes/manifests/kube-scheduler-pod.yaml file
3. wait for the pods to get restarted.

Actual results:
The pod on the node where the feature gate enabled does not come to running state and is in completed state with kube-scheduler operator degraded.

Expected results:
kube-scheduler operator should not be degraded.

Additional info:
Reverting the changes back to normal did not bring back the pod at all and the operator remained in degraded state.
installer-9-ip-10-0-203-82.cn-north-1.compute.internal                0/1     Completed   0          4h11m
openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal   3/3     Running     0          4h9m
openshift-kube-scheduler-ip-10-0-203-82.cn-north-1.compute.internal   3/3     Running     0          4h11m
revision-pruner-4-ip-10-0-203-82.cn-north-1.compute.internal          0/1     Completed   0          5h28m

Comment 2 Filip Krepinsky 2021-09-21 21:34:59 UTC
the symptoms are very similar to https://bugzilla.redhat.com/show_bug.cgi?id=2005225

Comment 3 Jan Chaloupka 2021-09-22 18:10:49 UTC
> 2. Now enable LocalStorageCapacityIsolation=False via /etc/kubernetes/manifests/kube-scheduler-pod.yaml file

Can you more elaborate on this step? Do you change /etc/kubernetes/manifests/kube-scheduler-pod.yaml directly on one of the master nodes? Or, do you use a different mechanism for enabling the LocalStorageCapacityIsolation feature. Are the steps to enable the feature documented?

Comment 5 Jan Chaloupka 2021-11-16 13:57:54 UTC
```
2021-09-17T07:51:35.744864602Z I0917 07:51:35.744825       1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T07:49:33Z","message":"StaticPodsDegraded: pods \"openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal\" not found","reason":"StaticPods_Error","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-09-17T07:47:32Z","message":"NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"NodeInstaller","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 2 nodes are active; 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
...
2021-09-17T07:51:45.411972239Z I0917 07:51:45.411911       1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T07:51:45Z","message":"NodeControllerDegraded: All master nodes are ready","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-09-17T07:47:32Z","message":"NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"NodeInstaller","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 2 nodes are active; 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
```

The first log line says the status Degraded=True occurred at 2021-09-17T07:49:33Z and lasted until 2021-09-17T07:51:45Z (based on the second log line).

```
2021-09-17T09:59:54.373055582Z I0917 09:59:54.372673       1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T09:35:47Z","message":"StaticPodsDegraded: pods \"openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal\" not found","reason":"StaticPods_Error","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-09-17T09:09:32Z","message":"NodeInstallerProgressing: 3 nodes are at revision 9","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 9","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
```

logs another occurrence of when the Degraded changed to True occurring from 2021-09-17T09:35:47Z until 09:59:54.372673 at least.

The first time the Degraded=True condition appeared corresponds with the bootstrapping process as (for some reason) the third scheduler instance (openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal) has not been created. Given your listing of scheduler instances has openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal running, we can ignore the first log.

The second time openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is missing. Which correlates with the list of instances you provided:
```
Reverting the changes back to normal did not bring back the pod at all and the operator remained in degraded state.
installer-9-ip-10-0-203-82.cn-north-1.compute.internal                0/1     Completed   0          4h11m
openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal   3/3     Running     0          4h9m
openshift-kube-scheduler-ip-10-0-203-82.cn-north-1.compute.internal   3/3     Running     0          4h11m
revision-pruner-4-ip-10-0-203-82.cn-north-1.compute.internal          0/1     Completed   0          5h28m
```

Kubelet logs for openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal from around 09:35:
```
Sep 17 09:33:23.307175 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.307139    1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:33:23.309005 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.308944    1354 kubelet.go:2076] "SyncLoop ADD" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:33:23.315002 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.313279    1354 kubelet.go:2083] "SyncLoop UPDATE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:33:23.318856 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.318827    1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:33:24.124054 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:24.123952    1354 prober.go:116] "Probe failed" probeType="Liveness" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" podUID=7965b44b-c369-487b-952a-b2138da7b3c4 containerName="kube-scheduler" probeResult=failure output="Get \"https://10.0.154.184:10259/healthz\": dial tcp 10.0.154.184:10259: connect: connection refused"
Sep 17 09:33:24.124318 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:24.124276    1354 prober.go:116] "Probe failed" probeType="Readiness" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" podUID=7965b44b-c369-487b-952a-b2138da7b3c4 containerName="kube-scheduler" probeResult=failure output="Get \"https://10.0.154.184:10259/healthz\": dial tcp 10.0.154.184:10259: connect: connection refused"
```

Which reports the kubelet registered the file on the disk changed. 

```
Sep 17 09:28:04.224578 ip-10-0-154-184 hyperkube[1354]: I0917 09:28:04.224543    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:29:04.224914 ip-10-0-154-184 hyperkube[1354]: I0917 09:29:04.224884    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:30:04.225713 ip-10-0-154-184 hyperkube[1354]: I0917 09:30:04.225632    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:31:04.227250 ip-10-0-154-184 hyperkube[1354]: I0917 09:31:04.226783    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:32:04.227866 ip-10-0-154-184 hyperkube[1354]: I0917 09:32:04.227829    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:33:04.229055 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:04.228948    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:34:04.229222 ip-10-0-154-184 hyperkube[1354]: I0917 09:34:04.229229    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:35:04.229893 ip-10-0-154-184 hyperkube[1354]: I0917 09:35:04.229767    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:36:04.230128 ip-10-0-154-184 hyperkube[1354]: I0917 09:36:04.230040    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:37:04.230410 ip-10-0-154-184 hyperkube[1354]: I0917 09:37:04.230380    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:38:04.231844 ip-10-0-154-184 hyperkube[1354]: I0917 09:38:04.231469    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:39:04.233008 ip-10-0-154-184 hyperkube[1354]: I0917 09:39:04.231737    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:40:04.232948 ip-10-0-154-184 hyperkube[1354]: I0917 09:40:04.232890    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:41:04.234377 ip-10-0-154-184 hyperkube[1354]: I0917 09:41:04.233868    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:42:04.234716 ip-10-0-154-184 hyperkube[1354]: I0917 09:42:04.234616    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:43:04.235042 ip-10-0-154-184 hyperkube[1354]: I0917 09:43:04.235055    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:44:04.236589 ip-10-0-154-184 hyperkube[1354]: I0917 09:44:04.236238    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:45:04.236831 ip-10-0-154-184 hyperkube[1354]: I0917 09:45:04.236700    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:46:04.237611 ip-10-0-154-184 hyperkube[1354]: I0917 09:46:04.237621    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:47:04.237811 ip-10-0-154-184 hyperkube[1354]: I0917 09:47:04.237816    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:48:04.238839 ip-10-0-154-184 hyperkube[1354]: I0917 09:48:04.238745    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:49:04.239495 ip-10-0-154-184 hyperkube[1354]: I0917 09:49:04.239459    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:50:04.240366 ip-10-0-154-184 hyperkube[1354]: I0917 09:50:04.240253    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:51:04.240455 ip-10-0-154-184 hyperkube[1354]: I0917 09:51:04.240422    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:52:04.241090 ip-10-0-154-184 hyperkube[1354]: I0917 09:52:04.241058    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:53:04.242282 ip-10-0-154-184 hyperkube[1354]: I0917 09:53:04.242248    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:54:04.243539 ip-10-0-154-184 hyperkube[1354]: I0917 09:54:04.243445    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:55:04.243916 ip-10-0-154-184 hyperkube[1354]: I0917 09:55:04.243801    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:56:04.244825 ip-10-0-154-184 hyperkube[1354]: I0917 09:56:04.244792    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:57:04.246895 ip-10-0-154-184 hyperkube[1354]: I0917 09:57:04.246887    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:58:04.247939 ip-10-0-154-184 hyperkube[1354]: I0917 09:58:04.247912    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:59:04.248206 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:04.248164    1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running
Sep 17 09:59:53.959059 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.958953    1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:59:53.959630 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.959573    1354 kubelet.go:2076] "SyncLoop ADD" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal]
Sep 17 09:59:53.959668 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.959657    1354 kubelet.go:2186] "SyncLoop (probe)" probe="liveness" status="unhealthy" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal"
```

Saying openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is running. Unfortunatelly, http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2005350/must-gather.local.4429227838798661642/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-514b5bd41b36c20014cd237c8aa7eec548e02df0b5d289ae15385eb2a8c6eebf/namespaces/openshift-kube-scheduler/pods/ does not have the logs for the pod.

http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2005350/must-gather.local.4429227838798661642/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-514b5bd41b36c20014cd237c8aa7eec548e02df0b5d289ae15385eb2a8c6eebf/host_service_logs/masters/crio_service.log shows:
```
Sep 17 09:07:43.755407 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.755366931Z" level=info msg="Created container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=d08b19bc-2ff7-4529-92b3-284fdc81f764 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Sep 17 09:07:43.756022 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.755984608Z" level=info msg="Starting container: 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57" id=808fa82a-bd49-46c2-92ee-3da3efebd548 name=/runtime.v1alpha2.RuntimeService/StartContainer
Sep 17 09:07:43.774435 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.774380005Z" level=info msg="Started container" PID=119747 containerID=5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57 description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler id=808fa82a-bd49-46c2-92ee-3da3efebd548 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427
...
Sep 17 09:07:44.230872 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.230827571Z" level=info msg="Created container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=facf3fdb-60c7-4a96-b25a-84763c83251f name=/runtime.v1alpha2.RuntimeService/CreateContainer
Sep 17 09:07:44.231364 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.231318791Z" level=info msg="Starting container: a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e" id=7b73ea19-d925-4ad6-9b14-f8801d9f4e7a name=/runtime.v1alpha2.RuntimeService/StartContainer
Sep 17 09:07:44.249862 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.249231641Z" level=info msg="Started container" PID=119873 containerID=a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller id=7b73ea19-d925-4ad6-9b14-f8801d9f4e7a name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427
...
Sep 17 09:07:43.986771 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.986725074Z" level=info msg="Created container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=9a20a376-52bc-46ee-9656-ffc651d6a989 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Sep 17 09:07:43.987452 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.987419608Z" level=info msg="Starting container: 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91" id=6891ae48-a6ba-4c43-b9a6-bcbbdd590017 name=/runtime.v1alpha2.RuntimeService/StartContainer
Sep 17 09:07:44.001663 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.001614185Z" level=info msg="Started container" PID=119825 containerID=7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91 description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer id=6891ae48-a6ba-4c43-b9a6-bcbbdd590017 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427
...
Sep 17 09:33:23.498782 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.498737761Z" level=info msg="Stopped container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=f0b425bc-7788-4110-b534-9e8ccba2cb71 name=/runtime.v1alpha2.RuntimeService/StopContainer
Sep 17 09:33:23.515016 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.514964942Z" level=info msg="Stopped container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=3ce1dcfb-530a-406d-b2f1-bcef3b814d7d name=/runtime.v1alpha2.RuntimeService/StopContainer
Sep 17 09:33:23.533013 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.532962967Z" level=info msg="Stopped container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=2ea6122e-fd6d-4140-b2d8-5916a65862bc name=/runtime.v1alpha2.RuntimeService/StopContainer
...
Sep 17 10:00:15.841950 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.841917034Z" level=info msg="Removed container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=907ac0df-37a6-49fb-b745-95d9cfde5c78 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.842934 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.842906113Z" level=info msg="Removing container: 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57" id=bd4a13d4-8907-4f5d-b56f-c89657a3aafe name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.860676 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.860646998Z" level=info msg="Removed container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=bd4a13d4-8907-4f5d-b56f-c89657a3aafe name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.861689 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.861660152Z" level=info msg="Removing container: 52fdc3baf7f94f1511ed3a0413f6f3e7511ddbc7620c26f7272ba1f5c595725d" id=41f54833-90df-4f5c-a5f8-a1691aafd039 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.909234 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.909203533Z" level=info msg="Removed container 52fdc3baf7f94f1511ed3a0413f6f3e7511ddbc7620c26f7272ba1f5c595725d: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/wait-for-host-port" id=41f54833-90df-4f5c-a5f8-a1691aafd039 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.910528 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.910499929Z" level=info msg="Removing container: 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91" id=04e0d12e-49cd-4a73-8ebe-747d2419f1f6 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
Sep 17 10:00:15.929278 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.929250492Z" level=info msg="Removed container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=04e0d12e-49cd-4a73-8ebe-747d2419f1f6 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
```

All three openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal containers were stopped at 09:33:23. Correlating with the kubelet logs clearly saying openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is running even though the crio.logs says all the containers are stopped, the kubelet incorrectly reports the pod status.

Comment 7 RamaKasturi 2021-11-18 11:41:09 UTC
Hello Harshal,

    I see that the test has only failed once on one profile (aws on china region) and after that i did not hit the issue again. But i can run the test on the same profile and see if that works. Is there any other thing you would want me to look in particular ?

Thanks
kasturi

Comment 9 RamaKasturi 2021-11-22 17:50:04 UTC
Hello Harshal,

    I tried to run this test with latest CI builds on the profile which was different from where the bug was actually reported and i do not see the issue happening there. We have a problem bringing up that profile and once it is up i will run the case there as well and provide information.

[knarra@knarra cucushift]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-11-20-024310   True        False         9h      Cluster version is 4.9.0-0.nightly-2021-11-20-024310


Thanks
kasturi

Comment 11 RamaKasturi 2021-11-25 11:18:32 UTC
@harshal Patil, i think bug should be moved to ON_QA rather than CLOSED -> CURRENT RELEASE as there is a fix which has gone into the payload. WDYT ?


Note You need to log in before you can comment on or make changes to this bug.