Description of problem: When enabling featuregate LocalStorageCapacityIsolation=False via /etc/kubernetes/manifests/kube-scheduler-pod.yaml kube-scheduler operator goes to degraded state with error message below & the pod being in completed instead of running. kube-scheduler 4.9.0-0.nightly-2021-09-16-215330 True False True 116m StaticPodsDegraded: pod/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal container "kube-scheduler" is terminated: Completed: ... Version-Release number of selected component (if applicable): [knarra@knarra OpenShift]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-16-215330 True False 5h1m Error while reconciling 4.9.0-0.nightly-2021-09-16-215330: the cluster operator kube-scheduler is degraded How reproducible: Hit it once Steps to Reproduce: 1. Install 4.9 cluster with latest bits 2. Now enable LocalStorageCapacityIsolation=False via /etc/kubernetes/manifests/kube-scheduler-pod.yaml file 3. wait for the pods to get restarted. Actual results: The pod on the node where the feature gate enabled does not come to running state and is in completed state with kube-scheduler operator degraded. Expected results: kube-scheduler operator should not be degraded. Additional info: Reverting the changes back to normal did not bring back the pod at all and the operator remained in degraded state. installer-9-ip-10-0-203-82.cn-north-1.compute.internal 0/1 Completed 0 4h11m openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal 3/3 Running 0 4h9m openshift-kube-scheduler-ip-10-0-203-82.cn-north-1.compute.internal 3/3 Running 0 4h11m revision-pruner-4-ip-10-0-203-82.cn-north-1.compute.internal 0/1 Completed 0 5h28m
Link to must-gather: http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2005350/must-gather.local.4429227838798661642/
the symptoms are very similar to https://bugzilla.redhat.com/show_bug.cgi?id=2005225
> 2. Now enable LocalStorageCapacityIsolation=False via /etc/kubernetes/manifests/kube-scheduler-pod.yaml file Can you more elaborate on this step? Do you change /etc/kubernetes/manifests/kube-scheduler-pod.yaml directly on one of the master nodes? Or, do you use a different mechanism for enabling the LocalStorageCapacityIsolation feature. Are the steps to enable the feature documented?
``` 2021-09-17T07:51:35.744864602Z I0917 07:51:35.744825 1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T07:49:33Z","message":"StaticPodsDegraded: pods \"openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal\" not found","reason":"StaticPods_Error","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-09-17T07:47:32Z","message":"NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"NodeInstaller","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 2 nodes are active; 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}} ... 2021-09-17T07:51:45.411972239Z I0917 07:51:45.411911 1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T07:51:45Z","message":"NodeControllerDegraded: All master nodes are ready","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-09-17T07:47:32Z","message":"NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"NodeInstaller","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 2 nodes are active; 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 5; 0 nodes have achieved new revision 6","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}} ``` The first log line says the status Degraded=True occurred at 2021-09-17T07:49:33Z and lasted until 2021-09-17T07:51:45Z (based on the second log line). ``` 2021-09-17T09:59:54.373055582Z I0917 09:59:54.372673 1 status_controller.go:211] clusteroperator/kube-scheduler diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-17T09:35:47Z","message":"StaticPodsDegraded: pods \"openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal\" not found","reason":"StaticPods_Error","status":"True","type":"Degraded"},{"lastTransitionTime":"2021-09-17T09:09:32Z","message":"NodeInstallerProgressing: 3 nodes are at revision 9","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-09-17T07:49:24Z","message":"StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 9","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-17T07:47:27Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}} ``` logs another occurrence of when the Degraded changed to True occurring from 2021-09-17T09:35:47Z until 09:59:54.372673 at least. The first time the Degraded=True condition appeared corresponds with the bootstrapping process as (for some reason) the third scheduler instance (openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal) has not been created. Given your listing of scheduler instances has openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal running, we can ignore the first log. The second time openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is missing. Which correlates with the list of instances you provided: ``` Reverting the changes back to normal did not bring back the pod at all and the operator remained in degraded state. installer-9-ip-10-0-203-82.cn-north-1.compute.internal 0/1 Completed 0 4h11m openshift-kube-scheduler-ip-10-0-185-82.cn-north-1.compute.internal 3/3 Running 0 4h9m openshift-kube-scheduler-ip-10-0-203-82.cn-north-1.compute.internal 3/3 Running 0 4h11m revision-pruner-4-ip-10-0-203-82.cn-north-1.compute.internal 0/1 Completed 0 5h28m ``` Kubelet logs for openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal from around 09:35: ``` Sep 17 09:33:23.307175 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.307139 1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal] Sep 17 09:33:23.309005 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.308944 1354 kubelet.go:2076] "SyncLoop ADD" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal] Sep 17 09:33:23.315002 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.313279 1354 kubelet.go:2083] "SyncLoop UPDATE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal] Sep 17 09:33:23.318856 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:23.318827 1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal] Sep 17 09:33:24.124054 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:24.123952 1354 prober.go:116] "Probe failed" probeType="Liveness" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" podUID=7965b44b-c369-487b-952a-b2138da7b3c4 containerName="kube-scheduler" probeResult=failure output="Get \"https://10.0.154.184:10259/healthz\": dial tcp 10.0.154.184:10259: connect: connection refused" Sep 17 09:33:24.124318 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:24.124276 1354 prober.go:116] "Probe failed" probeType="Readiness" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" podUID=7965b44b-c369-487b-952a-b2138da7b3c4 containerName="kube-scheduler" probeResult=failure output="Get \"https://10.0.154.184:10259/healthz\": dial tcp 10.0.154.184:10259: connect: connection refused" ``` Which reports the kubelet registered the file on the disk changed. ``` Sep 17 09:28:04.224578 ip-10-0-154-184 hyperkube[1354]: I0917 09:28:04.224543 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:29:04.224914 ip-10-0-154-184 hyperkube[1354]: I0917 09:29:04.224884 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:30:04.225713 ip-10-0-154-184 hyperkube[1354]: I0917 09:30:04.225632 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:31:04.227250 ip-10-0-154-184 hyperkube[1354]: I0917 09:31:04.226783 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:32:04.227866 ip-10-0-154-184 hyperkube[1354]: I0917 09:32:04.227829 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:33:04.229055 ip-10-0-154-184 hyperkube[1354]: I0917 09:33:04.228948 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:34:04.229222 ip-10-0-154-184 hyperkube[1354]: I0917 09:34:04.229229 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:35:04.229893 ip-10-0-154-184 hyperkube[1354]: I0917 09:35:04.229767 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:36:04.230128 ip-10-0-154-184 hyperkube[1354]: I0917 09:36:04.230040 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:37:04.230410 ip-10-0-154-184 hyperkube[1354]: I0917 09:37:04.230380 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:38:04.231844 ip-10-0-154-184 hyperkube[1354]: I0917 09:38:04.231469 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:39:04.233008 ip-10-0-154-184 hyperkube[1354]: I0917 09:39:04.231737 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:40:04.232948 ip-10-0-154-184 hyperkube[1354]: I0917 09:40:04.232890 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:41:04.234377 ip-10-0-154-184 hyperkube[1354]: I0917 09:41:04.233868 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:42:04.234716 ip-10-0-154-184 hyperkube[1354]: I0917 09:42:04.234616 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:43:04.235042 ip-10-0-154-184 hyperkube[1354]: I0917 09:43:04.235055 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:44:04.236589 ip-10-0-154-184 hyperkube[1354]: I0917 09:44:04.236238 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:45:04.236831 ip-10-0-154-184 hyperkube[1354]: I0917 09:45:04.236700 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:46:04.237611 ip-10-0-154-184 hyperkube[1354]: I0917 09:46:04.237621 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:47:04.237811 ip-10-0-154-184 hyperkube[1354]: I0917 09:47:04.237816 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:48:04.238839 ip-10-0-154-184 hyperkube[1354]: I0917 09:48:04.238745 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:49:04.239495 ip-10-0-154-184 hyperkube[1354]: I0917 09:49:04.239459 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:50:04.240366 ip-10-0-154-184 hyperkube[1354]: I0917 09:50:04.240253 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:51:04.240455 ip-10-0-154-184 hyperkube[1354]: I0917 09:51:04.240422 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:52:04.241090 ip-10-0-154-184 hyperkube[1354]: I0917 09:52:04.241058 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:53:04.242282 ip-10-0-154-184 hyperkube[1354]: I0917 09:53:04.242248 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:54:04.243539 ip-10-0-154-184 hyperkube[1354]: I0917 09:54:04.243445 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:55:04.243916 ip-10-0-154-184 hyperkube[1354]: I0917 09:55:04.243801 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:56:04.244825 ip-10-0-154-184 hyperkube[1354]: I0917 09:56:04.244792 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:57:04.246895 ip-10-0-154-184 hyperkube[1354]: I0917 09:57:04.246887 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:58:04.247939 ip-10-0-154-184 hyperkube[1354]: I0917 09:58:04.247912 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:59:04.248206 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:04.248164 1354 kubelet_getters.go:176] "Pod status updated" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" status=Running Sep 17 09:59:53.959059 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.958953 1354 kubelet.go:2086] "SyncLoop REMOVE" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal] Sep 17 09:59:53.959630 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.959573 1354 kubelet.go:2076] "SyncLoop ADD" source="file" pods=[openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal] Sep 17 09:59:53.959668 ip-10-0-154-184 hyperkube[1354]: I0917 09:59:53.959657 1354 kubelet.go:2186] "SyncLoop (probe)" probe="liveness" status="unhealthy" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal" ``` Saying openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is running. Unfortunatelly, http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2005350/must-gather.local.4429227838798661642/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-514b5bd41b36c20014cd237c8aa7eec548e02df0b5d289ae15385eb2a8c6eebf/namespaces/openshift-kube-scheduler/pods/ does not have the logs for the pod. http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2005350/must-gather.local.4429227838798661642/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-514b5bd41b36c20014cd237c8aa7eec548e02df0b5d289ae15385eb2a8c6eebf/host_service_logs/masters/crio_service.log shows: ``` Sep 17 09:07:43.755407 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.755366931Z" level=info msg="Created container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=d08b19bc-2ff7-4529-92b3-284fdc81f764 name=/runtime.v1alpha2.RuntimeService/CreateContainer Sep 17 09:07:43.756022 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.755984608Z" level=info msg="Starting container: 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57" id=808fa82a-bd49-46c2-92ee-3da3efebd548 name=/runtime.v1alpha2.RuntimeService/StartContainer Sep 17 09:07:43.774435 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.774380005Z" level=info msg="Started container" PID=119747 containerID=5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57 description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler id=808fa82a-bd49-46c2-92ee-3da3efebd548 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427 ... Sep 17 09:07:44.230872 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.230827571Z" level=info msg="Created container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=facf3fdb-60c7-4a96-b25a-84763c83251f name=/runtime.v1alpha2.RuntimeService/CreateContainer Sep 17 09:07:44.231364 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.231318791Z" level=info msg="Starting container: a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e" id=7b73ea19-d925-4ad6-9b14-f8801d9f4e7a name=/runtime.v1alpha2.RuntimeService/StartContainer Sep 17 09:07:44.249862 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.249231641Z" level=info msg="Started container" PID=119873 containerID=a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller id=7b73ea19-d925-4ad6-9b14-f8801d9f4e7a name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427 ... Sep 17 09:07:43.986771 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.986725074Z" level=info msg="Created container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=9a20a376-52bc-46ee-9656-ffc651d6a989 name=/runtime.v1alpha2.RuntimeService/CreateContainer Sep 17 09:07:43.987452 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:43.987419608Z" level=info msg="Starting container: 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91" id=6891ae48-a6ba-4c43-b9a6-bcbbdd590017 name=/runtime.v1alpha2.RuntimeService/StartContainer Sep 17 09:07:44.001663 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:07:44.001614185Z" level=info msg="Started container" PID=119825 containerID=7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91 description=openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer id=6891ae48-a6ba-4c43-b9a6-bcbbdd590017 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=a3715e882740169c8d864a233805a4ca202d3aa0351e230fd123f280c0a2b427 ... Sep 17 09:33:23.498782 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.498737761Z" level=info msg="Stopped container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=f0b425bc-7788-4110-b534-9e8ccba2cb71 name=/runtime.v1alpha2.RuntimeService/StopContainer Sep 17 09:33:23.515016 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.514964942Z" level=info msg="Stopped container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=3ce1dcfb-530a-406d-b2f1-bcef3b814d7d name=/runtime.v1alpha2.RuntimeService/StopContainer Sep 17 09:33:23.533013 ip-10-0-154-184 crio[1323]: time="2021-09-17 09:33:23.532962967Z" level=info msg="Stopped container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=2ea6122e-fd6d-4140-b2d8-5916a65862bc name=/runtime.v1alpha2.RuntimeService/StopContainer ... Sep 17 10:00:15.841950 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.841917034Z" level=info msg="Removed container a9e6202b81500de0528891e938c56af2b9a8b4837de01bab6c1ddbfe9c71de0e: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-recovery-controller" id=907ac0df-37a6-49fb-b745-95d9cfde5c78 name=/runtime.v1alpha2.RuntimeService/RemoveContainer Sep 17 10:00:15.842934 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.842906113Z" level=info msg="Removing container: 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57" id=bd4a13d4-8907-4f5d-b56f-c89657a3aafe name=/runtime.v1alpha2.RuntimeService/RemoveContainer Sep 17 10:00:15.860676 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.860646998Z" level=info msg="Removed container 5f02ec8ea775555f0293c64e8785baa7f8b3a7f381893a6dace10583f33eec57: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler" id=bd4a13d4-8907-4f5d-b56f-c89657a3aafe name=/runtime.v1alpha2.RuntimeService/RemoveContainer Sep 17 10:00:15.861689 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.861660152Z" level=info msg="Removing container: 52fdc3baf7f94f1511ed3a0413f6f3e7511ddbc7620c26f7272ba1f5c595725d" id=41f54833-90df-4f5c-a5f8-a1691aafd039 name=/runtime.v1alpha2.RuntimeService/RemoveContainer Sep 17 10:00:15.909234 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.909203533Z" level=info msg="Removed container 52fdc3baf7f94f1511ed3a0413f6f3e7511ddbc7620c26f7272ba1f5c595725d: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/wait-for-host-port" id=41f54833-90df-4f5c-a5f8-a1691aafd039 name=/runtime.v1alpha2.RuntimeService/RemoveContainer Sep 17 10:00:15.910528 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.910499929Z" level=info msg="Removing container: 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91" id=04e0d12e-49cd-4a73-8ebe-747d2419f1f6 name=/runtime.v1alpha2.RuntimeService/RemoveContainer Sep 17 10:00:15.929278 ip-10-0-154-184 crio[1323]: time="2021-09-17 10:00:15.929250492Z" level=info msg="Removed container 7871bf271902a208af526b4199b841d06d2d1c9ba18f1927999095a9d442ac91: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal/kube-scheduler-cert-syncer" id=04e0d12e-49cd-4a73-8ebe-747d2419f1f6 name=/runtime.v1alpha2.RuntimeService/RemoveContainer ``` All three openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal containers were stopped at 09:33:23. Correlating with the kubelet logs clearly saying openshift-kube-scheduler-ip-10-0-154-184.cn-north-1.compute.internal is running even though the crio.logs says all the containers are stopped, the kubelet incorrectly reports the pod status.
Hello Harshal, I see that the test has only failed once on one profile (aws on china region) and after that i did not hit the issue again. But i can run the test on the same profile and see if that works. Is there any other thing you would want me to look in particular ? Thanks kasturi
Hello Harshal, I tried to run this test with latest CI builds on the profile which was different from where the bug was actually reported and i do not see the issue happening there. We have a problem bringing up that profile and once it is up i will run the case there as well and provide information. [knarra@knarra cucushift]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-11-20-024310 True False 9h Cluster version is 4.9.0-0.nightly-2021-11-20-024310 Thanks kasturi
@harshal Patil, i think bug should be moved to ON_QA rather than CLOSED -> CURRENT RELEASE as there is a fix which has gone into the payload. WDYT ?