Hide Forgot
+++ This bug was initially created as a clone of Bug #2028595 +++ Spotted in CI run: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial/1466323246101041152 Test failure: : operator conditions kube-scheduler expand_less 0s Operator progressing (NodeInstaller): NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 6; 0 nodes have achieved new revision 7 Digging in we find that the kube-scheduler-operator is reporting: 2021-12-02T08:50:48.957393046Z I1202 08:50:48.957356 1 installer_controller.go:512] "ip-10-0-129-204.us-east-2.compute.internal" is in transition to 7, but has not made progress because waiting for static pod of revision 7, found 6 This repeats for 20 minutes until: 2021-12-02T09:10:11.339203742Z I1202 09:10:11.339165 1 installer_controller.go:512] "ip-10-0-129-204.us-east-2.compute.internal" is in transition to 7, but has not made progress because waiting for static pod of revision 7, found 6 At which point the test has failed. In the relevant must-gather: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial/1466323246101041152/artifacts/e2e-aws-serial/gather-must-gather/artifacts/must-gather.tar The installer pod claims to have written the revision 7 static pod manifest at: 2021-12-02T08:50:26.659271543Z I1202 08:50:26.659252 1 cmd.go:489] Writing static pod manifest "/etc/kubernetes/manifests/kube-scheduler-pod.yaml" ... But in the must-gather we see that the running pod is still revision 6 well over 20 minutes later: apiVersion: v1 kind: Pod metadata: annotations: kubectl.kubernetes.io/default-logs-container: kube-scheduler kubernetes.io/config.hash: cce59ead72804895d31dba4208b395aa kubernetes.io/config.mirror: cce59ead72804895d31dba4208b395aa kubernetes.io/config.seen: "2021-12-02T08:50:47.930740314Z" kubernetes.io/config.source: file target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}' creationTimestamp: "2021-12-02T08:50:47Z" labels: app: openshift-kube-scheduler revision: "6" scheduler: "true" ... In the kubelet logs for the relevant master we see: Dec 02 08:50:47.935456 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:47.935427 1371 status_manager.go:611] "Pod was deleted and then recreated, skipping status update" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-129-204.us-east-2.compute.internal" oldPodUID=f11e314508a00e4ea9bf37d76b82b162 podUID=cce59ead72804895d31dba4208b395aa It appears the kubelet is not respecting the new static pod definition. Looking at CI as a whole and assuming this test failure will typically be the same failure, it appears this is involved in roughly 3.5% of all 4.10 failed jobs: https://search.ci.openshift.org/?search=nodes+have+achieved+new+revision&maxAge=48h&context=1&type=junit&name=4.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job --- Additional comment from Jan Chaloupka on 2021-12-02 19:41:55 CET --- Pulling the must-gather and checking kubelet_service.log/crio_service.log from registry-ci-openshift-org-ocp-4-10-2021-12-02-081913-sha256-e78391c0bf790fea89fc8267f6e4a4b0497430642a2c329f2268d8eae23f465c/host_service_logs/masters. kubelet_service.log for installer-6-ip-10-0-129-204.us-east-2.compute.internal: ``` Dec 02 08:50:14.597970 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:14.597891 1371 kubelet.go:2076] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:14.915700 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:14.915568 1371 kuberuntime_manager.go:483] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" Dec 02 08:50:20.195393 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:20.195360 1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:46.897346 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:46.897307 1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:47.074528 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:47.074499 1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:e613e2cb-e2e5-4a33-bac5-3d046411b324 Type:ContainerStarted Data:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14} Dec 02 08:50:48.086749 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:48.086714 1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:e613e2cb-e2e5-4a33-bac5-3d046411b324 Type:ContainerStarted Data:0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5} Dec 02 08:50:48.086904 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:48.086878 1371 kuberuntime_container.go:720] "Killing container with a grace period" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" podUID=e613e2cb-e2e5-4a33-bac5-3d046411b324 containerName="installer" containerID="cri-o://0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5" gracePeriod=30 Dec 02 08:50:49.093406 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:49.093345 1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:e613e2cb-e2e5-4a33-bac5-3d046411b324 Type:ContainerDied Data:0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5} Dec 02 08:50:49.093406 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:49.093367 1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:e613e2cb-e2e5-4a33-bac5-3d046411b324 Type:ContainerDied Data:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14} Dec 02 08:50:49.152892 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:49.152862 1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:49.154890 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:49.154519 1371 kubelet.go:2086] "SyncLoop REMOVE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal] ``` kubelet_service.log for installer-7-ip-10-0-129-204.us-east-2.compute.internal: ``` Dec 02 08:50:24.001183 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:24.001152 1371 kubelet.go:2076] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:24.328780 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:24.328746 1371 kuberuntime_manager.go:483] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal" Dec 02 08:50:25.539685 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:25.530416 1371 kubelet.go:2083] "SyncLoop UPDATE" source="api" pods=[openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:26.000898 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:26.000826 1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c Type:ContainerStarted Data:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81} Dec 02 08:50:27.006226 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:27.006015 1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c Type:ContainerDied Data:18aecb076f180ca3220d5c90a4c6b8fc75e00bae87cae450430f5e4075a8e0e4} Dec 02 08:50:28.014048 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:28.014008 1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c Type:ContainerDied Data:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81} Dec 02 08:50:28.014456 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:28.014171 1371 kuberuntime_manager.go:502] "No ready sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal" Dec 02 09:10:59.764265 ip-10-0-129-204 hyperkube[1371]: I1202 09:10:59.764236 1371 logs.go:319] "Finished parsing log file" path="/var/log/pods/openshift-kube-scheduler_installer-7-ip-10-0-129-204.us-east-2.compute.internal_d2d64d26-6c1f-43f5-a080-e224dccbcb1c/installer/0.log" ``` crio_service.log for installer-6-ip-10-0-129-204.us-east-2.compute.internal: ``` Dec 02 08:50:14.916182 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:14.916118428Z" level=info msg="Running pod sandbox: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/POD" id=83fe7ed6-c9b7-4f00-9afc-760212ac3aed name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 02 08:50:14.937758 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:14.937725376Z" level=info msg="Got pod network &{Name:installer-6-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 UID:e613e2cb-e2e5-4a33-bac5-3d046411b324 NetNS:/var/run/netns/426e0e6e-e178-4a28-819c-3665ec20eba2 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}" Dec 02 08:50:14.937758 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:14.937753811Z" level=info msg="Adding pod openshift-kube-scheduler_installer-6-ip-10-0-129-204.us-east-2.compute.internal to CNI network \"multus-cni-network\" (type=multus)" Dec 02 08:50:46.898928 ip-10-0-129-204 crio[1340]: 2021-12-02T08:50:46Z [verbose] Add: openshift-kube-scheduler:installer-6-ip-10-0-129-204.us-east-2.compute.internal:e613e2cb-e2e5-4a33-bac5-3d046411b324:openshift-sdn(openshift-sdn):eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/var/run/netns/426e0e6e-e178-4a28-819c-3665ec20eba2"}],"ips":[{"version":"4","interface":0,"address":"10.129.0.51/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.129.0.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}} Dec 02 08:50:46.898928 ip-10-0-129-204 crio[1340]: I1202 08:50:46.873425 22770 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-scheduler", Name:"installer-6-ip-10-0-129-204.us-east-2.compute.internal", UID:"e613e2cb-e2e5-4a33-bac5-3d046411b324", APIVersion:"v1", ResourceVersion:"15668", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.129.0.51/23] from openshift-sdn Dec 02 08:50:46.899435 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:46.899395831Z" level=info msg="Got pod network &{Name:installer-6-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 UID:e613e2cb-e2e5-4a33-bac5-3d046411b324 NetNS:/var/run/netns/426e0e6e-e178-4a28-819c-3665ec20eba2 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}" Dec 02 08:50:46.899528 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:46.899512888Z" level=info msg="Checking pod openshift-kube-scheduler_installer-6-ip-10-0-129-204.us-east-2.compute.internal for CNI network multus-cni-network (type=multus)" Dec 02 08:50:46.907676 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:46.907616868Z" level=info msg="Ran pod sandbox 922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 with infra container: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/POD" id=83fe7ed6-c9b7-4f00-9afc-760212ac3aed name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 02 08:50:46.912470 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:46.912447861Z" level=info msg="Creating container: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=07941e6b-9315-4f7c-b9a2-7e55166f53c4 name=/runtime.v1alpha2.RuntimeService/CreateContainer Dec 02 08:50:47.232347 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:47.232302366Z" level=info msg="Created container 0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=07941e6b-9315-4f7c-b9a2-7e55166f53c4 name=/runtime.v1alpha2.RuntimeService/CreateContainer Dec 02 08:50:47.250378 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:47.250319005Z" level=info msg="Started container" PID=24486 containerID=0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5 description=openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer id=1b2bf1bc-90e7-4917-8640-61b6a79dee4b name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 Dec 02 08:50:48.265838 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:48.261799038Z" level=info msg="Stopped container 0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=793729ab-ee6b-4c7a-bc34-dc3876e1f157 name=/runtime.v1alpha2.RuntimeService/StopContainer Dec 02 08:50:48.270138 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:48.270108504Z" level=info msg="Got pod network &{Name:installer-6-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 UID:e613e2cb-e2e5-4a33-bac5-3d046411b324 NetNS:/var/run/netns/426e0e6e-e178-4a28-819c-3665ec20eba2 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}" Dec 02 08:50:48.270328 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:48.270288839Z" level=info msg="Deleting pod openshift-kube-scheduler_installer-6-ip-10-0-129-204.us-east-2.compute.internal from CNI network \"multus-cni-network\" (type=multus)" Dec 02 08:50:48.361546 ip-10-0-129-204 crio[1340]: 2021-12-02T08:50:48Z [verbose] Del: openshift-kube-scheduler:installer-6-ip-10-0-129-204.us-east-2.compute.internal:e613e2cb-e2e5-4a33-bac5-3d046411b324:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"} Dec 02 08:50:49.131160 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:49.130557683Z" level=info msg="Removed container 0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=7409cfa9-6dda-46be-8531-50548021deb8 name=/runtime.v1alpha2.RuntimeService/RemoveContainer ``` crio_service.log for installer-7-ip-10-0-129-204.us-east-2.compute.internal: ``` Dec 02 08:50:24.329304 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:24.329240449Z" level=info msg="Running pod sandbox: openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/POD" id=6d3f070e-3405-40fd-88db-1dcb3de2d0b3 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 02 08:50:24.349843 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:24.349805929Z" level=info msg="Got pod network &{Name:installer-7-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 UID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c NetNS:/var/run/netns/8fb53701-b7c0-47c5-b9ac-cc066ab157d9 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}" Dec 02 08:50:24.349843 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:24.349843781Z" level=info msg="Adding pod openshift-kube-scheduler_installer-7-ip-10-0-129-204.us-east-2.compute.internal to CNI network \"multus-cni-network\" (type=multus)" Dec 02 08:50:25.523921 ip-10-0-129-204 crio[1340]: 2021-12-02T08:50:25Z [verbose] Add: openshift-kube-scheduler:installer-7-ip-10-0-129-204.us-east-2.compute.internal:d2d64d26-6c1f-43f5-a080-e224dccbcb1c:openshift-sdn(openshift-sdn):eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/var/run/netns/8fb53701-b7c0-47c5-b9ac-cc066ab157d9"}],"ips":[{"version":"4","interface":0,"address":"10.129.0.50/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.129.0.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}} Dec 02 08:50:25.523921 ip-10-0-129-204 crio[1340]: I1202 08:50:25.508458 23113 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-scheduler", Name:"installer-7-ip-10-0-129-204.us-east-2.compute.internal", UID:"d2d64d26-6c1f-43f5-a080-e224dccbcb1c", APIVersion:"v1", ResourceVersion:"15767", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.129.0.50/23] from openshift-sdn Dec 02 08:50:25.526438 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.526398521Z" level=info msg="Got pod network &{Name:installer-7-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 UID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c NetNS:/var/run/netns/8fb53701-b7c0-47c5-b9ac-cc066ab157d9 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}" Dec 02 08:50:25.526722 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.526702613Z" level=info msg="Checking pod openshift-kube-scheduler_installer-7-ip-10-0-129-204.us-east-2.compute.internal for CNI network multus-cni-network (type=multus)" Dec 02 08:50:25.549754 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.549721898Z" level=info msg="Ran pod sandbox 2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 with infra container: openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/POD" id=6d3f070e-3405-40fd-88db-1dcb3de2d0b3 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 02 08:50:25.555095 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.555065782Z" level=info msg="Creating container: openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/installer" id=2bb0c487-9c13-4682-ab52-3923421b3926 name=/runtime.v1alpha2.RuntimeService/CreateContainer Dec 02 08:50:25.964434 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.964384878Z" level=info msg="Created container 18aecb076f180ca3220d5c90a4c6b8fc75e00bae87cae450430f5e4075a8e0e4: openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/installer" id=2bb0c487-9c13-4682-ab52-3923421b3926 name=/runtime.v1alpha2.RuntimeService/CreateContainer Dec 02 08:50:25.997697 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.990253651Z" level=info msg="Started container" PID=23202 containerID=18aecb076f180ca3220d5c90a4c6b8fc75e00bae87cae450430f5e4075a8e0e4 description=openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/installer id=9412e0dc-aae4-409d-9de8-c6844b5dd043 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 Dec 02 08:50:27.013027 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:27.012998220Z" level=info msg="Got pod network &{Name:installer-7-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 UID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c NetNS:/var/run/netns/8fb53701-b7c0-47c5-b9ac-cc066ab157d9 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}" Dec 02 08:50:27.013251 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:27.013221890Z" level=info msg="Deleting pod openshift-kube-scheduler_installer-7-ip-10-0-129-204.us-east-2.compute.internal from CNI network \"multus-cni-network\" (type=multus)" Dec 02 08:50:27.251672 ip-10-0-129-204 crio[1340]: 2021-12-02T08:50:27Z [verbose] Del: openshift-kube-scheduler:installer-7-ip-10-0-129-204.us-east-2.compute.internal:d2d64d26-6c1f-43f5-a080-e224dccbcb1c:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"} ``` Notice the following lines and the time they were logged: ``` Dec 02 08:50:14.597970 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:14.597891 1371 kubelet.go:2076] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:14.915700 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:14.915568 1371 kuberuntime_manager.go:483] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" Dec 02 08:50:20.195393 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:20.195360 1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:46.897346 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:46.897307 1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal] Dec 02 08:50:47.250378 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:47.250319005Z" level=info msg="Started container" PID=24486 containerID=0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5 description=openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer id=1b2bf1bc-90e7-4917-8640-61b6a79dee4b name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 Dec 02 08:50:48.265838 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:48.261799038Z" level=info msg="Stopped container 0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=793729ab-ee6b-4c7a-bc34-dc3876e1f157 name=/runtime.v1alpha2.RuntimeService/StopContainer ``` 08:50:14.597891 - installer-6-ip-10-0-129-204.us-east-2.compute.internal created 08:50:14.915568 - installer-6-ip-10-0-129-204.us-east-2.compute.internal waiting for its sandbox 08:50:20.195360 - installer-6-ip-10-0-129-204.us-east-2.compute.internal requested to be deleted 08:50:24.001152 - installer-7-ip-10-0-129-204.us-east-2.compute.internal created 08:50:28.014171 - installer-7-ip-10-0-129-204.us-east-2.compute.internal waiting for its sandbox 08:50:25.997697 - installer-7-ip-10-0-129-204.us-east-2.compute.internal installer container started to run 08:50:27.013251 - installer-7-ip-10-0-129-204.us-east-2.compute.internal installer container stopped running/got deleted 08:50:47.250319 - installer-6-ip-10-0-129-204.us-east-2.compute.internal installer container started to run 08:50:48.265838 - installer-6-ip-10-0-129-204.us-east-2.compute.internal installer container stopped running Based on the timing, the revision 7 installer got running and created the static manifests (based on the container logs). However, it was followed by running the revision 6 installer. Which rewritten the manifests to revision 6. So we got into a state where revision 7 pod is expected but it will never get created. Although, the installer controller sees the revision 7 installer completed successfully. --- Additional comment from Devan Goodwin on 2021-12-06 14:39:15 CET --- Great info thanks Jan, looks like a pretty serious scheduler bug. Moving to kube-scheduler as my kubelet guess was apparently incorrect.
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.
*** Bug 2042579 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069