Bug 2029468 - wait for installer n to be really deleted before n+1 is installed WAS: kube-scheduler-operator test failure: Node's not achieving new revision [NEEDINFO]
Summary: wait for installer n to be really deleted before n+1 is installed WAS: kube-s...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.10
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ---
: ---
Assignee: Jan Chaloupka
QA Contact: RamaKasturi
URL:
Whiteboard: LifecycleStale
Depends On: 2028595 2029470
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-06 15:20 UTC by Stefan Schimanski
Modified: 2023-01-16 09:33 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2028595
Environment:
Last Closed: 2023-01-16 09:33:18 UTC
Target Upstream Version:
Embargoed:
mfojtik: needinfo?


Attachments (Terms of Use)

Description Stefan Schimanski 2021-12-06 15:20:16 UTC
+++ This bug was initially created as a clone of Bug #2028595 +++

Spotted in CI run: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial/1466323246101041152

Test failure:

: operator conditions kube-scheduler expand_less 	0s
Operator progressing (NodeInstaller): NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 4; 1 nodes are at revision 6; 0 nodes have achieved new revision 7

Digging in we find that the kube-scheduler-operator is reporting:

2021-12-02T08:50:48.957393046Z I1202 08:50:48.957356       1 installer_controller.go:512] "ip-10-0-129-204.us-east-2.compute.internal" is in transition to 7, but has not made progress because waiting for static pod of revision 7, found 6

This repeats for 20 minutes until:

2021-12-02T09:10:11.339203742Z I1202 09:10:11.339165       1 installer_controller.go:512] "ip-10-0-129-204.us-east-2.compute.internal" is in transition to 7, but has not made progress because waiting for static pod of revision 7, found 6

At which point the test has failed.

In the relevant must-gather: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial/1466323246101041152/artifacts/e2e-aws-serial/gather-must-gather/artifacts/must-gather.tar

The installer pod claims to have written the revision 7 static pod manifest at:

2021-12-02T08:50:26.659271543Z I1202 08:50:26.659252       1 cmd.go:489] Writing static pod manifest "/etc/kubernetes/manifests/kube-scheduler-pod.yaml" ...

But in the must-gather we see that the running pod is still revision 6 well over 20 minutes later:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/default-logs-container: kube-scheduler
    kubernetes.io/config.hash: cce59ead72804895d31dba4208b395aa
    kubernetes.io/config.mirror: cce59ead72804895d31dba4208b395aa
    kubernetes.io/config.seen: "2021-12-02T08:50:47.930740314Z"
    kubernetes.io/config.source: file
    target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
  creationTimestamp: "2021-12-02T08:50:47Z"
  labels:
    app: openshift-kube-scheduler
    revision: "6"
    scheduler: "true"
...


In the kubelet logs for the relevant master we see:

 Dec 02 08:50:47.935456 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:47.935427    1371 status_manager.go:611] "Pod was deleted and then recreated, skipping status update" pod="openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-129-204.us-east-2.compute.internal" oldPodUID=f11e314508a00e4ea9bf37d76b82b162 podUID=cce59ead72804895d31dba4208b395aa

It appears the kubelet is not respecting the new static pod definition.



Looking at CI as a whole and assuming this test failure will typically be the same failure, it appears this is involved in roughly 3.5% of all 4.10 failed jobs: https://search.ci.openshift.org/?search=nodes+have+achieved+new+revision&maxAge=48h&context=1&type=junit&name=4.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

--- Additional comment from Jan Chaloupka on 2021-12-02 19:41:55 CET ---

Pulling the must-gather and checking kubelet_service.log/crio_service.log from registry-ci-openshift-org-ocp-4-10-2021-12-02-081913-sha256-e78391c0bf790fea89fc8267f6e4a4b0497430642a2c329f2268d8eae23f465c/host_service_logs/masters.

kubelet_service.log for installer-6-ip-10-0-129-204.us-east-2.compute.internal:
```
Dec 02 08:50:14.597970 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:14.597891    1371 kubelet.go:2076] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:14.915700 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:14.915568    1371 kuberuntime_manager.go:483] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal"
Dec 02 08:50:20.195393 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:20.195360    1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:46.897346 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:46.897307    1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:47.074528 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:47.074499    1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:e613e2cb-e2e5-4a33-bac5-3d046411b324 Type:ContainerStarted Data:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14}
Dec 02 08:50:48.086749 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:48.086714    1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:e613e2cb-e2e5-4a33-bac5-3d046411b324 Type:ContainerStarted Data:0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5}
Dec 02 08:50:48.086904 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:48.086878    1371 kuberuntime_container.go:720] "Killing container with a grace period" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" podUID=e613e2cb-e2e5-4a33-bac5-3d046411b324 containerName="installer" containerID="cri-o://0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5" gracePeriod=30
Dec 02 08:50:49.093406 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:49.093345    1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:e613e2cb-e2e5-4a33-bac5-3d046411b324 Type:ContainerDied Data:0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5}
Dec 02 08:50:49.093406 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:49.093367    1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:e613e2cb-e2e5-4a33-bac5-3d046411b324 Type:ContainerDied Data:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14}
Dec 02 08:50:49.152892 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:49.152862    1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:49.154890 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:49.154519    1371 kubelet.go:2086] "SyncLoop REMOVE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal]
```

kubelet_service.log for installer-7-ip-10-0-129-204.us-east-2.compute.internal:
```
Dec 02 08:50:24.001183 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:24.001152    1371 kubelet.go:2076] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:24.328780 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:24.328746    1371 kuberuntime_manager.go:483] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal"
Dec 02 08:50:25.539685 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:25.530416    1371 kubelet.go:2083] "SyncLoop UPDATE" source="api" pods=[openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:26.000898 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:26.000826    1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c Type:ContainerStarted Data:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81}
Dec 02 08:50:27.006226 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:27.006015    1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c Type:ContainerDied Data:18aecb076f180ca3220d5c90a4c6b8fc75e00bae87cae450430f5e4075a8e0e4}
Dec 02 08:50:28.014048 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:28.014008    1371 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal" event=&{ID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c Type:ContainerDied Data:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81}
Dec 02 08:50:28.014456 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:28.014171    1371 kuberuntime_manager.go:502] "No ready sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal"
Dec 02 09:10:59.764265 ip-10-0-129-204 hyperkube[1371]: I1202 09:10:59.764236    1371 logs.go:319] "Finished parsing log file" path="/var/log/pods/openshift-kube-scheduler_installer-7-ip-10-0-129-204.us-east-2.compute.internal_d2d64d26-6c1f-43f5-a080-e224dccbcb1c/installer/0.log"
```

crio_service.log for installer-6-ip-10-0-129-204.us-east-2.compute.internal:
```
Dec 02 08:50:14.916182 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:14.916118428Z" level=info msg="Running pod sandbox: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/POD" id=83fe7ed6-c9b7-4f00-9afc-760212ac3aed name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Dec 02 08:50:14.937758 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:14.937725376Z" level=info msg="Got pod network &{Name:installer-6-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 UID:e613e2cb-e2e5-4a33-bac5-3d046411b324 NetNS:/var/run/netns/426e0e6e-e178-4a28-819c-3665ec20eba2 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Dec 02 08:50:14.937758 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:14.937753811Z" level=info msg="Adding pod openshift-kube-scheduler_installer-6-ip-10-0-129-204.us-east-2.compute.internal to CNI network \"multus-cni-network\" (type=multus)"
Dec 02 08:50:46.898928 ip-10-0-129-204 crio[1340]: 2021-12-02T08:50:46Z [verbose] Add: openshift-kube-scheduler:installer-6-ip-10-0-129-204.us-east-2.compute.internal:e613e2cb-e2e5-4a33-bac5-3d046411b324:openshift-sdn(openshift-sdn):eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/var/run/netns/426e0e6e-e178-4a28-819c-3665ec20eba2"}],"ips":[{"version":"4","interface":0,"address":"10.129.0.51/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.129.0.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}}
Dec 02 08:50:46.898928 ip-10-0-129-204 crio[1340]: I1202 08:50:46.873425   22770 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-scheduler", Name:"installer-6-ip-10-0-129-204.us-east-2.compute.internal", UID:"e613e2cb-e2e5-4a33-bac5-3d046411b324", APIVersion:"v1", ResourceVersion:"15668", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.129.0.51/23] from openshift-sdn
Dec 02 08:50:46.899435 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:46.899395831Z" level=info msg="Got pod network &{Name:installer-6-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 UID:e613e2cb-e2e5-4a33-bac5-3d046411b324 NetNS:/var/run/netns/426e0e6e-e178-4a28-819c-3665ec20eba2 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Dec 02 08:50:46.899528 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:46.899512888Z" level=info msg="Checking pod openshift-kube-scheduler_installer-6-ip-10-0-129-204.us-east-2.compute.internal for CNI network multus-cni-network (type=multus)"
Dec 02 08:50:46.907676 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:46.907616868Z" level=info msg="Ran pod sandbox 922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 with infra container: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/POD" id=83fe7ed6-c9b7-4f00-9afc-760212ac3aed name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Dec 02 08:50:46.912470 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:46.912447861Z" level=info msg="Creating container: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=07941e6b-9315-4f7c-b9a2-7e55166f53c4 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Dec 02 08:50:47.232347 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:47.232302366Z" level=info msg="Created container 0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=07941e6b-9315-4f7c-b9a2-7e55166f53c4 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Dec 02 08:50:47.250378 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:47.250319005Z" level=info msg="Started container" PID=24486 containerID=0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5 description=openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer id=1b2bf1bc-90e7-4917-8640-61b6a79dee4b name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14
Dec 02 08:50:48.265838 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:48.261799038Z" level=info msg="Stopped container 0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=793729ab-ee6b-4c7a-bc34-dc3876e1f157 name=/runtime.v1alpha2.RuntimeService/StopContainer
Dec 02 08:50:48.270138 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:48.270108504Z" level=info msg="Got pod network &{Name:installer-6-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14 UID:e613e2cb-e2e5-4a33-bac5-3d046411b324 NetNS:/var/run/netns/426e0e6e-e178-4a28-819c-3665ec20eba2 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Dec 02 08:50:48.270328 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:48.270288839Z" level=info msg="Deleting pod openshift-kube-scheduler_installer-6-ip-10-0-129-204.us-east-2.compute.internal from CNI network \"multus-cni-network\" (type=multus)"
Dec 02 08:50:48.361546 ip-10-0-129-204 crio[1340]: 2021-12-02T08:50:48Z [verbose] Del: openshift-kube-scheduler:installer-6-ip-10-0-129-204.us-east-2.compute.internal:e613e2cb-e2e5-4a33-bac5-3d046411b324:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}
Dec 02 08:50:49.131160 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:49.130557683Z" level=info msg="Removed container 0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=7409cfa9-6dda-46be-8531-50548021deb8 name=/runtime.v1alpha2.RuntimeService/RemoveContainer
```

crio_service.log for installer-7-ip-10-0-129-204.us-east-2.compute.internal:
```
Dec 02 08:50:24.329304 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:24.329240449Z" level=info msg="Running pod sandbox: openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/POD" id=6d3f070e-3405-40fd-88db-1dcb3de2d0b3 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Dec 02 08:50:24.349843 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:24.349805929Z" level=info msg="Got pod network &{Name:installer-7-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 UID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c NetNS:/var/run/netns/8fb53701-b7c0-47c5-b9ac-cc066ab157d9 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Dec 02 08:50:24.349843 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:24.349843781Z" level=info msg="Adding pod openshift-kube-scheduler_installer-7-ip-10-0-129-204.us-east-2.compute.internal to CNI network \"multus-cni-network\" (type=multus)"
Dec 02 08:50:25.523921 ip-10-0-129-204 crio[1340]: 2021-12-02T08:50:25Z [verbose] Add: openshift-kube-scheduler:installer-7-ip-10-0-129-204.us-east-2.compute.internal:d2d64d26-6c1f-43f5-a080-e224dccbcb1c:openshift-sdn(openshift-sdn):eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/var/run/netns/8fb53701-b7c0-47c5-b9ac-cc066ab157d9"}],"ips":[{"version":"4","interface":0,"address":"10.129.0.50/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.129.0.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}}
Dec 02 08:50:25.523921 ip-10-0-129-204 crio[1340]: I1202 08:50:25.508458   23113 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-scheduler", Name:"installer-7-ip-10-0-129-204.us-east-2.compute.internal", UID:"d2d64d26-6c1f-43f5-a080-e224dccbcb1c", APIVersion:"v1", ResourceVersion:"15767", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.129.0.50/23] from openshift-sdn
Dec 02 08:50:25.526438 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.526398521Z" level=info msg="Got pod network &{Name:installer-7-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 UID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c NetNS:/var/run/netns/8fb53701-b7c0-47c5-b9ac-cc066ab157d9 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Dec 02 08:50:25.526722 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.526702613Z" level=info msg="Checking pod openshift-kube-scheduler_installer-7-ip-10-0-129-204.us-east-2.compute.internal for CNI network multus-cni-network (type=multus)"
Dec 02 08:50:25.549754 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.549721898Z" level=info msg="Ran pod sandbox 2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 with infra container: openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/POD" id=6d3f070e-3405-40fd-88db-1dcb3de2d0b3 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Dec 02 08:50:25.555095 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.555065782Z" level=info msg="Creating container: openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/installer" id=2bb0c487-9c13-4682-ab52-3923421b3926 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Dec 02 08:50:25.964434 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.964384878Z" level=info msg="Created container 18aecb076f180ca3220d5c90a4c6b8fc75e00bae87cae450430f5e4075a8e0e4: openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/installer" id=2bb0c487-9c13-4682-ab52-3923421b3926 name=/runtime.v1alpha2.RuntimeService/CreateContainer
Dec 02 08:50:25.997697 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:25.990253651Z" level=info msg="Started container" PID=23202 containerID=18aecb076f180ca3220d5c90a4c6b8fc75e00bae87cae450430f5e4075a8e0e4 description=openshift-kube-scheduler/installer-7-ip-10-0-129-204.us-east-2.compute.internal/installer id=9412e0dc-aae4-409d-9de8-c6844b5dd043 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81
Dec 02 08:50:27.013027 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:27.012998220Z" level=info msg="Got pod network &{Name:installer-7-ip-10-0-129-204.us-east-2.compute.internal Namespace:openshift-kube-scheduler ID:2e5a71fe0e6f9c14aad0bb4fec8c6afe94638337b0d94a50af70c019af1c5f81 UID:d2d64d26-6c1f-43f5-a080-e224dccbcb1c NetNS:/var/run/netns/8fb53701-b7c0-47c5-b9ac-cc066ab157d9 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Dec 02 08:50:27.013251 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:27.013221890Z" level=info msg="Deleting pod openshift-kube-scheduler_installer-7-ip-10-0-129-204.us-east-2.compute.internal from CNI network \"multus-cni-network\" (type=multus)"
Dec 02 08:50:27.251672 ip-10-0-129-204 crio[1340]: 2021-12-02T08:50:27Z [verbose] Del: openshift-kube-scheduler:installer-7-ip-10-0-129-204.us-east-2.compute.internal:d2d64d26-6c1f-43f5-a080-e224dccbcb1c:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}
```

Notice the following lines and the time they were logged:

```
Dec 02 08:50:14.597970 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:14.597891    1371 kubelet.go:2076] "SyncLoop ADD" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:14.915700 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:14.915568    1371 kuberuntime_manager.go:483] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal"
Dec 02 08:50:20.195393 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:20.195360    1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:46.897346 ip-10-0-129-204 hyperkube[1371]: I1202 08:50:46.897307    1371 kubelet.go:2092] "SyncLoop DELETE" source="api" pods=[openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal]
Dec 02 08:50:47.250378 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:47.250319005Z" level=info msg="Started container" PID=24486 containerID=0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5 description=openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer id=1b2bf1bc-90e7-4917-8640-61b6a79dee4b name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=922f082d193b55071f363a1d3d387946a955bea3e8614e1577658fd621498d14
Dec 02 08:50:48.265838 ip-10-0-129-204 crio[1340]: time="2021-12-02 08:50:48.261799038Z" level=info msg="Stopped container 0376b6c22d4639e39aac9a93f2e1895a517a1ef09dc781fd3f3ee1f70d6f8bf5: openshift-kube-scheduler/installer-6-ip-10-0-129-204.us-east-2.compute.internal/installer" id=793729ab-ee6b-4c7a-bc34-dc3876e1f157 name=/runtime.v1alpha2.RuntimeService/StopContainer
```

08:50:14.597891 - installer-6-ip-10-0-129-204.us-east-2.compute.internal created
08:50:14.915568 - installer-6-ip-10-0-129-204.us-east-2.compute.internal waiting for its sandbox
08:50:20.195360 - installer-6-ip-10-0-129-204.us-east-2.compute.internal requested to be deleted
08:50:24.001152 - installer-7-ip-10-0-129-204.us-east-2.compute.internal created
08:50:28.014171 - installer-7-ip-10-0-129-204.us-east-2.compute.internal waiting for its sandbox
08:50:25.997697 - installer-7-ip-10-0-129-204.us-east-2.compute.internal installer container started to run
08:50:27.013251 - installer-7-ip-10-0-129-204.us-east-2.compute.internal installer container stopped running/got deleted
08:50:47.250319 - installer-6-ip-10-0-129-204.us-east-2.compute.internal installer container started to run
08:50:48.265838 - installer-6-ip-10-0-129-204.us-east-2.compute.internal installer container stopped running

Based on the timing, the revision 7 installer got running and created the static manifests (based on the container logs). However, it was followed by running the revision 6 installer. Which rewritten the manifests to revision 6. So we got into a state where revision 7 pod is expected but it will never get created. Although, the installer controller sees the revision 7 installer completed successfully.

--- Additional comment from Devan Goodwin on 2021-12-06 14:39:15 CET ---

Great info thanks Jan, looks like a pretty serious scheduler bug. Moving to kube-scheduler as my kubelet guess was apparently incorrect.

Comment 1 Jan Chaloupka 2022-01-07 12:19:02 UTC
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 2 Michal Fojtik 2022-02-07 09:23:18 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 5 Jan Chaloupka 2022-02-18 11:20:30 UTC
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 6 Jan Chaloupka 2022-03-11 14:24:43 UTC
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 7 Jan Chaloupka 2022-04-25 13:37:50 UTC
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.


Note You need to log in before you can comment on or make changes to this bug.