Trying to bring up a bare metal dual-stack cluster using dev-scripts, MCO is degraded: - lastTransitionTime: "2020-09-21T11:36:03Z" message: 'Unable to apply 4.6.0-0.ci.test-2020-09-21-103241-ci-ln-2mpl212: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with "3 nodes are reporting degraded status on sync": "Node master-0 is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-4ffdaac60fdcb29578a2a0029f7dc5b5\\\" not found\", Node master-2 is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-4ffdaac60fdcb29578a2a0029f7dc5b5\\\" not found\", Node master-1 is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-4ffdaac60fdcb29578a2a0029f7dc5b5\\\" not found\"", retrying' reason: RequiredPoolsFailed status: "True" type: Degraded The referenced MachineConfig does not actually exist. MCD logs show: I0921 11:20:29.043729 11596 node.go:45] Setting initial node config: rendered-master-4ffdaac60fdcb29578a2a0029f7dc5b5 I0921 11:20:29.061157 11596 daemon.go:781] In bootstrap mode E0921 11:20:29.061191 11596 writer.go:135] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-4ffdaac60fdcb29578a2a0029f7dc5b5" not found I0921 11:20:31.060836 11596 daemon.go:781] In bootstrap mode E0921 11:20:31.060885 11596 writer.go:135] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-4ffdaac60fdcb29578a2a0029f7dc5b5" not found ... Based on a slack discussion, one possible culprit is the fact that we add a FeatureGate object to the install manifests: kind: FeatureGate metadata: name: cluster spec: featureSet: IPv6DualStackNoUpgrade MCC saw this: I0921 11:26:04.574593 1 kubelet_config_features.go:152] Applied FeatureSet cluster on MachineConfigPool master But the MachineConfig currently in use on the masters does not reflect it: sh-5.0# more /etc/kubernetes/kubelet.conf ... featureGates: APIPriorityAndFairness: true LegacyNodeRoleBehavior: false NodeDisruptionExclusion: true RotateKubeletServerCertificate: true SCTPSupport: true ServiceNodeExclusion: true SupportPodPidsLimit: true
So yeah, it seems like FeatureGates are only processed post-bootstrap. So at bootstrap time, the MC components generate their configs ignoring the FeatureGate. Then when the non-bootstrap components come up, they process everything _with_ the FeatureGate and generate a different MachineConfig than the nodes are expecting, and it can't recover. This seems... probably not _easily_ fixable? Actually, kubelet doesn't do anything useful with the `IPv6DualStack` feature gate in 1.19 anyway... maybe if I just patch MCO to ignore that gate when generating the kubelet config that will solve the problem for 4.6.
@Johnny is this something that your team could verify?
Presently IPv6 is only supported on IPI Baremetal. So maybe edge QE team can help on that.
@Ariel could someone from your team verify this BZ?
@Luksa @Lucie I'm looking for assistance getting this BZ verified. It is possible that someone from your team would be able to check this?
I got access to a dual-stack bare metal environment that was installed using 4.6.0-fc.8. The cluster looks healthy as far as I can tell, so going to mark this one verified as SanityOnly. ``` kni@r640-u09 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-fc.8 True False 25m Cluster version is 4.6.0-fc.8 [kni@r640-u09 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION openshift-master-0.qe3.kni.lab.eng.bos.redhat.com Ready master 64m v1.19.0+8a39924 openshift-master-1.qe3.kni.lab.eng.bos.redhat.com Ready master 62m v1.19.0+8a39924 openshift-master-2.qe3.kni.lab.eng.bos.redhat.com Ready master 63m v1.19.0+8a39924 openshift-worker-0.qe3.kni.lab.eng.bos.redhat.com Ready worker 35m v1.19.0+8a39924 openshift-worker-1.qe3.kni.lab.eng.bos.redhat.com Ready worker 35m v1.19.0+8a39924 [kni@r640-u09 ~]$ oc describe node/openshift-master-0.qe3.kni.lab.eng.bos.redhat.com Name: openshift-master-0.qe3.kni.lab.eng.bos.redhat.com Roles: master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=openshift-master-0.qe3.kni.lab.eng.bos.redhat.com kubernetes.io/os=linux node-role.kubernetes.io/master= node.openshift.io/os_id=rhcos Annotations: k8s.ovn.org/l3-gateway-config: {"default":{"mode":"shared","interface-id":"br-ex_openshift-master-0.qe3.kni.lab.eng.bos.redhat.com","mac-address":"98:03:9b:61:71:79","ip... k8s.ovn.org/node-chassis-id: b049fb50-575c-402a-829b-cb66e0253ba5 k8s.ovn.org/node-join-subnets: {"default":"fd98:0:0:1::/64"} k8s.ovn.org/node-local-nat-ip: {"default":["fd99::a184"]} k8s.ovn.org/node-mgmt-port-mac-address: 72:b4:85:98:19:32 k8s.ovn.org/node-primary-ifaddr: {"ipv6":"2620:52:0:1386::91/128"} k8s.ovn.org/node-subnets: {"default":"fd01:0:0:2::/64"} machine.openshift.io/machine: openshift-machine-api/qe3-rxhxg-master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-9d089e6353d1eb00acc37c491547b42c machineconfiguration.openshift.io/desiredConfig: rendered-master-9d089e6353d1eb00acc37c491547b42c machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 01 Oct 2020 12:48:10 -0400 Taints: node-role.kubernetes.io/master:NoSchedule Unschedulable: false Lease: HolderIdentity: openshift-master-0.qe3.kni.lab.eng.bos.redhat.com AcquireTime: <unset> RenewTime: Thu, 01 Oct 2020 13:52:46 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Thu, 01 Oct 2020 13:48:08 -0400 Thu, 01 Oct 2020 12:48:11 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Thu, 01 Oct 2020 13:48:08 -0400 Thu, 01 Oct 2020 12:48:11 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Thu, 01 Oct 2020 13:48:08 -0400 Thu, 01 Oct 2020 12:48:11 -0400 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Thu, 01 Oct 2020 13:48:08 -0400 Thu, 01 Oct 2020 12:51:22 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 2620:52:0:1386::91 Hostname: openshift-master-0.qe3.kni.lab.eng.bos.redhat.com ... [kni@r640-u09 ~]$ oc -n openshift-machine-config-operator logs po/machine-config-daemon-lmh58 machine-config-daemon I1001 16:52:00.818712 16424 start.go:108] Version: v4.6.0-202009240159.p0-dirty (a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b) I1001 16:52:00.821428 16424 start.go:121] Calling chroot("/rootfs") I1001 16:52:00.821486 16424 rpm-ostree.go:261] Running captured: rpm-ostree status --json I1001 16:52:00.964096 16424 daemon.go:226] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b973b2f9e432b12388874a9c8d191e699106bcbf12d962c729b4e16307dbd83f (46.82.202009222340-0) I1001 16:52:00.997093 16424 daemon.go:233] Installed Ignition binary version: 2.6.0 I1001 16:52:01.017542 16424 start.go:97] Copied self to /run/bin/machine-config-daemon on host I1001 16:52:01.019779 16424 metrics.go:106] Registering Prometheus metrics I1001 16:52:01.019883 16424 metrics.go:111] Starting metrics listener on 127.0.0.1:8797 I1001 16:52:01.021413 16424 update.go:1565] Starting to manage node: openshift-master-0.qe3.kni.lab.eng.bos.redhat.com I1001 16:52:01.024892 16424 rpm-ostree.go:261] Running captured: rpm-ostree status I1001 16:52:01.056146 16424 daemon.go:863] State: idle Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b973b2f9e432b12388874a9c8d191e699106bcbf12d962c729b4e16307dbd83f CustomOrigin: Managed by machine-config-operator Version: 46.82.202009222340-0 (2020-09-22T23:44:32Z) ostree://5d65bddfb072101a84501cd87b8abc650beb8dc0aa2bfeff022fc750cde52f1d Version: 46.82.202009222340-0 (2020-09-22T23:44:32Z) I1001 16:52:01.056167 16424 rpm-ostree.go:261] Running captured: journalctl --list-boots I1001 16:52:01.061298 16424 daemon.go:870] journalctl --list-boots: -1 2e4395008e0544118d9ade9c7a9c73e3 Thu 2020-10-01 16:44:33 UTC—Thu 2020-10-01 16:46:06 UTC 0 bd1cb9b851854871922c5b3e4a35a6fe Thu 2020-10-01 16:47:29 UTC—Thu 2020-10-01 16:52:01 UTC I1001 16:52:01.061322 16424 rpm-ostree.go:261] Running captured: systemctl list-units --state=failed --no-legend I1001 16:52:01.067939 16424 daemon.go:885] systemd service state: OK I1001 16:52:01.067953 16424 daemon.go:617] Starting MachineConfigDaemon I1001 16:52:01.068077 16424 daemon.go:624] Enabling Kubelet Healthz Monitor I1001 16:52:02.031362 16424 daemon.go:397] Node openshift-master-0.qe3.kni.lab.eng.bos.redhat.com is part of the control plane I1001 16:52:02.062519 16424 controlplane.go:52] Set root blockdev /sys/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host1/target1:2:0/1:2:0:0/block/sdb to use scheduler bfq I1001 16:52:03.610075 16424 node.go:24] No machineconfiguration.openshift.io/currentConfig annotation on node openshift-master-0.qe3.kni.lab.eng.bos.redhat.com: map[k8s.ovn.org/l3-gateway-config:{"default":{"mode":"shared","interface-id":"br-ex_openshift-master-0.qe3.kni.lab.eng.bos.redhat.com","mac-address":"98:03:9b:61:71:79","ip-addresses":["2620:52:0:1386::91/128"],"ip-address":"2620:52:0:1386::91/128","next-hops":["fe80::22d8:b00:c909:1e41"],"next-hop":"fe80::22d8:b00:c909:1e41","node-port-enable":"true","vlan-id":"0"}} k8s.ovn.org/node-chassis-id:b049fb50-575c-402a-829b-cb66e0253ba5 k8s.ovn.org/node-join-subnets:{"default":"fd98:0:0:1::/64"} k8s.ovn.org/node-local-nat-ip:{"default":["fd99::a184"]} k8s.ovn.org/node-mgmt-port-mac-address:72:b4:85:98:19:32 k8s.ovn.org/node-primary-ifaddr:{"ipv6":"2620:52:0:1386::91/128"} k8s.ovn.org/node-subnets:{"default":"fd01:0:0:2::/64"} volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json I1001 16:52:03.610415 16424 node.go:45] Setting initial node config: rendered-master-9d089e6353d1eb00acc37c491547b42c I1001 16:52:03.623167 16424 daemon.go:781] In bootstrap mode E1001 16:52:03.623199 16424 writer.go:135] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-9d089e6353d1eb00acc37c491547b42c" not found I1001 16:52:05.623901 16424 daemon.go:781] In bootstrap mode E1001 16:52:05.623925 16424 writer.go:135] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-9d089e6353d1eb00acc37c491547b42c" not found I1001 16:52:21.633843 16424 daemon.go:781] In bootstrap mode I1001 16:52:21.633862 16424 daemon.go:809] Current+desired config: rendered-master-9d089e6353d1eb00acc37c491547b42c I1001 16:52:21.637473 16424 daemon.go:1045] No bootstrap pivot required; unlinking bootstrap node annotations I1001 16:52:21.637501 16424 daemon.go:1083] Validating against pending config rendered-master-9d089e6353d1eb00acc37c491547b42c I1001 16:52:21.648520 16424 daemon.go:1094] Validated on-disk state I1001 16:52:21.669143 16424 daemon.go:1135] Completing pending config rendered-master-9d089e6353d1eb00acc37c491547b42c I1001 16:52:21.669161 16424 update.go:1565] completed update for config rendered-master-9d089e6353d1eb00acc37c491547b42c I1001 16:52:21.677168 16424 daemon.go:1151] In desired config rendered-master-9d089e6353d1eb00acc37c491547b42c ```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196