Seen in a recent CI job pull-ci-openshift-installer-master-e2e-metal-ipi-ovn-ipv6 (https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/5061/pull-ci-openshift-installer-master-e2e-metal-ipi-ovn-ipv6/1417782011476578304) BMO logs showing inspection timing out {"level":"info","ts":1626867680.6176746,"logger":"provisioner.ironic","msg":"current provision state","host":"openshift-machine-api~ostest-worker-1","lastError":"timeout reached while inspecting the node","current":"inspect failed","target":"manageable"} The inspection is timing out because dnsmasq isn't running, the dnsmasq container is waiting for the interface to be configured "Waiting for enp1s0 interface to be configured" But the static-ip containers have applied the IP address to br-ex + '[' -z fd00:1101::3/64 ']' + '[' -z '' ']' ++ echo fd00:1101::3/64 ++ cut -d/ -f1 + IP_ONLY=fd00:1101::3 ++ ip -j addr ++ jq -r -c '.[].addr_info[] | select(.local == "fd00:1101::3") | .label' + PROVISIONING_INTERFACE= + '[' -z '' ']' ++ ip -j route get fd00:1101::3 ++ jq -r '.[] | select(.dev != "lo") | .dev' + PROVISIONING_INTERFACE=br-ex + '[' -z br-ex ']' + /usr/sbin/ip addr add fd00:1101::3/64 dev br-ex valid_lft 300 preferred_lft 300 The reason for this is that the PROVISIONING_INTERFACE is not being passed to the metal3-pod (see yaml attached attached)
Created attachment 1804200 [details] metal3 pod yaml
So looking at the pod yaml we see this in the initContainers: - command: - /set-static-ip env: - name: PROVISIONING_IP value: fd00:1101::3/64 - name: PROVISIONING_INTERFACE We then also see: - command: - /refresh-static-ip env: - name: PROVISIONING_IP value: fd00:1101::3/64 - name: PROVISIONING_INTERFACE - name: PROVISIONING_MACS value: 00:89:aa:c9:9f:77,00:89:aa:c9:9f:7b,00:89:aa:c9:9f:7f So it looks like we missed adding the new PROVISIONING_MACS variable to the initContainer in https://github.com/openshift/cluster-baremetal-operator/pull/149 However looking at the script, that also doesn't yet seem to support PROVISIONING_MACS https://github.com/openshift/ironic-static-ip-manager/blob/master/refresh-static-ip So I think we'll need some changes similar to https://github.com/metal3-io/ironic-image/pull/272 to enable the interface to be derived from the list of macs. That all said, I'm not entirely clear why PROVISIONING_INTERFACE is not being passed, it should still be getting added to the install-config in CI I think 05_create_install_config logs show: 2021-07-21 10:09:48 ++(network.sh:128): source(): export CLUSTER_PRO_IF=enp1s0 ... 2021-07-21 10:09:50 ++(ocp_install_env.sh:94): baremetal_network_configuration(): [[ 4.9 == \4\.\3 ]] 2021-07-21 10:09:50 ++(ocp_install_env.sh:98): baremetal_network_configuration(): [[ Managed == \D\i\s\a\b\l\e\d ]] 2021-07-21 10:09:50 ++(ocp_install_env.sh:109): baremetal_network_configuration(): cat So we should set the provisioningNetworkInterface here https://github.com/openshift-metal3/dev-scripts/blob/master/ocp_install_env.sh#L110 Unfortunately we don't currently get the provisioning CR in the must-gather, so a local reproduce may be required to spot why/where that data is getting lost.
the content of install-config are in teh must gather in namespaces/kube-system/core/configmaps.yaml , it contains provisioningBridge: ostestpr provisioningDHCPRange: fd00:1101::a,fd00:1101::ffff:ffff:ffff:fffe provisioningNetwork: Managed provisioningNetworkCIDR: fd00:1101::/64 provisioningNetworkInterface: enp1s0 cluster-scoped-resources/metal3.io/provisionings/provisioning-configuration.yaml (attached) is also in the must gather and is missing the provisioningNetworkInterface] spec: provisioningDHCPRange: fd00:1101::a,fd00:1101::ffff:ffff:ffff:fffe provisioningIP: fd00:1101::3 provisioningNetwork: Managed provisioningNetworkCIDR: fd00:1101::/64 provisioningOSDownloadURL: http://[fd2e:6f44:5dd8:c956::1]/images/rhcos-48.84.202105190318-0-openstack.x86_64.qcow2.gz?sha256=37a156f9f2b0efded45cb3cd5688aa2d42c26873a534951484e96f546a6b2c84
Ah thanks Derek - I missed those :) Perhaps this is a regression caused by https://github.com/openshift/installer/pull/5015 ?
So yes it seems we unconditionally removed the provisioningNetworkInterface from the template: https://github.com/openshift/installer/pull/5015/files#diff-045c11e083110c266ab5b7fda7a845114ba29fd476f864802d2ac640b49422b2 That probably needs to either be added conditionally, or just always included (and we'll pass an empty string when nothing is specified in the install-config, which AFAICS should work with the logic added in https://github.com/metal3-io/ironic-image/pull/272)
*** Bug 1984860 has been marked as a duplicate of this bug. ***
Verified on 4.9.0-0.nightly-2021-07-25-125326 - deployed few times, the problem not reproduced
I don't this can be on QA with out cbo pull/177
(In reply to Angus Salkeld from comment #10) > I don't this can be on QA with out cbo pull/177 pull/177 has been closed with a message about being replaced by cbo pull/182 installer pull/5100 also merged so we should be ok to move this back to at least MODIFIED installer pull/5100 alone I think would have been enough to VERIFY this bgz as it fixed the regression in CI (which the bug was opened for), Do you know if there is another bug covering the new support for passing in MAC's? If so we can set this back to VERIFIED
(In reply to Derek Higgins from comment #11) > (In reply to Angus Salkeld from comment #10) > > I don't this can be on QA with out cbo pull/177 > > pull/177 has been closed with a message about being replaced by cbo pull/182 > installer pull/5100 also merged > > so we should be ok to move this back to at least MODIFIED > > installer pull/5100 alone I think would have been enough to VERIFY this bgz > as it fixed the regression in CI (which the bug was opened for), > > Do you know if there is another bug covering the new support for passing in > MAC's? > If so we can set this back to VERIFIED I think we are all good to test now.
verified on 4.9.0-0.nightly-2021-08-16-154237 deployment passed $ oc describe provisioning provisioning-configuration ..... Provisioning Interface: enp0s3 .....
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759