Bug 1984576 - PROVISIONING_INTERFACE missing from metal3 pod
Summary: PROVISIONING_INTERFACE missing from metal3 pod
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Steven Hardy
QA Contact: Lubov
URL:
Whiteboard:
: 1984860 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-21 16:23 UTC by Derek Higgins
Modified: 2021-10-18 17:40 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:40:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
metal3 pod yaml (28.28 KB, text/plain)
2021-07-21 16:25 UTC, Derek Higgins
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-baremetal-operator pull 177 0 None open Pass MACs to set-static-ip initContainer 2021-07-25 23:26:09 UTC
Github openshift cluster-baremetal-operator pull 182 0 None open Bug 1984576: Rebase of pull/177 (Pass MACs to set-static-ip initContainer) + unit tests 2021-07-26 21:50:46 UTC
Github openshift installer pull 5100 0 None open baremetal: reinstate provisioningInterface for provisioning CR 2021-07-22 10:55:17 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:40:57 UTC

Description Derek Higgins 2021-07-21 16:23:02 UTC
Seen in a recent CI job pull-ci-openshift-installer-master-e2e-metal-ipi-ovn-ipv6 (https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/5061/pull-ci-openshift-installer-master-e2e-metal-ipi-ovn-ipv6/1417782011476578304)


BMO logs showing inspection timing out

{"level":"info","ts":1626867680.6176746,"logger":"provisioner.ironic","msg":"current provision state","host":"openshift-machine-api~ostest-worker-1","lastError":"timeout reached while inspecting the node","current":"inspect failed","target":"manageable"}


The inspection is timing out because dnsmasq isn't running,
the dnsmasq container is waiting for the interface to be configured

"Waiting for enp1s0 interface to be configured"

But the static-ip containers have applied the IP address to br-ex

+ '[' -z fd00:1101::3/64 ']'
+ '[' -z '' ']'
++ echo fd00:1101::3/64
++ cut -d/ -f1
+ IP_ONLY=fd00:1101::3
++ ip -j addr
++ jq -r -c '.[].addr_info[] | select(.local == "fd00:1101::3") | .label'
+ PROVISIONING_INTERFACE=
+ '[' -z '' ']'
++ ip -j route get fd00:1101::3
++ jq -r '.[] | select(.dev != "lo") | .dev'
+ PROVISIONING_INTERFACE=br-ex
+ '[' -z br-ex ']'
+ /usr/sbin/ip addr add fd00:1101::3/64 dev br-ex valid_lft 300 preferred_lft 300



The reason for this is that the PROVISIONING_INTERFACE is not being passed to the metal3-pod (see yaml attached attached)

Comment 1 Derek Higgins 2021-07-21 16:25:12 UTC
Created attachment 1804200 [details]
metal3 pod yaml

Comment 2 Steven Hardy 2021-07-22 09:48:41 UTC
So looking at the pod yaml we see this in the initContainers:

  - command:
    - /set-static-ip
    env:
    - name: PROVISIONING_IP
      value: fd00:1101::3/64
    - name: PROVISIONING_INTERFACE

We then also see:

  - command:
    - /refresh-static-ip
    env:
    - name: PROVISIONING_IP
      value: fd00:1101::3/64
    - name: PROVISIONING_INTERFACE
    - name: PROVISIONING_MACS
      value: 00:89:aa:c9:9f:77,00:89:aa:c9:9f:7b,00:89:aa:c9:9f:7f

So it looks like we missed adding the new PROVISIONING_MACS variable to the initContainer in https://github.com/openshift/cluster-baremetal-operator/pull/149

However looking at the script, that also doesn't yet seem to support PROVISIONING_MACS

https://github.com/openshift/ironic-static-ip-manager/blob/master/refresh-static-ip

So I think we'll need some changes similar to https://github.com/metal3-io/ironic-image/pull/272 to enable the interface to be derived from the list of macs.

That all said, I'm not entirely clear why PROVISIONING_INTERFACE is not being passed, it should still be getting added to the install-config in CI I think 05_create_install_config logs show:

  2021-07-21 10:09:48 ++(network.sh:128): source(): export CLUSTER_PRO_IF=enp1s0
  ...
  2021-07-21 10:09:50 ++(ocp_install_env.sh:94): baremetal_network_configuration(): [[ 4.9 == \4\.\3 ]]
  2021-07-21 10:09:50 ++(ocp_install_env.sh:98): baremetal_network_configuration(): [[ Managed == \D\i\s\a\b\l\e\d ]]
  2021-07-21 10:09:50 ++(ocp_install_env.sh:109): baremetal_network_configuration(): cat

So we should set the provisioningNetworkInterface here https://github.com/openshift-metal3/dev-scripts/blob/master/ocp_install_env.sh#L110

Unfortunately we don't currently get the provisioning CR in the must-gather, so a local reproduce may be required to spot why/where that data is getting lost.

Comment 4 Derek Higgins 2021-07-22 10:19:04 UTC
the content of install-config are in teh must gather in namespaces/kube-system/core/configmaps.yaml , it contains
          provisioningBridge: ostestpr
          provisioningDHCPRange: fd00:1101::a,fd00:1101::ffff:ffff:ffff:fffe
          provisioningNetwork: Managed
          provisioningNetworkCIDR: fd00:1101::/64
          provisioningNetworkInterface: enp1s0


cluster-scoped-resources/metal3.io/provisionings/provisioning-configuration.yaml (attached) is also in the must gather and is missing the provisioningNetworkInterface]

spec:
  provisioningDHCPRange: fd00:1101::a,fd00:1101::ffff:ffff:ffff:fffe
  provisioningIP: fd00:1101::3
  provisioningNetwork: Managed
  provisioningNetworkCIDR: fd00:1101::/64
  provisioningOSDownloadURL: http://[fd2e:6f44:5dd8:c956::1]/images/rhcos-48.84.202105190318-0-openstack.x86_64.qcow2.gz?sha256=37a156f9f2b0efded45cb3cd5688aa2d42c26873a534951484e96f546a6b2c84

Comment 5 Steven Hardy 2021-07-22 10:23:11 UTC
Ah thanks Derek - I missed those :)

Perhaps this is a regression caused by https://github.com/openshift/installer/pull/5015 ?

Comment 6 Steven Hardy 2021-07-22 10:34:53 UTC
So yes it seems we unconditionally removed the provisioningNetworkInterface from the template:

https://github.com/openshift/installer/pull/5015/files#diff-045c11e083110c266ab5b7fda7a845114ba29fd476f864802d2ac640b49422b2

That probably needs to either be added conditionally, or just always included (and we'll pass an empty string when nothing is specified in the install-config, which AFAICS should work with the logic added in https://github.com/metal3-io/ironic-image/pull/272)

Comment 7 Derek Higgins 2021-07-22 11:09:48 UTC
*** Bug 1984860 has been marked as a duplicate of this bug. ***

Comment 9 Lubov 2021-07-26 10:54:17 UTC
Verified on 4.9.0-0.nightly-2021-07-25-125326 - deployed few times, the problem not reproduced

Comment 10 Angus Salkeld 2021-07-26 21:50:06 UTC
I don't this can be on QA with out cbo pull/177

Comment 11 Derek Higgins 2021-08-04 08:36:40 UTC
(In reply to Angus Salkeld from comment #10)
> I don't this can be on QA with out cbo pull/177

pull/177 has been closed with a message about being replaced by cbo pull/182
installer pull/5100 also merged

so we should be ok to move this back to at least MODIFIED

installer pull/5100 alone I think would have been enough to VERIFY this bgz
as it fixed the regression in CI (which the bug was opened for), 

Do you know if there is another bug covering the new support for passing in MAC's? 
If so we can set this back to VERIFIED

Comment 12 Angus Salkeld 2021-08-17 00:20:23 UTC
(In reply to Derek Higgins from comment #11)
> (In reply to Angus Salkeld from comment #10)
> > I don't this can be on QA with out cbo pull/177
> 
> pull/177 has been closed with a message about being replaced by cbo pull/182
> installer pull/5100 also merged
> 
> so we should be ok to move this back to at least MODIFIED
> 
> installer pull/5100 alone I think would have been enough to VERIFY this bgz
> as it fixed the regression in CI (which the bug was opened for), 
> 
> Do you know if there is another bug covering the new support for passing in
> MAC's? 
> If so we can set this back to VERIFIED

I think we are all good to test now.

Comment 14 Lubov 2021-08-17 09:25:59 UTC
verified on 4.9.0-0.nightly-2021-08-16-154237

deployment passed

$ oc describe provisioning provisioning-configuration
.....
  Provisioning Interface:        enp0s3
.....

Comment 17 errata-xmlrpc 2021-10-18 17:40:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.