Bug 2081734 - metal3-dnsmasq: workers are not provisioned during the cluster installation when BootMacAddress is not provided lower-case
Summary: metal3-dnsmasq: workers are not provisioned during the cluster installation w...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.11
Hardware: All
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.12.0
Assignee: Tudor Domnescu
QA Contact: wang lin
URL:
Whiteboard:
Depends On:
Blocks: 2110407
TreeView+ depends on / blocked
 
Reported: 2022-05-04 13:49 UTC by aleskandro
Modified: 2023-01-17 19:48 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2110407 (view as bug list)
Environment:
Last Closed: 2023-01-17 19:48:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:48:38 UTC

Description aleskandro 2022-05-04 13:49:25 UTC
Description of problem:

When provisioning an IPI on Bare Metal Cluster with Provisioning Network, after the bootstrap phase is completed, the workers are powered on via ipmi and the DHCP+PXE service should be handled by the metal3 operator scheduled in one of the masters.

If the MAC address of the provisioning interface (bootMacAddress) is not provided lower-case in the install-config.yaml file, the DHCP service is unavailable.

Moreover, the container is set as "ready" even if it isn't.


Version-Release number of selected component (if applicable):

Tested on 4.11.0-0.nightly-arm64-2022-05-04-091042 with OVN Network

How reproducible: IPI on Bare Metal with Managed Provisioning Network

Steps to Reproduce:
1. Instantiate an IPI on Bare Metal with Managed Provisioning Network and set the bootMacAddress to NOT be lower-case
2. Wait for the bootstrap to conclude
3. The workers won't boot
4. The installation fails

Actual results:

The workers won't get installed as they cannot boot by PXE, and the installation fails

oc get co

NAME                                       VERSION                                    AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
baremetal                                  4.11.0-0.nightly-arm64-2022-05-04-091042   True        False         False      59m    

oc project openshift-machine-api
oc get pods

metal3-fffcfb6bd-255n9                        7/7     Running   0          60m

oc logs metal3-fffcfb6bd-255n9 -c metal3-dnsmasq

+++ get_provisioning_interface
+++ '[' -n '' ']'
+++ local interface=provisioning
+++ for mac in ${PROVISIONING_MACS//,/ }
+++ ip -br link show up
+++ grep -q 00:1b:21:E4:37:EC
+++ for mac in ${PROVISIONING_MACS//,/ }
+++ ip -br link show up
+++ grep -q 00:1B:21:E4:3A:B1
+++ for mac in ${PROVISIONING_MACS//,/ }
+++ ip -br link show up
+++ grep -q A0:36:9F:30:03:EE
+++ echo provisioning
++ export PROVISIONING_INTERFACE=provisioning
++ PROVISIONING_INTERFACE=provisioning
++ export LISTEN_ALL_INTERFACES=true
++ LISTEN_ALL_INTERFACES=true
++ export IRONIC_PRIVATE_PORT=6388
++ IRONIC_PRIVATE_PORT=6388
++ export IRONIC_INSPECTOR_PRIVATE_PORT=5049
++ IRONIC_INSPECTOR_PRIVATE_PORT=5049
+ export HTTP_PORT=6180
+ HTTP_PORT=6180
+ export DNSMASQ_EXCEPT_INTERFACE=lo
+ DNSMASQ_EXCEPT_INTERFACE=lo
+ wait_for_interface_or_ip
+ '[' '!' -z '' ']'
+ '[' '!' -z '' ']'
+ echo 'Waiting for provisioning interface to be configured'
Waiting for provisioning interface to be configured
++ ip -br add show scope global up dev provisioning
++ awk '{print $3}'
++ sed -e 's%/.*%%'
++ head -n 1
Device "provisioning" does not exist.
+ export IRONIC_IP=
... Repeated forever ...

ip -br link show up
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
enP1p4s0         UP             00:1b:21:e4:3a:b1 <BROADCAST,MULTICAST,UP,LOWER_UP> 
enP4p3s0u2u3c2   UNKNOWN        46:6e:f9:36:e3:12 <BROADCAST,MULTICAST,UP,LOWER_UP> 
enP2p2s0f0       UP             28:c1:3c:8a:a2:8f <BROADCAST,MULTICAST,UP,LOWER_UP> 
enP2p2s0f1       UP             28:c1:3c:8a:a2:90 <BROADCAST,MULTICAST,UP,LOWER_UP> 

The provisioning interface is enP1p4s0 which has the IP set to 172.22.0.3

Expected results:

0. The workers are provisioned
1. The interface lookup by mac should be case-insensitive
2. The cluster-baremetal-operator should not be set as ready/available, as well as the metal3-dnsmasq container in the metal3 pod.

Additional info:

This seems to happen because /bin/ironic-common.sh in the metal3-dnsmasq container tries to lookup, case-sensitive, the provisioning interface by MAC Address.

(within the dnsmasq container)
cat /bin/ironic-common.sh 

#!/usr/bin/bash

set -euxo pipefail

function get_provisioning_interface() {
  if [ -n "${PROVISIONING_INTERFACE:-}" ]; then
    # don't override the PROVISIONING_INTERFACE if one is provided
    echo ${PROVISIONING_INTERFACE}
    return
  fi

  local interface="provisioning"
  for mac in ${PROVISIONING_MACS//,/ } ; do
    if ip -br link show up | grep -q "$mac"; then
      interface=$(ip -br link show up | grep "$mac" | cut -f 1 -d ' ')
      break
    fi
  done
  echo $interface
}

Comment 2 Tomas Sedovic 2022-07-12 12:11:48 UTC
Marking as not a blocker given the existence of a workaround: setting the bootMacAddress value to all lower case.

Comment 3 Tudor Domnescu 2022-07-15 09:16:39 UTC
A fix was merged in upstream https://github.com/metal3-io/ironic-image/pull/374

Comment 4 Riccardo Pittau 2022-07-20 15:14:29 UTC
https://github.com/openshift/ironic-image/pull/283 created for downstream sync

Comment 5 wang lin 2022-08-10 12:41:23 UTC
Verified on payload registry.ci.openshift.org/ocp-arm64/release-arm64:4.12.0-0.nightly-arm64-2022-08-09-214103

steps:
1. BootMacAddress given Case mixing like:
    - name: worker-00
      role: worker
      bmc:
        address: ipmi://openshift-qe-039.mgmt.arm.eng.rdu2.redhat.com/
        disableCertificateVerification: true
        username: *** HIDDEN ***
        password: *** HIDDEN ***
      bootMACAddress: a0:36:9F:30:04:B4
      rootDeviceHints:
        deviceName: "/dev/nvme0n1"
      networkConfig:
        interfaces:
        ......
    - name: worker-01
      role: worker
      bmc:
        address: ipmi://openshift-qe-040.mgmt.arm.eng.rdu2.redhat.com/
        disableCertificateVerification: true
        username: *** HIDDEN ***
        password: *** HIDDEN ***
      bootMACAddress: A0:36:9F:30:04:C4
      rootDeviceHints:
        deviceName: "/dev/nvme0n1"
      networkConfig:
        interfaces:

2.launch an IPI on Bare Metal with Managed Provisioning Network

3. Wait and check worker nodes can be created
oc get nodes
NAME                                                          STATUS   ROLES                  AGE    VERSION
worker-00.lwanbug2081734.qeclusters.arm.eng.rdu2.redhat.com   Ready    worker                 36m    v1.24.0+a9d6306
worker-01.lwanbug2081734.qeclusters.arm.eng.rdu2.redhat.com   Ready    worker                 36m    v1.24.0+a9d6306

4. Check logs of container metal3-dnsmasq, it can recognize upper case BootMacAddress
oc logs metal3-6bb77d9df6-xgjjc -c metal3-dnsmasq
+++ get_provisioning_interface
+++ '[' -n '' ']'
+++ local interface=provisioning
+++ for mac in ${PROVISIONING_MACS//,/ }
+++ ip -br link show up
+++ grep -qi 00:1B:21:E4:63:30
+++ for mac in ${PROVISIONING_MACS//,/ }
+++ ip -br link show up
+++ grep -qi 00:1B:21:E4:37:A7
++++ ip -br link show up
++++ grep -i 00:1B:21:E4:37:A7
++++ cut -f 1 -d ' '
+++ interface=enP1p4s0
+++ break
+++ echo enP1p4s0
++ export PROVISIONING_INTERFACE=enP1p4s0
++ PROVISIONING_INTERFACE=enP1p4s0
++ export LISTEN_ALL_INTERFACES=true
++ LISTEN_ALL_INTERFACES=true
++ export IRONIC_PRIVATE_PORT=6388
++ IRONIC_PRIVATE_PORT=6388
++ export IRONIC_INSPECTOR_PRIVATE_PORT=5049
++ IRONIC_INSPECTOR_PRIVATE_PORT=5049
+ export HTTP_PORT=6180

Comment 8 errata-xmlrpc 2023-01-17 19:48:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.