Description of problem: Version-Release number of selected component (if applicable): OCP 4.9 quay.io/openshift-release-dev/ocp-release:4.9.0-fc.0-x86_64 How reproducible: 1/1 Steps to Reproduce: 1. First build the 4.9 with Disabled provisioning 2. Add the provisioning parameters without specifying the provisioningInterface in the provisioning CR. [kni@provisionhost-0-0 ~]$ oc get provisioning -o yaml >> to_managed.yaml [kni@provisionhost-0-0 ~]$ vi to_managed.yaml Change from: provisioningNetwork: Disabled To: Managed.yaml for IPv4: provisioningDHCPRange: 172.22.0.10,172.22.0.254 provisioningIP: 172.22.0.3 provisioningNetwork: Managed provisioningNetworkCIDR: 172.22.0.0/24 Managed.yaml for IPv6: provisioningDHCPRange: fd00:1101:0:1::a,fd00:1101:0:1:ffff:ffff:ffff:fffe provisioningIP: fd00:1101:0:1::3 provisioningNetwork: Managed provisioningNetworkCIDR: fd00:1101:0:1::/64 [kni@provisionhost-0-0 ~]$ oc apply -f to_managed.yaml 3. Verify the provisioning yaml [kni@provisionhost-0-0 ~]$ oc get provisioning -o yaml 4. This should restart the Metal3 pod [kni@provisionhost-0-0 ~]$ oc get pods -w -n openshift-machine-api Actual results: The metal3 pod was restarted, but didn't return to the Running state metal3-5778b485d8-vvqzb 0/10 Init:CrashLoopBackOff 524 (3m4s ago) 44h Expected results: The metal3 pod was restarted and running Additional info: See must gather in the next comment
The broken pod look like this initContainers: - command: - /set-static-ip env: - name: PROVISIONING_IP value: fd00:1101:0:1::3/64 - name: PROVISIONING_INTERFACE - name: PROVISIONING_MACS value: 52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04c1c932eaa96137b5ca575e232a8181ae46aaa5a146d59cca7e21cc114fe398 since it has hostNetworking I logged into a master and ran the script directly on the host ./test.sh + export PROVISIONING_IP=fd00:1101:0:1::3/64 + PROVISIONING_IP=fd00:1101:0:1::3/64 + export PROVISIONING_INTERFACE= + PROVISIONING_INTERFACE= + export PROVISIONING_MACS=52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 + PROVISIONING_MACS=52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 + '[' -z fd00:1101:0:1::3/64 ']' + '[' -z '' ']' + '[' -n 52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 ']' + for mac in ${PROVISIONING_MACS//,/ } + ip -br link show up + grep -q 52:54:00:a2:33:ff + for mac in ${PROVISIONING_MACS//,/ } + ip -br link show up + grep -q 52:54:00:3e:fa:8e + for mac in ${PROVISIONING_MACS//,/ } + ip -br link show up + grep -q 52:54:00:d2:f6:16 + '[' -n '' ']' + echo 'ERROR: Could not find suitable interface for "fd00:1101:0:1::3/64"' ERROR: Could not find suitable interface for "fd00:1101:0:1::3/64" + exit 1 basically none of the real mac addresses match ip -br link show up lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> eth0 UP 52:54:00:56:07:da <BROADCAST,MULTICAST,UP,LOWER_UP> eth1 UP 52:54:00:ca:42:fc <BROADCAST,MULTICAST,UP,LOWER_UP> eth2 UP 52:54:00:06:00:58 <BROADCAST,MULTICAST,UP,LOWER_UP> virbr0 DOWN 52:54:00:4b:36:2d <NO-CARRIER,BROADCAST,MULTICAST,UP> baremetal-0 UP 52:54:00:ca:42:fc <BROADCAST,MULTICAST,UP,LOWER_UP> what cbo thinks they should be + export PROVISIONING_MACS=52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 Note: cbo only reads these once off here: https://github.com/openshift/cluster-baremetal-operator/blob/master/controllers/provisioning_controller.go#L217-L222 Does bmo see this change and reboot the machines with different macs?
Ignore the comment above, I was on the wrong host :facepalm: the issue seems to be that there are 2 interfaces that match the same mac ip -br link show up | grep 52:54:00:3e:fa:8e |cut -f 1 -d ' ' enp0s4 br-ex so that it finds the wrong thing. ip link show up 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:06:0e:08 brd ff:ff:ff:ff:ff:ff 3: enp0s4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:3e:fa:8e brd ff:ff:ff:ff:ff:ff 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:3e:fa:8e brd ff:ff:ff:ff:ff:ff etc.. changing : ip -br link show up | grep 52:54:00:3e:fa:8e |cut -f 1 -d ' ' to: ip -br link show up | grep 52:54:00:3e:fa:8e |grep -v UNKNOWN |cut -f 1 -d ' ' could work..
@amalykhi please note that the merged PR adds a new field to the Provisioning CRD + // ProvisioningMacAddresses is a list of mac addresses of network interfaces + // on a baremetal server to the provisioning network. + // Use this instead of ProvisioningInterface to allow interfaces of different + // names. If not provided it will be populated by the BMH.Spec.BootMacAddress + // of each master. + ProvisioningMacAddresses []string `json:"provisioningMacAddresses,omitempty"` So when you are in the disabled state, go onto each the master machines and get the mac of the available nic. then when you update the CR to Managed, also add these macs. We decided that it was less error prone for the user to do this than to have a script attempt to choose an unused nic.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056