Bug 2000081
Summary: | [IPI baremetal] The metal3 pod failed to restart when switching from Disabled to Managed provisioning without specifying provisioningInterface parameter | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Aleksandra Malykhin <amalykhi> | |
Component: | Bare Metal Hardware Provisioning | Assignee: | Steven Hardy <shardy> | |
Bare Metal Hardware Provisioning sub component: | cluster-baremetal-operator | QA Contact: | Aleksandra Malykhin <amalykhi> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | aos-bugs, hpokorny, shardy | |
Version: | 4.9 | Keywords: | Triaged | |
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause:
When modifying the provisioning.metal3.io provisioning-configuration resource to move provisioningNetwork from Disabled to Managed state, pod restart did not work as expected unless provisioningInterface was also specified.
Consequence:
Users that wish to specify an interface by MAC were unable to move the provisioningNetwork from Disabled to Managed state, since in this case the provisioningInterface is not always consistent between controlplane hosts
Fix:
Added a provisioningMacAddresses field to the provisioning.metal3.io CRD, such that provisioning interfaces for controlplane hosts may be specified by MAC, not only by name
Result:
It is now possible to specify provisioning interface devices by MAC address, and in this case it is now possible to move the provisioningNetwork from Disabled to Managed state without specifying the provisioningInterface field.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2012684 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:06:37 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2012684 |
Description
Aleksandra Malykhin
2021-09-01 10:49:37 UTC
The broken pod look like this initContainers: - command: - /set-static-ip env: - name: PROVISIONING_IP value: fd00:1101:0:1::3/64 - name: PROVISIONING_INTERFACE - name: PROVISIONING_MACS value: 52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04c1c932eaa96137b5ca575e232a8181ae46aaa5a146d59cca7e21cc114fe398 since it has hostNetworking I logged into a master and ran the script directly on the host ./test.sh + export PROVISIONING_IP=fd00:1101:0:1::3/64 + PROVISIONING_IP=fd00:1101:0:1::3/64 + export PROVISIONING_INTERFACE= + PROVISIONING_INTERFACE= + export PROVISIONING_MACS=52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 + PROVISIONING_MACS=52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 + '[' -z fd00:1101:0:1::3/64 ']' + '[' -z '' ']' + '[' -n 52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 ']' + for mac in ${PROVISIONING_MACS//,/ } + ip -br link show up + grep -q 52:54:00:a2:33:ff + for mac in ${PROVISIONING_MACS//,/ } + ip -br link show up + grep -q 52:54:00:3e:fa:8e + for mac in ${PROVISIONING_MACS//,/ } + ip -br link show up + grep -q 52:54:00:d2:f6:16 + '[' -n '' ']' + echo 'ERROR: Could not find suitable interface for "fd00:1101:0:1::3/64"' ERROR: Could not find suitable interface for "fd00:1101:0:1::3/64" + exit 1 basically none of the real mac addresses match ip -br link show up lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> eth0 UP 52:54:00:56:07:da <BROADCAST,MULTICAST,UP,LOWER_UP> eth1 UP 52:54:00:ca:42:fc <BROADCAST,MULTICAST,UP,LOWER_UP> eth2 UP 52:54:00:06:00:58 <BROADCAST,MULTICAST,UP,LOWER_UP> virbr0 DOWN 52:54:00:4b:36:2d <NO-CARRIER,BROADCAST,MULTICAST,UP> baremetal-0 UP 52:54:00:ca:42:fc <BROADCAST,MULTICAST,UP,LOWER_UP> what cbo thinks they should be + export PROVISIONING_MACS=52:54:00:a2:33:ff,52:54:00:3e:fa:8e,52:54:00:d2:f6:16 Note: cbo only reads these once off here: https://github.com/openshift/cluster-baremetal-operator/blob/master/controllers/provisioning_controller.go#L217-L222 Does bmo see this change and reboot the machines with different macs? Ignore the comment above, I was on the wrong host :facepalm: the issue seems to be that there are 2 interfaces that match the same mac ip -br link show up | grep 52:54:00:3e:fa:8e |cut -f 1 -d ' ' enp0s4 br-ex so that it finds the wrong thing. ip link show up 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:06:0e:08 brd ff:ff:ff:ff:ff:ff 3: enp0s4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:3e:fa:8e brd ff:ff:ff:ff:ff:ff 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:3e:fa:8e brd ff:ff:ff:ff:ff:ff etc.. changing : ip -br link show up | grep 52:54:00:3e:fa:8e |cut -f 1 -d ' ' to: ip -br link show up | grep 52:54:00:3e:fa:8e |grep -v UNKNOWN |cut -f 1 -d ' ' could work.. @amalykhi please note that the merged PR adds a new field to the Provisioning CRD + // ProvisioningMacAddresses is a list of mac addresses of network interfaces + // on a baremetal server to the provisioning network. + // Use this instead of ProvisioningInterface to allow interfaces of different + // names. If not provided it will be populated by the BMH.Spec.BootMacAddress + // of each master. + ProvisioningMacAddresses []string `json:"provisioningMacAddresses,omitempty"` So when you are in the disabled state, go onto each the master machines and get the mac of the available nic. then when you update the CR to Managed, also add these macs. We decided that it was less error prone for the user to do this than to have a script attempt to choose an unused nic. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |