Description of problem: network configurations added to BareMetalHost as day 1 with spec.preprovisioningNetworkDataName are not surviving machineset replicas scaling down and up again Version-Release number of selected component (if applicable): 4.10 How reproducible: Best way is having a 3 node cluster with masters schedulable and add one worker, then scale down and scale up again replicas of the worker machineset Steps to Reproduce: 1. Add a new worker BMH using spec.preprovisioningNetworkDataName 2. Scale up the worker machineset replicas so the new BMH gets provisioned 3. Scale down the worker machineset replicas so the new BMH gets deprovisioned 4. Scale up the worker machineset replicas so the new BMH gets provisioned (again) Actual results: The custom network configuration is lost and the BMH can't be reached anymore. Expected results: network configuration should be preserved. Additional info:
Hi All, Just bumping this one a little to hopefully get eyes on it. Let us know if there is any information that can be procured from the customer or other to get it moving. - dc
Confirmed this happens on 4.10, it doesn't appear to be the case for 4.11, I'm trying to track down the difference.
(In reply to Derek Higgins from comment #2) > Confirmed this happens on 4.10, it doesn't appear to be the case for 4.11, > I'm trying to track down the difference. BareMetalHostSpec in cluster-api-provider-baremetal doesn't contain > PreprovisioningNetworkDataName string `json:"preprovisioningNetworkDataName,omitempty"` it contains a outdated version of the vendored baremetal-operator, as a result preprovisioningNetworkDataName gets lost when the bmh is used. Adding this fixes the problem, but before we update the vendored BMO we first need to make sure we fix bz#2097695 as fixing this bug triggers the other
Hi, Any update? It seems that the issue has been identified and the blocker BZ is also solved. Also, is there any fix or workaround that can be applied without the need to upgrade to the Z version where this fix will be released? Many thanks.
Hi! Any update on this??
@rauferna the PR has now merged, you should be able to try it on the next nightly
Workaround for this issue: * Scale down MachineSet to only existing nodes * Create new BMH with preprovisioningNetworkDataName (or at least add that field back in, wait a while, then reboot the host) * *Wait* for inspection to complete * Scale up MachineSet
[kni@provisionhost-0-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-08-26-140250 True False 154m Cluster version is 4.10.0-0.nightly-2022-08-26-140250 deployed normal cluster with 3/2 workers provisioned. Scaled up worker using the following configuration file: [kni@provisionhost-0-0 ~]$ cat worker-0-2.yaml apiVersion: v1 kind: Secret metadata: name: openshift-worker-2-network type: Opaque stringData: nmstate: | routes: config: - destination: 0.0.0.0/0 next-hop-address: 192.168.123.1 next-hop-interface: enp0s4 dns-resolver: config: server: - 192.168.123.1 interfaces: - name: enp0s4 state: up ipv4: address: - ip: 192.168.123.11 prefix-length: 24 enabled: true dhcp: false --- apiVersion: v1 kind: Secret metadata: name: openshift-worker-2-bmc-secret namespace: openshift-machine-api type: Opaque data: username: YWRtaW4= password: cGFzc3dvcmQK --- apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: name: openshift-worker-0-2 namespace: openshift-machine-api spec: online: True bmc: address: redfish://192.168.123.1:8000/redfish/v1/Systems/e96821dd-7245-41e8-be43-cc60ec117fc9 credentialsName: openshift-worker-2-bmc-secret disableCertificateVerification: True username: admin password: password bootMACAddress: 52:54:00:ba:ff:b3 rootDeviceHints: deviceName: /dev/sda preprovisioningNetworkDataName: openshift-worker-2-network After multiple scale-ups and scale downs, worker-0-2 was available and ssh-able with the assigned ip-address. Tested on a normal DHCP environment and with no-DHCP fully environment where all initial nodes were deployed with static-ip. In both cases there was no issues.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.10.30 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6133
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days