Description of problem:
network configurations added to BareMetalHost as day 1 with spec.preprovisioningNetworkDataName are not surviving machineset replicas scaling down and up again
Version-Release number of selected component (if applicable):
Best way is having a 3 node cluster with masters schedulable and add one worker, then scale down and scale up again replicas of the worker machineset
Steps to Reproduce:
1. Add a new worker BMH using spec.preprovisioningNetworkDataName
2. Scale up the worker machineset replicas so the new BMH gets provisioned
3. Scale down the worker machineset replicas so the new BMH gets deprovisioned
4. Scale up the worker machineset replicas so the new BMH gets provisioned (again)
The custom network configuration is lost and the BMH can't be reached anymore.
network configuration should be preserved.
Just bumping this one a little to hopefully get eyes on it. Let us know if there is any information that can be procured from the customer or other to get it moving.
Confirmed this happens on 4.10, it doesn't appear to be the case for 4.11, I'm trying to track down the difference.
(In reply to Derek Higgins from comment #2)
> Confirmed this happens on 4.10, it doesn't appear to be the case for 4.11,
> I'm trying to track down the difference.
BareMetalHostSpec in cluster-api-provider-baremetal doesn't contain
> PreprovisioningNetworkDataName string `json:"preprovisioningNetworkDataName,omitempty"`
it contains a outdated version of the vendored baremetal-operator, as a result
preprovisioningNetworkDataName gets lost when the bmh is used. Adding this fixes the
problem, but before we update the vendored BMO we first need to make sure we fix
bz#2097695 as fixing this bug triggers the other
Any update? It seems that the issue has been identified and the blocker BZ is also solved.
Also, is there any fix or workaround that can be applied without the need to upgrade to the Z version where this fix will be released?
Any update on this??
@rauferna the PR has now merged, you should be able to try it on the next nightly
Workaround for this issue:
* Scale down MachineSet to only existing nodes
* Create new BMH with preprovisioningNetworkDataName (or at least add that field back in, wait a while, then reboot the host)
* *Wait* for inspection to complete
* Scale up MachineSet
[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2022-08-26-140250 True False 154m Cluster version is 4.10.0-0.nightly-2022-08-26-140250
deployed normal cluster with 3/2 workers provisioned.
Scaled up worker using the following configuration file:
[kni@provisionhost-0-0 ~]$ cat worker-0-2.yaml
- destination: 0.0.0.0/0
- name: enp0s4
- ip: 192.168.123.11
After multiple scale-ups and scale downs, worker-0-2 was available and ssh-able with the assigned ip-address.
Tested on a normal DHCP environment and with no-DHCP fully environment where all initial nodes were deployed with static-ip.
In both cases there was no issues.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.10.30 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days