Bug 1909083

Summary: BMHs stuck in RegistrationError when BMCs is not reachable during the initial cluster installation
Product: OpenShift Container Platform Reporter: kseremet
Component: Bare Metal Hardware ProvisioningAssignee: Steven Hardy <shardy>
Bare Metal Hardware Provisioning sub component: baremetal-operator QA Contact: Amit Ugol <augol>
Status: CLOSED NEXTRELEASE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, bfournie
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-06 16:10:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description kseremet 2020-12-18 10:13:25 UTC
Description of problem:

BMHs stuck in RegistrationError when BMCs is not reachable during the initial cluster installation.

Version-Release number of selected component (if applicable):
Tested on 4.6.x

How reproducible:
Always

Steps to Reproduce:

1. Start installation of a new OCP 4.6 cluster using the baremetal IPI method

2. After all the master nodes provisioned by the bootstrap machine (bootstrap/provisioner can reach to BMCs), bootstrap tears down and BMO starts running on new cluster

3. If the actual master nodes can not reach to BMCs, incorrect vlan config, blocked by FW etc.., when the BMHs created then they stuck in RegistrationError even you fix the network access issue between the master nodes and BMCs

Actual results:

BMHs stuck in RegistrationError and fixing the network access issue and restarting the BMO does not help

Expected results:

Retrying the registration process after nods start reaching to BMCs

Additional info:

It seems that this problem has already been addressed in upstream. Here are the github issues related to this one: https://github.com/metal3-io/baremetal-operator/pull/388 , https://github.com/metal3-io/baremetal-operator/issues/708 , https://github.com/metal3-io/baremetal-operator/issues/739

Comment 1 Bob Fournier 2021-04-06 16:10:47 UTC
Fix is in 4.7, it is not planned to backport this to 4.6.