Bug 1936534

Summary: When creating a worker with a used mac-address stuck on registering
Product: OpenShift Container Platform Reporter: Honza Pokorny <hpokorny>
Component: Bare Metal Hardware ProvisioningAssignee: Honza Pokorny <hpokorny>
Bare Metal Hardware Provisioning sub component: baremetal-operator QA Contact: Polina Rabinovich <prabinov>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium Keywords: Triaged
Version: 4.7   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:51:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1910352    

Description Honza Pokorny 2021-03-08 17:27:53 UTC
Version:

$ openshift-install version
Client Version: 4.7.0-0.nightly-2020-12-21-131655
Server Version: 4.7.0-0.nightly-2020-12-21-131655
Kubernetes Version: v1.20.0+87544c5
---------------------------------------------------------------

Platform:

#Please specify the platform type: aws, libvirt, openstack or baremetal etc.
libvirt
---------------------------------------------------------------

What happened?
We created a worker that hold an existing deployed worker's mac-address. We expected to get an error indicate that there is already a worker with the same mac-address. But the status of the worker stucked on registering state.

[kni@provisionhost-0-0 ~]$ oc get bmh -A
NAMESPACE               NAME                   STATUS   PROVISIONING STATUS      CONSUMER                                  BMC                                                                                    HARDWARE PROFILE   ONLINE   ERROR
default                 openshift-worker-0-2                                                                               redfish://192.168.123.1:8000/redfish/v1/Systems/84c713cf-2bc4-43c5-8a00-86c8c2ba8d25                      true     
openshift-machine-api   openshift-master-0-0   OK       externally provisioned   ocp-edge-cluster-0-mnn2d-master-0         redfish://192.168.123.1:8000/redfish/v1/Systems/20b39e3d-58c3-4bc4-94af-975200ae63b4                      true     
openshift-machine-api   openshift-master-0-1   OK       externally provisioned   ocp-edge-cluster-0-mnn2d-master-1         redfish://192.168.123.1:8000/redfish/v1/Systems/ce2645ae-08e3-4f8a-9622-fb9fd788b8ea                      true     
openshift-machine-api   openshift-master-0-2   OK       externally provisioned   ocp-edge-cluster-0-mnn2d-master-2         redfish://192.168.123.1:8000/redfish/v1/Systems/3094a111-e3aa-4ca3-a5b7-f87e6c916aa7                      true     
openshift-machine-api   openshift-worker-0-0   OK       provisioned              ocp-edge-cluster-0-mnn2d-worker-0-bz8wr   redfish://192.168.123.1:8000/redfish/v1/Systems/1817896d-ecc8-4cb6-aae8-aa0b8d43a0e1   unknown            true     
openshift-machine-api   openshift-worker-0-1   OK       provisioned              ocp-edge-cluster-0-mnn2d-worker-0-nbgkc   redfish://192.168.123.1:8000/redfish/v1/Systems/b4283036-875b-4bcc-aa4d-8d350c53f11d   unknown            true     
openshift-machine-api   openshift-worker-0-2            registering                                                        redfish://192.168.123.1:8000/redfish/v1/Systems/84c713cf-2bc4-43c5-8a00-86c8c2ba8d25                      true     

 
[kni@provisionhost-0-0 ~]$ cat worker-0-2.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: openshift-worker-0-2-bmc-secret
type: Opaque
data:
  username: YWRtaW4K
  password: cGFzc3dvcmQK
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: openshift-worker-0-2
spec:
  online: true
  bmc:
    address: redfish://192.168.123.1:8000/redfish/v1/Systems/84c713cf-2bc4-43c5-8a00-86c8c2ba8d25
    credentialsName: openshift-worker-0-2-bmc-secret
    disableCertificateVerification: True
    username: admin
    password: password
  bootMACAddress: 52:54:00:1e:43:06
  hardwareProfile: unknown
---------------------------------------------------------------
What did you expect to happen?
We expect to see error regarding the existed mac-address or ready state and getting error after this step ($ oc scale machineset -n openshift-machine-api ocp-edge-cluster-0-worker-0 --replicas=N+1).
---------------------------------------------------------------
How to reproduce it (as minimally and precisely as possible)?
1. $ ssh kni@provisionhost-0-0
2. Create a file for the new bmh we want to deploy:
   $ vi new-nodeX.yaml
Inside the file, put MAC address and IP address similar to exist deployed node.

apiVersion: v1
kind: Secret
metadata:
  name: openshift-worker-0-X-bmc-secret
type: Opaque
data:
  username: <YWRtaW4K>
  password: <cGFzc3dvcmQK>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: openshift-worker-0-X
spec:
  online: true
  bmc:
    address: <redfish://192.168.123.1:8000/redfish/v1/Systems/e2e8a52d-1012-4eec-a22b-dfd57f0df50b>
    credentialsName: openshift-worker-0-X-bmc-secret
    disableCertificateVerification: True
    username: admin
    password: password
  bootMACAddress: <52:54:00:e4:d1:13>
  rootDeviceHints:
    deviceName: /dev/sda

3. Add the new BMH:
   $ oc create -f new-nodeX.yaml -n openshift-machine-api

4. Result: An error message will indicate that there is already a BMH with the same MAC and IP address.

--------------------------------------------
must-gather - https://drive.google.com/drive/folders/1oVhbl0oXEu1LWAuSxs3sunOpPCHEE7tS?usp=sharing

Comment 2 Polina Rabinovich 2021-03-09 10:17:39 UTC
I put MAC address and IP address similar to exist deployed node I didn't get an error message will indicate that there is already a BMH with the same MAC and IP address, I got only registration error. In this regard, I wanted to ask if this is enough and I can move the bug to "verified"?

[kni@provisionhost-0-0 ~]$ oc get bmh -n openshift-machine-api
NAME                   STATE                    CONSUMER                                  ONLINE   ERROR
openshift-master-0-0   externally provisioned   ocp-edge-cluster-0-jmht5-master-0         true     
openshift-master-0-1   externally provisioned   ocp-edge-cluster-0-jmht5-master-1         true     
openshift-master-0-2   externally provisioned   ocp-edge-cluster-0-jmht5-master-2         true     
openshift-worker-0-0   provisioned              ocp-edge-cluster-0-jmht5-worker-0-lqnwr   true     
openshift-worker-0-1   provisioned              ocp-edge-cluster-0-jmht5-worker-0-6795b   true     
openshift-worker-0-2   registering                                                        true     registration error

Comment 3 Honza Pokorny 2021-03-09 12:38:05 UTC
The error should read something like:

"MAC Address 00:e2:b4:d9:0a:f1 conflicts with existing host ostest-worker-0"

Would do you mean by the MAC address being similar?

Could you attach the yaml output of the above command?

Comment 4 Polina Rabinovich 2021-03-09 13:22:55 UTC
(In reply to Honza Pokorny from comment #3)
> The error should read something like:
> 
> "MAC Address 00:e2:b4:d9:0a:f1 conflicts with existing host ostest-worker-0"
> 
> Would do you mean by the MAC address being similar?
> 
> Could you attach the yaml output of the above command?

1.I put for the new node (worker-0-2) MAC address and IP address the same to exist deployed node (worker-0-1).
2.I can see the error message in the yaml file:
 name: openshift-worker-0-2
    namespace: openshift-machine-api
    resourceVersion: "69903"
    uid: 63484031-70e8-428e-8c05-0e0332b03ded
  spec:
    bmc:
      address: redfish://192.168.123.1:8000/redfish/v1/Systems/a832161c-fe24-422e-93f8-4ae721b872b5
      credentialsName: openshift-worker-0-2-bmc-secret
      disableCertificateVerification: true
    bootMACAddress: 52:54:00:af:6e:ab
    hardwareProfile: unknown
    online: true
  status:
    errorCount: 7
    errorMessage: MAC address 52:54:00:af:6e:ab conflicts with existing node openshift-worker-0-1
    errorType: registration error
So this is enough that I can see the error only in the yaml file?

Comment 5 Honza Pokorny 2021-03-09 14:16:32 UTC
Yes, I think this is sufficient. Thanks

Comment 8 errata-xmlrpc 2021-07-27 22:51:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438