Bug 1910352 - When creating a worker with a used mac-address stuck on registering [NEEDINFO]
Summary: When creating a worker with a used mac-address stuck on registering
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.7
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: 4.7.z
Assignee: Honza Pokorny
QA Contact: Polina Rabinovich
URL:
Whiteboard:
Depends On: 1936534
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-23 14:52 UTC by Polina Rabinovich
Modified: 2021-03-30 04:47 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-30 04:46:29 UTC
Target Upstream Version:
prabinov: needinfo? (hpokorny)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-operator pull 131 0 None open Bug 1910352: Fail registration when boot MAC address conflicts 2021-03-08 17:17:19 UTC
Red Hat Product Errata RHSA-2021:0957 0 None None None 2021-03-30 04:47:52 UTC

Description Polina Rabinovich 2020-12-23 14:52:10 UTC
Version:

$ openshift-install version
Client Version: 4.7.0-0.nightly-2020-12-21-131655
Server Version: 4.7.0-0.nightly-2020-12-21-131655
Kubernetes Version: v1.20.0+87544c5
---------------------------------------------------------------

Platform:

#Please specify the platform type: aws, libvirt, openstack or baremetal etc.
libvirt
---------------------------------------------------------------

What happened?
We created a worker that hold an existing deployed worker's mac-address. We expected to get an error indicate that there is already a worker with the same mac-address. But the status of the worker stucked on registering state.

[kni@provisionhost-0-0 ~]$ oc get bmh -A
NAMESPACE               NAME                   STATUS   PROVISIONING STATUS      CONSUMER                                  BMC                                                                                    HARDWARE PROFILE   ONLINE   ERROR
default                 openshift-worker-0-2                                                                               redfish://192.168.123.1:8000/redfish/v1/Systems/84c713cf-2bc4-43c5-8a00-86c8c2ba8d25                      true     
openshift-machine-api   openshift-master-0-0   OK       externally provisioned   ocp-edge-cluster-0-mnn2d-master-0         redfish://192.168.123.1:8000/redfish/v1/Systems/20b39e3d-58c3-4bc4-94af-975200ae63b4                      true     
openshift-machine-api   openshift-master-0-1   OK       externally provisioned   ocp-edge-cluster-0-mnn2d-master-1         redfish://192.168.123.1:8000/redfish/v1/Systems/ce2645ae-08e3-4f8a-9622-fb9fd788b8ea                      true     
openshift-machine-api   openshift-master-0-2   OK       externally provisioned   ocp-edge-cluster-0-mnn2d-master-2         redfish://192.168.123.1:8000/redfish/v1/Systems/3094a111-e3aa-4ca3-a5b7-f87e6c916aa7                      true     
openshift-machine-api   openshift-worker-0-0   OK       provisioned              ocp-edge-cluster-0-mnn2d-worker-0-bz8wr   redfish://192.168.123.1:8000/redfish/v1/Systems/1817896d-ecc8-4cb6-aae8-aa0b8d43a0e1   unknown            true     
openshift-machine-api   openshift-worker-0-1   OK       provisioned              ocp-edge-cluster-0-mnn2d-worker-0-nbgkc   redfish://192.168.123.1:8000/redfish/v1/Systems/b4283036-875b-4bcc-aa4d-8d350c53f11d   unknown            true     
openshift-machine-api   openshift-worker-0-2            registering                                                        redfish://192.168.123.1:8000/redfish/v1/Systems/84c713cf-2bc4-43c5-8a00-86c8c2ba8d25                      true     

 
[kni@provisionhost-0-0 ~]$ cat worker-0-2.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: openshift-worker-0-2-bmc-secret
type: Opaque
data:
  username: YWRtaW4K
  password: cGFzc3dvcmQK
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: openshift-worker-0-2
spec:
  online: true
  bmc:
    address: redfish://192.168.123.1:8000/redfish/v1/Systems/84c713cf-2bc4-43c5-8a00-86c8c2ba8d25
    credentialsName: openshift-worker-0-2-bmc-secret
    disableCertificateVerification: True
    username: admin
    password: password
  bootMACAddress: 52:54:00:1e:43:06
  hardwareProfile: unknown
---------------------------------------------------------------
What did you expect to happen?
We expect to see error regarding the existed mac-address or ready state and getting error after this step ($ oc scale machineset -n openshift-machine-api ocp-edge-cluster-0-worker-0 --replicas=N+1).
---------------------------------------------------------------
How to reproduce it (as minimally and precisely as possible)?
1. $ ssh kni@provisionhost-0-0
2. Create a file for the new bmh we want to deploy:
   $ vi new-nodeX.yaml
Inside the file, put MAC address and IP address similar to exist deployed node.

apiVersion: v1
kind: Secret
metadata:
  name: openshift-worker-0-X-bmc-secret
type: Opaque
data:
  username: <YWRtaW4K>
  password: <cGFzc3dvcmQK>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: openshift-worker-0-X
spec:
  online: true
  bmc:
    address: <redfish://192.168.123.1:8000/redfish/v1/Systems/e2e8a52d-1012-4eec-a22b-dfd57f0df50b>
    credentialsName: openshift-worker-0-X-bmc-secret
    disableCertificateVerification: True
    username: admin
    password: password
  bootMACAddress: <52:54:00:e4:d1:13>
  rootDeviceHints:
    deviceName: /dev/sda

3. Add the new BMH:
   $ oc create -f new-nodeX.yaml -n openshift-machine-api

4. Result: An error message will indicate that there is already a BMH with the same MAC and IP address.

--------------------------------------------
must-gather - https://drive.google.com/drive/folders/1oVhbl0oXEu1LWAuSxs3sunOpPCHEE7tS?usp=sharing

Comment 4 sdasu 2021-01-19 18:11:16 UTC
Honza, this might be related to some work [https://github.com/metal3-io/baremetal-operator/pull/581] you have been doing on the BMO to check if the MAC address in the BMH is already in use. Please feel free to re-assign if you are no longer working on it.

Comment 6 Honza Pokorny 2021-02-02 15:20:40 UTC
This has been fixed upstream:

https://github.com/metal3-io/baremetal-operator/pull/776

https://github.com/metal3-io/baremetal-operator/pull/780

But not yet backported downstream; setting back to assigned until that happens

Comment 10 Polina Rabinovich 2021-03-22 14:25:39 UTC
It's not fixed, I checked in 4.7.3
The status of the worker still stucked on registering state without error:

[kni@provisionhost-0-0 ~]$ oc get bmh -n openshift-machine-api
NAME                   STATUS   PROVISIONING STATUS      CONSUMER                                  BMC                                                                                    HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0-0   OK       externally provisioned   ocp-edge-cluster-0-s7q9d-master-0         redfish://192.168.123.1:8000/redfish/v1/Systems/a3d533e4-6c95-40b0-b280-d9778a8acd09                      true     
openshift-master-0-1   OK       externally provisioned   ocp-edge-cluster-0-s7q9d-master-1         redfish://192.168.123.1:8000/redfish/v1/Systems/7be4f29c-26b8-48a9-9376-89d8ce5891c0                      true     
openshift-master-0-2   OK       externally provisioned   ocp-edge-cluster-0-s7q9d-master-2         redfish://192.168.123.1:8000/redfish/v1/Systems/577c051c-423f-41f9-9ecd-e1c618599cda                      true     
openshift-worker-0-0   OK       provisioned              ocp-edge-cluster-0-s7q9d-worker-0-bbj2f   redfish://192.168.123.1:8000/redfish/v1/Systems/9cb149bf-609d-4a6d-8e50-2251c43b2a66   unknown            true     
openshift-worker-0-1            registering                                                        redfish://192.168.123.1:8000/redfish/v1/Systems/9cb149bf-609d-4a6d-8e50-2251c43b2a66                      true     


From the yaml file:

name: openshift-worker-0-1
    namespace: openshift-machine-api
    resourceVersion: "1381496"
    selfLink: /apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts/openshift-worker-0-1
    uid: b81dd001-c3b9-41e8-9159-babdf67e2327
  spec:
    bmc:
      address: redfish://192.168.123.1:8000/redfish/v1/Systems/9cb149bf-609d-4a6d-8e50-2251c43b2a66
      credentialsName: openshift-worker-0-1-bmc-secret
      disableCertificateVerification: true
    bootMACAddress: 52:54:00:52:e5:4a
    hardwareProfile: unknown
    online: true
  status:
    errorCount: 0
    errorMessage: ""
    goodCredentials: {}
    hardwareProfile: ""

Comment 11 Polina Rabinovich 2021-03-26 16:18:14 UTC
Verified in:
[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-03-26-090502   True        False         13m     Cluster version is 4.7.0-0.nightly-2021-03-26-090502
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[kni@provisionhost-0-0 ~]$ oc get bmh -A
NAMESPACE               NAME                   STATUS   PROVISIONING STATUS      CONSUMER                                  BMC                                                                                    HARDWARE PROFILE   ONLINE   ERROR
openshift-machine-api   openshift-master-0-0   OK       externally provisioned   ocp-edge-cluster-0-lw7k6-master-0         redfish://192.168.123.1:8000/redfish/v1/Systems/f39229e7-8a2d-4b5c-bf6a-2fe7669b422e                      true     
openshift-machine-api   openshift-master-0-1   OK       externally provisioned   ocp-edge-cluster-0-lw7k6-master-1         redfish://192.168.123.1:8000/redfish/v1/Systems/d2dc9287-99ad-4e81-837b-b435250f1cda                      true     
openshift-machine-api   openshift-master-0-2   OK       externally provisioned   ocp-edge-cluster-0-lw7k6-master-2         redfish://192.168.123.1:8000/redfish/v1/Systems/587047ff-7951-40dd-b263-b6be34d8450d                      true     
openshift-machine-api   openshift-worker-0-0   OK       provisioned              ocp-edge-cluster-0-lw7k6-worker-0-mctlz   redfish://192.168.123.1:8000/redfish/v1/Systems/82f92ca2-6235-42df-8820-b1522b44fed9   unknown            true     
openshift-machine-api   openshift-worker-0-1   OK       provisioned              ocp-edge-cluster-0-lw7k6-worker-0-s8rz4   redfish://192.168.123.1:8000/redfish/v1/Systems/2df0b238-3885-464b-a442-c358f6717733   unknown            false    
openshift-machine-api   openshift-worker-0-2   error    registering                                                        redfish://192.168.123.1:8000/redfish/v1/Systems/2df0b238-3885-464b-a442-c358f6717733                      true     MAC address 52:54:00:ee:5e:a2 conflicts with existing node openshift-worker-0-1

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

name: openshift-worker-0-2
    namespace: openshift-machine-api
    resourceVersion: "38677"
    selfLink: /apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts/openshift-worker-0-2
    uid: 0a158267-398b-4802-9957-a588b4080880
  spec:
    bmc:
      address: redfish://192.168.123.1:8000/redfish/v1/Systems/2df0b238-3885-464b-a442-c358f6717733
      credentialsName: openshift-worker-0-2-bmc-secret
      disableCertificateVerification: true
    bootMACAddress: 52:54:00:ee:5e:a2
    hardwareProfile: unknown
    online: true
  status:
    errorCount: 3
    errorMessage: MAC address 52:54:00:ee:5e:a2 conflicts with existing node openshift-worker-0-1
    errorType: registration error
    goodCredentials: {}
    hardwareProfile: ""
    lastUpdated: "2021-03-26T16:13:46Z"
    operationHistory:
      deprovision:
        end: null
        start: null
      inspect:
        end: null
        start: null
      provision:
        end: null
        start: null
      register:
        end: null
        start: "2021-03-26T16:12:46Z"
    operationalStatus: error
    poweredOn: false
    provisioning:
      ID: ""
      image:
        checksum: ""
        url: ""
      state: registering

Comment 14 errata-xmlrpc 2021-03-30 04:46:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.4 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0957


Note You need to log in before you can comment on or make changes to this bug.