Bug 1893832

Summary: ErrorCount field is missing in baremetalhosts.metal3.io CRD
Product: OpenShift Container Platform Reporter: Shelly Miron <smiron>
Component: InstallerAssignee: Andrea Fasano <afasano>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Shelly Miron <smiron>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: low CC: kiran, prabinov, rbartal, zbitter
Version: 4.7Keywords: OtherQA, Triaged, UpcomingSprint
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:29:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
baremetalhosts.metal3.io_crd
none
baremetalhosts.metal3.io_crd-new
none
get_bmh-new.txt none

Description Shelly Miron 2020-11-02 18:18:33 UTC
Created attachment 1725940 [details]
baremetalhosts.metal3.io_crd

Version:
-------------------------
Client Version: 4.7.0-0.nightly-2020-10-27-051128
Server Version: 4.7.0-0.nightly-2020-10-27-051128
Kubernetes Version: v1.19.0+e67f5dc

Setup:
-------------------------
Provisioning_net_IPv6, Baremetal_net_IPv4, disconnected install


Platform:
-------------------------
libvirt
IPI (automated install with `openshift-baremetal-install`) 


What happened?
-------------------------

After we deployed a cluster successfully, when running $oc get bmh -A -o yaml, we expect to see 'ErrorCount' field as described here:

https://github.com/openshift/baremetal-operator/blob/01e0c1e89144deb1b05f689f29b45a749c660a3d/config/crd/bases/metal3.io_baremetalhosts.yaml#L252

But after I failed to scale down (which is a known bug in 4.7), I expect to see that the ErrorCount field would appear and hold a value, but it isn't described in the bmh description, nor in the baremetalhosts.metal3.io CRD (attached to the bug).

$oc get bmh -A -o yaml

.......
...................

spec:
    bmc:
      address: redfish://192.168.123.1:8000/redfish/v1/Systems/b7a2977b-0375-47ce-aa23-4d14968f15fd
      credentialsName: openshift-worker-0-1-bmc-secret
      disableCertificateVerification: true
    bootMACAddress: 52:54:00:d6:06:21
    consumerRef:
      apiVersion: machine.openshift.io/v1beta1
      kind: Machine
      name: ocp-edge-cluster-0-dh4qg-worker-0-5x94p
      namespace: openshift-machine-api
    hardwareProfile: unknown
    online: false
    rootDeviceHints:
      deviceName: /dev/sda
  status:
    errorMessage: ""
    goodCredentials:
      credentials:
        name: openshift-worker-0-1-bmc-secret
        namespace: openshift-machine-api
      credentialsVersion: "18225"
    hardware:
    ...
    .....


it seems like the this field is not showing in the machine-api-operator CRD as shown here:

https://github.com/openshift/machine-api-operator/blob/ed7858da22dec8c5d5d3302252a259e3cd743b6a/install/0000_30_machine-api-operator_08_baremetalhost.crd.yaml#L309



What did you expect to happen?
--------------------------------
we expect to see 'ErrorCount' field in the metal3.io_baremetalhosts.yaml CRD



How to reproduce it
--------------------------------
1. Deploy disconnected env with OCP4.7, IPV6 Provisioning network and IPV4 Baremetal network
2. $oc get bmh -A -o yaml, and search for the 'ErrorCount' field



must-gather: 
---------------------------------
https://drive.google.com/drive/folders/1r9WyF4hdrE43Me68J7vLwGhbTQqO9ygY?usp=sharing

Comment 1 Andrea Fasano 2020-11-04 12:21:06 UTC
*** Bug 1892243 has been marked as a duplicate of this bug. ***

Comment 3 Shelly Miron 2020-11-09 10:15:38 UTC
seems like the issue still exists (no error count field):

status:
    errorMessage: ""
    goodCredentials:
      credentials:
        name: openshift-worker-0-1-bmc-secret
        namespace: openshift-machine-api
      credentialsVersion: "21138"
    hardware:
      cpu:

added the CRD to the bug

Comment 4 Shelly Miron 2020-11-09 10:16:45 UTC
Created attachment 1727729 [details]
baremetalhosts.metal3.io_crd-new

Comment 5 Shelly Miron 2020-11-09 10:17:27 UTC
Created attachment 1727730 [details]
get_bmh-new.txt

Comment 6 Andrea Fasano 2020-11-09 10:36:36 UTC
In 4.7 the BMH CRD will be managed by CBO (and not anymore by MAO). CBO is currently under development, so until the CVO integration will not be enabled the current fix could not be tested (see https://github.com/openshift/cluster-baremetal-operator/blob/bc2f94fb67f989cf13525fa9f843bd3c59159e0e/Dockerfile#L12)

Comment 7 Andrea Fasano 2020-11-09 13:18:56 UTC
The current fix depends on the completion of the epic https://issues.redhat.com/browse/KNIDEPLOY-2171. I've tested it locally and it works fine, when the CBO will be completed it could be tested as part as epic as well.

Comment 8 Zane Bitter 2020-11-09 19:16:09 UTC
Does it hurt to update the CRD in MAO while we wait? It doesn't seem helpful to have the baremetal-operator controller and its CRD out of sync, even if we think it will be temporary.

Comment 9 Andrea Fasano 2020-11-10 12:16:15 UTC
Looks like the CBO migration will take a while to complete, so it could be a good idea to temporary cover MAO as well. /cc @sadasu

Comment 12 errata-xmlrpc 2021-02-24 15:29:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633