Bug 1915262

Summary: When deploying with assisted install the CBO operator is installed and enabled without metal3 pod
Product: OpenShift Container Platform Reporter: Raviv Bar-Tal <rbartal>
Component: assisted-installerAssignee: Angus Salkeld <asalkeld>
assisted-installer sub component: Installer QA Contact: Udi Kalifon <ukalifon>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: alazar, angus, aos-bugs, asalkeld, hpokorny, ohochman, rfreiman, ssmolyak, stbenjam, steve.benjamin
Version: 4.7Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: For version openshift 4.7+ the Provisioning CR is now not getting deleted and the provisioningNetwork is now set to Disabled. Also baremetalHost CRs are set to externallyProvisioned. Reason: This and dependent work was required have the baremetal operators enabled with assisted-installer. Result: There is now better node linking and all the baremetal services are running.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:52:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raviv Bar-Tal 2021-01-12 11:05:47 UTC
Version:

# oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-fc.0   True        False         4d21h   Cluster version is 4.7.0-fc.0

Platform:
Assisted installer 


What happened?
After deployment we have the CBO (cluster baremetal operatr),
But we do not have the metal3 pod on machine -api don't have the provisioning 

What did you expect to happen?
The status is in between ipi where the CBO and metal3 are fully installed,
and cloud where the CBO is disabled.
#Enter text here.

How to reproduce it (as minimally and precisely as possible)?
Use assisted installer to deploy the cluster,
or use Jenkins job:
https://jenkins-fci-continuous-productization.cloud.paas.psi.redhat.com/view/Assisted-Installer/job/ocp-assisted/

$ your-commands-here
[root@titan60 ~]# oc get clusteroperators
NAME                                       VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE                                                                                           
authentication                             4.7.0-fc.0   True        False         False      8h                                                                                              
baremetal                                  4.7.0-fc.0   True        False         False      4d21h 

----------------------
[root@titan60 ~]# oc get pods -n openshift-machine-api
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-autoscaler-operator-84c7c78f7c-nlxvc   2/2     Running   0          4d21h
cluster-baremetal-operator-77cf96f46b-ph5fb    1/1     Running   0          4d20h
machine-api-controllers-f4f84d4c9-jxjkk        7/7     Running   0          4d20h
machine-api-operator-5c7456b446-k2qgh          2/2     Running   0          4d20h
[root@titan60 ~]# 

--------------------------
[root@titan60 ~]# oc get provisioning -n openshift-machine-api
No resources found

--------------------------------
[root@titan60 ~]# oc get secrets
NAME                       TYPE                                  DATA   AGE
builder-dockercfg-7w8sj    kubernetes.io/dockercfg               1      4d21h
builder-token-h4tbv        kubernetes.io/service-account-token   4      4d21h
builder-token-sgzdf        kubernetes.io/service-account-token   4      4d21h
default-dockercfg-sn2mf    kubernetes.io/dockercfg               1      4d21h
default-token-9m4gn        kubernetes.io/service-account-token   4      4d22h
default-token-q5m5x        kubernetes.io/service-account-token   4      4d21h
deployer-dockercfg-zmr8t   kubernetes.io/dockercfg               1      4d21h
deployer-token-ftf8m       kubernetes.io/service-account-token   4      4d21h
deployer-token-wzstq       kubernetes.io/service-account-token   4      4d21h

Anything else we need to know?

#Enter text here.

Comment 1 Sasha Smolyak 2021-01-12 11:39:20 UTC
The all-over state looks like the provisioning CR was installed and then deleted. And we don't know if it's the expected behaviour

Comment 2 Stephen Benjamin 2021-01-12 17:29:11 UTC
This looks correct to me. We do indeed start CBO because the platform=baremetal when the platform is installed by assisted. However, they don't install the provisioning CR yet. So platform=baremetal, without our provisioning configuration.  That means CBO starts without the Metal3 pod. That's a valid configuration.

There's work in progress to turn on the Metal3 pod (https://github.com/openshift/assisted-service/pull/709).

Anyway, I don't think there's a bug here but you can have the assisted folks look to make sure what I said is correct.

Comment 3 Rom Freiman 2021-01-12 19:32:40 UTC
@stbenjam  is right. It's just not implemented yet.
@asalkeld is working on it

Comment 4 Michael Filanov 2021-01-13 09:24:55 UTC
@rfreiman @alazar what do you think?

Comment 5 Angus Salkeld 2021-01-20 10:22:04 UTC
This is really https://issues.redhat.com/browse/MGMT-2662
the only remaining PR to be merged is https://github.com/openshift/assisted-service/pull/709

Comment 6 Angus Salkeld 2021-02-02 06:41:05 UTC
@rbartal the required code is now merged. Can you retest please?

Comment 7 Sasha Smolyak 2021-02-04 09:23:55 UTC
> oc get secrets -n openshift-machine-api
...
metal3-ironic-inspector-password                  Opaque                                4      22h
metal3-ironic-password                            Opaque                                4      22h
metal3-ironic-rpc-password                        Opaque                                4      22h
metal3-mariadb-password                           Opaque                                1      22h
worker-user-data                                  Opaque                                2      22h

> oc get provisioning
NAME                         AGE
provisioning-configuration   22h

> oc get pods -n openshift-machine-api
NAME                                           READY   STATUS    RESTARTS   AGE
...
metal3-7468bd47d8-vgt2f                        8/8     Running   0          22h
metal3-image-cache-628hb                       1/1     Running   0          22h
metal3-image-cache-bs5t2                       1/1     Running   0          22h
metal3-image-cache-fv9pf                       1/1     Running   0          22h

> oc get clusteroperators
NAME                                       VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-fc.0   True        False         False      22h
baremetal                                  4.7.0-fc.0   True        False         False      22h


All the resources are found, closing the bug

Comment 8 Rom Freiman 2021-02-04 11:31:01 UTC
What's the status of the relevant CRs for wokers and masters?

Comment 9 Sasha Smolyak 2021-02-08 09:22:05 UTC
NAMESPACE               NAME                   STATUS   PROVISIONING STATUS      CONSUMER                                  BMC                                                                                                 HARDWARE PROFILE   ONLINE   ERROR
openshift-machine-api   openshift-master-0-0   OK       externally provisioned   ocp-edge-cluster-0-76psx-master-0         redfish-virtualmedia://192.168.123.1:8000/redfish/v1/Systems/d2d541d9-33d8-43eb-b8ed-1126cfbb6908                      true     
openshift-machine-api   openshift-master-0-1   OK       externally provisioned   ocp-edge-cluster-0-76psx-master-1         redfish-virtualmedia://192.168.123.1:8000/redfish/v1/Systems/16c5d388-5d07-4cff-81c8-11c692ecf62d                      true     
openshift-machine-api   openshift-master-0-2   OK       externally provisioned   ocp-edge-cluster-0-76psx-master-2         redfish-virtualmedia://192.168.123.1:8000/redfish/v1/Systems/9c1b9008-a968-4f14-9ea5-d44cd8b5555a                      true     
openshift-machine-api   openshift-worker-0-0   OK       provisioned              ocp-edge-cluster-0-76psx-worker-0-zq7j4   redfish-virtualmedia://192.168.123.1:8000/redfish/v1/Systems/afdf9b87-0af4-4c4e-b36f-80c2605c1ef5   unknown            true     
openshift-machine-api   openshift-worker-0-1   OK       provisioned              ocp-edge-cluster-0-76psx-worker-0-zm65s   redfish-virtualmedia://192.168.123.1:8000/redfish/v1/Systems/495dcd3b-c22c-4507-902c-53b7a30a0e55   unknown            true     

masters are externally provisioned

Comment 12 errata-xmlrpc 2021-02-24 15:52:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633