Bug 1966947 - [4.9.0] v1beta1.Machine is not registered in scheme, causing bmh_agent_controller reconcileSpokeBMH to fail
Summary: [4.9.0] v1beta1.Machine is not registered in scheme, causing bmh_agent_contro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.9.0
Assignee: Flavio Percoco
QA Contact: Trey West
URL:
Whiteboard: AI-Team-Platform
Depends On: 1965007
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-02 08:21 UTC by Flavio Percoco
Modified: 2021-10-26 17:22 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1965007
Environment:
Last Closed: 2021-10-26 17:22:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 1944 0 None open [ocm-2.3] Bug 1966947: Unify client Scheme registration 2021-06-07 12:27:52 UTC
Red Hat Product Errata RHBA-2021:3935 0 None None None 2021-10-26 17:22:48 UTC

Description Flavio Percoco 2021-06-02 08:21:06 UTC
+++ This bug was initially created as a clone of Bug #1965007 +++

Description of problem:

When a remote worker is added for the spoke cluster, bmh_agent_controller reconcile fails with 

time="2021-05-26T14:07:47Z" level=error msg="failed to create or update spoke Machine" func="github.com/openshift/assisted-service/internal/controller/controllers.(*BMACReconciler).reconcileSpokeBMH" file="/go/src/github.com/openshift/origin/internal/controller/controllers/bmh_agent_controller.go:522" error="no kind is registered for the type v1beta1.Machine in scheme \"pkg/runtime/scheme.go:100\""


Version-Release number of selected component (if applicable):


How reproducible:

always

Steps to Reproduce:

1. Install dev-scripts cluster (3 master + 1 worker) with 4 extra nodes for the spoke cluster.
2. make assisted_deployment
3. Create pull secret, cluster-ssh-key, ClusterImageSet, InfraEnv, ClusterDeployment. 
4. Apply dev-scripts/ocp/ostest/extra_host_manifests.yaml for first 3 BMH. Add these to the BMH definitions:

  annotations:
    # BMAC will add this annotation if not present
    inspect.metal3.io: disabled
  labels:
    infraenvs.agent-install.openshift.io: "bmac-test"
  spec:
    automatedCleaningMode: disabled

5. After agents are discovered, approve them:
kubectl -n assisted-installer patch agents.agent-install.openshift.io 132fb56c-3d7b-4c00-8944-26d8fc6ac8ca -p '{"spec":{"approved":true}}' --type merge

6. Wait for spoke cluster deployment to complete install.

7. Create worker node using the 4th BMH definition in extra_host_manifests.yaml

Actual results:

bmh_agent_controller reconcileSpokeBMH fails.

Expected results:

bmh_agent_controller reconcileSpokeBMH succeeds.

Additional info:

--- Additional comment from Flavio Percoco on 2021-06-01 06:20:58 UTC ---


> time="2021-05-26T14:07:47Z" level=error msg="failed to create or update spoke Machine" func="github.com/openshift/assisted-service/internal/controller/controllers.(*BMACReconciler).reconcileSpokeBMH" file="/go/src/github.com/openshift/origin/internal/controller/controllers/bmh_agent_controller.go:522" error="no kind is registered for the type v1beta1.Machine in scheme \"pkg/runtime/scheme.go:100\""


Checked Richard's environment and the CRD's exist in the Spoke Cluster:

machinesets.machine.openshift.io
machines.machine.openshift.io

This suggests the issue may be a simple Manager/Client instantiation issue since the Machine scheme's are currently not added to the runtime Scheme: https://github.com/openshift/assisted-service/blob/846f2dc89d10b74ed95cb99a6a8888902fb11497/cmd/main.go#L693-L710

Comment 3 Trey West 2021-06-18 15:05:00 UTC
@fpercoco 

I am trying to verify this with ACM 2.3. I currently don't see any logs from assisted-service regarding reconcileSpokeBMH. These are the only logs I see:

time="2021-06-18T14:54:47Z" level=error msg="failed to register host <276f3a07-fdca-46af-bed4-8b7f58426c20> to cluster d4afb783-c079-4d3c-bce0-321d0aa6d203 due to: Cannot add host to a cluster that is already installed, please use the day2 cluster option" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).RegisterHost" file="/remote-source/app/internal/bminventory/inventory.go:2494" cluster_id=d4afb783-c079-4d3c-bce0-321d0aa6d203 error="Cannot add host to a cluster that is already installed, please use the day2 cluster option" go-id=460011 pkg=Inventory request_id=9c271813-2bca-4c96-90ed-1c9bf0b05bc1
time="2021-06-18T14:54:47Z" level=error msg="RegisterHost failed" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).RegisterHost.func1" file="/remote-source/app/internal/bminventory/inventory.go:2469" cluster_id=d4afb783-c079-4d3c-bce0-321d0aa6d203 go-id=460011 pkg=Inventory request_id=9c271813-2bca-4c96-90ed-1c9bf0b05bc1

Comment 4 Flavio Percoco 2021-06-21 05:41:55 UTC
(In reply to Trey West from comment #3)
> @fpercoco 
> 
> I am trying to verify this with ACM 2.3. I currently don't see any logs from
> assisted-service regarding reconcileSpokeBMH. These are the only logs I see:
> 
> time="2021-06-18T14:54:47Z" level=error msg="failed to register host
> <276f3a07-fdca-46af-bed4-8b7f58426c20> to cluster
> d4afb783-c079-4d3c-bce0-321d0aa6d203 due to: Cannot add host to a cluster
> that is already installed, please use the day2 cluster option"
> func="github.com/openshift/assisted-service/internal/bminventory.
> (*bareMetalInventory).RegisterHost"
> file="/remote-source/app/internal/bminventory/inventory.go:2494"
> cluster_id=d4afb783-c079-4d3c-bce0-321d0aa6d203 error="Cannot add host to a
> cluster that is already installed, please use the day2 cluster option"
> go-id=460011 pkg=Inventory request_id=9c271813-2bca-4c96-90ed-1c9bf0b05bc1
> time="2021-06-18T14:54:47Z" level=error msg="RegisterHost failed"
> func="github.com/openshift/assisted-service/internal/bminventory.
> (*bareMetalInventory).RegisterHost.func1"
> file="/remote-source/app/internal/bminventory/inventory.go:2469"
> cluster_id=d4afb783-c079-4d3c-bce0-321d0aa6d203 go-id=460011 pkg=Inventory
> request_id=9c271813-2bca-4c96-90ed-1c9bf0b05bc1

I don't think this needs further verification, TBH. This was more a programmatic issue than a deployment one. It can be better verified when deploying a day 2 cluster, which is not a priority right now.

Feel free to switch it to VERIFIED if the regular 4.8 flow works.

Comment 5 Trey West 2021-06-22 17:55:02 UTC
Hi @fpercoco, lets wait to move this to verified after the Day2 flow works. I know that there are other bugs opened currently blocking it. For example: https://bugzilla.redhat.com/show_bug.cgi?id=1959869

Once that one and any others blocking the day2 installation are moved to ON_QA we can verify them all at the same time.

Comment 6 Trey West 2021-06-28 13:41:44 UTC
@fpercoco, since this is Day2 can we move target release to 4.9?

Comment 8 bjacot 2021-08-02 13:58:04 UTC
moving to 4.9 for day 2 flow.

Comment 13 Trey West 2021-10-20 13:00:12 UTC
VERIFIED on 2.4.0-DOWNSTREAM-2021-10-15-19-58-05

Comment 15 errata-xmlrpc 2021-10-26 17:22:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.4 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3935


Note You need to log in before you can comment on or make changes to this bug.