Bug 2096445

Summary: Assisted service POD keeps crashing after a bare metal host is created
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Eran Cohen <ercohen>
Component: Infrastructure OperatorAssignee: Eran Cohen <ercohen>
Status: CLOSED ERRATA QA Contact: Chad Crum <ccrum>
Severity: high Docs Contact: Derek <dcadzow>
Priority: unspecified    
Version: rhacm-2.6CC: cbynum, ccrum, trwest, yfirst
Target Milestone: ---Flags: cbynum: rhacm-2.6+
cbynum: rhacm-2.6.z+
Target Release: rhacm-2.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-06 22:30:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eran Cohen 2022-06-13 20:41:47 UTC
Description of the problem:
Unable to deploy a spoke cluster on an OCP 4.11 hub - spoke BMHs stuck in inspecting state and assisted-service pod in CrashLoopBackOff


ernal/controller/controllers/preprovisioningimage_controller.go:78" go-id=733 preprovisioning_image=ostest-extraworker-0 preprovisioning_image_namespace=openshift-machine-api request_id=0921967b-ab58-4c39-a618-95fb8ba57d02
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x273619f]

goroutine 733 [running]:
github.com/openshift/assisted-service/internal/controller/controllers.(*PreprovisioningImageReconciler).AddIronicAgentToInfraEnv(0xc001d08900, {0x36c75d0, 0xc0006b5c50}, {0x375b410, 0xc0011c9b90}, 0xc001168b40)
	/go/src/github.com/openshift/origin/internal/controller/controllers/preprovisioningimage_controller.go:292 +0x17f
github.com/openshift/assisted-service/internal/controller/controllers.(*PreprovisioningImageReconciler).Reconcile(0xc001d08900, {0x36c75d0, 0xc0006b5c20}, {{{0xc001150f30, 0x2ee5940}, {0xc001150f18, 0x30}}})
	/go/src/github.com/openshift/origin/internal/controller/controllers/preprovisioningimage_controller.go:102 +0xb8b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc00080e210, {0x36c75d0, 0xc0006b5bc0}, {{{0xc001150f30, 0x2ee5940}, {0xc001150f18, 0x415694}}})
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114 +0x26f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00080e210, {0x36c7528, 0xc000ca9fc0}, {0x2cc5420, 0xc0008d3ea0})
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311 +0x33e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00080e210, {0x36c7528, 0xc000ca9fc0})
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x357

Release version:

Release version:
- Latest upstream assisted-service-operator
- OCP 4.11 on hub (4.11.0-0.nightly-2022-05-25-193227)

Operator snapshot version:

OCP version:

Browser Info:

Steps to reproduce:
1. Create an infraEnv (without clusterRef)
2. Create BMH
3.

Actual results:

Expected results:

Additional info:

Comment 1 Chad Crum 2022-07-19 17:16:36 UTC
QE no longer seeing assisted pod crash or bmh stuck inspecting with recent builds. Spoke deploys properly e2e with converged flow enabled.

Comment 4 errata-xmlrpc 2022-09-06 22:30:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.6.0 security updates and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6370