2117577 – Assisted Installed fails to deploy spoke compact cluster

Bug 2117577 - Assisted Installed fails to deploy spoke compact cluster

Summary: Assisted Installed fails to deploy spoke compact cluster

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Advanced Cluster Management for Kubernetes
Classification:	Red Hat
Component:	Infrastructure Operator
Sub Component:
Version:	rhacm-2.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Michael Filanov
QA Contact:	Chad Crum
Docs Contact:	Derek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-08-11 11:21 UTC by Gurenko Alex
Modified:	2022-08-17 14:57 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-11 15:49:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	stolostron backlog issues 25051	None	None	None	2022-08-11 11:54:07 UTC
Red Hat Issue Tracker	MGMT-11570	None	None	None	2022-08-11 11:21:15 UTC
Red Hat Issue Tracker	MGMTBUGSM-518	None	None	None	2022-08-11 11:24:45 UTC

Description Gurenko Alex 2022-08-11 11:21:16 UTC

Description of the problem:

As a part of Altiostar deployment, when deploying the spoke compact cluster (3 masters only) deployment fails (times out). After investigation hive-operator pod is crashing and nodes report unreachability. If agents are manually restarted and hive-operator pod is re-created there is a new attempt to deploy cluster until hive-operator crash again.

Release version:

Operator snapshot version: 2.4.6

OCP version: 4.8.43

Browser Info:

Steps to reproduce:
1. Start spoke compact cluster deployment
2. Wait for agents to register

Actual results:

Deployment does not proceed with

The cluster has hosts that are not ready to install.

hive-operator is crashing with following in a log:

time="2022-08-10T11:29:30Z" level=info msg="reconcile complete" controller=hive elapsedMillis=620 elapsedMillisGT=0 outcome=unspecified
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x123e5fc]

NAME READY STATUS RESTARTS AGE

assisted-service-56d7794d8c-kgslh 2/2 Running 14 13h
hive-operator-66dd64b6b7-qfkbk 0/1 CrashLoopBackOff 108 13h

Expected results:

Deployment continues

Additional info:

With previous ocp 4.8.34 I saw this issue once or twice and re-deployment solved it. With .43 currently I've got really stuck as it's reproduced 3 out of 3 complete redeployments.

Comment 1 Michael Filanov 2022-08-11 12:30:41 UTC

Can you please attach cluster deployment and agent cluster install?

Comment 3 Eric Fried 2022-08-11 15:04:53 UTC

This seems to be caused by building with github.com/modern-go/reflect2 < v1.0.2 under go1.18. That's why this just showed up despite no code changes in hive's ocm-2.4 branch in 6mo -- ACM's build recently switched to using go1.18.

Hive is upgrading the dependency via https://issues.redhat.com/browse/HIVE-1997

After that, ACM will need to pick up the change and respin its build.

This will need to be done for 2.3 as well.

Comment 4 Eric Fried 2022-08-16 15:16:48 UTC

Hive side is all done here.

Comment 6 Michael Filanov 2022-08-17 14:57:31 UTC

yes this one is irrelevant, please followup the jira ticket

Note You need to log in before you can comment on or make changes to this bug.