Description of the problem: As a part of Altiostar deployment, when deploying the spoke compact cluster (3 masters only) deployment fails (times out). After investigation hive-operator pod is crashing and nodes report unreachability. If agents are manually restarted and hive-operator pod is re-created there is a new attempt to deploy cluster until hive-operator crash again. Release version: Operator snapshot version: 2.4.6 OCP version: 4.8.43 Browser Info: Steps to reproduce: 1. Start spoke compact cluster deployment 2. Wait for agents to register Actual results: Deployment does not proceed with The cluster has hosts that are not ready to install. hive-operator is crashing with following in a log: time="2022-08-10T11:29:30Z" level=info msg="reconcile complete" controller=hive elapsedMillis=620 elapsedMillisGT=0 outcome=unspecified panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x123e5fc] NAME READY STATUS RESTARTS AGE assisted-service-56d7794d8c-kgslh 2/2 Running 14 13h hive-operator-66dd64b6b7-qfkbk 0/1 CrashLoopBackOff 108 13h Expected results: Deployment continues Additional info: With previous ocp 4.8.34 I saw this issue once or twice and re-deployment solved it. With .43 currently I've got really stuck as it's reproduced 3 out of 3 complete redeployments.
Can you please attach cluster deployment and agent cluster install?
This seems to be caused by building with github.com/modern-go/reflect2 < v1.0.2 under go1.18. That's why this just showed up despite no code changes in hive's ocm-2.4 branch in 6mo -- ACM's build recently switched to using go1.18. Hive is upgrading the dependency via https://issues.redhat.com/browse/HIVE-1997 After that, ACM will need to pick up the change and respin its build. This will need to be done for 2.3 as well.
Hive side is all done here.
yes this one is irrelevant, please followup the jira ticket