Bug 2113926

Summary: hypershift cluster deployment hang due to nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed
Product: OpenShift Container Platform Reporter: Shelly Miron <smiron>
Component: HyperShiftAssignee: Alberto <agarcial>
Status: CLOSED ERRATA QA Contact: Shelly Miron <smiron>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.11CC: aaleman, cewong, mifiedle, sjenning
Target Milestone: ---Keywords: TestBlocker
Target Release: 4.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-23 15:09:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hypershift deployment hang none

Description Shelly Miron 2022-08-02 10:47:38 UTC
Created attachment 1902833 [details]
hypershift deployment hang

Description of problem:
-----------------------------------------
hypershift cluster deployment hang due to nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed

when looking into the control plan logs, we see the above error:

{"level":"info","ts":"2022-08-02T09:24:30Z","msg":"Reconciling etcd cluster status for managed strategy","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","hostedControlPlane":{"name":"hypercluster","namespace":"hypercluster-hypercluster"},"namespace":"hypercluster-hypercluster","name":"hypercluster","reconcileID":"1ff13b29-1a7e-48fa-8269-c13c63e7a7f3"}
{"level":"info","ts":"2022-08-02T09:24:30Z","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer


and by looking into the hpc spec:

Spec:
  Autoscaling:
  Cluster ID:                      76e4928a-e9a4-41b4-b713-1c2e5a4e17fc
  Controller Availability Policy:  SingleReplica
  Dns:
    Base Domain:  qe.lab.redhat.com
  Etcd:
    Management Type:                   Managed
  Fips:                                false
  Infra ID:                            hypercluster
  Infrastructure Availability Policy:  SingleReplica
  Issuer URL:                          https://kubernetes.default.svc
  Machine CIDR:                        192.168.128.0/24
  Network Type:                        OVNKubernetes
  Olm Catalog Placement:               management
  Platform:
    Agent:
      Agent Namespace:  infraenv-0
    Type:               None
  Pod CIDR:             10.132.0.0/14
  Pull Secret:
    Name:         pull-secret
  Release Image:  quay.io/openshift-release-dev/ocp-release:4.11.0-rc.6-x86_64
  Service CIDR:   172.31.0.0/16
  Services:
    Service:  APIServer
    Service Publishing Strategy:
      Type:   LoadBalancer
    Service:  OAuthServer
    Service Publishing Strategy:
      Type:   Route
    Service:  OIDC
    Service Publishing Strategy:
      Type:   Route
    Service:  Konnectivity
    Service Publishing Strategy:
      Type:   Route
    Service:  Ignition
    Service Publishing Strategy:
      Type:  Route
  Ssh Key:
    Name:  ssh-key


Version-Release number of selected component (if applicable):
-----------------------------------------

$ oc get clusterversion

NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.6   True        False         13h     Cluster version is 4.11.0-rc.6

How reproducible:
-----------------------------------------
100%


Steps to Reproduce:
-----------------------------------------
1. From the OCP UI console, install mce operator and hypershift operator 
2. Create infraenv with 3 hosts
3. Start the hypershift cluster creation

Actual results:
-----------------------------------------

hypershift cluster deployment hang due to nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed



Expected results:
-----------------------------------------

hypershift deployment finished successfully 



Additional info:
-----------------------------------------

Comment 2 Seth Jennings 2022-08-02 13:07:53 UTC
Would be easier to know for sure if the HC yaml was included in the bz, but I think I see where this is happening
https://github.com/openshift/hypershift/blob/964f03f7fc4ba0ed862669529b09235b5c5c30e0/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L277

Will open a PR.

Comment 3 Shelly Miron 2022-08-02 13:09:53 UTC
tnx! there is already a PR on the way: https://github.com/openshift/hypershift/pull/1638

Comment 4 Alberto 2022-08-03 14:18:19 UTC
4.11 backport is here in case we want it for mce https://github.com/openshift/hypershift/pull/1644

Comment 6 Shelly Miron 2022-08-15 08:40:10 UTC
verified with:

MCE version: 2.1.0-DOWNANDBACK-2022-08-11-23-42-45

$ oc version

Client Version: 4.11.0-rc.7
Kustomize Version: v4.5.4
Server Version: 4.11.0-rc.7
Kubernetes Version: v1.24.0+9546431

Comment 9 errata-xmlrpc 2022-08-23 15:09:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.11.1 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6103