Bug 2113926 - hypershift cluster deployment hang due to nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed
Summary: hypershift cluster deployment hang due to nil pointer dereference for hostedC...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: HyperShift
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.11.z
Assignee: Alberto
QA Contact: Shelly Miron
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-02 10:47 UTC by Shelly Miron
Modified: 2022-08-23 15:10 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-23 15:09:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
hypershift deployment hang (89.28 KB, image/png)
2022-08-02 10:47 UTC, Shelly Miron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift hypershift pull 1638 0 None open Fix nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed 2022-08-02 13:15:01 UTC
Red Hat Product Errata RHSA-2022:6103 0 None None None 2022-08-23 15:10:24 UTC

Description Shelly Miron 2022-08-02 10:47:38 UTC
Created attachment 1902833 [details]
hypershift deployment hang

Description of problem:
-----------------------------------------
hypershift cluster deployment hang due to nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed

when looking into the control plan logs, we see the above error:

{"level":"info","ts":"2022-08-02T09:24:30Z","msg":"Reconciling etcd cluster status for managed strategy","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","hostedControlPlane":{"name":"hypercluster","namespace":"hypercluster-hypercluster"},"namespace":"hypercluster-hypercluster","name":"hypercluster","reconcileID":"1ff13b29-1a7e-48fa-8269-c13c63e7a7f3"}
{"level":"info","ts":"2022-08-02T09:24:30Z","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer


and by looking into the hpc spec:

Spec:
  Autoscaling:
  Cluster ID:                      76e4928a-e9a4-41b4-b713-1c2e5a4e17fc
  Controller Availability Policy:  SingleReplica
  Dns:
    Base Domain:  qe.lab.redhat.com
  Etcd:
    Management Type:                   Managed
  Fips:                                false
  Infra ID:                            hypercluster
  Infrastructure Availability Policy:  SingleReplica
  Issuer URL:                          https://kubernetes.default.svc
  Machine CIDR:                        192.168.128.0/24
  Network Type:                        OVNKubernetes
  Olm Catalog Placement:               management
  Platform:
    Agent:
      Agent Namespace:  infraenv-0
    Type:               None
  Pod CIDR:             10.132.0.0/14
  Pull Secret:
    Name:         pull-secret
  Release Image:  quay.io/openshift-release-dev/ocp-release:4.11.0-rc.6-x86_64
  Service CIDR:   172.31.0.0/16
  Services:
    Service:  APIServer
    Service Publishing Strategy:
      Type:   LoadBalancer
    Service:  OAuthServer
    Service Publishing Strategy:
      Type:   Route
    Service:  OIDC
    Service Publishing Strategy:
      Type:   Route
    Service:  Konnectivity
    Service Publishing Strategy:
      Type:   Route
    Service:  Ignition
    Service Publishing Strategy:
      Type:  Route
  Ssh Key:
    Name:  ssh-key


Version-Release number of selected component (if applicable):
-----------------------------------------

$ oc get clusterversion

NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.6   True        False         13h     Cluster version is 4.11.0-rc.6

How reproducible:
-----------------------------------------
100%


Steps to Reproduce:
-----------------------------------------
1. From the OCP UI console, install mce operator and hypershift operator 
2. Create infraenv with 3 hosts
3. Start the hypershift cluster creation

Actual results:
-----------------------------------------

hypershift cluster deployment hang due to nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed



Expected results:
-----------------------------------------

hypershift deployment finished successfully 



Additional info:
-----------------------------------------

Comment 2 Seth Jennings 2022-08-02 13:07:53 UTC
Would be easier to know for sure if the HC yaml was included in the bz, but I think I see where this is happening
https://github.com/openshift/hypershift/blob/964f03f7fc4ba0ed862669529b09235b5c5c30e0/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L277

Will open a PR.

Comment 3 Shelly Miron 2022-08-02 13:09:53 UTC
tnx! there is already a PR on the way: https://github.com/openshift/hypershift/pull/1638

Comment 4 Alberto 2022-08-03 14:18:19 UTC
4.11 backport is here in case we want it for mce https://github.com/openshift/hypershift/pull/1644

Comment 6 Shelly Miron 2022-08-15 08:40:10 UTC
verified with:

MCE version: 2.1.0-DOWNANDBACK-2022-08-11-23-42-45

$ oc version

Client Version: 4.11.0-rc.7
Kustomize Version: v4.5.4
Server Version: 4.11.0-rc.7
Kubernetes Version: v1.24.0+9546431

Comment 9 errata-xmlrpc 2022-08-23 15:09:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.11.1 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6103


Note You need to log in before you can comment on or make changes to this bug.