Created attachment 1902833 [details] hypershift deployment hang Description of problem: ----------------------------------------- hypershift cluster deployment hang due to nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed when looking into the control plan logs, we see the above error: {"level":"info","ts":"2022-08-02T09:24:30Z","msg":"Reconciling etcd cluster status for managed strategy","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","hostedControlPlane":{"name":"hypercluster","namespace":"hypercluster-hypercluster"},"namespace":"hypercluster-hypercluster","name":"hypercluster","reconcileID":"1ff13b29-1a7e-48fa-8269-c13c63e7a7f3"} {"level":"info","ts":"2022-08-02T09:24:30Z","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer and by looking into the hpc spec: Spec: Autoscaling: Cluster ID: 76e4928a-e9a4-41b4-b713-1c2e5a4e17fc Controller Availability Policy: SingleReplica Dns: Base Domain: qe.lab.redhat.com Etcd: Management Type: Managed Fips: false Infra ID: hypercluster Infrastructure Availability Policy: SingleReplica Issuer URL: https://kubernetes.default.svc Machine CIDR: 192.168.128.0/24 Network Type: OVNKubernetes Olm Catalog Placement: management Platform: Agent: Agent Namespace: infraenv-0 Type: None Pod CIDR: 10.132.0.0/14 Pull Secret: Name: pull-secret Release Image: quay.io/openshift-release-dev/ocp-release:4.11.0-rc.6-x86_64 Service CIDR: 172.31.0.0/16 Services: Service: APIServer Service Publishing Strategy: Type: LoadBalancer Service: OAuthServer Service Publishing Strategy: Type: Route Service: OIDC Service Publishing Strategy: Type: Route Service: Konnectivity Service Publishing Strategy: Type: Route Service: Ignition Service Publishing Strategy: Type: Route Ssh Key: Name: ssh-key Version-Release number of selected component (if applicable): ----------------------------------------- $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-rc.6 True False 13h Cluster version is 4.11.0-rc.6 How reproducible: ----------------------------------------- 100% Steps to Reproduce: ----------------------------------------- 1. From the OCP UI console, install mce operator and hypershift operator 2. Create infraenv with 3 hosts 3. Start the hypershift cluster creation Actual results: ----------------------------------------- hypershift cluster deployment hang due to nil pointer dereference for hostedControlPlane.Spec.Etcd.Managed Expected results: ----------------------------------------- hypershift deployment finished successfully Additional info: -----------------------------------------
Would be easier to know for sure if the HC yaml was included in the bz, but I think I see where this is happening https://github.com/openshift/hypershift/blob/964f03f7fc4ba0ed862669529b09235b5c5c30e0/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L277 Will open a PR.
tnx! there is already a PR on the way: https://github.com/openshift/hypershift/pull/1638
4.11 backport is here in case we want it for mce https://github.com/openshift/hypershift/pull/1644
verified with: MCE version: 2.1.0-DOWNANDBACK-2022-08-11-23-42-45 $ oc version Client Version: 4.11.0-rc.7 Kustomize Version: v4.5.4 Server Version: 4.11.0-rc.7 Kubernetes Version: v1.24.0+9546431
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.11.1 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6103