Bug 2093503 - Assisted service POD keeps crashing after a bare metal host is created
Summary: Assisted service POD keeps crashing after a bare metal host is created
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Infrastructure Operator
Version: rhacm-2.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: rhacm-2.6
Assignee: Eran Cohen
QA Contact: Chad Crum
Derek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-03 22:28 UTC by tali@redhat.com
Modified: 2022-09-06 22:31 UTC (History)
5 users (show)

Fixed In Version: AI 2.5.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-06 22:30:54 UTC
Target Upstream Version:
Embargoed:
cbynum: rhacm-2.6+
cbynum: rhacm-2.6.z+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 3897 0 None open Bug 2093503: Assisted service POD keeps crashing after a bare metal host is created 2022-06-06 13:01:09 UTC
Github stolostron backlog issues 22976 0 None None None 2022-06-04 02:12:13 UTC
Red Hat Issue Tracker MGMTBUGSM-413 0 None None None 2022-06-06 08:31:37 UTC
Red Hat Product Errata RHSA-2022:6370 0 None None None 2022-09-06 22:31:11 UTC

Description tali@redhat.com 2022-06-03 22:28:11 UTC
Description of the problem:
Unable to deploy a spoke cluster on an OCP 4.11 hub - spoke BMHs stuck in inspecting state and assisted-service pod in CrashLoopBackOff

oc get pods -n assisted-installer
NAME                                       READY   STATUS             RESTARTS        AGE
agentinstalladmission-66475fdfc8-ml7xf     1/1     Running            0               27h
agentinstalladmission-66475fdfc8-r4p44     1/1     Running            0               27h
assisted-image-service-0                   1/1     Running            0               27h
assisted-service-7dd9cbffc9-xbdbx          1/2     CrashLoopBackOff   86 (2m7s ago)   7h47m
infrastructure-operator-844bdf9474-r8sn6   1/1     Running            0               27h

time="2022-06-03T14:23:19Z" level=info msg="PreprovisioningImage Reconcile ended" func="github.com/openshift/assisted-service/internal/controller/controllers.(*PreprovisioningImageReconciler).Reconcile.func1" file="/go/src/github.com/openshift/origin/internal/controller/controllers/preprovisioningimage_controller.go:78" go-id=416 preprovisioning_image=cnfde14.ptp.lab.eng.bos.redhat.com preprovisioning_image_namespace=cnfde14 request_id=67bc3092-3df7-4893-929f-982de264365b
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x27338b8]

goroutine 416 [running]:
github.com/openshift/assisted-service/internal/controller/controllers.(*PreprovisioningImageReconciler).getIronicAgentImage(_, {_, _}, {{{0xc00225e3f0, 0xe}, {0xc0018a8330, 0x24}, {0xc00225e3ea, 0x6}, 0xc001664258, ...}, ...})
	/go/src/github.com/openshift/origin/internal/controller/controllers/preprovisioningimage_controller.go:322 +0x38
github.com/openshift/assisted-service/internal/controller/controllers.(*PreprovisioningImageReconciler).AddIronicAgentToInfraEnv(0xc001fcf3b0, {0x36c4730, 0xc0010860f0}, {0x37583f0, 0xc000822460}, 0xc001896900)
	/go/src/github.com/openshift/origin/internal/controller/controllers/preprovisioningimage_controller.go:290 +0x130
github.com/openshift/assisted-service/internal/controller/controllers.(*PreprovisioningImageReconciler).Reconcile(0xc001fcf3b0, {0x36c4730, 0xc0010860c0}, {{{0xc001328756, 0x2ee31e0}, {0xc001908990, 0x30}}})
	/go/src/github.com/openshift/origin/internal/controller/controllers/preprovisioningimage_controller.go:102 +0xb8b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc00198a9a0, {0x36c4730, 0xc001086030}, {{{0xc001328756, 0x2ee31e0}, {0xc001908990, 0x415694}}})
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114 +0x26f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00198a9a0, {0x36c4688, 0xc000c7b540}, {0x2cc2be0, 0xc0018b0120})
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311 +0x33e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00198a9a0, {0x36c4688, 0xc000c7b540})
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/src/github.com/openshift/origin/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x357

Release version:
- Latest upstream assisted-service-operator
- OCP 4.11 on hub (4.11.0-0.nightly-2022-05-25-193227)
- 4.10 spoke


Steps to reproduce:
1. Deploy OCP 4.11 hub with upstream assisted-service-operator
2. Try to deploy spoke using manually created CRs


Actual results:
- Assisted-service pod crashed after creating a BMH
- BMH stuck "inspecting"


Expected results:
The spoke is deployed as expected

Additional info:
The OpenshiftVersion has not be reconciled to InfraEnv:
oc describe InfraEnv -n cnfde14 cnfde14
Name:         cnfde14
Namespace:    cnfde14
Labels:       <none>
Annotations:  argocd.argoproj.io/sync-wave: 1
              ran.openshift.io/ztp-gitops-generated: {}
API Version:  agent-install.openshift.io/v1beta1
Kind:         InfraEnv
Metadata:
  Creation Timestamp:  2022-06-03T14:19:26Z
  Finalizers:
    infraenv.agent-install.openshift.io/ai-deprovision
  Generation:  1
  Managed Fields:
    API Version:  agent-install.openshift.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"infraenv.agent-install.openshift.io/ai-deprovision":
    Manager:      assisted-service
    Operation:    Update
    Time:         2022-06-03T14:19:26Z
    API Version:  agent-install.openshift.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:agentLabelSelector:
          .:
          f:matchLabels:
            .:
            f:infraenvs.agent-install.openshift.io:
        f:bootArtifacts:
          .:
          f:initrd:
          f:ipxeScript:
          f:kernel:
          f:rootfs:
        f:conditions:
        f:createdTime:
        f:debugInfo:
          .:
          f:eventsURL:
        f:isoDownloadURL:
    Manager:      assisted-service
    Operation:    Update
    Subresource:  status
    Time:         2022-06-03T14:19:26Z
    API Version:  agent-install.openshift.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:argocd.argoproj.io/sync-wave:
          f:kubectl.kubernetes.io/last-applied-configuration:
          f:ran.openshift.io/ztp-gitops-generated:
      f:spec:
        .:
        f:additionalNTPSources:
        f:clusterRef:
          .:
          f:name:
          f:namespace:
        f:cpuArchitecture:
        f:nmStateConfigLabelSelector:
          .:
          f:matchLabels:
            .:
            f:nmstate-label:
        f:pullSecretRef:
          .:
          f:name:
        f:sshAuthorizedKey:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2022-06-03T14:19:26Z
  Resource Version:  1098847
  UID:               805103d1-9e73-4cc4-9cdc-8191774c81f2
Spec:
  Additional NTP Sources:
    2.pool.ntp.org
  Cluster Ref:
    Name:            cnfde14
    Namespace:       cnfde14
  Cpu Architecture:  x86_64
  Nm State Config Label Selector:
    Match Labels:
      Nmstate - Label:  cnfde14
  Pull Secret Ref:
    Name:              pull-secret
  Ssh Authorized Key:  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDWWipbX819wFCM75f1J1Gr0IDZm1A5rPxFemibbGVboUwA2La/Msf9Oa9hLSnq1PEpT2VB+KysjCggA4semrXR85VctExW8mBPDgG6uLCnmQ9f/Vtkg0GEfR4mkuCxBVxPV8OVTeU4Lv2kAEcBGJEwStQei0/1u24Y4b9njCe0kYY1rJvn1XNAab1avDMAr1AVrV4jx+ChrZtZsoN/CQxWfeFFYEgyzg00wmq5QKanOcraKNZAg93sRK48QiW21EHCO0iJXiArZonhwSNUoiprqreK1666xRDCCzM9mu8/gQl7XV9za713KkNMeKdGKRR9nGl3BhZshwd4+dPNQboUPGwZXwU9IoecFVSjfBZkhYupdcQXHfWQ58ZqX9i90b4Xq+INqVk4rOIyplf3FWMb/xXnl+mNX3f7T7SPETdVVhlHKCDslmffGC4PDuZF8UIlKahnt5T2WWE4NRzCHsNQyxqVJi4at7/9U+nYa4oeha/QB4s+07N/q+LD1PVLOws= tali@Taos-MacBook-Pro
Status:
  Agent Label Selector:
    Match Labels:
      infraenvs.agent-install.openshift.io:  cnfde14
  Boot Artifacts:
    Initrd:       https://assisted-image-service-assisted-installer.apps.ocp-edge-cluster-assisted-0.qe.lab.redhat.com/images/de05147a-b1c8-488d-9fd2-9b37dbbfea63/pxe-initrd?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJkZTA1MTQ3YS1iMWM4LTQ4OGQtOWZkMi05YjM3ZGJiZmVhNjMifQ.Gyg8-sXdNPbzhZ62C1wmXHrcgZgNdEvkXhGcC_yEYhKeHC13fpNtcUe3D_EhZtcGytnrtlbgYXThdGHWjrAVCA&arch=x86_64&version=4.10
    Ipxe Script:  https://assisted-service-assisted-installer.apps.ocp-edge-cluster-assisted-0.qe.lab.redhat.com/api/assisted-install/v2/infra-envs/de05147a-b1c8-488d-9fd2-9b37dbbfea63/downloads/files?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJkZTA1MTQ3YS1iMWM4LTQ4OGQtOWZkMi05YjM3ZGJiZmVhNjMifQ.Oux_zuzwk66b9BzBEOYHJVE71tRXeB3y6C4s5BylkGg4aO9q6g44rbE9pip4xDgOnuEc9IMj0VhIyWKKsM1GkQ&file_name=ipxe-script
    Kernel:       https://assisted-image-service-assisted-installer.apps.ocp-edge-cluster-assisted-0.qe.lab.redhat.com/boot-artifacts/kernel?arch=x86_64&version=4.10
    Rootfs:       https://assisted-image-service-assisted-installer.apps.ocp-edge-cluster-assisted-0.qe.lab.redhat.com/boot-artifacts/rootfs?arch=x86_64&version=4.10
  Conditions:
    Last Transition Time:  2022-06-03T14:19:26Z
    Message:               Image has been created
    Reason:                ImageCreated
    Status:                True
    Type:                  ImageCreated
  Created Time:            2022-06-03T14:19:26Z
  Debug Info:
    Events URL:      https://assisted-service-assisted-installer.apps.ocp-edge-cluster-assisted-0.qe.lab.redhat.com/api/assisted-install/v2/events?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJkZTA1MTQ3YS1iMWM4LTQ4OGQtOWZkMi05YjM3ZGJiZmVhNjMifQ.fvqqDvw4jMK0htX3JlraPYTdbj0954avjt-Zb5vjKINHBaDa806myO7LvFqW6QBJ2STuBKyorRJx34ELQiDxZw&infra_env_id=de05147a-b1c8-488d-9fd2-9b37dbbfea63
  Iso Download URL:  https://assisted-image-service-assisted-installer.apps.ocp-edge-cluster-assisted-0.qe.lab.redhat.com/images/de05147a-b1c8-488d-9fd2-9b37dbbfea63?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJkZTA1MTQ3YS1iMWM4LTQ4OGQtOWZkMi05YjM3ZGJiZmVhNjMifQ.cIPxpwBT70gJWKXMc-mbiU2kKnny6TmftpMhff7fXW2nafEpoldrt9IZQTflvWNgfhYWoMksmUvnp1EgaoaQqw&arch=x86_64&type=minimal-iso&version=4.10
Events:              <none>

Comment 1 tali@redhat.com 2022-06-08 21:57:56 UTC
I tested the latest upstream assisted-service-operator and this issue has been fixed.

Comment 2 Chad Crum 2022-07-19 17:15:22 UTC
QE also no longer seeing this with recent builds.

Comment 5 errata-xmlrpc 2022-09-06 22:30:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.6.0 security updates and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6370


Note You need to log in before you can comment on or make changes to this bug.