Bug 2096460 - Spoke BMH stuck "inspecting" when deployed via the converged workflow
Summary: Spoke BMH stuck "inspecting" when deployed via the converged workflow
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Infrastructure Operator
Version: rhacm-2.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: rhacm-2.6
Assignee: Eran Cohen
QA Contact: Chad Crum
Derek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-13 22:05 UTC by tali@redhat.com
Modified: 2022-09-06 22:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-06 22:31:08 UTC
Target Upstream Version:
Embargoed:
cbynum: rhacm-2.6+
cbynum: rhacm-2.6.z+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 3938 0 None Merged BZ-2096460: Spoke BMH stuck inspecting when deployed via the converged workflow 2022-06-30 05:40:39 UTC
Github stolostron backlog issues 23453 0 None None None 2022-06-14 18:43:08 UTC
Red Hat Issue Tracker MGMTBUGSM-434 0 None None None 2022-06-14 12:34:08 UTC

Description tali@redhat.com 2022-06-13 22:05:51 UTC
Description of problem:
Unable to deploy a spoke cluster via the converged workflow on an OCP 4.11 hub with the upstream assisted-service. The BMHs are stuck in “inspecting” state and eventually Introspection timeout.

oc get bmh -A
NAMESPACE               NAME                                 STATE                    CONSUMER                       ONLINE   ERROR              AGE
cnfde11                 cnfde11.ptp.lab.eng.bos.redhat.com   inspecting                                              true     inspection error   130m
cnfde14                 cnfde14.ptp.lab.eng.bos.redhat.com   inspecting                                              true     inspection error   4h1m
openshift-machine-api   cnfdt08-master-0                     externally provisioned   cnfdt08-2qtb2-master-0         true                        45h
openshift-machine-api   cnfdt08-master-1                     externally provisioned   cnfdt08-2qtb2-master-1         true                        45h
openshift-machine-api   cnfdt08-master-2                     externally provisioned   cnfdt08-2qtb2-master-2         true                        45h
openshift-machine-api   cnfdt08-worker-0                     provisioned              cnfdt08-2qtb2-worker-0-f65kw   true                        45h
openshift-machine-api   cnfdt08-worker-1                     provisioned              cnfdt08-2qtb2-worker-0-krpw6   true                        45h

The ironic-agent.service on the spoke terminated without any logs.
journalctl | grep agent 
Jun 13 18:01:54 localhost ignition[1677]: INFO     : files: createFilesystemsFiles: createFiles: op(4): [started]  writing file "/sysroot/etc/ironic-python-agent.conf"
Jun 13 18:01:54 localhost ignition[1677]: INFO     : files: createFilesystemsFiles: createFiles: op(4): [finished] writing file "/sysroot/etc/ironic-python-agent.conf"
Jun 13 18:01:54 localhost ignition[1677]: INFO     : files: createFilesystemsFiles: createFiles: op(a): [started]  writing file "/sysroot/var/usrlocal/bin/agent-fix-bz1964591"
Jun 13 18:01:54 localhost ignition[1677]: INFO     : files: createFilesystemsFiles: createFiles: op(a): [finished] writing file "/sysroot/var/usrlocal/bin/agent-fix-bz1964591"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(b): [started]  processing unit "agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(b): op(c): [started]  writing unit "agent.service" at "/sysroot/etc/systemd/system/agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(b): op(c): [finished] writing unit "agent.service" at "/sysroot/etc/systemd/system/agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(b): [finished] processing unit "agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(f): [started]  processing unit "ironic-agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(f): op(10): [started]  writing unit "ironic-agent.service" at "/sysroot/etc/systemd/system/ironic-agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(f): op(10): [finished] writing unit "ironic-agent.service" at "/sysroot/etc/systemd/system/ironic-agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(f): [finished] processing unit "ironic-agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(11): [started]  setting preset to disabled for "agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(11): [finished] setting preset to disabled for "agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(13): [started]  setting preset to enabled for "ironic-agent.service"
Jun 13 18:01:55 localhost ignition[1677]: INFO     : files: op(13): [finished] setting preset to enabled for "ironic-agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: createFilesystemsFiles: createFiles: op(4): [started]  writing file "/sysroot/etc/ironic-python-agent.conf"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: createFilesystemsFiles: createFiles: op(4): [finished] writing file "/sysroot/etc/ironic-python-agent.conf"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: createFilesystemsFiles: createFiles: op(a): [started]  writing file "/sysroot/var/usrlocal/bin/agent-fix-bz1964591"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: createFilesystemsFiles: createFiles: op(a): [finished] writing file "/sysroot/var/usrlocal/bin/agent-fix-bz1964591"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(b): [started]  processing unit "agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(b): op(c): [started]  writing unit "agent.service" at "/sysroot/etc/systemd/system/agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(b): op(c): [finished] writing unit "agent.service" at "/sysroot/etc/systemd/system/agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(b): [finished] processing unit "agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(f): [started]  processing unit "ironic-agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(f): op(10): [started]  writing unit "ironic-agent.service" at "/sysroot/etc/systemd/system/ironic-agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(f): op(10): [finished] writing unit "ironic-agent.service" at "/sysroot/etc/systemd/system/ironic-agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(f): [finished] processing unit "ironic-agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(11): [started]  setting preset to disabled for "agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(11): [finished] setting preset to disabled for "agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(13): [started]  setting preset to enabled for "ironic-agent.service"
Jun 13 18:02:01 localhost ignition[1677]: INFO     : files: op(13): [finished] setting preset to enabled for "ironic-agent.service"
Jun 13 18:03:26 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Succeeded.
Jun 13 19:43:47 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[2871]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 13 19:43:47 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[2871]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Jun 13 19:43:47 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[2871]: Listening on GnuPG cryptographic agent and passphrase cache.
Jun 13 19:43:47 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[2871]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).

journalctl -u ironic-agent.service
-- Logs begin at Mon 2022-06-13 18:01:00 UTC, end at Mon 2022-06-13 21:57:20 UTC. --
Jun 13 18:03:18 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Starting Ironic Agent...
Jun 13 18:03:18 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1a1913094671837099e85eec4392a51bfa5fece675f479ba13e83e63d14013d3...
Jun 13 18:03:19 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Getting image source signatures
Jun 13 18:03:19 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Copying blob sha256:f70d60810c69edad990aaf0977a87c6d2bcc9cd52904fa6825f08507a9b6e7bc
Jun 13 18:03:19 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Copying blob sha256:545277d800059b32cf03377a9301094e9ac8aa4bb42d809766d7355ca9aa8652
Jun 13 18:03:19 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Copying blob sha256:de516cc59493b713e0b33a4954f7eb500383e59642e2897d02e63992d4576720
Jun 13 18:03:19 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Copying blob sha256:7d61a723adecf7221e001b9b245c2e08616f071d92ebaab84c2a13fab58108d3
Jun 13 18:03:21 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Copying config sha256:ff8a671fa9ccb6b31e1b020d43ee88e6a72aef9f64c6b1ce31a1b5d3449d1571
Jun 13 18:03:21 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Writing manifest to image destination
Jun 13 18:03:21 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: Storing signatures
Jun 13 18:03:25 api.cnfde14.ptp.lab.eng.bos.redhat.com podman[2519]: ff8a671fa9ccb6b31e1b020d43ee88e6a72aef9f64c6b1ce31a1b5d3449d1571
Jun 13 18:03:25 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: Started Ironic Agent.
Jun 13 18:03:26 api.cnfde14.ptp.lab.eng.bos.redhat.com systemd[1]: ironic-agent.service: Succeeded.

sudo podman ps --all
CONTAINER ID  IMAGE                                                                                                                   COMMAND     CREATED      STATUS                  PORTS       NAMES
20fa60228d1f  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1a1913094671837099e85eec4392a51bfa5fece675f479ba13e83e63d14013d3  /bin/bash   4 hours ago  Exited (0) 4 hours ago              ironic-agent

sudo podman images  
REPOSITORY                                      TAG         IMAGE ID      CREATED      SIZE
quay.io/openshift-release-dev/ocp-v4.0-art-dev  <none>      ff8a671fa9cc  4 weeks ago  544 MB


Version-Release number of selected component (if applicable):
- Latest upstream assisted-service-operator (with converged flow enabled)
- OCP 4.11 on hub (4.11.0-0.nightly-2022-06-06-201913)
- OCP 4.10 spokes


How reproducible:
100%

Steps to Reproduce:
1. Deploy a 4.11 hub cluster with 3 masters and 2 workers
2. Deploy the assisted service operator bundle with converged flow enabled
3. Try to deploy spoke using ztp crd flow with infrastructure operator

Actual results:
The BMHs stuck in "inspecting" state

Expected results:
The spoke clusters are deployed as expected.

Additional info:

Comment 1 tali@redhat.com 2022-06-29 21:59:13 UTC
I re-tested the ZTP deploy with the converged flow and this issue has been resolved in the latest assisted service operator.

Comment 2 Chad Crum 2022-07-19 17:14:12 UTC
This was validated by QE as well - the normal ztp flow with converged enable works.

Comment 5 errata-xmlrpc 2022-09-06 22:31:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.6.0 security updates and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6370


Note You need to log in before you can comment on or make changes to this bug.