Bug 1944246 - Ironic fails to inspect and move node to "manageable' but get bmh remains in "inspecting"
Summary: Ironic fails to inspect and move node to "manageable' but get bmh remains in ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Zane Bitter
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-29 15:21 UTC by Raviv Bar-Tal
Modified: 2021-07-27 22:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:56:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
conductor.log (17.16 MB, text/plain)
2021-03-29 15:21 UTC, Raviv Bar-Tal
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github metal3-io baremetal-operator pull 840 0 None closed Don't immediately retry on Inspect fail 2021-05-11 15:48:26 UTC
Github openshift baremetal-operator pull 142 0 None closed Merge upstream 2021-04-06 2021-05-11 15:47:47 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:56:50 UTC

Description Raviv Bar-Tal 2021-03-29 15:21:38 UTC
Created attachment 1767392 [details]
conductor.log

Description of problem:
In this case I tried to provision node using redfish-virtualmedian,
but the node failed to mount the boot iso.
Ironic condactor move the node from inspecting to manage (see log for full error message).
"oved to provision state "inspect failed" from state "inspecting"; target provision state is "manageable":"

But 'oc get bmh' command show the nodes status as "inspecting"
this status does not change.

NAME                         STATE                    CONSUMER                         ONLINE   ERROR
openshift-dworker-titan112   inspecting                                                true     

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-03-15-105049

How reproducible:


Steps to Reproduce:
1. try to add node with redfish-virtualmedia that can node mount the iso
2. (in our lab try to add seal or titan)
3.

Actual results:


Expected results:
BMH  chane status to manage or error and show error message 

Additional info:
Ironic conductor log is attahced

Comment 1 Zane Bitter 2021-03-30 16:14:43 UTC
The provisioning state is as expected, but we should be reporting an error message and are not.

I noticed a bug here: https://github.com/metal3-io/baremetal-operator/pull/817#issuecomment-809824437 that means we are not reporting errors correctly in introspection. I need to check the logs, but there's a good chance this could be the cause.

Comment 2 Zane Bitter 2021-03-30 18:29:09 UTC
Without the baremetal-operator log it's impossible to say what's going on. However, the issue mentioned in my previous comment would have caused the baremetal-operator to begin another attempt at inspection immediately after detecting failure. While the error would be reported, it would also be cleared again once the retry was detected to be in progress. That's not inconsistent with what we see in the conductor log, so the linked fix should cause the reported error to remain around for longer as the retries get further apart.

According to the logs, the error is:

unable to start inspection: HTTP POST https://10.35.77.3/redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia/CD/Actions/VirtualMedia.InsertMedia returned code 400.

If that's unexpected then let's open a separate bug to address it.

Comment 3 Zane Bitter 2021-05-11 15:47:51 UTC
Reporting fix is available in 4.8 branch.

Comment 7 errata-xmlrpc 2021-07-27 22:56:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.