Bug 1944246
Summary: | Ironic fails to inspect and move node to "manageable' but get bmh remains in "inspecting" | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Raviv Bar-Tal <rbartal> | ||||
Component: | Bare Metal Hardware Provisioning | Assignee: | Zane Bitter <zbitter> | ||||
Bare Metal Hardware Provisioning sub component: | baremetal-operator | QA Contact: | Raviv Bar-Tal <rbartal> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | aos-bugs, kiran, lshilin | ||||
Version: | 4.8 | Keywords: | Triaged | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-07-27 22:56:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
The provisioning state is as expected, but we should be reporting an error message and are not. I noticed a bug here: https://github.com/metal3-io/baremetal-operator/pull/817#issuecomment-809824437 that means we are not reporting errors correctly in introspection. I need to check the logs, but there's a good chance this could be the cause. Without the baremetal-operator log it's impossible to say what's going on. However, the issue mentioned in my previous comment would have caused the baremetal-operator to begin another attempt at inspection immediately after detecting failure. While the error would be reported, it would also be cleared again once the retry was detected to be in progress. That's not inconsistent with what we see in the conductor log, so the linked fix should cause the reported error to remain around for longer as the retries get further apart. According to the logs, the error is: unable to start inspection: HTTP POST https://10.35.77.3/redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia/CD/Actions/VirtualMedia.InsertMedia returned code 400. If that's unexpected then let's open a separate bug to address it. Reporting fix is available in 4.8 branch. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Created attachment 1767392 [details] conductor.log Description of problem: In this case I tried to provision node using redfish-virtualmedian, but the node failed to mount the boot iso. Ironic condactor move the node from inspecting to manage (see log for full error message). "oved to provision state "inspect failed" from state "inspecting"; target provision state is "manageable":" But 'oc get bmh' command show the nodes status as "inspecting" this status does not change. NAME STATE CONSUMER ONLINE ERROR openshift-dworker-titan112 inspecting true Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-03-15-105049 How reproducible: Steps to Reproduce: 1. try to add node with redfish-virtualmedia that can node mount the iso 2. (in our lab try to add seal or titan) 3. Actual results: Expected results: BMH chane status to manage or error and show error message Additional info: Ironic conductor log is attahced