Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2051492

Summary: Unable to start inspection due to virtual media session existence
Product: OpenShift Container Platform Reporter: Michael Gourin <mgourin>
Component: Bare Metal Hardware ProvisioningAssignee: Jacob Anders <janders>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Amit Ugol <augol>
Status: CLOSED CANTFIX Docs Contact:
Severity: medium    
Priority: medium CC: bfournie, janders, mgourin, ykashtan, yliu1
Version: 4.10Keywords: Triaged
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-06 05:18:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Gourin 2022-02-07 11:20:26 UTC
Description of problem:
During IPI 4.10 baremetal deployment, the baremetal worker is not isn't properly provisioned due to an error stating that the virtual media having existing sessions.
A BMC reset seemingly solves the issue.


Version-Release number of selected component (if applicable):
4.10

How reproducible:
4.10 IPI baremetal deployment using dell servers.

Steps to Reproduce:
1. Run 4.10 IPI baremetal deployment
2.
3.

Actual results:
Baremetal worker fails to provision and outputs the following error:
 Normal  InspectionError    14m   metal3-baremetal-controller  Failed to inspect hardware. Reason: unable to start inspection: HTTP POST https://10.19.28.43/redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia/CD/Actions/VirtualMedia.InsertMedia returned code 500. Base.1.7.GeneralError: A general error has occurred. See ExtendedInfo for more information Extended information: [{'Message': 'Virtual Media is detached or Virtual Media devices are already in use.', 'MessageArgs': [], 'MessageArgs': 0, 'MessageId': 'IDRAC.2.4.VRM0021', 'RelatedProperties': [], 'RelatedProperties': 0, 'Resolution': 'Change the Virtual Media Attach Mode to Attach or Auto-Attach.Stop existing Virtual Media sessions and retry the operation.', 'Severity': 'Informational'}]


Expected results:
Proper provisioning of the baremetal worker.

Additional info:
Similar BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1861025

Comment 1 Jacob Anders 2022-02-07 11:47:20 UTC
I've performed troubleshooting on this machine. The above error would occur despite no active jobs listed in the lifecycle controller. The server had a vMedia image configured, but "Inserted" attribute was set to false. While it was in this state, it wasn't possible to configure another vMedia image, or to successfully eject the already-configured vMedia image. The only action that I found that would fix the issue was resetting iDRAC and attempting to configure vMedia again.

Comment 3 Jacob Anders 2022-02-07 12:02:24 UTC
Note we've addressed a similar (but not identical) issue in this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1988879

The above BZ added the ability to run Lifecycle Controller reset as well as iDRAC reset on node enrollment.

We only enabled LC reset by default. Depending on the conclusions here we may consider enabling iDRAC reset as well.

I will try to discuss this with our colleagues from Dell first.

Comment 4 Jacob Anders 2022-02-07 12:05:59 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1861025 displayed similar symptoms, although it may have been a different problem. Posting here for completeness/tracking.

Comment 6 Jacob Anders 2022-03-01 10:46:15 UTC
This is a confirmed iDRAC firmware issue which will be resolved in a future release. Firmware development timeframe means this may take several months.

Comment 7 Jacob Anders 2022-03-11 05:09:31 UTC
I'll leave this open for a discussion on whether we can do anything on our side to reduce the impact of this issue. If not will close with CANTFIX as it is caused by a firmware bug in iDRAC.

Comment 8 Jacob Anders 2022-04-06 05:18:25 UTC
Given the fact this is a firmware issue acknowledged by Dell and the fact that the team is unlikely to have cycles to discuss potential additional workarounds in the interim, closing this as CANTFIX.