Bug 1893546

Summary: Deploy using virtual media fails on node cleaning step
Product: OpenShift Container Platform Reporter: Lubov <lshilin>
Component: InstallerAssignee: Derek Higgins <derekh>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Lubov <lshilin>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: augol, derekh, kiran, kquinn, pablo.iranzo, rpittau, tsedovic, yjoseph
Version: 4.7Keywords: Regression, Triaged, UpcomingSprint
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously when virtual-media was used, fast-track mode would not work as expected as nodes were rebooted between operations. This issue has been fixed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:29:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1831748, 1862608, 1884672    
Attachments:
Description Flags
install-config.yaml
none
ironic-conductor.log from bootstrap none

Description Lubov 2020-11-01 18:13:08 UTC
Created attachment 1725575 [details]
install-config.yaml

Version:
$ ./openshift-baremetal-install version
./openshift-baremetal-install 4.7.0-0.nightly-2020-10-27-051128
built from commit de62b01d451857d942ac41dcfdad9385856fde9e
release image registry.svc.ci.openshift.org/ocp/release@sha256:2945b52c8bf58f18a2770071acb8cece1fad38c105f28c98dbfecfe675e2f758

Platform:
IPI baremetal

What happened?
Deploy using redfish-virtualmedia fails on bootstrap
In ironic-conductor log reported errors:
ERROR ironic.conductor.task_manager [req-1e55a547-689e-451e-bd13-1c85698bd311 - - - - -] Node 96c9c3cf-db58-4692-b202-85d74b2a26be moved to provision state "clean failed" from state "clean wait"; target provision state is "available"
...
 ERROR ironic.conductor.utils [-] Timeout reached while cleaning the node. Please check if the ramdisk responsible for the cleaning is running on the node. Failed on step {}.


What did you expect to happen?
Deploy finishes successfully as it was done in OCP 4.6

How to reproduce it (as minimally and precisely as possible)?
Deploy OCP 4.7 using redfish-virtualmedia (see attached install-config.yaml)

Comment 1 Lubov 2020-11-01 18:14:37 UTC
Created attachment 1725576 [details]
ironic-conductor.log from bootstrap

Comment 2 Dmitry Tantsur 2020-11-04 13:06:42 UTC
https://review.opendev.org/#/c/760586/ is a likely fix

Comment 3 Derek Higgins 2020-11-04 14:35:26 UTC
(In reply to Dmitry Tantsur from comment #2)
> https://review.opendev.org/#/c/760586/ is a likely fix

Yup, tested, looks like it fixes the problem,

Comment 10 Lubov 2020-11-25 19:43:50 UTC
Verified on
4.7.0-0.nightly-2020-11-25-055236

Comment 13 errata-xmlrpc 2021-02-24 15:29:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633