Bug 1924804

Summary: Deployment on iDrac with latest firmware 04.40.00.00 fails on one of masters nodes
Product: OpenShift Container Platform Reporter: Lubov <lshilin>
Component: Bare Metal Hardware ProvisioningAssignee: Tomas Sedovic <tsedovic>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Amit Ugol <augol>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: unspecified    
Version: 4.7   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-03 19:15:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ironic-conductor.log for the second attempt
none
ironic-conductor.log for the third attempt none

Description Lubov 2021-02-03 16:32:52 UTC
Created attachment 1754815 [details]
ironic-conductor.log for the second attempt

Description of problem:
After upgrading all Dell machines to FW 04.40 on the first attempt to deploy the cluster using idrac-virtualmedia all 3 masters were deployed 

On the second attempt the deploy failed on one of masters. Error reported for the node in ironic-conductor log:
ERROR ironic.conductor.utils [-] Cleaning for node a249ad4b-ba33-44c3-815f-f1ff46a66926 failed. Timeout reached while cleaning the node. Please check if the ramdisk responsible for the cleaning is running on the node. Failed on step {} 

The host failing to boot. After bootstrap VM was powered off, the host booted

On the third attempt deploy failed on the same master but with different error in conductor log:
ERROR ironic.conductor.manager [req-d9b7c700-e6f3-4c1a-ac4b-9d8107efefef - - - - -] During sync_power_state, max retries exceeded for node 62970a50-706b-4713-a62b-4ebd93cdb960, node state power off does not match expected state 'power on'. Updating DB state to 'power off' Switching node to maintenance mode


Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-02-02-223803


Steps to Reproduce:
Run OCP deployment usinf idrac-virtualmedia on Dell machine with the latest FW (04.40) 

Actual results:
Deployment fails

Expected results:
Deployment passed

Additional info:
attaching ironic-conductor logs for the second and third attempts

Comment 1 Lubov 2021-02-03 16:36:24 UTC
Created attachment 1754816 [details]
ironic-conductor.log for the third attempt

Comment 2 Lubov 2021-02-03 19:15:57 UTC
It looks there is a problem with the host

Closing the bz. If after the host is fix the reported problem is back, will re-open