Bug 2006101

Summary: Power off fails for drivers that don't support Soft power off
Product: OpenShift Container Platform Reporter: Zane Bitter <zbitter>
Component: Bare Metal Hardware ProvisioningAssignee: Zane Bitter <zbitter>
Bare Metal Hardware Provisioning sub component: baremetal-operator QA Contact: Fujitsu container team <fj-lsoft-rh-cnt>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: lshilin, mmizuma
Version: 4.9Keywords: Triaged
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2009850 (view as bug list) Environment:
Last Closed: 2022-03-10 16:12:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2009850    

Description Zane Bitter 2021-09-20 21:04:42 UTC
When an ironic driver doesn't support soft power off, we are supposed to fall back to doing a hard power off. This was inadvertently broken in OpenShift 4.9. Now when the driver doesn't support soft power off we end up returning a 'transient' error and retrying in an infinite loop.

The Fujitsu driver is known to not support soft power off when its agent is not available on the host.

Comment 2 Lubov 2021-10-04 10:44:15 UTC
is it Fujitsu only specific?

Comment 3 Zane Bitter 2021-10-04 13:37:40 UTC
(In reply to Lubov from comment #2)
> is it Fujitsu only specific?

Possibly not, but that is the only example I know of for sure (Fujitsu requires an agent running on the host to do soft power off, so if it is not present it fails in this way).

Comment 4 Lubov 2021-10-05 05:47:58 UTC
Assigning to @fj-lsoft-ofuku.fujitsu.com to clear our backlog

Comment 5 Fujitsu container team 2021-10-06 07:21:34 UTC
Hi Zane, Lubov,

In case of Fujitsu server(iRMC driver), we have two power interfaces.
- ipmitool: we can do soft power off
- irmc: we can do soft power off if ServerView agent is installed in the system.
(https://docs.openstack.org/ironic/latest/admin/drivers/irmc.html#supported-platforms)

For OpenShift, ipmitool is hard-corded in Metal3 and OpenShift. 
- https://github.com/metal3-io/baremetal-operator/blob/master/pkg/bmc/irmc.go#L86
- https://github.com/openshift/baremetal-operator/blob/master/pkg/bmc/irmc.go#L86

So we always support soft power off in OpenShift.
In conclusion, I don't think Fujitsu server is affected by this problem.
But I will verify this modification (https://github.com/metal3-io/baremetal-operator/pull/985) against Fujitsu server just in case.

Best Regards,
Yasuhiro Futakawa

Comment 7 Fujitsu container team 2021-10-07 10:36:56 UTC
Hi Zane, Lubov,

Fujitsu verified the latest nightly build which includes the following patches, and confirmed soft power off worked correctly.
https://github.com/openshift/baremetal-operator/pull/180

Best Regards,
Yasuhiro Futakawa

Comment 8 Lubov 2021-10-07 13:38:18 UTC
(In reply to Fujitsu container team from comment #7)
> Hi Zane, Lubov,
> 
> Fujitsu verified the latest nightly build which includes the following
> patches, and confirmed soft power off worked correctly.
> https://github.com/openshift/baremetal-operator/pull/180
> 
> Best Regards,
> Yasuhiro Futakawa

Many thanks, closing as verified!

Comment 11 errata-xmlrpc 2022-03-10 16:12:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056