Bug 2006101 - Power off fails for drivers that don't support Soft power off
Summary: Power off fails for drivers that don't support Soft power off
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Zane Bitter
QA Contact: Fujitsu container team
URL:
Whiteboard:
Depends On:
Blocks: 2009850
TreeView+ depends on / blocked
 
Reported: 2021-09-20 21:04 UTC by Zane Bitter
Modified: 2022-03-10 16:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2009850 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:12:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github metal3-io baremetal-operator issues 984 0 None open Power off fails for drivers that don't support Soft power off 2021-09-20 21:04:41 UTC
Github openshift baremetal-operator pull 180 0 None open Merge upstream 2021-10-01 2021-10-01 18:46:51 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:12:37 UTC

Description Zane Bitter 2021-09-20 21:04:42 UTC
When an ironic driver doesn't support soft power off, we are supposed to fall back to doing a hard power off. This was inadvertently broken in OpenShift 4.9. Now when the driver doesn't support soft power off we end up returning a 'transient' error and retrying in an infinite loop.

The Fujitsu driver is known to not support soft power off when its agent is not available on the host.

Comment 2 Lubov 2021-10-04 10:44:15 UTC
is it Fujitsu only specific?

Comment 3 Zane Bitter 2021-10-04 13:37:40 UTC
(In reply to Lubov from comment #2)
> is it Fujitsu only specific?

Possibly not, but that is the only example I know of for sure (Fujitsu requires an agent running on the host to do soft power off, so if it is not present it fails in this way).

Comment 4 Lubov 2021-10-05 05:47:58 UTC
Assigning to @fj-lsoft-ofuku.fujitsu.com to clear our backlog

Comment 5 Fujitsu container team 2021-10-06 07:21:34 UTC
Hi Zane, Lubov,

In case of Fujitsu server(iRMC driver), we have two power interfaces.
- ipmitool: we can do soft power off
- irmc: we can do soft power off if ServerView agent is installed in the system.
(https://docs.openstack.org/ironic/latest/admin/drivers/irmc.html#supported-platforms)

For OpenShift, ipmitool is hard-corded in Metal3 and OpenShift. 
- https://github.com/metal3-io/baremetal-operator/blob/master/pkg/bmc/irmc.go#L86
- https://github.com/openshift/baremetal-operator/blob/master/pkg/bmc/irmc.go#L86

So we always support soft power off in OpenShift.
In conclusion, I don't think Fujitsu server is affected by this problem.
But I will verify this modification (https://github.com/metal3-io/baremetal-operator/pull/985) against Fujitsu server just in case.

Best Regards,
Yasuhiro Futakawa

Comment 7 Fujitsu container team 2021-10-07 10:36:56 UTC
Hi Zane, Lubov,

Fujitsu verified the latest nightly build which includes the following patches, and confirmed soft power off worked correctly.
https://github.com/openshift/baremetal-operator/pull/180

Best Regards,
Yasuhiro Futakawa

Comment 8 Lubov 2021-10-07 13:38:18 UTC
(In reply to Fujitsu container team from comment #7)
> Hi Zane, Lubov,
> 
> Fujitsu verified the latest nightly build which includes the following
> patches, and confirmed soft power off worked correctly.
> https://github.com/openshift/baremetal-operator/pull/180
> 
> Best Regards,
> Yasuhiro Futakawa

Many thanks, closing as verified!

Comment 11 errata-xmlrpc 2022-03-10 16:12:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.