2036006 – [BIOS setting values] Attempt to set Integer parameter results in preparation error

Bug 2036006 - [BIOS setting values] Attempt to set Integer parameter results in preparation error

Summary: [BIOS setting values] Attempt to set Integer parameter results in preparation...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Bare Metal Hardware Provisioning
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Bob Fournier
QA Contact:	Lubov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-29 08:51 UTC by Lubov
Modified:	2022-03-10 16:37 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:36:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
hfs ans schema (72.01 KB, text/plain) 2021-12-29 08:51 UTC, Lubov	no flags	Details
boot is stuck on iov6 (16.85 KB, image/png) 2022-01-06 12:34 UTC, Lubov	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	metal3-io baremetal-operator pull 1064	None	open	Use integer value in clean steps for HostFirmwareSettings Integer type	2022-01-06 23:32:17 UTC
Github	openshift baremetal-operator pull 199	None	open	Bug 2036006: Use integer value in clean steps for HostFirmwareSettings Integer type	2022-01-17 18:50:02 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-10 16:36:59 UTC

Description Lubov 2021-12-29 08:51:25 UTC

Created attachment 1848178 [details]
hfs ans schema

Description of problem:
On HPE setup trying to set EnabledCoresPerProc to 1. On scale down of machineset the node and machine are deleted, the bmh is stuck in preparation error 
$ oc get bmh
NAME                 STATE                    CONSUMER                        ONLINE   ERROR               AGE
openshift-master-0   externally provisioned   ocp-edge-6mt5w-master-0         true                         19h
openshift-master-1   externally provisioned   ocp-edge-6mt5w-master-1         true                         19h
openshift-master-2   externally provisioned   ocp-edge-6mt5w-master-2         true                         19h
openshift-worker-0   preparing                                                false    preparation error   19h
openshift-worker-1   provisioned              ocp-edge-6mt5w-worker-0-tz669   true                         19h
openshift-worker-2   provisioned              ocp-edge-6mt5w-worker-0-tqhrf   true                         19h

$ oc describe bmh openshift-worker-0
Events:
  Normal                          101m  metal3-baremetal-controller  Node 9356bb77-2ce7-471d-8cd5-37c3446d7cd7 failed step {'args': {'settings': [{'name': 'EnabledCoresPerProc', 'value': '1'}]}, 'interface': 'bios', 'step': 'apply_configuration', 'abortable': False, 'priority': 0}: Redfish exception occurred. Error: Redfish BIOS apply configuration failed for node 9356bb77-2ce7-471d-8cd5-37c3446d7cd7. Error: HTTP PATCH https://10.46.61.19:443/redfish/v1/systems/1/bios/settings/ returned code 400. unknown error Extended information: none


Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2021-12-23-153012

How reproducible:


Steps to Reproduce:
0. $ oc project openshift-machine-api
1. Set parameters as described above in one of workers HFS CRs:
$ oc edit hfs openshift-worker-0
2. Annotate the worker for deletion and scale down the machineset
$ oc annotate machine ocp-edge-5jqsr-worker-0-mk5m4 machine.openshift.io/cluster-api-delete-machine=yes
$ oc scale --replicas=1 machineset ocp-edge-5jqsr-worker-0

Actual results:
BMH stuck in preparation error

Expected results:
The BMH becomes available after preparation step finished successfully 

Additional info:

Comment 1 Lubov 2021-12-29 08:56:33 UTC

must-gather: http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ2036006-must-gather.tar.gz

Comment 2 Bob Fournier 2022-01-03 02:31:50 UTC

It looks like the HPE node doesn't allow the changing of EnabledCoresPerProc to 1. Some HPE documentation https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c03503454 shows that this value can only be changed to a multiple of 4.

If you are mainly testing that an integer value can be changed, I would recommend trying with a different setting to see if that works. We could follow up with HPE as to the proper values that can be set for EnabledCoresPerProc.

Comment 3 Lubov 2022-01-04 06:51:52 UTC

As we discussed, to set to multiple to 4 didn't help. I tried another parameter that was working and getting the same problem:
 Normal                          6m57s  metal3-baremetal-controller  Node b30848d5-b264-486e-899e-65ac29ee5782 failed step {'args': {'settings': [{'name': 'NetworkBootRetryCount', 'value': '5'}]}, 'interface': 'bios', 'step': 'apply_configuration', 'abortable': False, 'priority': 0}: Redfish exception occurred. Error: Redfish BIOS apply configuration failed for node b30848d5-b264-486e-899e-65ac29ee5782. Error: HTTP PATCH https://10.46.61.26:443/redfish/v1/systems/1/bios/settings/ returned code 400. unknown error Extended information: none

Adding must-gather: http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ2036006_2_must-gather.tar.gz

Comment 4 Bob Fournier 2022-01-05 18:39:41 UTC

When setting Integer types manually via Redfish, both HPE and Dell require that an Integer be used since a string conversion isn't done.

On HPE:
curl -v --data '{"Attributes": {"NetworkBootRetryCount": "7"}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.19.109.247/redfish/v1/Systems/1/Bios/settings/

resulted in:
{"error":{"code":"iLO.0.10.ExtendedInfo","message":"See @Message.ExtendedInfo for more information.","@Message.ExtendedInfo":[{"MessageArgs":["\"7\"","NetworkBootRetryCount"],"MessageId":"Base.1.4.PropertyValueTypeError"}]}}HTTP/1.1 400 Bad Request

While:
curl -v --data '{"Attributes": {"NetworkBootRetryCount": 7}}' -H "content-type: application/json" -k --user XX:XX-X PATCH https://10.19.109.247/redfish/v1/Systems/1/Bios/settings/

worked:
HTTP/1.1 200 OK


Same on Dell:
curl -v --data '{"Attributes": {"IscsiDev1Con1Retry": "4"}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.9.103.30/redfish/v1/Systems/System.Embedded.1/Bios/Settings

resulted in:
{"error":{"@Message.ExtendedInfo":[{"Message":"The value integer for the property IscsiDev1Con1Retry is of a different type than the property can accept.","MessageArgs":["integer","IscsiDev1Con1Retry"],"MessageArgs":2,"MessageId":"Base.1.2.PropertyValueTypeError",

While:
curl -v --data '{"Attributes": {"IscsiDev1Con1Retry": 4}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.9.103.30/redfish/v1/Systems/System.Embedded.1/Bios/Settings

Worked

Comment 5 Lubov 2022-01-06 12:34:03 UTC

As we've seen setting the parameter to integer causes the hot to stuck on boot in preparing or provisioning state. The deployment is ipv4, but from some reason boot tries to use ipv6 (see attached picture)

Comment 6 Lubov 2022-01-06 12:34:58 UTC

Created attachment 1849252 [details]
boot is stuck on iov6

Comment 11 Bob Fournier 2022-01-26 23:54:50 UTC

Lubov - were you able to get it so that 5 is being set, instead of "5"?

Comment 15 errata-xmlrpc 2022-03-10 16:36:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.