Bug 2036006

Summary: [BIOS setting values] Attempt to set Integer parameter results in preparation error
Product: OpenShift Container Platform Reporter: Lubov <lshilin>
Component: Bare Metal Hardware ProvisioningAssignee: Bob Fournier <bfournie>
Bare Metal Hardware Provisioning sub component: baremetal-operator QA Contact: Lubov <lshilin>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: eglottma
Version: 4.10Keywords: Triaged
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:36:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hfs ans schema
none
boot is stuck on iov6 none

Description Lubov 2021-12-29 08:51:25 UTC
Created attachment 1848178 [details]
hfs ans schema

Description of problem:
On HPE setup trying to set EnabledCoresPerProc to 1. On scale down of machineset the node and machine are deleted, the bmh is stuck in preparation error 
$ oc get bmh
NAME                 STATE                    CONSUMER                        ONLINE   ERROR               AGE
openshift-master-0   externally provisioned   ocp-edge-6mt5w-master-0         true                         19h
openshift-master-1   externally provisioned   ocp-edge-6mt5w-master-1         true                         19h
openshift-master-2   externally provisioned   ocp-edge-6mt5w-master-2         true                         19h
openshift-worker-0   preparing                                                false    preparation error   19h
openshift-worker-1   provisioned              ocp-edge-6mt5w-worker-0-tz669   true                         19h
openshift-worker-2   provisioned              ocp-edge-6mt5w-worker-0-tqhrf   true                         19h

$ oc describe bmh openshift-worker-0
Events:
  Normal                          101m  metal3-baremetal-controller  Node 9356bb77-2ce7-471d-8cd5-37c3446d7cd7 failed step {'args': {'settings': [{'name': 'EnabledCoresPerProc', 'value': '1'}]}, 'interface': 'bios', 'step': 'apply_configuration', 'abortable': False, 'priority': 0}: Redfish exception occurred. Error: Redfish BIOS apply configuration failed for node 9356bb77-2ce7-471d-8cd5-37c3446d7cd7. Error: HTTP PATCH https://10.46.61.19:443/redfish/v1/systems/1/bios/settings/ returned code 400. unknown error Extended information: none


Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2021-12-23-153012

How reproducible:


Steps to Reproduce:
0. $ oc project openshift-machine-api
1. Set parameters as described above in one of workers HFS CRs:
$ oc edit hfs openshift-worker-0
2. Annotate the worker for deletion and scale down the machineset
$ oc annotate machine ocp-edge-5jqsr-worker-0-mk5m4 machine.openshift.io/cluster-api-delete-machine=yes
$ oc scale --replicas=1 machineset ocp-edge-5jqsr-worker-0

Actual results:
BMH stuck in preparation error

Expected results:
The BMH becomes available after preparation step finished successfully 

Additional info:

Comment 2 Bob Fournier 2022-01-03 02:31:50 UTC
It looks like the HPE node doesn't allow the changing of EnabledCoresPerProc to 1. Some HPE documentation https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c03503454 shows that this value can only be changed to a multiple of 4.

If you are mainly testing that an integer value can be changed, I would recommend trying with a different setting to see if that works. We could follow up with HPE as to the proper values that can be set for EnabledCoresPerProc.

Comment 3 Lubov 2022-01-04 06:51:52 UTC
As we discussed, to set to multiple to 4 didn't help. I tried another parameter that was working and getting the same problem:
 Normal                          6m57s  metal3-baremetal-controller  Node b30848d5-b264-486e-899e-65ac29ee5782 failed step {'args': {'settings': [{'name': 'NetworkBootRetryCount', 'value': '5'}]}, 'interface': 'bios', 'step': 'apply_configuration', 'abortable': False, 'priority': 0}: Redfish exception occurred. Error: Redfish BIOS apply configuration failed for node b30848d5-b264-486e-899e-65ac29ee5782. Error: HTTP PATCH https://10.46.61.26:443/redfish/v1/systems/1/bios/settings/ returned code 400. unknown error Extended information: none

Adding must-gather: http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ2036006_2_must-gather.tar.gz

Comment 4 Bob Fournier 2022-01-05 18:39:41 UTC
When setting Integer types manually via Redfish, both HPE and Dell require that an Integer be used since a string conversion isn't done.

On HPE:
curl -v --data '{"Attributes": {"NetworkBootRetryCount": "7"}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.19.109.247/redfish/v1/Systems/1/Bios/settings/

resulted in:
{"error":{"code":"iLO.0.10.ExtendedInfo","message":"See @Message.ExtendedInfo for more information.","@Message.ExtendedInfo":[{"MessageArgs":["\"7\"","NetworkBootRetryCount"],"MessageId":"Base.1.4.PropertyValueTypeError"}]}}HTTP/1.1 400 Bad Request

While:
curl -v --data '{"Attributes": {"NetworkBootRetryCount": 7}}' -H "content-type: application/json" -k --user XX:XX-X PATCH https://10.19.109.247/redfish/v1/Systems/1/Bios/settings/

worked:
HTTP/1.1 200 OK


Same on Dell:
curl -v --data '{"Attributes": {"IscsiDev1Con1Retry": "4"}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.9.103.30/redfish/v1/Systems/System.Embedded.1/Bios/Settings

resulted in:
{"error":{"@Message.ExtendedInfo":[{"Message":"The value integer for the property IscsiDev1Con1Retry is of a different type than the property can accept.","MessageArgs":["integer","IscsiDev1Con1Retry"],"MessageArgs":2,"MessageId":"Base.1.2.PropertyValueTypeError",

While:
curl -v --data '{"Attributes": {"IscsiDev1Con1Retry": 4}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.9.103.30/redfish/v1/Systems/System.Embedded.1/Bios/Settings

Worked

Comment 5 Lubov 2022-01-06 12:34:03 UTC
As we've seen setting the parameter to integer causes the hot to stuck on boot in preparing or provisioning state. The deployment is ipv4, but from some reason boot tries to use ipv6 (see attached picture)

Comment 6 Lubov 2022-01-06 12:34:58 UTC
Created attachment 1849252 [details]
boot is stuck on iov6

Comment 11 Bob Fournier 2022-01-26 23:54:50 UTC
Lubov - were you able to get it so that 5 is being set, instead of "5"?

Comment 15 errata-xmlrpc 2022-03-10 16:36:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056