Created attachment 1848178 [details] hfs ans schema Description of problem: On HPE setup trying to set EnabledCoresPerProc to 1. On scale down of machineset the node and machine are deleted, the bmh is stuck in preparation error $ oc get bmh NAME STATE CONSUMER ONLINE ERROR AGE openshift-master-0 externally provisioned ocp-edge-6mt5w-master-0 true 19h openshift-master-1 externally provisioned ocp-edge-6mt5w-master-1 true 19h openshift-master-2 externally provisioned ocp-edge-6mt5w-master-2 true 19h openshift-worker-0 preparing false preparation error 19h openshift-worker-1 provisioned ocp-edge-6mt5w-worker-0-tz669 true 19h openshift-worker-2 provisioned ocp-edge-6mt5w-worker-0-tqhrf true 19h $ oc describe bmh openshift-worker-0 Events: Normal 101m metal3-baremetal-controller Node 9356bb77-2ce7-471d-8cd5-37c3446d7cd7 failed step {'args': {'settings': [{'name': 'EnabledCoresPerProc', 'value': '1'}]}, 'interface': 'bios', 'step': 'apply_configuration', 'abortable': False, 'priority': 0}: Redfish exception occurred. Error: Redfish BIOS apply configuration failed for node 9356bb77-2ce7-471d-8cd5-37c3446d7cd7. Error: HTTP PATCH https://10.46.61.19:443/redfish/v1/systems/1/bios/settings/ returned code 400. unknown error Extended information: none Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-12-23-153012 How reproducible: Steps to Reproduce: 0. $ oc project openshift-machine-api 1. Set parameters as described above in one of workers HFS CRs: $ oc edit hfs openshift-worker-0 2. Annotate the worker for deletion and scale down the machineset $ oc annotate machine ocp-edge-5jqsr-worker-0-mk5m4 machine.openshift.io/cluster-api-delete-machine=yes $ oc scale --replicas=1 machineset ocp-edge-5jqsr-worker-0 Actual results: BMH stuck in preparation error Expected results: The BMH becomes available after preparation step finished successfully Additional info:
must-gather: http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ2036006-must-gather.tar.gz
It looks like the HPE node doesn't allow the changing of EnabledCoresPerProc to 1. Some HPE documentation https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c03503454 shows that this value can only be changed to a multiple of 4. If you are mainly testing that an integer value can be changed, I would recommend trying with a different setting to see if that works. We could follow up with HPE as to the proper values that can be set for EnabledCoresPerProc.
As we discussed, to set to multiple to 4 didn't help. I tried another parameter that was working and getting the same problem: Normal 6m57s metal3-baremetal-controller Node b30848d5-b264-486e-899e-65ac29ee5782 failed step {'args': {'settings': [{'name': 'NetworkBootRetryCount', 'value': '5'}]}, 'interface': 'bios', 'step': 'apply_configuration', 'abortable': False, 'priority': 0}: Redfish exception occurred. Error: Redfish BIOS apply configuration failed for node b30848d5-b264-486e-899e-65ac29ee5782. Error: HTTP PATCH https://10.46.61.26:443/redfish/v1/systems/1/bios/settings/ returned code 400. unknown error Extended information: none Adding must-gather: http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ2036006_2_must-gather.tar.gz
When setting Integer types manually via Redfish, both HPE and Dell require that an Integer be used since a string conversion isn't done. On HPE: curl -v --data '{"Attributes": {"NetworkBootRetryCount": "7"}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.19.109.247/redfish/v1/Systems/1/Bios/settings/ resulted in: {"error":{"code":"iLO.0.10.ExtendedInfo","message":"See @Message.ExtendedInfo for more information.","@Message.ExtendedInfo":[{"MessageArgs":["\"7\"","NetworkBootRetryCount"],"MessageId":"Base.1.4.PropertyValueTypeError"}]}}HTTP/1.1 400 Bad Request While: curl -v --data '{"Attributes": {"NetworkBootRetryCount": 7}}' -H "content-type: application/json" -k --user XX:XX-X PATCH https://10.19.109.247/redfish/v1/Systems/1/Bios/settings/ worked: HTTP/1.1 200 OK Same on Dell: curl -v --data '{"Attributes": {"IscsiDev1Con1Retry": "4"}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.9.103.30/redfish/v1/Systems/System.Embedded.1/Bios/Settings resulted in: {"error":{"@Message.ExtendedInfo":[{"Message":"The value integer for the property IscsiDev1Con1Retry is of a different type than the property can accept.","MessageArgs":["integer","IscsiDev1Con1Retry"],"MessageArgs":2,"MessageId":"Base.1.2.PropertyValueTypeError", While: curl -v --data '{"Attributes": {"IscsiDev1Con1Retry": 4}}' -H "content-type: application/json" -k --user XX:XX -X PATCH https://10.9.103.30/redfish/v1/Systems/System.Embedded.1/Bios/Settings Worked
As we've seen setting the parameter to integer causes the hot to stuck on boot in preparing or provisioning state. The deployment is ipv4, but from some reason boot tries to use ipv6 (see attached picture)
Created attachment 1849252 [details] boot is stuck on iov6
Lubov - were you able to get it so that 5 is being set, instead of "5"?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056