Bug 2022426 - [OCP 4.9] UEFISecureBoot fails intermitently - lifecycle controller job queue
Summary: [OCP 4.9] UEFISecureBoot fails intermitently - lifecycle controller job queue
Keywords:
Status: CLOSED DUPLICATE of bug 1988879
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Jacob Anders
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-11 15:40 UTC by davis phillips
Modified: 2021-11-30 00:48 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-30 00:48:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description davis phillips 2021-11-11 15:40:07 UTC
Description of problem:
If secureBoot is already configured in the BIOS it seems the bmh gets into a loop with submitting the job or there are any other lifecycle jobs pending.

  Normal  Registered              11m    metal3-baremetal-controller  Registered new host
  Normal  InspectionSkipped       11m    metal3-baremetal-controller  disabled by annotation
  Normal  BMCAccessValidated      11m    metal3-baremetal-controller  Verified access to BMC
  Normal  ProfileSet              11m    metal3-baremetal-controller  Hardware profile set: unknown
  Normal  PowerOn                 11m    metal3-baremetal-controller  Host powered on
  Normal  Registered              9m53s  metal3-baremetal-controller  Registered new host
  Normal  ProvisioningStarted     9m32s  metal3-baremetal-controller  Image provisioning started for https://assisted-image-service-open-cluster-management.apps.clus3a.t5g.lab.eng.bos.redhat.com/images/e38d94fd-1e57-4da8-bd6b-0903e856d8a3?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJlMzhkOTRmZC0xZTU3LTRkYTgtYmQ2Yi0wOTAzZTg1NmQ4YTMifQ.Y43XSDaUahJdRTmImzSwjKSyJIHy-ryswIcPVcSkGx1sx8qkT5SbjZrq_Zl12DgsGtcDnhPGn-40WWwfD6Rf-g&arch=x86_64&type=minimal-iso&version=4.9
  Normal  ProvisioningError       9m11s  metal3-baremetal-controller  Image provisioning failed: Failed to deploy. Exception: HTTP POST https://10.19.28.55/redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia/CD/Actions/VirtualMedia.InsertMedia returned code 400. Base.1.7.GeneralError: A general error has occurred. See ExtendedInfo for more information Extended information: [{'Message': 'Unable to mount remote share http://10.19.32.197:6180/redfish/boot-dfa94054-d7a0-4b92-858e-2b76e77ffeff.iso?filename=tmp_uxe1rp4.iso.', 'MessageArgs': ['http://10.19.32.197:6180/redfish/boot-dfa94054-d7a0-4b92-858e-2b76e77ffeff.iso?filename=tmp_uxe1rp4.iso'], 'MessageArgs': 1, 'MessageId': 'IDRAC.2.4.RAC0720', 'RelatedProperties': ['#/Image'], 'RelatedProperties': 1, 'Resolution': 'Retry the operation.', 'Severity': 'Informational'}]
  Normal  DeprovisioningStarted   7m52s  metal3-baremetal-controller  Image deprovisioning started
  Normal  DeprovisioningComplete  7m42s  metal3-baremetal-controller  Image deprovisioning completed
  Normal  ProvisioningStarted     7m41s  metal3-baremetal-controller  Image provisioning started for https://assisted-image-service-open-cluster-management.apps.clus3a.t5g.lab.eng.bos.redhat.com/images/e38d94fd-1e57-4da8-bd6b-0903e856d8a3?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJlMzhkOTRmZC0xZTU3LTRkYTgtYmQ2Yi0wOTAzZTg1NmQ4YTMifQ.Y43XSDaUahJdRTmImzSwjKSyJIHy-ryswIcPVcSkGx1sx8qkT5SbjZrq_Zl12DgsGtcDnhPGn-40WWwfD6Rf-g&arch=x86_64&type=minimal-iso&version=4.9
  Normal  ProvisioningError       7m31s  metal3-baremetal-controller  Image provisioning failed: Deploy step deploy.deploy failed: Redfish exception occurred. Error: Failed to set secure boot state on node dfa94054-d7a0-4b92-858e-2b76e77ffeff to True: HTTP PATCH https://10.19.28.55/redfish/v1/Systems/System.Embedded.1/SecureBoot returned code 400. Base.1.7.GeneralError: Pending configuration values are already committed, unable to perform another set operation. Extended information: [{'Message': 'Pending configuration values are already committed, unable to perform another set operation.', 'MessageArgs': ['SecureBootEnable'], 'MessageArgs': 1, 'MessageId': 'IDRAC.2.4.SYS011', 'RelatedProperties': ['#/SecureBootEnable'], 'RelatedProperties': 1, 'Resolution': 'Wait for the scheduled job to complete or delete the configuration jobs before attempting more set attribute operations.', 'Severity': 'Warning'}].
  Normal  DeprovisioningStarted   6m4s   metal3-baremetal-controller  Image deprovisioning started
  Normal  DeprovisioningComplete  5m54s  metal3-baremetal-controller  Image deprovisioning completed
  Normal  ProvisioningStarted     5m53s  metal3-baremetal-controller  Image provisioning started for https://assisted-image-service-open-cluster-management.apps.clus3a.t5g.lab.eng.bos.redhat.com/images/e38d94fd-1e57-4da8-bd6b-0903e856d8a3?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJlMzhkOTRmZC0xZTU3LTRkYTgtYmQ2Yi0wOTAzZTg1NmQ4YTMifQ.Y43XSDaUahJdRTmImzSwjKSyJIHy-ryswIcPVcSkGx1sx8qkT5SbjZrq_Zl12DgsGtcDnhPGn-40WWwfD6Rf-g&arch=x86_64&type=minimal-iso&version=4.9
  Normal  ProvisioningError       5m43s  metal3-baremetal-controller  Image provisioning failed: Deploy step deploy.deploy failed: Redfish exception occurred. Error: Failed to set secure boot state on node dfa94054-d7a0-4b92-858e-2b76e77ffeff to True: HTTP PATCH https://10.19.28.55/redfish/v1/Systems/System.Embedded.1/SecureBoot returned code 400. Base.1.7.GeneralError: Pending configuration values are already committed, unable to perform another set operation. Extended information: [{'Message': 'Pending configuration values are already committed, unable to perform another set operation.', 'MessageArgs': ['SecureBootEnable'], 'MessageArgs': 1, 'MessageId': 'IDRAC.2.4.SYS011', 'RelatedProperties': ['#/SecureBootEnable'], 'RelatedProperties': 1, 'Resolution': 'Wait for the scheduled job to complete or delete the configuration jobs before attempting more set attribute operations.', 'Severity': 'Warning'}].
  Normal  DeprovisioningStarted   3m50s  metal3-baremetal-controller  Image deprovisioning started
  Normal  DeprovisioningComplete  3m40s  metal3-baremetal-controller  Image deprovisioning completed
  Normal  ProvisioningStarted     3m39s  metal3-baremetal-controller  Image provisioning started for https://assisted-image-service-open-cluster-management.apps.clus3a.t5g.lab.eng.bos.redhat.com/images/e38d94fd-1e57-4da8-bd6b-0903e856d8a3?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJlMzhkOTRmZC0xZTU3LTRkYTgtYmQ2Yi0wOTAzZTg1NmQ4YTMifQ.Y43XSDaUahJdRTmImzSwjKSyJIHy-ryswIcPVcSkGx1sx8qkT5SbjZrq_Zl12DgsGtcDnhPGn-40WWwfD6Rf-g&arch=x86_64&type=minimal-iso&version=4.9
  Normal  ProvisioningError       3m29s  metal3-baremetal-controller  Image provisioning failed: Deploy step deploy.deploy failed: Redfish exception occurred. Error: Failed to set secure boot state on node dfa94054-d7a0-4b92-858e-2b76e77ffeff to True: HTTP PATCH https://10.19.28.55/redfish/v1/Systems/System.Embedded.1/SecureBoot returned code 400. Base.1.7.GeneralError: Pending configuration values are already committed, unable to perform another set operation. Extended information: [{'Message': 'Pending configuration values are already committed, unable to perform another set operation.', 'MessageArgs': ['SecureBootEnable'], 'MessageArgs': 1, 'MessageId': 'IDRAC.2.4.SYS011', 'RelatedProperties': ['#/SecureBootEnable'], 'RelatedProperties': 1, 'Resolution': 'Wait for the scheduled job to complete or delete the configuration jobs before attempting more set attribute operations.', 'Severity': 'Warning'}].
  Normal  Registered              112s   metal3-baremetal-controller  Registered new host
  Normal  ProvisioningStarted     91s    metal3-baremetal-controller  Image provisioning started for https://assisted-image-service-open-cluster-management.apps.clus3a.t5g.lab.eng.bos.redhat.com/images/e38d94fd-1e57-4da8-bd6b-0903e856d8a3?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJlMzhkOTRmZC0xZTU3LTRkYTgtYmQ2Yi0wOTAzZTg1NmQ4YTMifQ.Y43XSDaUahJdRTmImzSwjKSyJIHy-ryswIcPVcSkGx1sx8qkT5SbjZrq_Zl12DgsGtcDnhPGn-40WWwfD6Rf-g&arch=x86_64&type=minimal-iso&version=4.9
  Normal  ProvisioningError       81s    metal3-baremetal-controller  Image provisioning failed: Failed to deploy. Exception: HTTP POST https://10.19.28.55/redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia/CD/Actions/VirtualMedia.InsertMedia returned code 400. Base.1.7.GeneralError: A general error has occurred. See ExtendedInfo for more information Extended information: [{'Message': 'Unable to mount remote share http://10.19.32.196:6180/redfish/boot-42afd08e-43d1-4605-a80e-b2c8e5f72c7f.iso?filename=tmpw39ywyf7.iso.', 'MessageArgs': ['http://10.19.32.196:6180/redfish/boot-42afd08e-43d1-4605-a80e-b2c8e5f72c7f.iso?filename=tmpw39ywyf7.iso'], 'MessageArgs': 1, 'MessageId': 'IDRAC.2.4.RAC0720', 'RelatedProperties': ['#/Image'], 'RelatedProperties': 1, 'Resolution': 'Retry the operation.', 'Severity': 'Informational'}]


Version-Release number of selected component (if applicable):
Dell MX740c blade
OpenShift 4.9.5 Hub cluster

How reproducible:

Steps to Reproduce:
1. Enable secure boot in the BIOS manually. 
2. Create the CR to deploy the node with secure boot enabled.

spec:
  online: true
  automatedCleaningMode: disabled
  bootMACAddress: xx:xx:xx:xx:xx:xx
  bootMode: UEFISecureBoot
  bmc:
    address: idrac-virtualmedia+https://10.19.x.y/redfish/v1/Systems/System.Embedded.1
    credentialsName: sno-worker-2-worker
    disableCertificateVerification: true

3. Machine fails to provision.

2021-11-11 15:13:23.132 1 DEBUG sushy.resources.base [req-6a13ffb3-51d3-415d-b996-5adc9d32a3d0 ironic-user - - - -] Received representation of SecureBoot /redfish/v1/Systems/System.Embedded.1/SecureBoot: {'_actions': {'reset_keys': {'allowed_values': ['ResetAllKeysToDefault', 'DeleteAllKeys', 'DeletePK', 'ResetPK', 'ResetKEK', 'ResetDB', 'ResetDBX'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/System.Embedded.1/SecureBoot/Actions/SecureBoot.ResetKeys'}}, '_oem_vendors': ['Dell'], 'current_boot': 'Disabled', 'description': 'UEFI Secure Boot', 'enabled': False, 'identity': 'SecureBoot', 'links': None, 'mode': 'DeployedMode', 'name': 'UEFI Secure Boot'} refresh /usr/lib/python3.6/site-packages/sushy/resources/base.py:636
2021-11-11 15:13:23.150 1 INFO ironic.conductor.utils [req-6a13ffb3-51d3-415d-b996-5adc9d32a3d0 ironic-user - - - -] Updated boot_mode uefi, secure_boot Falsefor node f12f4617-7423-45cf-8d79-087c7cfad18f
2021-11-11 15:13:23.150 1 DEBUG ironic.common.states [req-6a13ffb3-51d3-415d-b996-5adc9d32a3d0 ironic-user - - - -] Exiting old state 'verifying' in response to event 'done' on_exit /usr/lib/python3.6/site-packages/ironic/common/states.py:307
2021-11-11 15:13:23.151 1 DEBUG ironic.common.states [req-6a13ffb3-51d3-415d-b996-5adc9d32a3d0 ironic-user - - - -] Entering new state 'manageable' in response to event 'done' on_enter /usr/lib/python3.6/site-packages/ironic/common/states.py:313
/usr/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
2021-11-11 15:13:23.167 1 INFO ironic.conductor.task_manager [req-6a13ffb3-51d3-415d-b996-5adc9d32a3d0 ironic-user - - - -] Node f12f4617-7423-45cf-8d79-087c7cfad18f moved to provision state "manageable" from state "verifying"; target provision state is "None"
2021-11-11 15:13:23.171 1 DEBUG ironic.conductor.task_manager [req-6a13ffb3-51d3-415d-b996-5adc9d32a3d0 ironic-user - - - -] Successfully released exclusive lock for caching boot_mode or secure_boot state on node f12f4617-7423-45cf-8d79-087c7cfad18f (lock was held 11.45 sec) release_resources /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:447
2021-11-11 15:13:23.274 1 DEBUG sushy.connector [req-cbde89cd-cb2e-4cba-baa6-49b5239d0617 ironic-user - - - -] HTTP response for GET https://10.19.28.52/redfish/v1/Systems/System.Embedded.1/SecureBoot: status code: 200 _op /usr/lib/python3.6/site-packages/sushy/connector.py:185
2021-11-11 15:13:23.275 1 DEBUG sushy.resources.base [req-cbde89cd-cb2e-4cba-baa6-49b5239d0617 ironic-user - - - -] Received representation of SecureBoot /redfish/v1/Systems/System.Embedded.1/SecureBoot: {'_actions': {'reset_keys': {'allowed_values': ['ResetAllKeysToDefault', 'DeleteAllKeys', 'DeletePK', 'ResetPK', 'ResetKEK', 'ResetDB', 'ResetDBX'], 'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Systems/System.Embedded.1/SecureBoot/Actions/SecureBoot.ResetKeys'}}, '_oem_vendors': ['Dell'], 'current_boot': 'Disabled', 'description': 'UEFI Secure Boot', 'enabled': False, 'identity': 'SecureBoot', 'links': None, 'mode': 'DeployedMode', 'name': 'UEFI Secure Boot'} refresh /usr/lib/python3.6/site-packages/sushy/resources/base.py:636
2021-11-11 15:13:23.287 1 INFO ironic.conductor.utils [req-cbde89cd-cb2e-4cba-baa6-49b5239d0617 ironic-user - - - -] Updated boot_mode uefi, secure_boot Falsefor node 05b54b51-4d32-4dd2-a0af-7afd1feb5217
2021-11-11 15:13:23.287 1 DEBUG ironic.common.states [req-cbde89cd-cb2e-4cba-baa6-49b5239d0617 ironic-user - - - -] Exiting old state 'verifying' in response to event 'done' on_exit /usr/lib/python3.6/site-packages/ironic/common/states.py:307
2021-11-11 15:13:23.288 1 DEBUG ironic.common.states [req-cbde89cd-cb2e-4cba-baa6-49b5239d0617 ironic-user - - - -] Entering new state 'manageable' in response to event 'done' on_enter /usr/lib/python3.6/site-packages/ironic/common/states.py:313
2021-11-11 15:13:23.299 1 INFO ironic.conductor.task_manager [req-cbde89cd-cb2e-4cba-baa6-49b5239d0617 ironic-user - - - -] Node 05b54b51-4d32-4dd2-a0af-7afd1feb5217 moved to provision state "manageable" from state "verifying"; target provision state is "None"
2021-11-11 15:13:23.302 1 DEBUG sushy.connector [req-9e3ff094-a501-4fb6-a3e4-3b2e8a93b5b7 ironic-user - - - -] HTTP response for GET https://10.19.28.50/redfish/v1/Systems/System.Embedded.1: status code: 200 _op /usr/lib/python3.6/site-packages/sushy/connector.py:185


Expected results:
Provisioned host with SecureBoot enabled.

Workaround:
1. Remove bmh CR
2. Clear lifecycle controller job queue.
3. Add bmh CR again

The BMO could clear the job queue or power on the server to apply any pending jobs.

Comment 1 Jacob Anders 2021-11-12 04:43:51 UTC
Thank you for the report Davis.

We recently fixed a very similar problem in https://bugzilla.redhat.com/show_bug.cgi?id=1988879.

Would you be able to test a 4.10 build and see if the issue is gone, or still persists?

Thank you,
Jacob

Comment 2 Jacob Anders 2021-11-15 23:36:34 UTC
I realised I didn't set needinfo while asking the question, adding it now.

Comment 3 davis phillips 2021-11-16 16:24:51 UTC
Hey Jacob! Thanks for the quick follow up. This fix appears to have resolved the issue for me.

Comment 4 Jacob Anders 2021-11-18 23:00:12 UTC
That's great, thank you Davis.

Would you be happy for me to close this bug as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1988879?

Backporting this fix to 4.9 may be difficult, as it relies on a new Ironic feature (verify steps) so I'd be hesitant to do it unless there is a strong need. If that is the case please let me know and we'll raise it with the Leadership Team.

Comment 5 Jacob Anders 2021-11-28 22:58:54 UTC
Hi Davis,

Do you still require assistance with this?

Thank you,
Jacob

Comment 6 davis phillips 2021-11-29 14:41:21 UTC
Apologies for the delay in response. The 4.10 release should suffice. Please close this issue.

Comment 7 Jacob Anders 2021-11-30 00:48:35 UTC
Thank you Davis. As per your response, closing this BZ as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1988879.

*** This bug has been marked as a duplicate of bug 1988879 ***


Note You need to log in before you can comment on or make changes to this bug.