Description of problem: During ZTP deployment, the bmh image provisioning failed with following error. Normal ProvisioningStarted 65s metal3-baremetal-controller Image provisioning started for https://assisted-image-service-multicluster-engine.apps.kni-qe-17.lab.eng.rdu2.redhat.com/images/685cefb6-52b7-4aaa-96fa-a458f30991e2?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI2ODVjZWZiNi01MmI3LTRhYWEtOTZmYS1hNDU4ZjMwOTkxZTIifQ.-vZQpCAmItXqrhWvjlWpT0EmgjYY4Gr7pRreMkBFCyElEplsQRlIQeVAaIarEcM7p0J1-m1Qf2cX2sRlOpAOmg&arch=x86_64&type=minimal-iso&version=4.9 Normal ProvisioningError 45s metal3-baremetal-controller Image provisioning failed: Deploy step deploy.deploy failed with HTTPError: HTTP PATCH https://10.16.230.34/redfish/v1/Managers/1/VirtualMedia/EXT1 returned code 412. Base.1.8.GeneralError: A general error has occurred. See ExtendedInfo for more information. Extended information: [{'MessageSeverity': 'Critical', 'MessageArgs': [], 'MessageId': 'Base.1.8.PreconditionFailed', 'Resolution': 'Try the operation again using the appropriate ETag.', '@odata.type': '#Message.v1_1_0.Message', 'Message': 'The ETag supplied did not match the ETag required to change this resource.'}]. Version-Release number of selected component (if applicable): 4.10.20 How reproducible: 100% in my env Steps to Reproduce: 1. Start spoke cluster deployment via ZTP gitops. siteconfig used is here: http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/242339a7e61248cfc76eea6fe8b77d6a943d6aa8/siteconfig/se450.yaml 2. Wait for iso to be created in infraenv 3. wait for provisioned state in bmh Actual results: 3. bmh image provisioning started and failed with following error: Normal ProvisioningStarted 65s metal3-baremetal-controller Image provisioning started for https://assisted-image-service-multicluster-engine.apps.kni-qe-17.lab.eng.rdu2.redhat.com/images/685cefb6-52b7-4aaa-96fa-a458f30991e2?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI2ODVjZWZiNi01MmI3LTRhYWEtOTZmYS1hNDU4ZjMwOTkxZTIifQ.-vZQpCAmItXqrhWvjlWpT0EmgjYY4Gr7pRreMkBFCyElEplsQRlIQeVAaIarEcM7p0J1-m1Qf2cX2sRlOpAOmg&arch=x86_64&type=minimal-iso&version=4.9 Normal ProvisioningError 45s metal3-baremetal-controller Image provisioning failed: Deploy step deploy.deploy failed with HTTPError: HTTP PATCH https://10.16.230.34/redfish/v1/Managers/1/VirtualMedia/EXT1 returned code 412. Base.1.8.GeneralError: A general error has occurred. See ExtendedInfo for more information. Extended information: [{'MessageSeverity': 'Critical', 'MessageArgs': [], 'MessageId': 'Base.1.8.PreconditionFailed', 'Resolution': 'Try the operation again using the appropriate ETag.', '@odata.type': '#Message.v1_1_0.Message', 'Message': 'The ETag supplied did not match the ETag required to change this resource.'}]. Expected results: bmh is in provisioned state. Additional info: Manually attaching an iso image via BMC GUI works fine for this server.
Moving to Jacob since was looking at the issue when Rhys mentioned on slack (he is already looking into alternatives)
Thank you for the report, Yang. Can I please ask you to confirm the exact Lenovo model in question? We've had at least one similar report so I'm trying to determine if these issues are linked to the same server model. Thank you, Jacob
@janders model is this: ThinkEdge SE450. And I can give you access to my server if needed.
running into the same issue on the same lenovo model. it was working fine for Hubcluster OCP 4.9 , ACM 2.4.x and ztp 4.10 but now failing for OCP 4.10.16, ACM 2.5.1. BMH is showing 412 error code. provisioning failed.
Yang, Rhys, can I please confirm the BIOS/firmware versions on your machines? We're having a discussion about this with the upstream Ironic community, this would be a very useful reference to compare results between different operators. Thank you!
Thank you Yang. So it's 1.10 BMC firmware revision. Rhys: I dug out a similar screenshot from our slack thread and it looks like you're on 1.10 as well - please correct me if I am wrong.
Added an upstream patch proposed by Julia which may help resolve this issue.
(In reply to Jacob Anders from comment #10) > Thank you Yang. So it's 1.10 BMC firmware revision. > > Rhys: I dug out a similar screenshot from our slack thread and it looks like > you're on 1.10 as well - please correct me if I am wrong. Firmware as below... BMC (Primary) Active 1.10 USX304W 2022-04-20 BMC (Backup) Inactive 1.10 USX304W 2022-04-20 UEFI Active 1.01 CME104I 2022-03-17
I tested the above linked patch (https://review.opendev.org/c/openstack/sushy/+/850547) and can seemingly now make requests against Redfish without eTag issues, but I'm having a problem proceeding further. I'm not sure if it's a side effect from the patch though. Image provisioning failed: Deploy step deploy.deploy failed: Redfish exception occurred. Error: Redfish set boot device failed for node a7573fbd-15c5-4cb6-b8c9-567b378b8b6f. Error: HTTP PATCH https://192.168.1.1/redfish/v1/Systems/1/Pending returned code 400. Base.1.8.GeneralError: A general error has occurred. See ExtendedInfo for more information. Extended information: [{'@odata.type': '#Message.v1_1_0.Message', 'Resolution': 'Remove the property from the request body and resubmit the request if the operation failed.', 'MessageArgs': ['BootSourceOverrideTarget'], 'MessageSeverity': 'Warning', 'MessageId': 'Base.1.8.PropertyNotWritable', 'Message': 'The property BootSourceOverrideTarget is a read only property and cannot be assigned a value.'}]. It seems to be trying to navigate to /Pending, is that to be expected?
Just chiming in since I haven't had a chance to cycle back as of yet. No, it is not expected AFAIK. We're clearly seeing varying/differing behavior here, which I think is going to require some careful re-reading of spec documents before we'll make progress. FWIW, there is a community/operator and customer who has observed the exact same behavior with their brand new lenovo server when they also grabbed the patch to give it a try. I *believe*, based upon comments they are making and questions they are asking, that they might have found themselves getting further along with some additional changes/logic, but I've not gotten details nor have they posted a patch yet.
no patch found yet? we are waiting for some workaround to get these servers certified on OCP 4.9.x.
So, the navigation to /Pending doesn't seem to be in the DMTF specification. I *suspect* it is a result of the implementor's decisions around error handling. BootSourceOverrideTarget is also explicitly listed as a read-write field, so I'm suspecting that machine is in the territory of some sort of bug. Or it could be in some sort of unrelated state, which is temporarily treating the field as read-only. Fixing etag behavior likely wouldn't change that behavior. In regards to the upstream Etag patch, we've requested reviews from representitives from multiple hardware vendors to validate that they do not foresee any issues.
(In reply to mmorency from comment #15) > no patch found yet? we are waiting for some workaround to get these servers > certified on OCP 4.9.x. Unfortunately there is no patch or workaround at the moment. As discussed on Slack I'm intending to have a closer look at this in few weeks time (will be off next week).
Also confirming that Julia's further patch set - https://review.opendev.org/c/openstack/sushy/+/850547 - also resolves the eTag issue on the ThinkEdge SE450, but we're still left with the strange "/Pending" redirect for virtual media attachment.
The /Pending redirect is actually coming from the System itself: "@Redfish.Settings": { "SupportedApplyTimes": [ "OnReset" ], "Messages": [], "@odata.type": "#Settings.v1_3_0.Settings", "SettingsObject": { "@odata.id": "/redfish/v1/Systems/1/Pending" }, "Time": null }, Why it does not accept the update is beyond my understanding.
Early patch is up for review: https://review.opendev.org/c/openstack/sushy/+/856123 I hope to test it on SE450 later this week (Marc kindly provided me with access to one). Will keep you posted.
I've done some specific testing on Lenovo SE450 (big thanks to Marc and team for providing me access). I again ran into the previously reported problem with the /Pending/ URL (as described in https://bugzilla.redhat.com/show_bug.cgi?id=2103710#c19) I investigated the problem in detail and here's where sushy/ironic are hitting the problem: https://opendev.org/openstack/sushy/src/branch/master/sushy/resources/system/system.py#L286 If a BMC provides a SettingsURL, we attempt to use it. For some reason, the SE450 provides us a non-functional Settings URL yet if we ignore it and issue a PATCH request against the System, that works. This looks like a BMC firmware bug that should be addressed in firmware. I will raise this specifically in a discussion with the Vendor (it's been mentioned alongside eTag issues but I think we need a dedicated discussion). I will also investigate possible workarounds.
Added a gerrit link to a sushy patch with a possible workaround for consideration
Customer hardware: Lenovo / ThinkSystem SR650 V2
{ 'ProductName': 'ThinkSystem SR650 V2', 'SystemId': '7Z73CTO1WW', 'FirmwareVersion': 'AFBT20N' } 'firmware_version': 'AFBT20N // 1.80 // 2022-04-19' highlights from Ironic logs: ~~~ 2022-09-15 11:10:38.416 1 DEBUG sushy.connector [req-e2a2a204-d345-4630-9800-cc8123146889 - - - - -] HTTP request: PATCH https://<server_ip>/redfish/v1/Managers/1/VirtualMedia/EXT1; headers: {'If-Match': 'W/"47ec134fd3872967abd","47ec134fd3872967abd"', 'Content-Type': 'application/json', 'OData-Version': '4.0'}; body: {'Image': 'http://10.1.54.8:80/redfish/boot-1bfdc2e3-a486-4d0c-80a0-d043c2ff7b14.iso', 'Inserted': True, 'WriteProtected': True}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:111 2022-09-15 11:10:38.579 1 DEBUG sushy.exceptions [req-e2a2a204-d345-4630-9800-cc8123146889 - - - - -] HTTP response for PATCH https://<server_ip>/redfish/v1/Managers/1/VirtualMedia/EXT1: status code: 412, error: Base.1.11.GeneralError: A general error has occurred. See ExtendedInfo for more information., extended: [{'@odata.type': '#Message.v1_1_2.Message', 'Resolution': 'Try the operation again using the appropriate ETag.', 'Message': 'The ETag supplied did not match the ETag required to change this resource.', 'MessageArgs': [], 'MessageSeverity': 'Critical', 'MessageId': 'Base.1.11.PreconditionFailed'}] __init__ /usr/lib/python3.6/site-packages/sushy/exceptions.py:122 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector [req-e2a2a204-d345-4630-9800-cc8123146889 - - - - -] Unable to start managed inspection for node 1bfdc2e3-a486-4d0c-80a0-d043c2ff7b14: HTTP PATCH https://10.1.82.23/redfish/v1/Managers/1/VirtualMedia/EXT1 returned code 412. Base.1.11.GeneralError: A general error has occurred. See ExtendedInfo for more information. Extended information: [{'@odata.type': '#Message.v1_1_2.Message', 'Resolution': 'Try the operation again using the appropriate ETag.', 'Message': 'The ETag supplied did not match the ETag required to change this resource.', 'MessageArgs': [], 'MessageSeverity': 'Critical', 'MessageId': 'Base.1.11.PreconditionFailed'}]: sushy.exceptions.HTTPError: HTTP PATCH https://<server_ip>/redfish/v1/Managers/1/VirtualMedia/EXT1 returned code 412. Base.1.11.GeneralError: A general error has occurred. See ExtendedInfo for more information. Extended information: [{'@odata.type': '#Message.v1_1_2.Message', 'Resolution': 'Try the operation again using the appropriate ETag.', 'Message': 'The ETag supplied did not match the ETag required to change this resource.', 'MessageArgs': [], 'MessageSeverity': 'Critical', 'MessageId': 'Base.1.11.PreconditionFailed'}] 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector Traceback (most recent call last): 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/inspector.py", line 210, in _start_managed_inspection 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector task.driver.boot.prepare_ramdisk(task, ramdisk_params=params) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/boot.py", line 558, in prepare_ramdisk 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector _insert_vmedia(task, managers, iso_ref, sushy.VIRTUAL_MEDIA_CD) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 329, in wrapped_f 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector return self.call(f, *args, **kw) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 409, in call 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector do = self.iter(retry_state=retry_state) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 356, in iter 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector return fut.result() 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector return self.__get_result() 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector raise self._exception 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 412, in call 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector result = fn(*args, **kwargs) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/boot.py", line 200, in _insert_vmedia 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector write_protected=True) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/sushy/resources/manager/virtual_media.py", line 167, in insert_media 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector self._conn.patch(target_uri, data=payload, headers=headers) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/sushy/connector.py", line 247, in patch 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector **extra_session_req_kwargs) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/sushy/connector.py", line 127, in _op 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector exceptions.raise_for_response(method, url, response) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector File "/usr/lib/python3.6/site-packages/sushy/exceptions.py", line 172, in raise_for_response 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector raise HTTPError(method, url, response) 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector sushy.exceptions.HTTPError: HTTP PATCH https://<server_ip>/redfish/v1/Managers/1/VirtualMedia/EXT1 returned code 412. Base.1.11.GeneralError: A general error has occurred. See ExtendedInfo for more information. Extended information: [{'@odata.type': '#Message.v1_1_2.Message', 'Resolution': 'Try the operation again using the appropriate ETag.', 'Message': 'The ETag supplied did not match the ETag required to change this resource.', 'MessageArgs': [], 'MessageSeverity': 'Critical', 'MessageId': 'Base.1.11.PreconditionFailed'}] 2022-09-15 11:10:38.580 1 ERROR ironic.drivers.modules.inspector
Hi Pavitra and Emre, There's been some confusion around this BZ due to the fact we're trying to cover two separate issues in it: 1) HTTP 412 error as per the initial description ( https://bugzilla.redhat.com/show_bug.cgi?id=2103710#c0 ) 2) Base.1.8.PropertyNotWritable error against /Pending/ URL ( https://bugzilla.redhat.com/show_bug.cgi?id=2103710#c13 ) In hindsight, I should have requested a second BZ to be opened against 2) but I was too focused on troubleshooting and we just kept going with the second issue in the same BZ. I will split it out later this week for the benefit of more clarity while tracking these issues going forward. At this point it is not clear whether issue 2) is specific to Lenovo SE450 only or whether it may impact other Lenovo models and possibly non-Lenovo machines. So far we have only observed the issue 2) on the SE450 model. Issue 1) should be resolved by this upstream patch: https://review.opendev.org/c/openstack/sushy/+/850547/ which should be included in OpenShift 4.10.34 z-stream. Would your customer be in position to upgrade to the latest 4.10 z-stream, try again and report back whether the fix to issue 1) has been successful? This would greatly assist in our troubleshooting and issue tracking. Setting status of this BZ to MODIFIED to reflect the real state of the fix - it is already available in the 4.10 codebase. Thank you, Jacob
NOTE: a new bug focusing on the /Pending/ URL and read only BootSourceOverrideEnabled: https://issues.redhat.com/browse/OCPBUGS-2052. I will continue working on this second issue in the new bug, at this stage development work on this bug is complete (verification of the fix is required).
Pavira, Emre Just following up on my comment: https://bugzilla.redhat.com/show_bug.cgi?id=2103710#c27. Would you be able to have your customers verify that the issue is fixed in the latest 4.10.z release? Thank you, Jacob
Hello Team, Any update on this? Thanks, Vinu K
The two issues affecting Lenovo SE450 that this bug touches on have been fixed in 4.13 / master and are awaiting verification on the specific hardware to be completed: https://issues.redhat.com/browse/OCPBUGS-3480 https://issues.redhat.com/browse/OCPBUGS-4225 Once this is done, we will commence backporting to 4.12, 4.11.z and 4.10.z. Given these have been fixed in master (and backports will be tracked in JIRA) - and I haven't heard further on my questions in https://bugzilla.redhat.com/show_bug.cgi?id=2103710#c32 I will proceed to close this bug. I would normally close this as duplicate of https://issues.redhat.com/browse/OCPBUGS-3480 but given I can't mark duplicates across Bugzilla and JIRA, closing with CANTFIX instead.