Description of problem: While troubleshooting this bz1776929 the node was remaining on the ipxe in POST. After the 20 minutes introspection timeout, ironic tried to set_boot_device on the node, but failed with UnableToModifyDuringSystemPOST returned. This message was quite hidden though, I had to add some custom debug to get it and opened bz1820689 to address this. I'm wondering if inspector shouldn't shutdown the node before sending BootSourceOverrideTarget ? Should this be under ironic-inspector or redfish? Version-Release number of selected component (if applicable): master How reproducible: All the time Steps to Reproduce: 1. Launch introspection 2. Fail to load ipxe image and remain in ipxe shell 3. wait 20 minutes Actual results: On second try, inspector fails with this traceback [1] Expected results: This shouldn't prevent inspector from doing a second try. Additional info: [1] ~~~ ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last): ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/ilo/management.py", line 279, in set_boot_device ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server ilo_object.set_one_time_boot(boot_device) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/proliantutils/ilo/client.py", line 459, in set_one_time_boot ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server return self._call_method('set_one_time_boot', value) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/proliantutils/ilo/client.py", line 341, in _call_method ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server return method(*args, **kwargs) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/proliantutils/redfish/redfish.py", line 610, in set_one_time_boot ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server raise exception.IloError(msg) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server proliantutils.exception.IloError: [iLO xxx] The Redfish controller failed to set one time boot device NETWORK. Error: HTTP PATCH https://xxx/redfish/v1/Systems/1 returned code 400. iLO.0.10.ExtendedInfo: See @Message.ExtendedInfo for more information. ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last): ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 235, in inner ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server return func(*args, **kwargs) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 3034, in set_boot_device ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server persistent=persistent) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 60, in wrapped ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/ilo/management.py", line 286, in set_boot_device ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server error=ilo_exception) ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server ironic.common.exception.IloOperationError: Setting pxe as boot device failed, error: [iLO xxx] The Redfish controller failed to set one time boot device NETWORK. Error: HTTP PATCH https://xxx/redfish/v1/Systems/1 returned code 400. iLO.0.10.ExtendedInfo: See @Message.ExtendedInfo for more information. ironic/ironic-conductor.log:2020-04-01 17:38:06.494 7 ERROR oslo_messaging.rpc.server ~~~
As Ilya noted - this is up to proliantutils or iLO driver to make that decision...? Redfish does not require any specific power state when changing boot options
I think the managed introspection functionality that merged during the last upstream development cycle (Ussuri), should effectively solve this as the item managing the power and boot mode settings is then just ironic with-in a workflow, at least as long as [inspector]require_managed_boot is set to True. The only way to realistically prevent this is for inspector to force the power state off in advance of trying to run, or the driver trying to assert power state off before changing the boot device. I guess the machine was already powered on when inspection was triggered? Depending on the code path, it looks like the call goes to inspector, inspector then attempts to ask ironic to set the network device to boot, and then reboot the node. I guess my disconnect is why is the node on even before this step?
Patch uploaded upstream to address this. The actual process in this case is being driven by ironic-inspector. The previous focus on proliantutils was not correct as it is legitimately failing, just not with much clarity, although patches have been proposed upstream to improve that.
low priority, no progress in the last year closing wontfix If this needs to be reconsidered, please re-open
I noticed the noted patch will be in OSP17, linking appropriately and moving to modified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543