This bug was initially created as a copy of Bug #2027544 I am copying this bug because: Description of problem: A customer is finding that that the power status within Ironic of some of their baremetal nodes are being set to None after a period of time (typically 24 hours or more). The customer is using Redfish to manage DELL servers via iDRACs. After enabling debug within Ironic, what is seen from the logs when this issue appears to occur is a sequence where: 1. Ironic with Redfish attempts to perform a GET against https://x.x.x.x/redfish/v1/Systems/System.Embedded.1 which fails as the Authentication credentials are missing/invalid (likely auth expired) 2. Performs a GET against https://x.x.x.x/redfish/v1/SessionService but during the processing of this request a transitory network issue occurs and the GET connection fails. The error message in Ironic debug logs includes the text "Error <....> while attempting to establish a session. Falling back to basic authentication" which maps to _do_authenticate(..) in Sushy's auth.py. 3. All subsequent Ironic / Redfish operations to that specific iDRAC now fail with invalid credentials suggesting that the iDRAC is likely not supporting Redfish with Basic Authentication. Looking earlier in the logs show that Session renewal is working fine when there are no networking interruptions. Also, restarting Ironic brings all the Ironic nodes that were marked as power state None back to their correct power status presumably because Sushy has returned back to authenticating using the REST API with SessionService rather than basic authentication. This is a significant issue for the customer as they are loosing Ironic's node power status and managability in production. The behaviour suggests that the Sushy falling back to Basic Auth is making the assumption that the target Redfish device supports that mode which is not necessarily the case. Version-Release number of selected component (if applicable): OSP 16.1 How reproducible: Occuring in multiple production and lab OpenStack regions within the customer. Requires transitory network issues to observe. Steps to Reproduce: 1. Deploy OSP 16.1 using Redfish with Ironic for management 2. Induce transitory network issues that cause connection failures during Redfish operations 3. Observe Ironic node power state marked as None over time Actual results: Ironic node power states are None Expected results: Ironic node power state should reflect the actual power state of the node Additional info: Customer's Ironic debug logs are available from the associated support case.
Patches have merged and RPMs are built.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.3 (Train)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:4793