Bug 2222048
| Summary: | ironic keeps using old ilo password after update | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Eric Nothen <enothen> |
| Component: | python-proliantutils | Assignee: | Julia Kreger <jkreger> |
| Status: | MODIFIED --- | QA Contact: | James E. LaBarre <jlabarre> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | bfournie, eduen, hbrock, jkreger, jslagle, mburns, sbaker |
| Target Milestone: | z6 | Keywords: | Triaged |
| Target Release: | 16.2 (Train on RHEL 8.4) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | python-proliantutils-2.9.4-17.1.20230620223248.5bc7569.el9ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Eric Nothen
2023-07-11 16:53:54 UTC
Could you please provide a full baremetal node show: openstack baremetal node show -fyaml hpdl380.local162.lab It is not clear which power-interface is being used on this node, but if it is ipmi then I may see the issue. There is an update function to sync ipmi_password with the current value of ilo_password, which may not be triggered on baremetal node set. If this is the case, you'll need to sync the password values manually: openstack baremetal node set --driver-info ilo_password=newpassword --driver-info ipmi_password=newpassword hpdl380.local162.lab (In reply to Steve Baker from comment #1) > Could you please provide a full baremetal node show: > > openstack baremetal node show -fyaml hpdl380.local162.lab > ~~~ (undercloud) [stack.lab ~]$ openstack baremetal node show -fyaml hpdl380.local162.lab allocation_uuid: null automated_clean: null bios_interface: ilo boot_interface: ilo-pxe chassis_uuid: null clean_step: {} conductor: director.local162.lab conductor_group: '' console_enabled: false console_interface: ilo created_at: '2023-07-11T15:38:19+00:00' deploy_interface: iscsi deploy_step: {} description: null driver: ilo driver_info: deploy_kernel: file:///var/lib/ironic/httpboot/agent.kernel deploy_ramdisk: file:///var/lib/ironic/httpboot/agent.ramdisk ilo_address: 10.0.0.100 ilo_password: '******' ilo_username: root rescue_kernel: file:///var/lib/ironic/httpboot/agent.kernel rescue_ramdisk: file:///var/lib/ironic/httpboot/agent.ramdisk driver_internal_info: {} extra: {} fault: power failure inspect_interface: inspector inspection_finished_at: null inspection_started_at: null instance_info: {} instance_uuid: null last_error: 'During sync_power_state, max retries exceeded for node 658b192e-abe6-48c8-ab07-5577ea3dc3b3, node state None does not match expected state ''power on''. Updating DB state to ''None'' Switching node to maintenance mode. Error: iLO get_power_status failed, error: Login failed.' maintenance: true maintenance_reason: 'During sync_power_state, max retries exceeded for node 658b192e-abe6-48c8-ab07-5577ea3dc3b3, node state None does not match expected state ''power on''. Updating DB state to ''None'' Switching node to maintenance mode. Error: iLO get_power_status failed, error: Login failed.' management_interface: ilo name: hpdl380.local162.lab network_interface: flat owner: null power_interface: ilo power_state: null properties: cpu_arch: x86_64 protected: false protected_reason: null provision_state: manageable provision_updated_at: '2023-07-11T15:38:25+00:00' raid_config: {} raid_interface: no-raid rescue_interface: agent reservation: null resource_class: baremetal storage_interface: noop target_power_state: null target_provision_state: null target_raid_config: {} traits: [] updated_at: '2023-07-11T16:32:47+00:00' uuid: 658b192e-abe6-48c8-ab07-5577ea3dc3b3 vendor_interface: no-vendor (undercloud) [stack.lab ~]$ ~~~ > It is not clear which power-interface is being used on this node, It's an ilo (see for example steps #5 for ilo driver and step #6 for the the ilo.power error in the log). This comes from 'pm_type: "ilo"' in the nodes.yaml. > but if it is ipmi then I may see the issue. > There is an update function to sync > ipmi_password with the current value of ilo_password, which may not be > triggered on baremetal node set. No it's the other way around for me. As much as I tried, I could not reproduce the problem when using ipmi driver. That is: when using ipmi ironic retries with the **new** password after I do "baremetal node set --driver-info ipmi_password=...". It just takes some minutes, but it resolves on its own without a container restart. When using ilo, however, it stays broken and retrying continuously with the old password until ironic_conductor is restarted, or until I put the old password back both in the ilo and in ironic. > If this is the case, you'll need to sync > the password values manually: > > openstack baremetal node set --driver-info ilo_password=newpassword > --driver-info ipmi_password=newpassword hpdl380.local162.lab It's not the same case, so I don't think this command applies. I don't have ipmi information on driver_info, only ilo_*. We've run into a similar issue with the redfish hardware type interfaces. Actually, I need to add a patch to detect some ilos and reject the ilo driver on them and instead indicate "use redfish". Anyhow! A glance at the ironic code path doesn't appear to use caching, and we always read from the database when launching new tasks. The challenge is the proliantutils library, internally, has caching, and that is what is breaking things here. It appears, at least looking at Wallaby branch upstream, and newer, they have mirrored the logic to spawn a new cached client and checked upon username, password, and address. However in the Train release of proliantutils, it had no such guard, and only relied upon the address of the BMC. It appears they fixed this in proliantutils 2.9.2, in upstream change number 694448. Looks like we can just cherry-pick the patch into place. Awesome, thanks for the update Julia. There's a workaround in place at the moment, so we can wait for the fix. |