Bug 2033746 - Ironic baremetal node goes to maintenace after minor update
Summary: Ironic baremetal node goes to maintenace after minor update
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: z8
: 16.1 (Train on RHEL 8.2)
Assignee: OSP Team
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-17 18:21 UTC by Rafael Urena
Modified: 2022-11-28 16:35 UTC (History)
2 users (show)

Fixed In Version: openstack-ironic-13.0.7-1.20220105043354.3d77e61.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-24 11:02:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11879 0 None None None 2021-12-17 18:21:51 UTC
Red Hat Product Errata RHBA-2022:0986 0 None None None 2022-03-24 11:02:39 UTC

Description Rafael Urena 2021-12-17 18:21:23 UTC
Description of problem:
Customer upgraded from 16.1.6 to 16.1.7. After the upgrade they noticed that all the baremetal nodes went to maintenance on with the following:

| last_error | During sync_power_state, max retries exceeded for node 9271e535-a7d0-4e60-849e-02b0e83f2769, node state None does not match expected state 'None'. Updating DB state to 'None' Switching node to maintenance mode. Error: An exclusive lock is required, but the current context has a shared lock. |

Version-Release number of selected component (if applicable):
Openstack 16.1.7 (minor update from 16.1.6)

How reproducible:
CU has not attempted to reproduce

Steps to Reproduce:
1. deploy osp 16.1.6
2. perform minor upgrade to osp 16.1.7
3. verify baremetal node status

Actual results:
All nodes go to maintenace

~~~
$ openstack baremetal node list
+--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name              | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+
| 9271e535-a7d0-4e60-849e-02b0e83f2769 | controller-0      | f85c29ae-c0a8-409c-b282-0ec85ae35852 | None        | active             | True        |
| 5821e92e-fc8e-453f-9241-cec86b461d25 | controller-1      | 9a390b75-a888-4b6d-80e8-25abcdb6fb09 | None        | active             | True        |
| 5b56aa9e-e7e5-43f7-b19d-2968ce2716e4 | controller-2      | d1b4a31c-3564-42ad-a91e-342adccd8dac | None        | active             | True        |
| 8a0e788e-ec84-43ab-9250-bbc052294e33 | computeDell6152-0 | 4da51add-42bf-4f12-a48f-1bbf2558d2e6 | None        | active             | True        |
| e6ee252c-b202-43f2-a888-a11edbf2c44b | computeDell6152-1 | None                                 | None        | available          | True        |
| ce6c0e9e-c7d8-4de1-8b28-ca973566c045 | storage-0         | 3350c31f-2f15-449e-8b40-b13db30e0e77 | None        | active             | True        |
| cad62674-65cd-4636-ac85-7d2661e32d6f | storage-1         | df425cce-3066-4e9c-a6b5-bf0a81d05aca | None        | active             | True        |
| 40060dad-4e9f-469d-b5cd-50937248bc7b | storage-2         | 49157863-ef71-4677-a3e3-5446b92edbad | None        | active             | True        |
| f2c99b78-feb7-4e9a-996f-a5325f7a5329 | computeSriov-0    | 6bf99fe0-4e93-4fc4-ac21-0cdc6fa70ff0 | None        | active             | True        |
| 879f2998-714c-4966-9de5-db07ba8f2073 | computeSriov-1    | None                                 | None        | available          | True        |
| 13bcf2a1-7d9b-4f1b-b8bd-020cf2dffcdc | computeDell6230-0 | 4d4866fa-7484-4cc9-930d-b8710f43fbcf | None        | active             | True        |
| cc56446d-1985-4ea3-b6a2-ea82dff38c3c | computeDell6230-1 | None                                 | None        | available          | True        |
+--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+

Expected results:
Nodes receive power state

Additional info:
We performed the following on the undercloud to see if the state could be restored:
~~~
# systemctl restart tripleo_ironic_conductor.service
# systemctl restart tripleo_ironic_inspector.service
~~~

This allowed the state to be reset 

~~~
$ openstack baremetal node list
+--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name              | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+
| 9271e535-a7d0-4e60-849e-02b0e83f2769 | controller-0      | f85c29ae-c0a8-409c-b282-0ec85ae35852 | None        | active             | False       |
| 5821e92e-fc8e-453f-9241-cec86b461d25 | controller-1      | 9a390b75-a888-4b6d-80e8-25abcdb6fb09 | None        | active             | False       |
| 5b56aa9e-e7e5-43f7-b19d-2968ce2716e4 | controller-2      | d1b4a31c-3564-42ad-a91e-342adccd8dac | None        | active             | False       |
| 8a0e788e-ec84-43ab-9250-bbc052294e33 | computeDell6152-0 | 4da51add-42bf-4f12-a48f-1bbf2558d2e6 | None        | active             | False       |
| e6ee252c-b202-43f2-a888-a11edbf2c44b | computeDell6152-1 | None                                 | None        | available          | False       |
| ce6c0e9e-c7d8-4de1-8b28-ca973566c045 | storage-0         | 3350c31f-2f15-449e-8b40-b13db30e0e77 | None        | active             | False       |
| cad62674-65cd-4636-ac85-7d2661e32d6f | storage-1         | df425cce-3066-4e9c-a6b5-bf0a81d05aca | None        | active             | False       |
| 40060dad-4e9f-469d-b5cd-50937248bc7b | storage-2         | 49157863-ef71-4677-a3e3-5446b92edbad | None        | active             | False       |
| f2c99b78-feb7-4e9a-996f-a5325f7a5329 | computeSriov-0    | 6bf99fe0-4e93-4fc4-ac21-0cdc6fa70ff0 | None        | active             | False       |
| 879f2998-714c-4966-9de5-db07ba8f2073 | computeSriov-1    | None                                 | None        | available          | False       |
| 13bcf2a1-7d9b-4f1b-b8bd-020cf2dffcdc | computeDell6230-0 | 4d4866fa-7484-4cc9-930d-b8710f43fbcf | None        | active             | False       |
| cc56446d-1985-4ea3-b6a2-ea82dff38c3c | computeDell6230-1 | None                                 | None        | available          | False       |
+--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+
~~~

But they all went to maintenance again. This is seen in the logs:

~~~
2021-12-15 14:45:06.467 8 DEBUG ironic.conductor.task_manager [req-d3c43a10-e156-4abe-8b90-450fbaf74d16 - - - - -] Successfully released shared lock for power failure recovery on node 9271e535-a7d0-4e60-849e-02b0e83f2769 (lock was held 0.03 sec) release_resources /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:360
2021-12-15 14:45:06.484 8 DEBUG ironic.conductor.task_manager [req-d3c43a10-e156-4abe-8b90-450fbaf74d16 - - - - -] Attempting to get shared lock on node 5821e92e-fc8e-453f-9241-cec86b461d25 (for power failure recovery) __init__ /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:222
2021-12-15 14:45:06.523 8 DEBUG ironic.conductor.manager [req-d3c43a10-e156-4abe-8b90-450fbaf74d16 - - - - -] During power_failure_recovery, could not get power state for node 5821e92e-fc8e-453f-9241-cec86b461d25, Error: An exclusive lock is required, but the current context has a shared lock.. _power_failure_recovery /usr/lib/python3.6/site-packages/ironic/conductor/manager.py:1932 
~~~

Comment 2 Steve Baker 2022-01-04 21:12:12 UTC
Here is the proposed backport for 16.1.x, the same fix is included in 16.2.2

Comment 3 Steve Baker 2022-02-08 21:19:00 UTC
The fix is in compose RHOS-16.1-RHEL-8-20220121.n.1

Comment 12 errata-xmlrpc 2022-03-24 11:02:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986


Note You need to log in before you can comment on or make changes to this bug.