Bug 2208237 - [FFU] Controller Nodes in MAINTENANCE state after Overcloud Ctlplane System Upgrade
Summary: [FFU] Controller Nodes in MAINTENANCE state after Overcloud Ctlplane System U...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-release
Version: 17.1 (Wallaby)
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ga
: ---
Assignee: Juan Badia Payno
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-18 10:41 UTC by Ricardo Diaz
Modified: 2023-08-07 08:10 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-07 12:29:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-25175 0 None None None 2023-05-18 10:42:43 UTC

Description Ricardo Diaz 2023-05-18 10:41:46 UTC
Description of problem:
After running (with no error) the Overcloud Ctlplane System Upgrade FFU OSP17 stage controllers are in MAINTENANCE state:
~~~
(undercloud) [stack@undercloud-0 ~]$ metalsmith list
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+
| UUID                                 | Node Name    | Allocation UUID                      | Hostname           | State       | IP Addresses         |
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+
| 6dca4b5f-ac03-4a04-9516-cb88fc148012 | compute-0    | b897d9a0-0ef7-4a3f-9c39-71eff5b9673d | computedpdksriov-0 | ACTIVE      | ctlplane=192.0.70.17 |
| 32199f18-b156-4e07-9570-56f6e90eb64c | compute-1    | 94c7c4e7-8c8d-4ea9-9bbe-017f43c4d134 | computedpdksriov-1 | ACTIVE      | ctlplane=192.0.70.14 |
| 46c84e87-4549-4d96-beea-ababb43e2236 | controller-0 | 7ff671a4-f665-477a-b733-c9e9a827ffa1 | controller-0       | MAINTENANCE | ctlplane=192.0.70.15 |
| f90624be-e487-4ad7-8a47-4649dd545c81 | controller-1 | a129feb1-52d0-41d1-9c96-d85f7e4559a2 | controller-1       | MAINTENANCE | ctlplane=192.0.70.9  |
| e296f959-b31a-436f-a87d-fc5febaac5b0 | controller-2 | 623fe14e-2ffb-488e-a7ca-e1f2bc79e007 | controller-2       | MAINTENANCE | ctlplane=192.0.70.6  |
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+
~~~


Version-Release number of selected component (if applicable):
FFU 16.2 -> 17.1

How reproducible:
100%

Steps to Reproduce:
1.Run Overcloud Ctlplane System Upgrade FFU OSP17
2.
3.

Actual results:


Expected results:
Controller nodes must be in ACTIVE state

Additional info:

Comment 1 Ricardo Diaz 2023-05-18 12:31:12 UTC
It looks like there is no problem when unsetting maintenance for a controller:

(undercloud) [stack@undercloud-0 ~]$ metalsmith list
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+
| UUID                                 | Node Name    | Allocation UUID                      | Hostname           | State       | IP Addresses         |
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+
| 6dca4b5f-ac03-4a04-9516-cb88fc148012 | compute-0    | b897d9a0-0ef7-4a3f-9c39-71eff5b9673d | computedpdksriov-0 | ACTIVE      | ctlplane=192.0.70.17 |
| 32199f18-b156-4e07-9570-56f6e90eb64c | compute-1    | 94c7c4e7-8c8d-4ea9-9bbe-017f43c4d134 | computedpdksriov-1 | ACTIVE      | ctlplane=192.0.70.14 |
| 46c84e87-4549-4d96-beea-ababb43e2236 | controller-0 | 7ff671a4-f665-477a-b733-c9e9a827ffa1 | controller-0       | MAINTENANCE | ctlplane=192.0.70.15 |
| f90624be-e487-4ad7-8a47-4649dd545c81 | controller-1 | a129feb1-52d0-41d1-9c96-d85f7e4559a2 | controller-1       | MAINTENANCE | ctlplane=192.0.70.9  |
| e296f959-b31a-436f-a87d-fc5febaac5b0 | controller-2 | 623fe14e-2ffb-488e-a7ca-e1f2bc79e007 | controller-2       | MAINTENANCE | ctlplane=192.0.70.6  |
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+

(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node maintenance unset controller-0

(undercloud) [stack@undercloud-0 ~]$ metalsmith list
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+
| UUID                                 | Node Name    | Allocation UUID                      | Hostname           | State       | IP Addresses         |
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+
| 6dca4b5f-ac03-4a04-9516-cb88fc148012 | compute-0    | b897d9a0-0ef7-4a3f-9c39-71eff5b9673d | computedpdksriov-0 | ACTIVE      | ctlplane=192.0.70.17 |
| 32199f18-b156-4e07-9570-56f6e90eb64c | compute-1    | 94c7c4e7-8c8d-4ea9-9bbe-017f43c4d134 | computedpdksriov-1 | ACTIVE      | ctlplane=192.0.70.14 |
| 46c84e87-4549-4d96-beea-ababb43e2236 | controller-0 | 7ff671a4-f665-477a-b733-c9e9a827ffa1 | controller-0       | ACTIVE      | ctlplane=192.0.70.15 |
| f90624be-e487-4ad7-8a47-4649dd545c81 | controller-1 | a129feb1-52d0-41d1-9c96-d85f7e4559a2 | controller-1       | MAINTENANCE | ctlplane=192.0.70.9  |
| e296f959-b31a-436f-a87d-fc5febaac5b0 | controller-2 | 623fe14e-2ffb-488e-a7ca-e1f2bc79e007 | controller-2       | MAINTENANCE | ctlplane=192.0.70.6  |
+--------------------------------------+--------------+--------------------------------------+--------------------+-------------+----------------------+

Comment 2 Ricardo Diaz 2023-05-18 17:06:48 UTC
It looks like that after some minutes the Controller backs to MAINTENANCE state.

Comment 3 Juan Badia Payno 2023-05-19 09:48:20 UTC
The issue with the metalsmith with VMs is that it is simulate the ipmi with vbmc, everything is installed on rhel8.4 with virtualenv (python3.6).
Once the undercloud OS is upgraded to rhel-9.2 the vbmc does not work any longer. vbmc needs to be reinstalled and restarted.

Comment 5 Jesse Pretorius 2023-06-07 12:29:15 UTC
This is an issue in CI automation which would need to be solved in Infrared or some other CI automation changes. The issue is not in OSP.


Note You need to log in before you can comment on or make changes to this bug.