Bug 1850929 - Overcloud nodes transition to ERROR state after Undercloud Upgrade - 13 -> 16.1 Beta
Summary: Overcloud nodes transition to ERROR state after Undercloud Upgrade - 13 -> 16...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Dan Macpherson
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
: 1882757 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-25 08:22 UTC by Sadique Puthen
Modified: 2020-10-23 05:16 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-23 05:16:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sadique Puthen 2020-06-25 08:22:54 UTC
Description of problem:

I am trying to upgrade OSP-13 to OSP-16.1 Beta and hits a failure during the overcloud upgrade prepare. Here are the steps.

1- Before I start the upgrade, all overcloud nodes are in ACTIVE state. 3 Controllers, 2 Compute and 3 Ceph nodes.

2- Undercloud upgrade is finished successfully [1]. All overcloud nodes go to ERROR state automatically after the upgrade is finished.

$ openstack server list
+--------------------------------------+--------------+--------+----------------------+----------------+---------+
| ID                                   | Name         | Status | Networks             | Image          | Flavor  |
+--------------------------------------+--------------+--------+----------------------+----------------+---------+
| 33103ecd-b252-43fb-94d1-30655f9185da | ceph-3       | ERROR  | ctlplane=172.16.0.73 | overcloud-full | ceph    |
| 051e8a66-8d0e-43eb-9424-a986fb52f48e | controller-1 | ERROR  | ctlplane=172.16.0.51 | overcloud-full | control |
| ada9bb5f-2279-464d-a90c-7004d6d85702 | controller-3 | ERROR  | ctlplane=172.16.0.53 | overcloud-full | control |
| 0c141306-89a2-4856-a566-4dd7620d9249 | ceph-1       | ERROR  | ctlplane=172.16.0.71 | overcloud-full | ceph    |
| bacd7b1a-09fa-4544-8534-4baacd6c524a | ceph-2       | ERROR  | ctlplane=172.16.0.72 | overcloud-full | ceph    |
| b410e89c-af29-4f70-a714-91f033982fa1 | controller-2 | ERROR  | ctlplane=172.16.0.52 | overcloud-full | control |
| 98dcbb4b-2216-4a1d-baeb-dec70166743f | compute-1    | ERROR  | ctlplane=172.16.0.61 | overcloud-full | compute |
| 544f401e-1160-4f89-a819-ea8495293aff | compute-2    | ERROR  | ctlplane=172.16.0.62 | overcloud-full | compute |
+--------------------------------------+--------------+--------+----------------------+----------------+---------+

3- Then trying to run upgrade prepare [2], hits below error.

2020-06-24 12:49:57Z [overcloud-ControllerServiceChain-cw7ao4jlg3u4.ServiceChain]: DELETE_COMPLETE  state changed
2020-06-24 12:49:57Z [overcloud-ControllerServiceChain-cw7ao4jlg3u4]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2020-06-24 12:49:58Z [overcloud.ControllerServiceChain]: UPDATE_COMPLETE  state changed

 Stack overcloud/72b3afd7-e2b5-4476-9a49-83d3b89d0f58 UPDATE_FAILED 

overcloud.Compute.1.NovaCompute:
  resource_type: OS::TripleO::ComputeServer
  physical_resource_id: 544f401e-1160-4f89-a819-ea8495293aff
  status: UPDATE_FAILED
  status_reason: |
    Conflict: resources.NovaCompute: Cannot 'update metadata' instance 544f401e-1160-4f89-a819-ea8495293aff while it is in vm_state error (HTTP 409) (Request-ID: req-65030cee-9273-4087-b717-6ea4afc7088c)
overcloud.Compute.0.NovaCompute:
  resource_type: OS::TripleO::ComputeServer
  physical_resource_id: 98dcbb4b-2216-4a1d-baeb-dec70166743f
  status: UPDATE_FAILED
  status_reason: |
    Conflict: resources.NovaCompute: Cannot 'update metadata' instance 98dcbb4b-2216-4a1d-baeb-dec70166743f while it is in vm_state error (HTTP 409) (Request-ID: req-f63a86e7-d19f-43a9-9eef-f0068fc5b868)

The error is probably because, the overcloud nodes are in ERROR state. Can anyone help me how we can fix this? Though the nodes are in ERROR state post undercloud upgrade, the actual osp-13 overcloud nodes are running without any issues.


[1] https://gitlab.cee.redhat.com/sputhenp/ospkvm/-/blob/master/templates/osp-13/upgrade/undercloud-upgrade-13-16.yaml
[2] https://gitlab.cee.redhat.com/sputhenp/ospkvm/-/blob/master/templates/osp-13/upgrade/overcloud-upgrade-prepare-tls-everywhere.sh
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Lukas Bezdicka 2020-06-29 10:12:54 UTC
for uuid in $(openstack baremetal node list -f value -c UUID); do 
  openstack baremetal node set $uuid --driver ipmi;
  openstack baremetal node maintenance set $uuid  --reason "Changing driver and/or hardware interfaces" ;
  openstack baremetal node set $uuid --driver ipmi --deploy-interface iscsi;
  openstack baremetal node maintenance unset $uuid;
done

for uuid in $(openstack server list -f value -c ID); do
  nova reset-state --active $uuid;
done


 Stack overcloud/72b3afd7-e2b5-4476-9a49-83d3b89d0f58 UPDATE_COMPLETE

Comment 2 Sadique Puthen 2020-07-02 06:15:42 UTC
Can we do some validation here and warn the user to move away from deprecated ironic drivers that might have been removed or not working in 16.1?

Comment 3 Lukas Bezdicka 2020-10-21 10:58:56 UTC
*** Bug 1882757 has been marked as a duplicate of this bug. ***

Comment 4 Dan Macpherson 2020-10-23 05:16:15 UTC
Have implemented content on how to convert to the next gen drivers:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#converting-to-next-generation-power-management-drivers

This is content we used for the OSP13 to 14 upgrade process and should still be valid.


Note You need to log in before you can comment on or make changes to this bug.