Description of problem: I am trying to upgrade OSP-13 to OSP-16.1 Beta and hits a failure during the overcloud upgrade prepare. Here are the steps. 1- Before I start the upgrade, all overcloud nodes are in ACTIVE state. 3 Controllers, 2 Compute and 3 Ceph nodes. 2- Undercloud upgrade is finished successfully [1]. All overcloud nodes go to ERROR state automatically after the upgrade is finished. $ openstack server list +--------------------------------------+--------------+--------+----------------------+----------------+---------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+----------------------+----------------+---------+ | 33103ecd-b252-43fb-94d1-30655f9185da | ceph-3 | ERROR | ctlplane=172.16.0.73 | overcloud-full | ceph | | 051e8a66-8d0e-43eb-9424-a986fb52f48e | controller-1 | ERROR | ctlplane=172.16.0.51 | overcloud-full | control | | ada9bb5f-2279-464d-a90c-7004d6d85702 | controller-3 | ERROR | ctlplane=172.16.0.53 | overcloud-full | control | | 0c141306-89a2-4856-a566-4dd7620d9249 | ceph-1 | ERROR | ctlplane=172.16.0.71 | overcloud-full | ceph | | bacd7b1a-09fa-4544-8534-4baacd6c524a | ceph-2 | ERROR | ctlplane=172.16.0.72 | overcloud-full | ceph | | b410e89c-af29-4f70-a714-91f033982fa1 | controller-2 | ERROR | ctlplane=172.16.0.52 | overcloud-full | control | | 98dcbb4b-2216-4a1d-baeb-dec70166743f | compute-1 | ERROR | ctlplane=172.16.0.61 | overcloud-full | compute | | 544f401e-1160-4f89-a819-ea8495293aff | compute-2 | ERROR | ctlplane=172.16.0.62 | overcloud-full | compute | +--------------------------------------+--------------+--------+----------------------+----------------+---------+ 3- Then trying to run upgrade prepare [2], hits below error. 2020-06-24 12:49:57Z [overcloud-ControllerServiceChain-cw7ao4jlg3u4.ServiceChain]: DELETE_COMPLETE state changed 2020-06-24 12:49:57Z [overcloud-ControllerServiceChain-cw7ao4jlg3u4]: UPDATE_COMPLETE Stack UPDATE completed successfully 2020-06-24 12:49:58Z [overcloud.ControllerServiceChain]: UPDATE_COMPLETE state changed Stack overcloud/72b3afd7-e2b5-4476-9a49-83d3b89d0f58 UPDATE_FAILED overcloud.Compute.1.NovaCompute: resource_type: OS::TripleO::ComputeServer physical_resource_id: 544f401e-1160-4f89-a819-ea8495293aff status: UPDATE_FAILED status_reason: | Conflict: resources.NovaCompute: Cannot 'update metadata' instance 544f401e-1160-4f89-a819-ea8495293aff while it is in vm_state error (HTTP 409) (Request-ID: req-65030cee-9273-4087-b717-6ea4afc7088c) overcloud.Compute.0.NovaCompute: resource_type: OS::TripleO::ComputeServer physical_resource_id: 98dcbb4b-2216-4a1d-baeb-dec70166743f status: UPDATE_FAILED status_reason: | Conflict: resources.NovaCompute: Cannot 'update metadata' instance 98dcbb4b-2216-4a1d-baeb-dec70166743f while it is in vm_state error (HTTP 409) (Request-ID: req-f63a86e7-d19f-43a9-9eef-f0068fc5b868) The error is probably because, the overcloud nodes are in ERROR state. Can anyone help me how we can fix this? Though the nodes are in ERROR state post undercloud upgrade, the actual osp-13 overcloud nodes are running without any issues. [1] https://gitlab.cee.redhat.com/sputhenp/ospkvm/-/blob/master/templates/osp-13/upgrade/undercloud-upgrade-13-16.yaml [2] https://gitlab.cee.redhat.com/sputhenp/ospkvm/-/blob/master/templates/osp-13/upgrade/overcloud-upgrade-prepare-tls-everywhere.sh Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
for uuid in $(openstack baremetal node list -f value -c UUID); do openstack baremetal node set $uuid --driver ipmi; openstack baremetal node maintenance set $uuid --reason "Changing driver and/or hardware interfaces" ; openstack baremetal node set $uuid --driver ipmi --deploy-interface iscsi; openstack baremetal node maintenance unset $uuid; done for uuid in $(openstack server list -f value -c ID); do nova reset-state --active $uuid; done Stack overcloud/72b3afd7-e2b5-4476-9a49-83d3b89d0f58 UPDATE_COMPLETE
Can we do some validation here and warn the user to move away from deprecated ironic drivers that might have been removed or not working in 16.1?
*** Bug 1882757 has been marked as a duplicate of this bug. ***
Have implemented content on how to convert to the next gen drivers: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#converting-to-next-generation-power-management-drivers This is content we used for the OSP13 to 14 upgrade process and should still be valid.