Description of problem: Live migration reportedly fails, but VM is actually moved and nova DB is never updated to reflect the new host. Somewhat similar to this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1636102 Version-Release number of selected component (if applicable): openstack-nova-compute-17.0.13-2.el7ost.noarch OSP13 How reproducible: Difficult to reproduce. I believe the migration needs to fail in some specific way to end up in this situation. Steps to Reproduce: 1. Try to live-migrate a VM 2. It reports failure but actually moves the VM 3. Nova DB now reports wrong hypervisor Actual results: openstack server show shows the wrong hypervisor compared to checking virsh on the source and destination computes Expected results: Nova DB should reflect actual location of VM Additional info: There is a solution article for updating the DB: https://access.redhat.com/solutions/2070503 But in this case, we also have attached volumes. So I think it's best to not manually modify the DB until confirming with the Nova engineering team to ensure we don't cause further issues.
Indeed recovery/fixing of a server which has failed in the middle of a migration should be done with care. Please read the entirety of this reply before doing anything. You will want to verify whether the ports and volumes that are supposed to be attached to the server are working properly. From what you wrote in comment 0, it sounds like the server has moved to destination and you see it on the destination with virsh. Do you also see a virt guest on the source host for the same server? OK, so assuming there is only one virt guest and it's on the destination host, we want to check the ports and volumes and correct them if necessary. Examples of basic checks are: can you ssh into the guest via all of its IP addresses? Can you access its attached volumes without any problems? If the answers to these is yes, the ports and volumes are probably connected to the correct host but I'd also check the data in openstack to verify. You'll want to check which host the ports and volumes are attached to first. 1. Get the IP address(es) and volume IDs for the server so you can look up the ports and the volumes. The ports will be in the 'addresses' field and the volume IDs will be in the 'volumes_attached' field. $ openstack server show <server uuid> 2. Get the port according to the IP address. $ openstack port list --fixed-ip ip-address=<server ip address> 3. Get the details for the port. $ openstack port show <port uuid from step 2> 4. Look at the 'binding_host_id', this will be either the source compute host or the destination compute host. 5. If the port is showing as connected to the source host, I suggest detaching it from the source. (We will attach it again in a few steps). If the port is showing as connected to the destination, skip removal. $ openstack server remove port <port uuid> 6. Get the details for the volume(s). $ openstack volume show <volume uuid from step 1> 7. Look at the list of 'host_name' in the 'attachments' field. Are these the source compute host or the destination? 8. If the volume(s) are showing as connected to the source host, I suggest detaching them from the source. (We will attach it again in a few steps). If the volume(s) are showing as connected to the destination, skip removal. $ openstack server remove volume <volume uuid from step 1> 9. Now that everything is detached from the source, update the 'host' field in the nova.instances table for the instance to reflect the destination. (This will control which compute host the next commands will go to). If you didn't remove/detach any ports or volumes from the server, skip to step 12. 10. Re-attach the port(s). $ openstack server add port <port uuid from step 2> 11. Re-attach the volume(s). $ openstack server add volume <volume uuid from step 1> 12. Check the server resource allocations in placement and make sure they are on the destination host. (You will need to install the osc-placement package if you don't already have it). $ openstack resource provider allocation show <server uuid> $ openstack resource provider list --uuid <resource provider uuid from the previous command> 13. Look at the 'name' field in the resource provider list. Is it the source compute host or the destination compute host? 14. If it is the source compute host, add the allocations to the destination compute host. Note this is a full replacement command, so to add allocations you have to get the current allocations first, then set them back with the new allocation added to the list. $ openstack resource provider list --name <destination compute host> --allocations (Note the resource provider uuid from this command). $ openstack resource provider allocation set --allocation <rp=resource-provider-id,resource-class-name=amount-of-resource-used> --allocation <rp=resource-provider-id,resource-class-name=amount-of-resource-used> ... [1] 15. If you had to add the allocations to the destination host, delete them from the source host. Similarly this is a full replacement, so to remove allocations you have to get the current allocations, then set them back with the old allocation removed from the list. $ openstack resource provider list --name <source compute host> --allocations $ openstack resource provider allocation set --allocation <rp=resource-provider-id,resource-class-name=amount-of-resource-used> --allocation <rp=resource-provider-id,resource-class-name=amount-of-resource-used> ... [1] 16. Once the ports, volumes, placement resource allocations, and 'host' in nova.instances in the database are correctly pointing at the destination compute host, you should be done. [1] https://docs.openstack.org/osc-placement/queens/cli/index.html#resource-provider-allocation-set [2] https://docs.openstack.org/python-openstackclient/queens/cli/command-list.html