Bug 1742436

Summary: Update instance host and task state when post live migration fails
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: openstack-novaAssignee: Amit Uniyal <auniyal>
Status: CLOSED MIGRATED QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: low    
Version: 13.0 (Queens)CC: alifshit, aruffin, astupnik, auniyal, broose, cmuresan, dasmith, eglynn, fboboc, gkadam, jbeaudoi, jhakimra, jhardee, kchamart, lyarwood, mircea.vutcovici, msecaur, mwitt, nbourgeo, pveiga, rdiwakar, sbauza, sgordon, smooney, vcojot, vromanso
Target Milestone: gaKeywords: Triaged, ZStream
Target Release: 18.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-nova-27.1.1-18.0.20230801141713.252e660.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-01-11 14:55:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2019-08-16 19:05:18 UTC
This bug was initially created as a copy of Bug #1636102

I am copying this bug because: 



Update instance host and task state when post live migration fails

If a live migration fails during the post processing it can lead to
the instance being shutdown on the source node and left in a migrating
task state. The instance is now running on the target node so the
instance host and task state should be updated.

Comment 2 David Hill 2019-08-16 19:11:00 UTC
The VM was migrated to remote_compute.localdomain but in the database it remained on source_compute.localdomain .   The following error was seen in the compute logs:

2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [req-f0905fc1-ec77-47ec-a4ce-4324718cec3a 6828abf50d114f7cad181a7402571511 cdfe5b92c11d49fc86d242448d99f8cc - default default] [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901] Post live
 migration at destination remote_compute.localdomain failed: NeutronAdminCredentialConfigurationInvalid_Remote: Networking client is experiencing an unauthorized exception.
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901] Traceback (most recent call last):
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6384, in _post_live_migration
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]     instance, block_migration, dest)
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]   File "/usr/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 783, in post_live_migration_at_destination
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]     instance=instance, block_migration=block_migration)
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in call
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]     retry=self.retry)
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in _send
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]     timeout=timeout, retry=retry)
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]     retry=retry)
2019-08-14 12:55:18.671 1 ERROR nova.compute.manager [instance: 229f9217-67ed-4ef5-bcd8-796fa89f3901]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 550, in _send

Comment 7 Cristian Muresanu 2020-05-26 22:09:04 UTC
We have observed that If we give the destination node while triggering the live migration it works but if we just trigger live migration and let scheduler decide on the destination node its fails.

Comment 8 jhardee 2020-08-03 15:05:46 UTC
We wanted to see if there's any update on this?

Thanks team

Comment 13 Matthew Secaur 2022-03-10 21:42:42 UTC
So, it's been 2.5 years that this BZ hasn't made any progress. In the meantime, I have two customers who are waiting on this to be fixed in OSP16.1 and, as far as I can tell, OSP16.2, also.

What can we do to get more attention on this issue?

Thanks!

Comment 14 aruffin@redhat.com 2022-06-03 19:39:23 UTC
Hello,

Is there any progress on this bug?

Comment 15 Alex Stupnikov 2022-06-06 06:59:28 UTC
Hello. We are backporting fix upstream and are planning to release RHOSP 16.2 fix in one of upcoming minor releases.

Comment 16 smooney 2022-06-13 12:37:51 UTC
the fix upstream has not been merged and was deprioriteis so I'm not currently working on this.

when it lands we can likely backport to 16.2 but i don't plan to backport it to 16.1 or 13

Comment 20 aruffin@redhat.com 2022-11-03 20:14:52 UTC
Hello,

As this was merged upstream, CU is asking when and what more is needed to have this backported into 16.2

Andre

Comment 25 Red Hat Bugzilla 2024-05-11 04:25:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days