Bug 2182132 - [OSP 16.1] Live migration only succeeds if destination hypervisor is specified
Summary: [OSP 16.1] Live migration only succeeds if destination hypervisor is specified
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-27 16:08 UTC by Matsvei Hauryliuk
Modified: 2023-07-31 08:39 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-31 08:39:42 UTC
Target Upstream Version:
Embargoed:
mhauryli: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-23767 0 None None None 2023-03-27 18:12:06 UTC

Description Matsvei Hauryliuk 2023-03-27 16:08:55 UTC
Description of problem:
The client has a custom non-director deployment with hundreds of compute nodes separated between different host aggregates. When trying live migration of an instance without specifying destination host:

# openstack server migrate <instance01> --live-migration

an error message is triggered. However when specifying the destination hypervisor explicitly:

# openstack server migrate <instance01> --live-migration --host <host01> --os-compute-api-version 2.30

the operation is successful.

Inside the nova-conductor.log:
2023-02-20 14:36:01.585 32 WARNING nova.scheduler.utils |..| Setting instance to ACTIVE state.: nova.exception_Remote.InvalidSharedStorage_Remote: <host01> is not on shared storage: Shared storage live-migration requires either shared storage or boot-from-volume with no local disks.

Version-Release number of selected component (if applicable):
RHOSP 16.1.6

How reproducible:
Live-migrate an instance.

Steps to Reproduce:
1. # openstack server migrate <instance01> --live-migration

Actual results:
Migration failed.

Expected results:
Successful migration to an available host per actual workload on respective hypervisors.

Additional info:
We've collected the outputs from:
# openstack server show <instance01>
# openstack compute service list

As well as Sosreports from controller nodes and computes.

Comment 2 Artom Lifshitz 2023-03-27 18:05:45 UTC
I suspect this is a side effect of the use of the new microversion rather than the specifying of the destination. In microversions before 2.24, the block_migration parameter was a boolean that was either True or False. If set to False but with the instance not on shared storage, live migration fails as block migration is necessary to migrate the instance's disk. Starting with 2.25, block_migration is a string that can accept the 'auto' value, which makes Nova auto-detect the storage situation. See [1] for full details.

If you retry the migration as follows:

# openstack server migrate <instance01> --live-migration --os-compute-api-version 2.30

I expect it to work.

[1] https://docs.openstack.org/api-ref/compute/?expanded=live-migrate-server-os-migratelive-action-detail#live-migrate-server-os-migratelive-action

Comment 17 Artom Lifshitz 2023-06-12 22:35:53 UTC
I have closed this bug as it has been waiting for more info for at least 4 weeks (see my comment #14). We only do this to ensure that we don't accumulate stale bugs which can't be addressed. If you are able to provide the requested information, please feel free to re-open this bug.

Comment 30 Sylvain Bauza 2023-07-31 08:39:42 UTC
So, I created a functional test for trying to see whether force_hosts/nodes and the scheduler hint '_nova_check_type' were persisted after rebuilding an instance.

As you can see in https://review.opendev.org/c/openstack/nova/+/889843, no, we *don't persist* these RequestSpec fields' values so any other move operation after rebuild shouldn't have any problem.
Here, then, again, I can't say anything but the fact that it's not the root cause of this BZ.

Given that we're not able to reproduce the problem, that we also don't really be able to know the root cause of the live migration issue, and given it looks the customer is now happy, I'm preferring to close this bug report as NOTABUG.


Note You need to log in before you can comment on or make changes to this bug.