Description of problem: The client has a custom non-director deployment with hundreds of compute nodes separated between different host aggregates. When trying live migration of an instance without specifying destination host: # openstack server migrate <instance01> --live-migration an error message is triggered. However when specifying the destination hypervisor explicitly: # openstack server migrate <instance01> --live-migration --host <host01> --os-compute-api-version 2.30 the operation is successful. Inside the nova-conductor.log: 2023-02-20 14:36:01.585 32 WARNING nova.scheduler.utils |..| Setting instance to ACTIVE state.: nova.exception_Remote.InvalidSharedStorage_Remote: <host01> is not on shared storage: Shared storage live-migration requires either shared storage or boot-from-volume with no local disks. Version-Release number of selected component (if applicable): RHOSP 16.1.6 How reproducible: Live-migrate an instance. Steps to Reproduce: 1. # openstack server migrate <instance01> --live-migration Actual results: Migration failed. Expected results: Successful migration to an available host per actual workload on respective hypervisors. Additional info: We've collected the outputs from: # openstack server show <instance01> # openstack compute service list As well as Sosreports from controller nodes and computes.
I suspect this is a side effect of the use of the new microversion rather than the specifying of the destination. In microversions before 2.24, the block_migration parameter was a boolean that was either True or False. If set to False but with the instance not on shared storage, live migration fails as block migration is necessary to migrate the instance's disk. Starting with 2.25, block_migration is a string that can accept the 'auto' value, which makes Nova auto-detect the storage situation. See [1] for full details. If you retry the migration as follows: # openstack server migrate <instance01> --live-migration --os-compute-api-version 2.30 I expect it to work. [1] https://docs.openstack.org/api-ref/compute/?expanded=live-migrate-server-os-migratelive-action-detail#live-migrate-server-os-migratelive-action
I have closed this bug as it has been waiting for more info for at least 4 weeks (see my comment #14). We only do this to ensure that we don't accumulate stale bugs which can't be addressed. If you are able to provide the requested information, please feel free to re-open this bug.
So, I created a functional test for trying to see whether force_hosts/nodes and the scheduler hint '_nova_check_type' were persisted after rebuilding an instance. As you can see in https://review.opendev.org/c/openstack/nova/+/889843, no, we *don't persist* these RequestSpec fields' values so any other move operation after rebuild shouldn't have any problem. Here, then, again, I can't say anything but the fact that it's not the root cause of this BZ. Given that we're not able to reproduce the problem, that we also don't really be able to know the root cause of the live migration issue, and given it looks the customer is now happy, I'm preferring to close this bug report as NOTABUG.