Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1806405

Summary: During migration of VM, a Hypervisor with insuffient HugePages is chosen
Product: Red Hat OpenStack Reporter: NaveenRaj Navaratna Raj <nnavarat>
Component: openstack-novaAssignee: Kashyap Chamarthy <kchamart>
Status: CLOSED INSUFFICIENT_DATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: dasmith, eglynn, gkadam, jhakimra, kchamart, rcernin, sbauza, sgordon, smooney, vkoul, vromanso
Target Milestone: ---Keywords: Reopened
Target Release: ---Flags: vkoul: needinfo-
vkoul: needinfo-
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-30 19:58:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 8 smooney 2020-03-12 22:14:44 UTC
live migration with numa toplogy is not fully supported in osp prior to 16.
all instances with hugepage allocations have a numa toplogy.
this is a known limitation and this is not considerd a bug as it was never
fully supported when hugepage support was introduced into nova.

before osp 16 the nova scheduler check that the destination host is able to spawn a new instance with
the required numa topology and hugepage/cpu pinning constraint however the numa topology filter does not
know this is a live migration. before osp 16 the libvirt virt dirver does not support regenerating the
numa/hugpeage elements during a migration meaning that  checking that the destination can spawn a new vm with the
same constraints is not sufficient. as the xml will not be modified it is not enough to assert that there is enough
free hugepages on any numa node, you have to assert that there are enough free hugepages on the destination host on the same
numa nodes the vm was assigned to on the source node. Because this additional limitation is not accounted for by the numa
toplogy filter during a live migration the scheduler can select a host weher there are enough free hugepages on different
numa nodes and in that case teh qemu error reported will be observed and the migration will fail leaving the vm in the active state.

This behaviour is expected prior to osp 16 and we have backported a check to osp version below 16 to prevent live migration of instnace
with numa typologies. in the case above to hit this error the customer has either re-enable numa migration seeting 
[workarounds]/enable_numa_live_migration=true
https://github.com/openstack/nova/blob/stable/queens/nova/conf/workarounds.py#L181-L206
or they have not upgrade to osp 13 zstream that contains 
https://code.engineering.redhat.com/gerrit/gitweb?p=nova.git;a=commit;h=9999bce00f5bea5f3e90ab9e16625d4237504bcb

the upstream release note is available here https://github.com/openstack/nova/blob/master/releasenotes/notes/disable-live-migration-with-numa-bc710a1bcde25957.yaml
as this was backported upstream and then imported into osp 13 it is not called out in the osp release notes but the documenation was updated in 
https://bugzilla.redhat.com/show_bug.cgi?id=1805828

with all of the above in mind i am closing this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1660853

*** This bug has been marked as a duplicate of bug 1660853 ***

Comment 16 Red Hat Bugzilla 2024-01-06 04:28:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days