Description of problem: Live migration fails when migrating a BZ with cpu pinning and huge pages, e.g.: - name: nfv_qe_base_flavor ram: 8192 disk: 20 vcpus: 6 extra_specs: "hw:mem_page_size": "large" "hw:cpu_policy": "dedicated" "hw:emulator_threads_policy": "share" When live migration is attempted, nova-compute throws the following ERROR: 2021-07-26 10:37:15.563 7 ERROR nova.virt.libvirt.driver [-] [instance: 5bd7e565-8744-4965-85f9-61f1e2bb6b8d] Live Migration failure: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported: libvirt.libvirtError: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported Recent job with failed results: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/nfv/view/ml2-ovn/job/DFG-nfv-16.2-director-3cont-2comp-ipv4-geneve-ovn-hci-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/7/ Compute Logs with Failure for instance 5bd7e565-8744-4965-85f9-61f1e2bb6b8d: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-nfv-16.2-director-3cont-2comp-ipv4-geneve-ovn-hci-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/7/computehciovndpdksriov-0/var/log/containers/nova/nova-compute.log.gz Version-Release number of selected component (if applicable): 16.2 How reproducible: 100% Steps to Reproduce: 1. Spawn an instance with huge pages and dedicated cpu policy 2. Live migrate the instance Actual results: Live Migration Fails Expected results: Live Migration succeeds Additional info:
The issue described here is a lot similar to what was reported in Bug #1710687 This was flagged while validating Bug #1967130
so i spoke to the virt team about this cpu pinning and hugepageas are supporte with post copy but in a dpdk enve we need to enable support in ovs for vhost-user interfaces the error in the qemu log is "vhost-user backend not capable of postcopy" as seen in http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-nfv-16.2-director-3cont-2comp-ipv4-geneve-ovn-hci-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/7/computehciovndpdksriov-1/var/log/libvirt/qemu/instance-00000032.log.gz to enable post-copy with vhost-user in ovs-dpdk you need to set ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true https://github.com/Mellanox/OVS/blob/master/Documentation/topics/dpdk/vhost-user.rst#vhost-user-client-post-copy-live-migration-support-experimental i think we were previously missing that step in our docs and in ooo. we may or may not also need to modify /proc/sys/vm/unprivileged_userfaultfd based on one sugesttion but i dont think that is reuired. https://lwn.net/Articles/782745/ i think the short term path forward is to jsut not enable post copy by defualt for ovs-dpdk until we can propley test and automate setting "ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true" and /proc/sys/vm/unprivileged_userfaultfd kvm if required ( i dont think it is)
dgilbert found this realted bz https://bugzilla.redhat.com/show_bug.cgi?id=1565952 inf FDP 20.1 so ya this just looks like we missed enabling the feature in ovs
So we have three possible solutions here: - We can disable postcopy when DPDK is enabled for a host [1] on train-only - We land these 2 patches [2][3] to enable vhost-postcopy-support automatically when postcopy is enabled in nova. - We can document that operators should either set ovs config vhost-postcopy-support to true prior to updating OR disable postcopy in nova after the update to fix LM. If we can't land [2][3] in z0, we might then opt to just land [1] and never optimize LM for Train because [2][3] would require an OVS restart to be applied (thus impacting tenants). We can expect an OVS restart in z0 because of the RHEL8.2 > 8.4 update but we can't expect a restart for other minor updates. That would be great if QE would have cycles to test [2][3]. [1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/802764 [2] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/802760 [3] https://review.opendev.org/c/openstack/tripleo-ansible/+/802742
After discussion, we opted to automatically enable vhost-postcopy-support in OVS instead of reverting the original automation in nova-libvirt. The benefits of automatically enabling postcopy when available are important and if we can't automatically enable it before 16.2 launches, we will end up in a state where live-migration might be unstable and/or broken, most importantly on instances with hugepages because we're going to miss the reboot-window of upgrading RHEL8.2>8.4. Enabling vhost-postcopy-support requires an OVS restart to be effective. I partly tested the linked patches [1][2] (one for tripleo-ansible and one for tripleo-heat-templates) so that would be great if someone at QE could run a full 16.2 NFV deployment with DPDK and full tempest run ASAP. [1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/802760 [2] https://review.opendev.org/c/openstack/tripleo-ansible/+/802742
Train patches: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/803017 https://review.opendev.org/c/openstack/tripleo-ansible/+/803018
It should be OK to run OVS without mlockall. But you need to bear in mind the reason why OVS locks the memory. The original issue was that during large VM migrations there was memory pressure created on a server, so OVS memory got swapped leading to the network outage on the host and inability to do anything. For the OVS with DPDK case it's, probably, not that critical, because permanent hugepages can not be swapped, but OVS still uses a fair amount of a regular malloced memory that can be swapped (if not locked) leading to the network issues, especially if OVS handles the main host network. All in all, it's OK to run without memory locking as long as you have enough RAM and your memory is never going to be swapped. OVS just calls mlockall(MCL_CURRENT | MCL_FUTURE) meaning that all the current and future memory allocations will be locked. Why it's done this way? Simply because it's not feasible to lock every single chunk of memory. Why the guest memory gets locked/populated? Just because in case of a vhost-user, guest's memory is shared between OVS and the VM. So, at the moment vhost library maps the required chunk of the guest's memory, it gets locked due to MCL_FUTURE.
This BZ is verified with tripleo-ansible-0.7.1-2.20210603175839.el8ost.7.noarch cat ~/core_puddle_version RHOS-16.2-RHEL-8-20210811.n.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483
*** Bug 2033430 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days