DescriptionMartin Schuppert
2021-08-11 09:05:40 UTC
+++ This bug was initially created as a clone of Bug #1986567 +++
Description of problem:
Live migration fails when migrating a BZ with cpu pinning and huge pages, e.g.:
- name: nfv_qe_base_flavor
ram: 8192
disk: 20
vcpus: 6
extra_specs:
"hw:mem_page_size": "large"
"hw:cpu_policy": "dedicated"
"hw:emulator_threads_policy": "share"
When live migration is attempted, nova-compute throws the following ERROR:
2021-07-26 10:37:15.563 7 ERROR nova.virt.libvirt.driver [-] [instance: 5bd7e565-8744-4965-85f9-61f1e2bb6b8d] Live Migration failure: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported: libvirt.libvirtError: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported
Recent job with failed results:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/nfv/view/ml2-ovn/job/DFG-nfv-16.2-director-3cont-2comp-ipv4-geneve-ovn-hci-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/7/
Compute Logs with Failure for instance 5bd7e565-8744-4965-85f9-61f1e2bb6b8d:
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-nfv-16.2-director-3cont-2comp-ipv4-geneve-ovn-hci-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/7/computehciovndpdksriov-0/var/log/containers/nova/nova-compute.log.gz
Version-Release number of selected component (if applicable):
16.2
How reproducible:
100%
Steps to Reproduce:
1. Spawn an instance with huge pages and dedicated cpu policy
2. Live migrate the instance
Actual results:
Live Migration Fails
Expected results:
Live Migration succeeds
Additional info:
--- Additional comment from Martin Schuppert on 2021-08-05 08:58:26 UTC ---
from ovs log:
2021-08-04T12:18:10.737Z|00011|dpdk|WARN|vhost-postcopy-support and mlockall are not compatible.
2021-08-04T12:18:10.737Z|00012|dpdk|INFO|POSTCOPY support for vhost-user-client disabled.
[root@computeovndpdksriov-0 ~]# ps -ef |grep ovs |grep mlock
openvsw+ 6555 1 99 Aug04 ? 1-16:15:50 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
Live migration work when set /etc/sysconfig/openvswitch --mlockall=no,
~~~
# Pass or not --mlockall option to ovs-vswitchd.
# This option should be set to "yes" or "no". The default is "yes".
# Enabling this option can avoid networking interruptions due to
# system memory pressure in extraordinary situations, such as multiple
# concurrent VM import operations.
# --mlockall=yes
~~~
OPTIONS="--mlockall=no"
Migrate all 3 types of instances work:
(overcloud) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+---------+--------+------------+-------------+------------------------+---------------------------------------+--------------------------------------+-------------+-----------+-------------------+-----------------------------------+------------+
| ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties |
+--------------------------------------+---------+--------+------------+-------------+------------------------+---------------------------------------+--------------------------------------+-------------+-----------+-------------------+-----------------------------------+------------+
| c82a8a4c-1060-4e35-a956-66346f5a5edc | t3 | ACTIVE | None | Running | dpdk-mgmt=10.10.10.121 | rhel-guest-image-7-6-210-x86-64-qcow2 | 58640e0c-2b39-4bc5-86ee-5de9cc220513 | | | nova | computeovndpdksriov-0.localdomain | |
| 95a17512-3f8f-4fef-82f5-20188e5e53f5 | t2 | ACTIVE | None | Running | dpdk-mgmt=10.10.10.159 | rhel-guest-image-7-6-210-x86-64-qcow2 | 58640e0c-2b39-4bc5-86ee-5de9cc220513 | | | nova | computeovndpdksriov-0.localdomain | |
| da1c6ed7-1aae-425d-af16-228b6b4cc930 | test-vm | ACTIVE | None | Running | dpdk-mgmt=10.10.10.198 | rhel-guest-image-7-6-210-x86-64-qcow2 | 58640e0c-2b39-4bc5-86ee-5de9cc220513 | | | nova | computeovndpdksriov-0.localdomain | |
+--------------------------------------+---------+--------+------------+-------------+------------------------+---------------------------------------+--------------------------------------+-------------+-----------+-------------------+-----------------------------------+------------+
(overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration --block-migration t3
(overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration --block-migration t2
(overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration --block-migration test-vm
(overcloud) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+---------+--------+------------+-------------+------------------------+---------------------------------------+--------------------------------------+-------------+-----------+-------------------+-----------------------------------+------------+
| ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties |
+--------------------------------------+---------+--------+------------+-------------+------------------------+---------------------------------------+--------------------------------------+-------------+-----------+-------------------+-----------------------------------+------------+
| c82a8a4c-1060-4e35-a956-66346f5a5edc | t3 | ACTIVE | None | Running | dpdk-mgmt=10.10.10.121 | rhel-guest-image-7-6-210-x86-64-qcow2 | 58640e0c-2b39-4bc5-86ee-5de9cc220513 | | | nova | computeovndpdksriov-1.localdomain | |
| 95a17512-3f8f-4fef-82f5-20188e5e53f5 | t2 | ACTIVE | None | Running | dpdk-mgmt=10.10.10.159 | rhel-guest-image-7-6-210-x86-64-qcow2 | 58640e0c-2b39-4bc5-86ee-5de9cc220513 | | | nova | computeovndpdksriov-1.localdomain | |
| da1c6ed7-1aae-425d-af16-228b6b4cc930 | test-vm | ACTIVE | None | Running | dpdk-mgmt=10.10.10.198 | rhel-guest-image-7-6-210-x86-64-qcow2 | 58640e0c-2b39-4bc5-86ee-5de9cc220513 | | | nova | computeovndpdksriov-1.localdomain | |
+--------------------------------------+---------+--------+------------+-------------+------------------------+---------------------------------------+--------------------------------------+-------------+-----------+-------------------+-----------------------------------+------------+
--- Additional comment from Martin Schuppert on 2021-08-05 09:47:15 UTC ---
LM fails as ovs gets started per default with --mlockall=yes. As a result postcopy gets disabled:
2021-08-04T12:18:10.737Z|00011|dpdk|WARN|vhost-postcopy-support and mlockall are not compatible.
2021-08-04T12:18:10.737Z|00012|dpdk|INFO|POSTCOPY support for vhost-user-client disabled.
When setting OPTIONS="--mlockall=no" in /etc/sysconfig/openvswitch and restart ovs, LM works.
As I am really not an ovs expert, the question is, do we want to
a) set OPTIONS="--mlockall=no" per default:
# Enabling this option can avoid networking interruptions due to
# system memory pressure in extraordinary situations, such as multiple
# concurrent VM import operations.
b) or only if postcopy is enabled
--- Additional comment from on 2021-08-05 13:19:35 UTC ---
so ovs should never be mlocking the guest memory.
the guest memory allocation is under the contol of nova and should not be modifed by an external source.
mlockign dpdks hugepage memory i would have thought was not required either.
the hugepages are not going to be swawped or migrate between numa nodes so mlock wont relaly help much in that case.
mlocking ovs normal memory allocations on the other had may impact the determinium of the ovs-vswitchd process but
im not sure how that correaltes to lockign the guest memory
this is the warning in the docs
"DPDK Post-copy feature requires avoiding to populate the guest memory (application must not call mlock* syscall). So enabling mlockall is incompatible with post-copy feature."
the dpdk requirement seams to be on the guest memory which ovs and dpdk shoudl not be locking. if locking is required it shold be done by nova and nova alone.
we required hugepages to be used when using vhost-user in nova so we don't allow over subscription but technically you can use dpdk and vhost-user with non hugepage backed
guests. if we had support for that in nova ovs locking the guest memory would be a high severity ovs bug as it would break our memory accounting.
if we are bing conservitive i would say we either always disable post-copy wehn using dpdk or we set -mlockall=no when post copy is enabled.
although personally im inclide to say we shoudl be setting -mlockall=no always and if you need the guest memory to be locked you should set hw:cpu_realtime.
we had also disscused adding hw:mlock in the past to allow this indepently of using the realtime extra spec but the current situation is highly questionable.
i think we should be filing a sperate bug to remove the use of -mlockall form ovs independently of what we decide for post copy or at least modifying it so it
is only mlocking ovs memory not guest memory so that postcopy works.
--- Additional comment from on 2021-08-05 13:25:45 UTC ---
by the way if a third party integration mlocked the gust memory out side of nova we would not support that installation.
we would treat that as if you modifed the libvirt xml and which will forfit all supprot for that vm.
if i have missunder stood the error message and the statement in the docs please correct me.
if mlockall is only locking ovs memory and that by its self is enought to break post-copy then that is more reaonsble.
at which point it just be come a judgment call is mlockall requried on non realtime systems.
i would assert that no its not required on non realtime systems. since post copy will not be used for realtiem systems
it think we woudl be safe to mlock when postcopy is enabled and enable mlock when post copy is not enabled.
that will give the determinium when its required (realtime hosts) and when the requirements are less stict
we benift form the much more effectince live migration support.
--- Additional comment from Ilya Maximets on 2021-08-10 11:28:13 UTC ---
It should be OK to run OVS without mlockall. But you need to bear in mind
the reason why OVS locks the memory. The original issue was that during
large VM migrations there was memory pressure created on a server, so OVS
memory got swapped leading to the network outage on the host and inability
to do anything. For the OVS with DPDK case it's, probably, not that critical,
because permanent hugepages can not be swapped, but OVS still uses a fair
amount of a regular malloced memory that can be swapped (if not locked)
leading to the network issues, especially if OVS handles the main host network.
All in all, it's OK to run without memory locking as long as you have enough
RAM and your memory is never going to be swapped.
OVS just calls mlockall(MCL_CURRENT | MCL_FUTURE) meaning that all the current
and future memory allocations will be locked.
Why it's done this way?
Simply because it's not feasible to lock every single chunk of memory.
Why the guest memory gets locked/populated?
Just because in case of a vhost-user, guest's memory is shared between OVS
and the VM. So, at the moment vhost library maps the required chunk of
the guest's memory, it gets locked due to MCL_FUTURE.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: Red Hat OpenStack Platform 16.2 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2022:0995