Bug 2120925
Summary: | Failed to migrate vm with error - unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | chhu |
Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
Status: | CLOSED DUPLICATE | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 17.0 (Wallaby) | CC: | bdobreli, dasmith, eglynn, fjin, jdenemar, jhakimra, kchamart, mprivozn, sbauza, sgordon, vromanso |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | libvirt_OSP_INT | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-09-02 12:20:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
chhu
2022-08-24 03:21:36 UTC
Hi, Jiri I have the testing environment, you can use it if need, will you please help to check if libvirt need to do code change or not ? Many thanks! > Libvirt enable unprivileged access to userfaultfd before starting post-copy
> migration, it sets the sysctl knob in runtime once post-copy migration is
> requested.
The first version of the libvirt patch was implemented this way, but the final
patch which was actually pushed and is part of RHEL-9 works differently.
Libvirt just installs /usr/lib/sysctl.d/60-qemu-postcopy-migration.conf files
which systemd is supposed apply when the system boots. Can you check the file
exists and contains "vm.unprivileged_userfaultfd = 1"? Also the settings might
be overriden by something else in /usr/lib/sysctl.d/, /run/sysctl.d/, or
/etc/sysctl.d/. Can you check vm.unprivileged_userfaultfd is not set there by
anything but the libvirt's conf file? Also did you reboot the hosts after
installing libvirt? I believe sysctl files are only applied on boot.
Oh, libvirt runs in a container here. I believe the sysctl knob should be set in the host itself rather than in a container. I guess libvirt (and the sysctl conf file) is only installed in the container, which means openstack would need to make sure the host is properly setup by itself. Thanks Jiri ! Deployed a new env with below job with latest OSP build: RHOS-17.0-RHEL-9-20220823.n.2 custom-17.0_compact-director-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph #35 Rerun the steps in Description, the error is no longer existed. This bug is fixed in latest OSP build. The openstack packages: openstack-tripleo-heat-templates-14.3.1-0.20220719171722.feca772.el9ost.noarch - no error: compute node: "vm.unprivileged_userfaultfd = 1" openstack-tripleo-heat-templates-14.3.1-0.20220719171711.feca772.el9ost.noarch - with the error, compute node: "vm.unprivileged_userfaultfd = 0" Check the vm.unprivileged_userfaultfd on compute-0, outside of the nova_virtqemud container: [heat-admin@compute-0 ~]$ sudo sysctl -a|grep vm.unprivileged_userfaultfd vm.unprivileged_userfaultfd = 1 More details: - Step 1-3, create the VM on compute-0 Check the vm.unprivileged_userfaultfd on compute-0, outside of the nova_virtqemud container: [heat-admin@compute-0 ~]$ sudo sysctl -a|grep vm.unprivileged_userfaultfd vm.unprivileged_userfaultfd = 1 heat-admin@compute-0 ~]$ ls /usr/lib/sysctl.d/ 10-default-yama-scope.conf 50-coredump.conf 50-default.conf 50-libkcapi-optmem_max.conf 50-pid-max.conf 50-redhat.conf README - Step4: Live migrate the VM successfully, VM migrated to compute-1 (overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration vm-r9 --wait The --disk-overcommit and --no-disk-overcommit options are only supported by --os-compute-api-version 2.24 or below; this will be an error in a future release Complete (overcloud) [stack@undercloud-0 ~]$ openstack server show vm-r9 +-------------------------------------+--------------------------------------------------------------------------------------+ | Field | Value | +-------------------------------------+--------------------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute-1.redhat.local | | OS-EXT-SRV-ATTR:hostname | vm-r9 | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local - Step5: Live block migrate VM, the VM is still running on source compute node, get expected error in "/var/log/containers/nova/nova-compute.log": "default default] Exception during message handling: nova.exception.InvalidLocalStorage: compute-1.redhat.local is not on local storage: Block migration can not be used with shared storage." (overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration --block-migration vm-r9 --wait The --disk-overcommit and --no-disk-overcommit options are only supported by --os-compute-api-version 2.24 or below; this will be an error in a future release Complete (overcloud) [stack@undercloud-0 ~]$ openstack server show vm-r9 +-------------------------------------+--------------------------------------------------------------------------------------+ | Field | Value | +-------------------------------------+--------------------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute-1.redhat.local | | OS-EXT-SRV-ATTR:hostname | vm-r9 | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local | | OS-EXT-SRV-ATTR:instance_name | instance-000001ac | | OS-EXT-SRV-ATTR:kernel_id | | | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id | | | OS-EXT-SRV-ATTR:reservation_id | r-uzg0xobd | | OS-EXT-SRV-ATTR:root_device_name | /dev/vda | | OS-EXT-SRV-ATTR:user_data | None | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | *** This bug has been marked as a duplicate of bug 2110556 *** |