Bug 2120925 - Failed to migrate vm with error - unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported
Summary: Failed to migrate vm with error - unable to execute QEMU command 'migrate-set...
Keywords:
Status: CLOSED DUPLICATE of bug 2110556
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard: libvirt_OSP_INT
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-24 03:21 UTC by chhu
Modified: 2023-03-21 19:57 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-02 12:20:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-18349 0 None None None 2022-08-24 03:25:47 UTC

Description chhu 2022-08-24 03:21:36 UTC
Description of problem:
Failed to migrate vm with error - unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported

Version-Release number of selected component (if applicable):
openstack-nova-compute-23.2.2-0.20220705171705.7074ac0.el9ost.noarch
libvirt-daemon-driver-qemu-8.0.0-8.1.el9_0.x86_64
qemu-kvm-6.2.0-11.el9_0.3.x86_64
kernel: 5.14.0-70.17.1.el9_0.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Installed OSP17.0(RHEL9.0) with the job, the env was with local storage. 
custom-17.0_compact-director-rhel-9.0-virthost-3cont_2comp-ipv4-gre-lvm #35

2. Created the image, network and flavor
(overcloud) [stack@undercloud-0 ~]$ openstack image create r9-qcow2 --disk-format qcow2 --container-format bare --file RHEL-9.0.0-20220429.1-x86_64.qcow2
(overcloud) [stack@undercloud-0 ~]$ openstack image list| grep r9-qcow2
| de713510-def9-46b6-a8e6-0eecb434f644 | r9-qcow2                         | active |

(overcloud) [stack@undercloud-0 ~]$ openstack network create private
(overcloud) [stack@undercloud-0 ~]$ openstack subnet create --network private private_subnet --allocation-pool start=192.168.32.2,end=192.168.32.245 --dhcp --gateway=192.168.32.1 --subnet-range 192.168.32.0/24
(overcloud) [stack@undercloud-0 ~]$ openstack network list| grep private
| b1333c24-e41c-46a4-98ed-d185d5df6d2f | private                                            | 25793595-e580-487d-bc78-5295d2250033 |

(overcloud) [stack@undercloud-0 ~]$ openstack flavor create m1.small --ram 512 --disk 10 --vcpus 1

3. Created the VM from image successfully and it was running on compute-1. 
(overcloud) [stack@undercloud-0 ~]$ openstack server create --flavor m1.small --image r9-qcow2 --nic net-id=b1333c24-e41c-46a4-98ed-d185d5df6d2f vm-r9
(overcloud) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+-------------+--------+-----------------------------------+----------+----------+
| ID                                   | Name        | Status | Networks                          | Image    | Flavor   |
+--------------------------------------+-------------+--------+-----------------------------------+----------+----------+
| b202094d-218b-4679-9850-1f072b582cd7 | vm-r9       | ACTIVE | private=192.168.32.72             | r9-qcow2 | m1.small |
(overcloud) [stack@undercloud-0 ~]$ openstack server show vm-r9
+-------------------------------------+--------------------------------------------------------------------------------------+
| Field                               | Value                                                                                |
+-------------------------------------+--------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                               |
| OS-EXT-AZ:availability_zone         | nova                                                                                 |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                               |
| OS-EXT-SRV-ATTR:hostname            | vm-r9                                                                                |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local                                                               |
| OS-EXT-SRV-ATTR:instance_name       | instance-000001c1                                                                    |
......|
| OS-EXT-STS:power_state              | Running                                                                              |

4. Tried to live migrate VM, the command line return "Complete", the VM is still on compute-1. Check the virtqemud.log on compute-1, there is error:
"error : virNetClientProgramDispatchError:172 : internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported".
More logs in files: live-migrate-source.log, live-migrate-target.log

(overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration vm-r9 --wait
The --disk-overcommit and --no-disk-overcommit options are only supported by --os-compute-api-version 2.24 or below; this will be an error in a future release
Complete

5. Tried to live block migrate VM, the command line return "Complete", but the VM is still on compute-1. Check the virtqemud.log on compute-1, there is error:
"error : virNetClientProgramDispatchError:172 : internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported"
More logs in files: live-block-migrate-source.log, live-block-migrate-target.log

(overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration --block-migration vm-r9 --wait
The --disk-overcommit and --no-disk-overcommit options are only supported by --os-compute-api-version 2.24 or below; this will be an error in a future release
Complete

6. Check the sysctl vm.unprivileged_userfaultfd settings on compute-1 and compute-0
[root@compute-1 /]# sysctl -a|grep vm.unprivileged_userfaultfd
vm.unprivileged_userfaultfd = 0
[root@compute-0 /]# sysctl -a|grep vm.unprivileged_userfaultfd
vm.unprivileged_userfaultfd = 0

7. Set vm.unprivileged_userfaultfd = 1 on compute-1 and compute-0

8. Live block migrate VM, the command line return "Complete", the VM is migrated from compute-1 to compute-0 successfully

9. Postcopy requries trapping page faults from kernel code, in RHEL9 we need to set vm.unprivileged_userfaultfd to 1 during postcopy phase. 
Libvirt enable unprivileged access to userfaultfd before starting post-copy migration, it sets the sysctl knob in runtime once post-copy migration is requested.
- Bug 1945420 - [RHEL9] Setup vm.unprivileged_userfaultfd for postcopy: since libvirt-8.0.0-0rc1.1.el9
- [libvirt PATCH] qemu: Enable unprivileged userfaultfd for post-copy migration

Actual results:
In step4 and step5, hit error below in virtqemud.log and the VM is not migrated to target compute node.
"error : virNetClientProgramDispatchError:172 : internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported"

Expected results:
In step4 and step5: No error in virtqemud.log
In step5: VM is migrated to target compute node

Additional info:
Log files on source and target compute node, when run step4 and step5:
- live-migrate-source.log, live-migrate-target.log
- live-block-migrate-source.log, live-block-migrate-target.log

Comment 5 chhu 2022-08-24 05:52:36 UTC
Hi, Jiri

I have the testing environment, you can use it if need, will you please help to check if libvirt need to do code change or not ? Many thanks!

Comment 6 Jiri Denemark 2022-08-24 08:06:31 UTC
> Libvirt enable unprivileged access to userfaultfd before starting post-copy
> migration, it sets the sysctl knob in runtime once post-copy migration is
> requested.

The first version of the libvirt patch was implemented this way, but the final
patch which was actually pushed and is part of RHEL-9 works differently.
Libvirt just installs /usr/lib/sysctl.d/60-qemu-postcopy-migration.conf files
which systemd is supposed apply when the system boots. Can you check the file
exists and contains "vm.unprivileged_userfaultfd = 1"? Also the settings might
be overriden by something else in /usr/lib/sysctl.d/, /run/sysctl.d/, or
/etc/sysctl.d/. Can you check vm.unprivileged_userfaultfd is not set there by
anything but the libvirt's conf file? Also did you reboot the hosts after
installing libvirt? I believe sysctl files are only applied on boot.

Comment 8 Jiri Denemark 2022-08-25 12:34:46 UTC
Oh, libvirt runs in a container here. I believe the sysctl knob should be set
in the host itself rather than in a container. I guess libvirt (and the sysctl
conf file) is only installed in the container, which means openstack would
need to make sure the host is properly setup by itself.

Comment 9 chhu 2022-08-26 09:32:39 UTC
Thanks Jiri ! 

Deployed a new env with below job with latest OSP build: RHOS-17.0-RHEL-9-20220823.n.2
custom-17.0_compact-director-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph #35

Rerun the steps in Description, the error is no longer existed.
This bug is fixed in latest OSP build.

The openstack packages:
openstack-tripleo-heat-templates-14.3.1-0.20220719171722.feca772.el9ost.noarch  - no error: compute node: "vm.unprivileged_userfaultfd = 1"
openstack-tripleo-heat-templates-14.3.1-0.20220719171711.feca772.el9ost.noarch - with the error, compute node: "vm.unprivileged_userfaultfd = 0"

Check the vm.unprivileged_userfaultfd on compute-0, outside of the nova_virtqemud container:
[heat-admin@compute-0 ~]$ sudo sysctl -a|grep vm.unprivileged_userfaultfd
vm.unprivileged_userfaultfd = 1

More details:
- Step 1-3, create the VM on compute-0

Check the vm.unprivileged_userfaultfd on compute-0, outside of the nova_virtqemud container:
[heat-admin@compute-0 ~]$ sudo sysctl -a|grep vm.unprivileged_userfaultfd
vm.unprivileged_userfaultfd = 1

heat-admin@compute-0 ~]$ ls /usr/lib/sysctl.d/
10-default-yama-scope.conf  50-coredump.conf  50-default.conf  50-libkcapi-optmem_max.conf  50-pid-max.conf  50-redhat.conf  README

- Step4: Live migrate the VM successfully, VM migrated to compute-1
(overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration vm-r9 --wait
The --disk-overcommit and --no-disk-overcommit options are only supported by --os-compute-api-version 2.24 or below; this will be an error in a future release
Complete
(overcloud) [stack@undercloud-0 ~]$ openstack server show vm-r9
+-------------------------------------+--------------------------------------------------------------------------------------+
| Field                               | Value                                                                                |
+-------------------------------------+--------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                               |
| OS-EXT-AZ:availability_zone         | nova                                                                                 |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                               |
| OS-EXT-SRV-ATTR:hostname            | vm-r9                                                                                |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local

- Step5: Live block migrate VM, the VM is still running on source compute node, get expected error in "/var/log/containers/nova/nova-compute.log":
"default default] Exception during message handling: nova.exception.InvalidLocalStorage: compute-1.redhat.local is not on local storage: Block migration can not be used with shared storage."

(overcloud) [stack@undercloud-0 ~]$ openstack server migrate --live-migration --block-migration vm-r9 --wait
The --disk-overcommit and --no-disk-overcommit options are only supported by --os-compute-api-version 2.24 or below; this will be an error in a future release
Complete
(overcloud) [stack@undercloud-0 ~]$ openstack server show vm-r9
+-------------------------------------+--------------------------------------------------------------------------------------+
| Field                               | Value                                                                                |
+-------------------------------------+--------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                               |
| OS-EXT-AZ:availability_zone         | nova                                                                                 |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                               |
| OS-EXT-SRV-ATTR:hostname            | vm-r9                                                                                |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local                                                               |
| OS-EXT-SRV-ATTR:instance_name       | instance-000001ac                                                                    |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                      |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                    |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                      |
| OS-EXT-SRV-ATTR:reservation_id      | r-uzg0xobd                                                                           |
| OS-EXT-SRV-ATTR:root_device_name    | /dev/vda                                                                             |
| OS-EXT-SRV-ATTR:user_data           | None                                                                                 |
| OS-EXT-STS:power_state              | Running                                                                              |
| OS-EXT-STS:task_state               | None                                                                                 |
| OS-EXT-STS:vm_state                 | active                                                                               |

Comment 10 Bogdan Dobrelya 2022-09-02 12:20:13 UTC

*** This bug has been marked as a duplicate of bug 2110556 ***


Note You need to log in before you can comment on or make changes to this bug.