RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2158704 - RFE: Prefer /dev/userfaultfd over userfaultfd(2) syscall
Summary: RFE: Prefer /dev/userfaultfd over userfaultfd(2) syscall
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.2
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Peter Xu
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On: 2158706
Blocks: 2158705
TreeView+ depends on / blocked
 
Reported: 2023-01-06 08:10 UTC by Michal Privoznik
Modified: 2023-05-09 07:56 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-7.2.0-9.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2158705 2158706 (view as bug list)
Environment:
Last Closed: 2023-05-09 07:23:43 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src qemu-kvm merge_requests 149 0 None opened Support /dev/userfaultfd 2023-02-13 23:25:51 UTC
Red Hat Issue Tracker RHELPLAN-144088 0 None None None 2023-01-06 08:11:34 UTC
Red Hat Product Errata RHSA-2023:2162 0 None None None 2023-05-09 07:24:35 UTC

Description Michal Privoznik 2023-01-06 08:10:20 UTC
Description of problem:
So far, for postcopy migration the userfaultfd(2) syscall is used. But this has couple of drawbacks (which are summarized nicely in kernel commit [1]). To resolve these, kernel came up with /dev/userfaultfd device, and this is a request to switch to that.

Please note, some scenarios where QEMU is running might be disallowing the userfaultfd(2) syscall as it is viewed as too powerful. For intsance KubeVirt [2].


1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2d5de004e009add27db76c5cdc9f1f7f7dc087e7

2: https://issues.redhat.com/browse/OCPBUGS-5031

Comment 2 John Ferlan 2023-01-09 20:31:40 UTC
I know Nitesh is taking over Live Migration shortly, but we need to consider this sooner than later  especially if OpenShift goes thru with the plan to alter the default seccomp profile. There is a "work-around" of sorts in the plan for kubevirt (https://github.com/kubevirt/kubevirt/pull/8917).

Comment 6 Li Xiaohui 2023-02-14 02:33:02 UTC
Hi Peter, 
About the verification of this bug, I think running postcopy test is ok, what do you think?

Comment 7 Peter Xu 2023-02-14 14:09:56 UTC
Xiaohui,

Thanks for raising this question.  Yes that should be enough.

To make sure you're using the new /dev/userfaultfd descriptor, you can do this to disable the userfaultfd syscall first for qemu:

[NOTE: this will not disable the whole userfaultfd syscall, but only the unprivileged kernel userfaultfd, which will stop QEMU from using it already because qemu will need that privileged uffd for handle kernel faults]
# echo 0 > /proc/sys/vm/unprivileged_userfaultfd

With above, we should already fail to boot the dest QEMU with postcopy enabled, like this:

[note: here we don't need root privilege or it won't fail]
$ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on
qemu-system-x86_64: postcopy_ram_supported_by_host: userfaultfd not available: Operation not permitted
qemu-system-x86_64: Postcopy is not supported

Or if you enable postcopy via QMP I think that should just fail the QMP command to enable postcopy.

Then, with the new kernel and have /dev/userfaultfd being there with the right permissions:

# chmod 0666 /dev/userfaultfd

One should be able to start dest QEMU successfully, like:

[note: here we don't need root privilege too to compare with above]
$ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on
qemu-system-x86_64: postcopy_ram_supported_by_host: userfaultfd not available: Operation not permitted
qemu-system-x86_64: Postcopy is not supported

With that, a simplest round of postcopy would suffice.

Thanks.

Comment 8 Peter Xu 2023-02-14 14:12:31 UTC
(In reply to Peter Xu from comment #7)
> One should be able to start dest QEMU successfully, like:
> 
> [note: here we don't need root privilege too to compare with above]
> $ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on
> qemu-system-x86_64: postcopy_ram_supported_by_host: userfaultfd not
> available: Operation not permitted
> qemu-system-x86_64: Postcopy is not supported

Sorry, it's a copy-paste error..  It should just succeed and continue here.

Comment 9 Li Xiaohui 2023-02-16 11:07:20 UTC
(In reply to Peter Xu from comment #7)
> Xiaohui,
> 
> Thanks for raising this question.  Yes that should be enough.
> 
> To make sure you're using the new /dev/userfaultfd descriptor, you can do
> this to disable the userfaultfd syscall first for qemu:
> 
> [NOTE: this will not disable the whole userfaultfd syscall, but only the
> unprivileged kernel userfaultfd, which will stop QEMU from using it already
> because qemu will need that privileged uffd for handle kernel faults]
> # echo 0 > /proc/sys/vm/unprivileged_userfaultfd
> 
> With above, we should already fail to boot the dest QEMU with postcopy
> enabled, like this:
> 
> [note: here we don't need root privilege or it won't fail]
> $ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on
> qemu-system-x86_64: postcopy_ram_supported_by_host: userfaultfd not
> available: Operation not permitted
> qemu-system-x86_64: Postcopy is not supported
> 
> Or if you enable postcopy via QMP I think that should just fail the QMP
> command to enable postcopy.
> 
> Then, with the new kernel and have /dev/userfaultfd being there with the
> right permissions:

Here, still need to disable the userfaultfd syscall?

> 
> # chmod 0666 /dev/userfaultfd

I have verified the relevant kernel bug 2158706 on kernel-5.14.0-270.el9.x86_64, in that bug, I can see the default permissions isn't 0666:
https://bugzilla.redhat.com/show_bug.cgi?id=2158706#c16

[root@dell-per7525-25 bz2158706]# ls -lt /dev/userfaultfd 
crw-------. 1 root root 10, 126 Feb 15 08:01 /dev/userfaultfd

So we must give 666 permissons to /dev/userfaultfd for postcopy migration? If not, will fail to start postcopy? 
If so, why don't we keep 666 as the default for /dev/userfaultfd?

> 
> One should be able to start dest QEMU successfully, like:
> 
> [note: here we don't need root privilege too to compare with above]
> $ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on
> 
> It should just succeed and continue here
> 
> With that, a simplest round of postcopy would suffice.
> 
> Thanks.

Thank you to help provide the test steps.

Comment 10 Peter Xu 2023-02-16 14:50:30 UTC
(In reply to Li Xiaohui from comment #9)
> So we must give 666 permissons to /dev/userfaultfd for postcopy migration?

Not really.   Here I just wanted to make sure we have permission to access the new devfile so we can test it.

> If not, will fail to start postcopy? 

Yes.

> If so, why don't we keep 666 as the default for /dev/userfaultfd?

The permission here isn't important to me - that should be managed by system admins in the future no matter what's the default values (not only permissions, but owner, group, etc.).  E.g., in production QEMU can be put into a group who always have permission to access /dev/userfaultfd, then the permission can be 0660 disallowing any process from using kernel traps freely but it'll let QEMU pass.

So IMHO here we don't need to worry about the default values (which I think should follow the whole system for any default devfile node), but whether it'll work for us as long as the permission is validated.

Thanks.
Peter

Comment 12 Yanan Fu 2023-02-20 12:45:56 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 15 Li Xiaohui 2023-02-23 03:40:46 UTC
1 ) kernel-5.14.0-270.el9.x86_64 && qemu-kvm-7.2.0-8.el9.x86_64, qemu user
[qemu@dell-per7525-26 /]$ cat /proc/sys/vm/unprivileged_userfaultfd 
0
[qemu@dell-per7525-26 /]$ /usr/libexec/qemu-kvm -cpu EPYC-Milan -monitor stdio -machine q35 -incoming defer 
(qemu) migrate_set_capability postcopy-ram on
postcopy_ram_supported_by_host: userfaultfd not available: Operation not permitted
Error: Postcopy is not supported


2 ) kernel-5.14.0-270.el9.x86_64 && qemu-kvm-7.2.0-10.el9.x86_64, qemu user
[root@dell-per7525-26 qemu-kvm-latest]# cat /proc/sys/vm/unprivileged_userfaultfd
0
[root@dell-per7525-26 qemu-kvm-latest]# ls -lt /dev/userfaultfd 
crw-rw-rw-. 1 root root 10, 126 Feb 22 08:56 /dev/userfaultfd
[qemu@dell-per7525-26 /]$ /usr/libexec/qemu-kvm -cpu EPYC-Milan -monitor stdio -machine q35 -incoming defer
(qemu) migrate_set_capability postcopy-ram on
(qemu) info migrate_capabilities 
...
postcopy-ram: on
...

3 ) kernel-5.14.0-270.el9.x86_64 && qemu-kvm-7.2.0-10.el9.x86_64, root user. Run postcopy all cases and tier 1 test loop, all pass.
[root@dell-per7525-26 ~]# ls -lt /dev/userfaultfd 
crw-------. 1 root root 10, 126 Feb 22 08:56 /dev/userfaultfd
[root@dell-per7525-25 ipa]# python3 Start2Run.py --test_requirement=VIRT_49060_x86_q35_blockdev --src_host_ip=10.73.2.80 --dst_host_ip=10.73.2.82 --share_images_dir=/mnt/xiaohli --sys_image_name=rhel920-64-virtio-scsi.qcow2 --guest_os_type=linux --firmware=ovmf --cpu_model=EPYC-Milan,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,vaes=on,vpclmulqdq=on,spec-ctrl=on,stibp=on,arch-capabilities=on,ssbd=on,cmp-legacy=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,erms=off,fsrm=off
========================= Test Requirement: VIRT-49060-X86-Q35-BLOCKDEV(Migration - x86) =========================
--> Running case(1/11): BASE-TEST-POSTCOPY-Migration basic precopy test without setting downtime and speed (4 min 36 sec)--- PASS.
--> Running case(2/11): VIRT-49062-[postcopy] Migration finishes only with postcopy under high stress (rhel only) (14 min 33 sec)--- PASS.
--> Running case(3/11): VIRT-58670-[postcopy] Cancel migration during the precopy phase (1 min 16 sec)--- PASS.
--> Running case(4/11): VIRT-58672-[postcopy] Source should recovers when fail the destination during the precopy phase (1 min 16 sec)--- PASS.
--> Running case(5/11): VIRT-85702-[postcopy] Post-copy migration with XBZRLE compression (2 min 56 sec)--- PASS.
--> Running case(6/11): VIRT-86251-[postcopy] live migration post-copy support file-backed memory (3 min 24 sec)--- PASS.
--> Running case(7/11): VIRT-93722-[postcopy]Postcopy migration with Numa pinned and Hugepage pinned guest--file backend (3 min 40 sec)--- PASS.
--> Running case(8/11): VIRT-294886-[migration] Postcopy migration recover after migrate-pause (2 min 36 sec)--- PASS.
--> Running case(9/11): RHEL-150076-[postcopy] Set postcopy migration speed(max-postcopy-bandwidth) (4 min 40 sec)--- PASS.
--> Running case(10/11): RHEL-186017-[postcopy] Basic postcopy migration (3 min 12 sec)--- PASS.
--> Running case(11/11): RHEL-189930-[postcopy] Post-copy migration with enabling auto-converge (3 min 32 sec)--- PASS.

[root@dell-per7525-25 ipa]# python3 Start2Run.py --test_requirement=tier1_q35_blockdev --src_host_ip=10.73.2.80 --dst_host_ip=10.73.2.82 --share_images_dir=/mnt/xiaohli --sys_image_name=rhel920-64-virtio-scsi.qcow2 --guest_os_type=linux --firmware=ovmf --cpu_model=EPYC-Milan,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,vaes=on,vpclmulqdq=on,spec-ctrl=on,stibp=on,arch-capabilities=on,ssbd=on,cmp-legacy=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,erms=off,fsrm=off
========================= Test Requirement: TIER1-Q35-BLOCKDEV(Migration - x86) =========================
--> Running case(1/10): RHEL-178709-[migration] Basic migration test (4 min 44 sec)--- PASS.
--> Running case(2/10): VIRT-10022-[migration] Migrate guest via a compressed file (4 min 24 sec)--- PASS.
--> Running case(3/10): VIRT-10061-[migration] Cancel a migration process with "migration_cancel" command (7 min 16 sec)--- PASS.
--> Running case(4/10): VIRT-10067-[migration] Set migration downtime (3 min 4 sec)--- PASS.
--> Running case(5/10): RHEL-186017-[postcopy] Basic postcopy migration (2 min 40 sec)--- PASS.
--> Running case(6/10): VIRT-10081-[migration][page delta compression] Check live migration statistics for xbzrle specific options (3 min 40 sec)--- PASS.
--> Running case(7/10): VIRT-48421-[auto converge] Live migration with auto converge- dynamic cpu throttling (3 min 4 sec)--- PASS.
--> Running case(8/10): VIRT-85868-[TLS]TLS encryption migration via ipv4 addr(3 min 0 sec)--- PASS.
--> Running case(9/10): VIRT-109869-[Multiple-fds] Live migration with multifd on (10 min 44 sec)--- PASS.
--> Running case(10/10): VIRT-296185-[zero copy] Zero copy migration (1 min 52 sec)--- PASS.
**********************************************************************************************

Comment 16 Li Xiaohui 2023-02-23 03:57:37 UTC
Per above Comment 15, mark this bug verified.


BTW, I think we don't need to add extra cases for this bug's change. Keeping test postcopy feature is enough. Peter, what do you think?

Comment 17 Peter Xu 2023-02-23 15:19:46 UTC
(In reply to Li Xiaohui from comment #16)
> BTW, I think we don't need to add extra cases for this bug's change. Keeping
> test postcopy feature is enough. Peter, what do you think?

Agreed.

Comment 21 errata-xmlrpc 2023-05-09 07:23:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2162


Note You need to log in before you can comment on or make changes to this bug.