Bug 2158704
| Summary: | RFE: Prefer /dev/userfaultfd over userfaultfd(2) syscall | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Michal Privoznik <mprivozn> | |
| Component: | qemu-kvm | Assignee: | Peter Xu <peterx> | |
| qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | alexander.lougovski, chayang, coli, jinzhao, juzhang, leobras, nilal, peterx, quintela, virt-maint | |
| Version: | 9.2 | Keywords: | FutureFeature, Triaged | |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | qemu-kvm-7.2.0-9.el9 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2158705 2158706 (view as bug list) | Environment: | ||
| Last Closed: | 2023-05-09 07:23:43 UTC | Type: | Feature Request | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2158706 | |||
| Bug Blocks: | 2158705 | |||
|
Description
Michal Privoznik
2023-01-06 08:10:20 UTC
I know Nitesh is taking over Live Migration shortly, but we need to consider this sooner than later especially if OpenShift goes thru with the plan to alter the default seccomp profile. There is a "work-around" of sorts in the plan for kubevirt (https://github.com/kubevirt/kubevirt/pull/8917). Hi Peter, About the verification of this bug, I think running postcopy test is ok, what do you think? Xiaohui, Thanks for raising this question. Yes that should be enough. To make sure you're using the new /dev/userfaultfd descriptor, you can do this to disable the userfaultfd syscall first for qemu: [NOTE: this will not disable the whole userfaultfd syscall, but only the unprivileged kernel userfaultfd, which will stop QEMU from using it already because qemu will need that privileged uffd for handle kernel faults] # echo 0 > /proc/sys/vm/unprivileged_userfaultfd With above, we should already fail to boot the dest QEMU with postcopy enabled, like this: [note: here we don't need root privilege or it won't fail] $ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on qemu-system-x86_64: postcopy_ram_supported_by_host: userfaultfd not available: Operation not permitted qemu-system-x86_64: Postcopy is not supported Or if you enable postcopy via QMP I think that should just fail the QMP command to enable postcopy. Then, with the new kernel and have /dev/userfaultfd being there with the right permissions: # chmod 0666 /dev/userfaultfd One should be able to start dest QEMU successfully, like: [note: here we don't need root privilege too to compare with above] $ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on qemu-system-x86_64: postcopy_ram_supported_by_host: userfaultfd not available: Operation not permitted qemu-system-x86_64: Postcopy is not supported With that, a simplest round of postcopy would suffice. Thanks. (In reply to Peter Xu from comment #7) > One should be able to start dest QEMU successfully, like: > > [note: here we don't need root privilege too to compare with above] > $ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on > qemu-system-x86_64: postcopy_ram_supported_by_host: userfaultfd not > available: Operation not permitted > qemu-system-x86_64: Postcopy is not supported Sorry, it's a copy-paste error.. It should just succeed and continue here. (In reply to Peter Xu from comment #7) > Xiaohui, > > Thanks for raising this question. Yes that should be enough. > > To make sure you're using the new /dev/userfaultfd descriptor, you can do > this to disable the userfaultfd syscall first for qemu: > > [NOTE: this will not disable the whole userfaultfd syscall, but only the > unprivileged kernel userfaultfd, which will stop QEMU from using it already > because qemu will need that privileged uffd for handle kernel faults] > # echo 0 > /proc/sys/vm/unprivileged_userfaultfd > > With above, we should already fail to boot the dest QEMU with postcopy > enabled, like this: > > [note: here we don't need root privilege or it won't fail] > $ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on > qemu-system-x86_64: postcopy_ram_supported_by_host: userfaultfd not > available: Operation not permitted > qemu-system-x86_64: Postcopy is not supported > > Or if you enable postcopy via QMP I think that should just fail the QMP > command to enable postcopy. > > Then, with the new kernel and have /dev/userfaultfd being there with the > right permissions: Here, still need to disable the userfaultfd syscall? > > # chmod 0666 /dev/userfaultfd I have verified the relevant kernel bug 2158706 on kernel-5.14.0-270.el9.x86_64, in that bug, I can see the default permissions isn't 0666: https://bugzilla.redhat.com/show_bug.cgi?id=2158706#c16 [root@dell-per7525-25 bz2158706]# ls -lt /dev/userfaultfd crw-------. 1 root root 10, 126 Feb 15 08:01 /dev/userfaultfd So we must give 666 permissons to /dev/userfaultfd for postcopy migration? If not, will fail to start postcopy? If so, why don't we keep 666 as the default for /dev/userfaultfd? > > One should be able to start dest QEMU successfully, like: > > [note: here we don't need root privilege too to compare with above] > $ ./qemu-system-x86_64 -incoming defer -global migration.x-postcopy-ram=on > > It should just succeed and continue here > > With that, a simplest round of postcopy would suffice. > > Thanks. Thank you to help provide the test steps. (In reply to Li Xiaohui from comment #9) > So we must give 666 permissons to /dev/userfaultfd for postcopy migration? Not really. Here I just wanted to make sure we have permission to access the new devfile so we can test it. > If not, will fail to start postcopy? Yes. > If so, why don't we keep 666 as the default for /dev/userfaultfd? The permission here isn't important to me - that should be managed by system admins in the future no matter what's the default values (not only permissions, but owner, group, etc.). E.g., in production QEMU can be put into a group who always have permission to access /dev/userfaultfd, then the permission can be 0660 disallowing any process from using kernel traps freely but it'll let QEMU pass. So IMHO here we don't need to worry about the default values (which I think should follow the whole system for any default devfile node), but whether it'll work for us as long as the permission is validated. Thanks. Peter QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. 1 ) kernel-5.14.0-270.el9.x86_64 && qemu-kvm-7.2.0-8.el9.x86_64, qemu user [qemu@dell-per7525-26 /]$ cat /proc/sys/vm/unprivileged_userfaultfd 0 [qemu@dell-per7525-26 /]$ /usr/libexec/qemu-kvm -cpu EPYC-Milan -monitor stdio -machine q35 -incoming defer (qemu) migrate_set_capability postcopy-ram on postcopy_ram_supported_by_host: userfaultfd not available: Operation not permitted Error: Postcopy is not supported 2 ) kernel-5.14.0-270.el9.x86_64 && qemu-kvm-7.2.0-10.el9.x86_64, qemu user [root@dell-per7525-26 qemu-kvm-latest]# cat /proc/sys/vm/unprivileged_userfaultfd 0 [root@dell-per7525-26 qemu-kvm-latest]# ls -lt /dev/userfaultfd crw-rw-rw-. 1 root root 10, 126 Feb 22 08:56 /dev/userfaultfd [qemu@dell-per7525-26 /]$ /usr/libexec/qemu-kvm -cpu EPYC-Milan -monitor stdio -machine q35 -incoming defer (qemu) migrate_set_capability postcopy-ram on (qemu) info migrate_capabilities ... postcopy-ram: on ... 3 ) kernel-5.14.0-270.el9.x86_64 && qemu-kvm-7.2.0-10.el9.x86_64, root user. Run postcopy all cases and tier 1 test loop, all pass. [root@dell-per7525-26 ~]# ls -lt /dev/userfaultfd crw-------. 1 root root 10, 126 Feb 22 08:56 /dev/userfaultfd [root@dell-per7525-25 ipa]# python3 Start2Run.py --test_requirement=VIRT_49060_x86_q35_blockdev --src_host_ip=10.73.2.80 --dst_host_ip=10.73.2.82 --share_images_dir=/mnt/xiaohli --sys_image_name=rhel920-64-virtio-scsi.qcow2 --guest_os_type=linux --firmware=ovmf --cpu_model=EPYC-Milan,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,vaes=on,vpclmulqdq=on,spec-ctrl=on,stibp=on,arch-capabilities=on,ssbd=on,cmp-legacy=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,erms=off,fsrm=off ========================= Test Requirement: VIRT-49060-X86-Q35-BLOCKDEV(Migration - x86) ========================= --> Running case(1/11): BASE-TEST-POSTCOPY-Migration basic precopy test without setting downtime and speed (4 min 36 sec)--- PASS. --> Running case(2/11): VIRT-49062-[postcopy] Migration finishes only with postcopy under high stress (rhel only) (14 min 33 sec)--- PASS. --> Running case(3/11): VIRT-58670-[postcopy] Cancel migration during the precopy phase (1 min 16 sec)--- PASS. --> Running case(4/11): VIRT-58672-[postcopy] Source should recovers when fail the destination during the precopy phase (1 min 16 sec)--- PASS. --> Running case(5/11): VIRT-85702-[postcopy] Post-copy migration with XBZRLE compression (2 min 56 sec)--- PASS. --> Running case(6/11): VIRT-86251-[postcopy] live migration post-copy support file-backed memory (3 min 24 sec)--- PASS. --> Running case(7/11): VIRT-93722-[postcopy]Postcopy migration with Numa pinned and Hugepage pinned guest--file backend (3 min 40 sec)--- PASS. --> Running case(8/11): VIRT-294886-[migration] Postcopy migration recover after migrate-pause (2 min 36 sec)--- PASS. --> Running case(9/11): RHEL-150076-[postcopy] Set postcopy migration speed(max-postcopy-bandwidth) (4 min 40 sec)--- PASS. --> Running case(10/11): RHEL-186017-[postcopy] Basic postcopy migration (3 min 12 sec)--- PASS. --> Running case(11/11): RHEL-189930-[postcopy] Post-copy migration with enabling auto-converge (3 min 32 sec)--- PASS. [root@dell-per7525-25 ipa]# python3 Start2Run.py --test_requirement=tier1_q35_blockdev --src_host_ip=10.73.2.80 --dst_host_ip=10.73.2.82 --share_images_dir=/mnt/xiaohli --sys_image_name=rhel920-64-virtio-scsi.qcow2 --guest_os_type=linux --firmware=ovmf --cpu_model=EPYC-Milan,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,vaes=on,vpclmulqdq=on,spec-ctrl=on,stibp=on,arch-capabilities=on,ssbd=on,cmp-legacy=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,erms=off,fsrm=off ========================= Test Requirement: TIER1-Q35-BLOCKDEV(Migration - x86) ========================= --> Running case(1/10): RHEL-178709-[migration] Basic migration test (4 min 44 sec)--- PASS. --> Running case(2/10): VIRT-10022-[migration] Migrate guest via a compressed file (4 min 24 sec)--- PASS. --> Running case(3/10): VIRT-10061-[migration] Cancel a migration process with "migration_cancel" command (7 min 16 sec)--- PASS. --> Running case(4/10): VIRT-10067-[migration] Set migration downtime (3 min 4 sec)--- PASS. --> Running case(5/10): RHEL-186017-[postcopy] Basic postcopy migration (2 min 40 sec)--- PASS. --> Running case(6/10): VIRT-10081-[migration][page delta compression] Check live migration statistics for xbzrle specific options (3 min 40 sec)--- PASS. --> Running case(7/10): VIRT-48421-[auto converge] Live migration with auto converge- dynamic cpu throttling (3 min 4 sec)--- PASS. --> Running case(8/10): VIRT-85868-[TLS]TLS encryption migration via ipv4 addr(3 min 0 sec)--- PASS. --> Running case(9/10): VIRT-109869-[Multiple-fds] Live migration with multifd on (10 min 44 sec)--- PASS. --> Running case(10/10): VIRT-296185-[zero copy] Zero copy migration (1 min 52 sec)--- PASS. ********************************************************************************************** Per above Comment 15, mark this bug verified. BTW, I think we don't need to add extra cases for this bug's change. Keeping test postcopy feature is enough. Peter, what do you think? (In reply to Li Xiaohui from comment #16) > BTW, I think we don't need to add extra cases for this bug's change. Keeping > test postcopy feature is enough. Peter, what do you think? Agreed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2162 |