Bug 1945420
Summary: | [RHEL9] Setup vm.unprivileged_userfaultfd for postcopy | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Peter Xu <peterx> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
libvirt sub component: | General | QA Contact: | Fangge Jin <fjin> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | aarcange, ailan, chayang, dgilbert, fjin, jdenemar, jsuchane, lcheng, omosnace, peterx, pezhang, smitterl, tstaudt, virt-maint, virt-qe-z, xuzhang, zpytela |
Version: | 9.0 | Keywords: | Regression |
Target Milestone: | beta | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-8.0.0-0rc1.1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-17 12:45:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 8.0.0 |
Embargoed: |
Description
Peter Xu
2021-03-31 21:53:58 UTC
I could reproduce this bz by libvirt, but postcopy migration works well on qemu side on the latest rhel9: kernel-5.11.0-2.el9.x86_64 & qemu-kvm-5.2.0-11.el9.x86_64 & libvirt-7.0.0-4.el9.x86_64 I have checked vm.unprivileged_userfaultfd and /etc/sysctl.d on both src and dst host, they are same as blew: [root@hp-dl385g10-13 sysctl.d]# cat 99-sysctl.conf # sysctl settings are defined through files in # /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/. # # Vendors settings live in /usr/lib/sysctl.d/. # To override a whole file, create a new file with the same in # /etc/sysctl.d/ and put new settings there. To override # only specific settings, add a file with a lexically later # name in /etc/sysctl.d/ and put new settings there. # # For more information, see sysctl.conf(5) and sysctl.d(5). [root@hp-dl385g10-13 sysctl.d]# cat /proc/sys/vm/unprivileged_userfaultfd 0 Test steps: 1.Postcopy migration via libvirt: [root@hp-dl385g10-13 home]# virsh migrate rhel8.4 qemu+ssh://10.73.130.69/system --live --verbose --p2p --postcopy --postcopy-after-precopy error: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported 2.Postcopy migration via qemu: Notes: here I only show the qmp info from src host, but we could see postcopy migration succeed: {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":true},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":true},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false},{"capability":"dirty-bitmaps","state":false}]},"id":"libvirt-386"} {"return": {}, "id": "libvirt-386"} {"execute": "migrate","arguments":{"uri": "tcp:$dst_host_ip:1234"}} {"return": {}} {"execute":"migrate-start-postcopy","id":"libvirt-453"} {"return": {}, "id": "libvirt-453"} {"timestamp": {"seconds": 1617717149, "microseconds": 815910}, "event": "STOP"} {"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-455"} {"return": {}, "id": "libvirt-455"} {"execute":"query-migrate"} {"return": {"status": "completed", "setup-time": 3, "downtime": 52068, "total-time": 129868, "ram": {"total": 4312604672, "postcopy-requests": 439, "dirty-sync-count": 25, "multifd-bytes": 0, "pages-per-second": 33584, "page-size": 4096, "remaining": 0, "mbps": 562.830863, "transferred": 9136168867, "duplicate": 651263, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 9112498176, "normal": 2224731}}} Peter, shall we update the Component from qemu to libvirt per above contents? (In reply to Li Xiaohui from comment #1) > Peter, shall we update the Component from qemu to libvirt per above contents? Hi, Xiaohui, did you use root when testing with QEMU? Root will not be affected because root is prilileged. Indeed at last we might need to fix this in libvirt, but let's temporary keep the component untouched until we settle how to fix it, so as to avoid bouncing. It looks like there was selinux support added in the kernel: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/MPP3RFFCSSN34TGM2YAD55PX5DKOW6RL/ so perhaps the best answer here is to wire up selinux instead; and make sure RHEL9's kernel has those selinux patches. Adding Ondrej in. oh, and this would also affect dpdk when doing a postcopy migration with qemu, so the dpdk process would also need the perms if done with selinux. (In reply to Peter Xu from comment #2) > (In reply to Li Xiaohui from comment #1) > > Peter, shall we update the Component from qemu to libvirt per above contents? > > Hi, Xiaohui, did you use root when testing with QEMU? Root will not be > affected because root is prilileged. You're right. I used root user to do postcopy migration from QEMU side, so succeeded. I checked again to boot vm via libvirt, found it using qemu user, and same user when do postcopy migration. > > Indeed at last we might need to fix this in libvirt, but let's temporary > keep the component untouched until we settle how to fix it, so as to avoid > bouncing. Got it. thanks Hm... I don't think SELinux would be a solution to this. Note that under the (default) targeted policy, both privileged and unprivileged user sessions run as unconfined_t, so SELinux wouldn't mitigate the ability of unprivileged users to use usefaultfd with kernel memory... In the default configuration it mainly confines system services (which are often run as root), so it provides security at a different layer than DAC. The sysadmin would have to deliberately enable confined users across the system for SELinux to have an effect on user sessions. But this tricky to get working well, so only very few users/customers do this. TL;DR: Adding the SELinux support wouldn't fully mitigate settng vm.unprivileged_userfaultfd to 1. It would be nice to have (and we're working on it ;), as it would prevent most system daemons from using the userfaultfd(2) syscall for evil if they get compromised somehow, but kind of orthogonal to this issue. (In reply to Ondrej Mosnacek from comment #6) > Hm... I don't think SELinux would be a solution to this. Note that under the > (default) targeted policy, both privileged and unprivileged user sessions > run as unconfined_t, so SELinux wouldn't mitigate the ability of > unprivileged users to use usefaultfd with kernel memory... In the default > configuration it mainly confines system services (which are often run as > root), so it provides security at a different layer than DAC. The sysadmin > would have to deliberately enable confined users across the system for > SELinux to have an effect on user sessions. But this tricky to get working > well, so only very few users/customers do this. > > TL;DR: Adding the SELinux support wouldn't fully mitigate settng > vm.unprivileged_userfaultfd to 1. It would be nice to have (and we're > working on it ;), as it would prevent most system daemons from using the > userfaultfd(2) syscall for evil if they get compromised somehow, but kind of > orthogonal to this issue. Hmm I'm not sure I completely follow that yet; but you might find it needs fixing in libvirt's svirt support, since it crafts an selinux policy for each VM; so you'd also want to do a live migration with postcopy via libvirt on a domain with svirt enabled. (Probably see Daniel Berrange for libvirt questions) It'd be nice to have a more finegrined way to give unprivileged access to qemu only (be it SELinux or some new logic), but for the time being can we change libvirt to unconditionally issue a "echo 1 >/proc/sys/vm/unprivileged_userfaultfd" just before issuing the qemu postcopy monitor command? We only support postcopy through libvirt so the above will fix the practical issues with a simple change and it'll be re-entrant so it will still work even if there are multiple libvirt daemons in the same host (i.e. CNV). It's preferable to only set it to 1 if postcopy is actively used on the system and it doesn't need to be set to 1 before that. Thanks, Andrea (In reply to Dr. David Alan Gilbert from comment #7) > Hmm I'm not sure I completely follow that yet; but you might find it needs > fixing in libvirt's svirt support, since it crafts > an selinux policy for each VM; so you'd also want to do a live migration > with postcopy via libvirt > on a domain with svirt enabled. > (Probably see Daniel Berrange for libvirt questions) I feel like the app needs to pass both the unprivileged_userfaultfd check and also the selinux one (if there's a rule for it) to be allowed to get an userfaultfd. However if no uffd specific selinux policy in svirt specified, it'll pass the check by default? (In reply to Andrea Arcangeli from comment #8) > It'd be nice to have a more finegrined way to give unprivileged access to > qemu only (be it SELinux or some new logic), but for the time being can we > change libvirt to unconditionally issue a "echo 1 > >/proc/sys/vm/unprivileged_userfaultfd" just before issuing the qemu > postcopy monitor command? I tend to agree. If someday we prefer finer grained permission control we can either start to use selinux or extend yet another interface to /dev/userfaultfd so as to work similar as /dev/kvm, but so far echo>1 looks the simplest and working solution. Assigned to Amnon for next level triage to ensure the issue/concern isn't lost. Reproduce bz on qemu side via some changes(step 1-3) both on src&dst host: 1.# usermod -s /bin/bash qemu 2.Set qemu user to be the owner of the shared vm disk # chown -R qemu /mnt/nfs 3.Set tap (1)Install tunctl rpm: # yum install http://fr2.rpmfind.net/linux/opensuse/tumbleweed/repo/oss/x86_64/tunctl-1.5-26.20.x86_64.rpm (2)# sudo tunctl -t tap0 -u qemu 4.Boot a vm under qemu user with cmd on src host: -device virtio-net-pci,netdev=hostnet0,id=net0,vectors=4,mac=52:54:00:6c:83:be,bus=pci.1,addr=0x0 \ -netdev tap,ifname=tap0,id=hostnet0,vhost=on,script=no,downscript=no \ 5.Boot a vm under qemu user with same cmds from src but append with '-incoming defer' on dst host 6.Then migrate with postcopy enabled: Test result: 1.Succeed to set 'postcopy-ram on' on src host: (qemu) migrate_set_capability postcopy-ram on (qemu) info migrate_capabilities xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on 2.Will get error when set 'postcopy-ram on' on dst host because userfaultfd is needed only on dst, when postcopy is enabled: (qemu) migrate_set_capability postcopy-ram on postcopy_ram_supported_by_host: userfaultfd not available: Operation not permitted Error: Postcopy is not supported Notes: vm.unprivileged_userfaultfd is 0 both on src&dst host under qemu user [qemu@xxx home]$ # cat /proc/sys/vm/unprivileged_userfaultfd 0 Jiri, please have a look. Thanks. Pushed upstream as commit d804408ef9044aeb0d73b2e83fc044c5fff3c86d Refs: v7.10.0-206-gd804408ef9 Author: Jiri Denemark <jdenemar> AuthorDate: Thu Dec 2 15:43:27 2021 +0100 Commit: Jiri Denemark <jdenemar> CommitDate: Fri Dec 10 17:53:11 2021 +0100 qemu: Enable unprivileged userfaultfd for post-copy migration Userfaultfd is by default allowed only for privileged processes. Since libvirt runs QEMU unprivileged, we need to enable unprivileged access to userfaultfd to enable post-copy migration. https://bugzilla.redhat.com/show_bug.cgi?id=1945420 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Daniel P. Berrangé <berrange> Verified with libvirt-8.0.0-1.el9.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: libvirt), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2390 |