RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1945420 - [RHEL9] Setup vm.unprivileged_userfaultfd for postcopy
Summary: [RHEL9] Setup vm.unprivileged_userfaultfd for postcopy
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: beta
: ---
Assignee: Jiri Denemark
QA Contact: Fangge Jin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-31 21:53 UTC by Peter Xu
Modified: 2022-05-17 13:02 UTC (History)
17 users (show)

Fixed In Version: libvirt-8.0.0-0rc1.1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-17 12:45:04 UTC
Type: Bug
Target Upstream Version: 8.0.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2022:2390 0 None None None 2022-05-17 12:45:18 UTC

Description Peter Xu 2021-03-31 21:53:58 UTC
RHEL9 has introduced d0d4730ac2e4 ("userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob", 2020-12-15) so that postcopy won't work by default anymore, as postcopy requries trapping page faults from kernel code too, e.g., when the faults are from kvm, vhost, etc.

We need to set vm.unprivileged_userfaultfd to 1 somehow at least during postcopy phase, or globally by adding a new config into /etc/sysctl.d, so as to make postcopy work as before.

Currently assigning it to qemu for simplicity as postcopy is a feature of qemu, however it's prone to change after we decide how to add the knob.

Comment 1 Li Xiaohui 2021-04-06 14:02:55 UTC
I could reproduce this bz by libvirt, but postcopy migration works well on qemu side on the latest rhel9:
kernel-5.11.0-2.el9.x86_64 & qemu-kvm-5.2.0-11.el9.x86_64 & libvirt-7.0.0-4.el9.x86_64


I have checked vm.unprivileged_userfaultfd and /etc/sysctl.d on both src and dst host, they are same as blew:
[root@hp-dl385g10-13 sysctl.d]# cat 99-sysctl.conf 
# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
[root@hp-dl385g10-13 sysctl.d]# cat /proc/sys/vm/unprivileged_userfaultfd 
0


Test steps:
1.Postcopy migration via libvirt:
[root@hp-dl385g10-13 home]# virsh migrate rhel8.4 qemu+ssh://10.73.130.69/system --live --verbose  --p2p --postcopy --postcopy-after-precopy
error: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported

2.Postcopy migration via qemu:
Notes: here I only show the qmp info from src host, but we could see postcopy migration succeed:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":true},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":true},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false},{"capability":"dirty-bitmaps","state":false}]},"id":"libvirt-386"}
{"return": {}, "id": "libvirt-386"}
{"execute": "migrate","arguments":{"uri": "tcp:$dst_host_ip:1234"}}
{"return": {}}
{"execute":"migrate-start-postcopy","id":"libvirt-453"}
{"return": {}, "id": "libvirt-453"}
{"timestamp": {"seconds": 1617717149, "microseconds": 815910}, "event": "STOP"}
{"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-455"}
{"return": {}, "id": "libvirt-455"}
{"execute":"query-migrate"}
{"return": {"status": "completed", "setup-time": 3, "downtime": 52068, "total-time": 129868, "ram": {"total": 4312604672, "postcopy-requests": 439, "dirty-sync-count": 25, "multifd-bytes": 0, "pages-per-second": 33584, "page-size": 4096, "remaining": 0, "mbps": 562.830863, "transferred": 9136168867, "duplicate": 651263, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 9112498176, "normal": 2224731}}}


Peter, shall we update the Component from qemu to libvirt per above contents?

Comment 2 Peter Xu 2021-04-06 14:19:57 UTC
(In reply to Li Xiaohui from comment #1)
> Peter, shall we update the Component from qemu to libvirt per above contents?

Hi, Xiaohui, did you use root when testing with QEMU?  Root will not be affected because root is prilileged.

Indeed at last we might need to fix this in libvirt, but let's temporary keep the component untouched until we settle how to fix it, so as to avoid bouncing.

Comment 3 Dr. David Alan Gilbert 2021-04-07 10:41:45 UTC
It looks like there was selinux support added in the kernel:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/MPP3RFFCSSN34TGM2YAD55PX5DKOW6RL/

so perhaps the best answer here is to wire up selinux instead; and make sure RHEL9's kernel has those selinux patches.
Adding Ondrej in.

Comment 4 Dr. David Alan Gilbert 2021-04-07 10:44:01 UTC
oh, and this would also affect dpdk when doing a postcopy migration with qemu, so the dpdk process would also need the perms if done with selinux.

Comment 5 Li Xiaohui 2021-04-07 10:55:04 UTC
(In reply to Peter Xu from comment #2)
> (In reply to Li Xiaohui from comment #1)
> > Peter, shall we update the Component from qemu to libvirt per above contents?
> 
> Hi, Xiaohui, did you use root when testing with QEMU?  Root will not be
> affected because root is prilileged.

You're right. I used root user to do postcopy migration from QEMU side, so succeeded.
I checked again to boot vm via libvirt, found it using qemu user, and same user when do postcopy migration.

> 
> Indeed at last we might need to fix this in libvirt, but let's temporary
> keep the component untouched until we settle how to fix it, so as to avoid
> bouncing.

Got it. thanks

Comment 6 Ondrej Mosnacek 2021-04-07 11:02:40 UTC
Hm... I don't think SELinux would be a solution to this. Note that under the (default) targeted policy, both privileged and unprivileged user sessions run as unconfined_t, so SELinux wouldn't mitigate the ability of unprivileged users to use usefaultfd with kernel memory... In the default configuration it mainly confines system services (which are often run as root), so it provides security at a different layer than DAC. The sysadmin would have to deliberately enable confined users across the system for SELinux to have an effect on user sessions. But this tricky to get working well, so only very few users/customers do this.

TL;DR: Adding the SELinux support wouldn't fully mitigate settng vm.unprivileged_userfaultfd to 1. It would be nice to have (and we're working on it ;), as it would prevent most system daemons from using the userfaultfd(2) syscall for evil if they get compromised somehow, but kind of orthogonal to this issue.

Comment 7 Dr. David Alan Gilbert 2021-04-07 19:21:37 UTC
(In reply to Ondrej Mosnacek from comment #6)
> Hm... I don't think SELinux would be a solution to this. Note that under the
> (default) targeted policy, both privileged and unprivileged user sessions
> run as unconfined_t, so SELinux wouldn't mitigate the ability of
> unprivileged users to use usefaultfd with kernel memory... In the default
> configuration it mainly confines system services (which are often run as
> root), so it provides security at a different layer than DAC. The sysadmin
> would have to deliberately enable confined users across the system for
> SELinux to have an effect on user sessions. But this tricky to get working
> well, so only very few users/customers do this.
> 
> TL;DR: Adding the SELinux support wouldn't fully mitigate settng
> vm.unprivileged_userfaultfd to 1. It would be nice to have (and we're
> working on it ;), as it would prevent most system daemons from using the
> userfaultfd(2) syscall for evil if they get compromised somehow, but kind of
> orthogonal to this issue.

Hmm I'm not sure I completely follow that yet; but you might find it needs fixing in libvirt's svirt support, since it crafts
an selinux policy for each VM; so you'd also want to do a live migration with postcopy via libvirt
on a domain with svirt enabled.
(Probably see Daniel Berrange for libvirt questions)

Comment 8 Andrea Arcangeli 2021-04-07 20:43:42 UTC
It'd be nice to have a more finegrined way to give unprivileged access to qemu only (be it SELinux or some new logic), but for the time being can we change libvirt to unconditionally issue a "echo 1 >/proc/sys/vm/unprivileged_userfaultfd" just before issuing the qemu postcopy monitor command?

We only support postcopy through libvirt so the above will fix the practical issues with a simple change and it'll be re-entrant so it will still work even if there are multiple libvirt daemons in the same host (i.e. CNV).

It's preferable to only set it to 1 if postcopy is actively used on the system and it doesn't need to be set to 1 before that.

Thanks,
Andrea

Comment 9 Peter Xu 2021-04-08 18:52:13 UTC
(In reply to Dr. David Alan Gilbert from comment #7)
> Hmm I'm not sure I completely follow that yet; but you might find it needs
> fixing in libvirt's svirt support, since it crafts
> an selinux policy for each VM; so you'd also want to do a live migration
> with postcopy via libvirt
> on a domain with svirt enabled.
> (Probably see Daniel Berrange for libvirt questions)

I feel like the app needs to pass both the unprivileged_userfaultfd check and also the selinux one (if there's a rule for it) to be allowed to get an userfaultfd.  However if no uffd specific selinux policy in svirt specified, it'll pass the check by default?

(In reply to Andrea Arcangeli from comment #8)
> It'd be nice to have a more finegrined way to give unprivileged access to
> qemu only (be it SELinux or some new logic), but for the time being can we
> change libvirt to unconditionally issue a "echo 1
> >/proc/sys/vm/unprivileged_userfaultfd" just before issuing the qemu
> postcopy monitor command?

I tend to agree.  If someday we prefer finer grained permission control we can either start to use selinux or extend yet another interface to /dev/userfaultfd so as to work similar as /dev/kvm, but so far echo>1 looks the simplest and working solution.

Comment 10 John Ferlan 2021-07-08 15:38:37 UTC
Assigned to Amnon for next level triage to ensure the issue/concern isn't lost.

Comment 13 Li Xiaohui 2021-07-19 14:58:03 UTC
Reproduce bz on qemu side via some changes(step 1-3) both on src&dst host:
1.# usermod -s /bin/bash qemu
2.Set qemu user to be the owner of the shared vm disk
# chown -R qemu /mnt/nfs
3.Set tap
(1)Install tunctl rpm:
# yum install http://fr2.rpmfind.net/linux/opensuse/tumbleweed/repo/oss/x86_64/tunctl-1.5-26.20.x86_64.rpm
(2)# sudo tunctl -t tap0 -u qemu
4.Boot a vm under qemu user with cmd on src host:
-device virtio-net-pci,netdev=hostnet0,id=net0,vectors=4,mac=52:54:00:6c:83:be,bus=pci.1,addr=0x0 \
-netdev tap,ifname=tap0,id=hostnet0,vhost=on,script=no,downscript=no \
5.Boot a vm under qemu user with same cmds from src but append with '-incoming defer' on dst host
6.Then migrate with postcopy enabled:


Test result:
1.Succeed to set 'postcopy-ram on' on src host:
(qemu) migrate_set_capability postcopy-ram on  
(qemu) info migrate_capabilities 
xbzrle: off
rdma-pin-all: off
auto-converge: off
zero-blocks: off
compress: off
events: off
postcopy-ram: on
2.Will get error when set 'postcopy-ram on' on dst host because userfaultfd is needed only on dst, when postcopy is enabled:
(qemu) migrate_set_capability postcopy-ram on
postcopy_ram_supported_by_host: userfaultfd not available: Operation not permitted
Error: Postcopy is not supported


Notes: vm.unprivileged_userfaultfd is 0 both on src&dst host under qemu user
[qemu@xxx home]$ # cat /proc/sys/vm/unprivileged_userfaultfd 
0

Comment 16 Jaroslav Suchanek 2021-08-24 12:13:54 UTC
Jiri, please have a look. Thanks.

Comment 17 Jiri Denemark 2021-12-10 16:57:55 UTC
Pushed upstream as

commit d804408ef9044aeb0d73b2e83fc044c5fff3c86d
Refs: v7.10.0-206-gd804408ef9
Author:     Jiri Denemark <jdenemar>
AuthorDate: Thu Dec 2 15:43:27 2021 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Dec 10 17:53:11 2021 +0100

    qemu: Enable unprivileged userfaultfd for post-copy migration

    Userfaultfd is by default allowed only for privileged processes. Since
    libvirt runs QEMU unprivileged, we need to enable unprivileged access to
    userfaultfd to enable post-copy migration.

    https://bugzilla.redhat.com/show_bug.cgi?id=1945420

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Daniel P. Berrangé <berrange>

Comment 20 Fangge Jin 2022-01-24 12:10:10 UTC
Verified with libvirt-8.0.0-1.el9.x86_64

Comment 22 errata-xmlrpc 2022-05-17 12:45:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: libvirt), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2390


Note You need to log in before you can comment on or make changes to this bug.