Bug 2121441
| Summary: | NVME disk hot-plug fails due to the denial from selinux | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Han Han <hhan> |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
| libvirt sub component: | Storage | QA Contact: | Han Han <hhan> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | berrange, jdenemar, jsuchane, kanderso, lmen, lvrabec, meili, mmalik, mprivozn, ssekidde, virt-maint, xuzhang, ymankad |
| Version: | 9.1 | Keywords: | Regression, TestBlocker, Triaged, Upstream |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-8.5.0-7.el9_1 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-15 10:04:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The device file has incorrect SELinux context device_t, but I cannot reproduce the issue. In the policy the label is set correctly: rhel91# matchpathcon /dev/vfio/54 /dev/vfio/54 system_u:object_r:vfio_device_t:s0 rhel91# ll -Za /dev/vfio total 0 drwxr-xr-x. 2 root root system_u:object_r:device_t:s0 60 Sep 19 05:04 . drwxr-xr-x. 20 root root system_u:object_r:device_t:s0 3280 Sep 19 05:04 .. crw-rw-rw-. 1 root root system_u:object_r:vfio_device_t:s0 10, 196 Sep 19 05:04 vfio although there is a file transition only for vfio. What else is needed? (In reply to Zdenek Pytela from comment #2) > The device file has incorrect SELinux context device_t, but I cannot > reproduce the issue. In the policy the label is set correctly: > > rhel91# matchpathcon /dev/vfio/54 > /dev/vfio/54 system_u:object_r:vfio_device_t:s0 > > rhel91# ll -Za /dev/vfio > total 0 > drwxr-xr-x. 2 root root system_u:object_r:device_t:s0 60 Sep 19 > 05:04 . > drwxr-xr-x. 20 root root system_u:object_r:device_t:s0 3280 Sep 19 > 05:04 .. > crw-rw-rw-. 1 root root system_u:object_r:vfio_device_t:s0 10, 196 Sep 19 > 05:04 vfio > > although there is a file transition only for vfio. > > What else is needed? Nothing else I think. I also tried to attach it to vfio_pci manually by vfio_adm(https://gitlab.com/maximlevitsky/misc_tools/-/raw/master/utils/vfio_adm). The label of /dev/vfio/54 is the expected label vfio_device_t: ➜ ~ ./vfio_adm --attach --device 0000:87:00.0 --driver vfio_pci Attaching device 0000:87:00.0 to VFIO/UIO ➜ ~ ls /dev/vfio/54 -Z system_u:object_r:vfio_device_t:s0 /dev/vfio/54 Maybe the issue is caused by libvirt? Michel, could you help to have a look at it? Yeah, this is a libvirt bug. What we are seeing here is a race condition. By default, libvirt starts qemu in its own mount namespace so that it can create and manage a private /dev for it. And what's happening in this particular case is: 1) libvirt attaches NVMe disk to vfio-pci driver, 2) kernel creates /dev/vfio/X node, but with default SELinux device_t label, 3) libvirt creates a copy of /dev/vfio/X in the QEMU's private /dev (part of that is copying SELinux label), 4) SELinux wakes up, sees the new node and sets correct label (vfio_device_t), but does so only in the top level namespace, it's not propagated into the private /dev, 5) libvirt tells QEMU to attach NVMe disk, which fails because of incorrect label. Now, the problem here is that libvirt does not set SELinux label betwen steps 4) and 5) (it does so for PCI assignment, which is basically the same operation in this specific case). I could easily verify that this is a race condition by attaching gdb to virtqemud and setting a breakpoint at qemuDomainNamespaceSetupDisk() and letting the program continue once the breakpoint was hit. This gave enough time for SELinux to execute step 4) and thus libvirt created the /dev/vfio/X node with proper label. At any rate, this is a libvirt bug not a SELinux one (although one could make a strong argument that SELinux should have stepped in while the kernel was creating /dev/vfio/X node in step 2). Patch posted upstream: https://listman.redhat.com/archives/libvir-list/2022-September/234413.html commit 68e93e3180ad4e51bf9f86850dc86d8f528d6564 (HEAD -> master, origin/master, origin/HEAD)
Author: Michal Prívozník <mprivozn>
AuthorDate: Wed Sep 21 15:56:13 2022 +0200
Commit: Michal Prívozník <mprivozn>
CommitDate: Thu Sep 22 16:24:05 2022 +0200
security_selinux: Don't ignore NVMe disks when setting image label
For NVMe disks we skip setting SELinux label on corresponding
VFIO group (/dev/vfio/X). This bug is only visible with
namespaces and goes as follows:
1) libvirt assigns NVMe disk to vfio-pci driver,
2) kernel creates /dev/vfio/X node with generic device_t SELinux
label,
3) our namespace code creates the exact copy of the node in
domain's private /dev,
4) SELinux policy kicks in an changes the label on the node to
vfio_device_t (in the top most namespace),
5) libvirt tells QEMU to attach the NVMe disk, which is denied by
SELinux policy.
While one can argue that kernel should have created the
/dev/vfio/X node with the correct SELinux label from the
beginning (step 2), libvirt can't rely on that and needs to set
label on its own.
Surprisingly, I already wrote the code that aims on this specific
case (v6.0.0-rc1~241), but because of a shortcut we take earlier
it is never ran. The reason is that
virStorageSourceIsLocalStorage() considers NVMe disks as
non-local because their source is not accessible via src->path
(or even if it is, it's not a local path).
Therefore, do not exit early for NVMe disks and let the function
continue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2121441
Fixes: 284a12bae0e4cf93ea72797965d6c12e3a103f40
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Peter Krempa <pkrempa>
v8.7.0-129-g68e93e3180
Installed the build in comment6. Test nvme attach/detach for 50 times. All passed. #!/bin/bash DOM=rhel-9.2 DISK_XML=nvme.xml for i in {1..50};do virsh attach-device $DOM $DISK_XML if [ $? -ne 0 ];then echo "attach fails" break fi sleep 1 virsh detach-device $DOM $DISK_XML if [ $? -ne 0 ];then echo "detach fails" break fi sleep 1 done Results: Device attached successfully Device detached successfully ... Tested on libvirt-8.5.0-7.el9_1.x86_64 qemu-kvm-7.0.0-13.el9.x86_64 as comment10. PASS Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8003 |
Description of problem: As subject Version-Release number of selected component (if applicable): kernel-5.14.0-152.el9.x86_64 selinux-policy-34.1.40-1.el9.noarch libvirt-8.5.0-5.el9.x86_64 qemu-kvm-7.0.0-11.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a host with a NVME disk 2. Attach the disk to the running VM ➜ ~ virsh list Id Name State ---------------------- 3 rhel running ➜ ~ cat nvme.xml <disk type='nvme' device='disk'> <driver name='qemu' type='raw'/> <source type='pci' managed='yes' namespace='1'> <address domain='0x0000' bus='0x87' slot='0x00' function='0x0'/> </source> <target dev='vdb' bus='virtio'/> </disk> ➜ ~ virsh attach-device rhel nvme.xml error: Failed to attach device from nvme.xml error: internal error: unable to execute QEMU command 'blockdev-add': Failed to open VFIO group file: /dev/vfio/54: Permission denied The selinux denial message: Aug 25 05:49:36 dell-per730-37 setroubleshoot[2455]: SELinux is preventing /usr/libexec/qemu-kvm from 'read, write' accesses on the chr_file 54. For complete SELinux messages run: sealert -l 34557985-d3f4-42e4-a4f6-8e1e2a3611ce Aug 25 05:49:36 dell-per730-37 setroubleshoot[2455]: SELinux is preventing /usr/libexec/qemu-kvm from 'read, write' accesses on the chr_file 54.#012#012***** Plugin device (91.4 confidence) suggests ****************************#012#012If you want to allow qemu-kvm to have read write access on the 54 chr_file#012Then you need to change the label on 54 to a type of a similar device.#012Do#012# semanage fcontext -a -t SIMILAR_TYPE '54'#012# restorecon -v '54'#012#012***** Plugin catchall (9.59 confidence) suggests **************************#012#012If you believe that qemu-kvm should be allowed read write access on the 54 chr_file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'qemu-kvm' --raw | audit2allow -M my-qemukvm#012# semodule -X 300 -i my-qemukvm.pp#012 Actual results: As above Expected results: No selinux denial Additional info: 1. The test could pass when selinux is permissive. 2. VM could start with a NVME disk 3. The details from selaert: ➜ ~ sealert -l 34557985-d3f4-42e4-a4f6-8e1e2a3611ce SELinux is preventing /usr/libexec/qemu-kvm from 'read, write' accesses on the chr_file 54. ***** Plugin device (91.4 confidence) suggests **************************** If you want to allow qemu-kvm to have read write access on the 54 chr_file Then you need to change the label on 54 to a type of a similar device. Do # semanage fcontext -a -t SIMILAR_TYPE '54' # restorecon -v '54' ***** Plugin catchall (9.59 confidence) suggests ************************** If you believe that qemu-kvm should be allowed read write access on the 54 chr_file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'qemu-kvm' --raw | audit2allow -M my-qemukvm # semodule -X 300 -i my-qemukvm.pp Additional Information: Source Context system_u:system_r:svirt_t:s0:c5,c897 Target Context system_u:object_r:device_t:s0 Target Objects 54 [ chr_file ] Source qemu-kvm Source Path /usr/libexec/qemu-kvm Port <Unknown> Host dell-per730-37.lab.eng.pek2.redhat.com Source RPM Packages qemu-kvm-core-7.0.0-11.el9.x86_64 Target RPM Packages SELinux Policy RPM selinux-policy-targeted-34.1.40-1.el9.noarch Local Policy RPM selinux-policy-targeted-34.1.40-1.el9.noarch Selinux Enabled True Policy Type targeted Enforcing Mode Enforcing Host Name XXXX Platform Linux XXXX 5.14.0-152.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Aug 22 20:01:02 EDT 2022 x86_64 x86_64 Alert Count 1 First Seen 2022-08-25 05:49:32 EDT Last Seen 2022-08-25 05:49:32 EDT Local ID 34557985-d3f4-42e4-a4f6-8e1e2a3611ce Raw Audit Messages type=AVC msg=audit(1661420972.879:176): avc: denied { read write } for pid=2382 comm="qemu-kvm" name="54" dev="tmpfs" ino=15 scontext=system_u:system_r:svir t_t:s0:c5,c897 tcontext=system_u:object_r:device_t:s0 tclass=chr_file permissive=0 type=SYSCALL msg=audit(1661420972.879:176): arch=x86_64 syscall=openat success=no exit=EACCES a0=ffffff9c a1=55ee511f8960 a2=2 a3=0 items=0 ppid=1 pid=2382 aui d=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm=qemu-kvm exe=/usr/libexec/qemu-kvm subj=sys tem_u:system_r:svirt_t:s0:c5,c897 key=(null) Hash: qemu-kvm,svirt_t,device_t,chr_file,read,write