RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2121441 - NVME disk hot-plug fails due to the denial from selinux
Summary: NVME disk hot-plug fails due to the denial from selinux
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Han Han
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-25 13:13 UTC by Han Han
Modified: 2022-11-15 10:41 UTC (History)
13 users (show)

Fixed In Version: libvirt-8.5.0-7.el9_1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 10:04:47 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-132336 0 None None None 2022-08-25 13:24:24 UTC
Red Hat Product Errata RHSA-2022:8003 0 None None None 2022-11-15 10:04:59 UTC

Description Han Han 2022-08-25 13:13:16 UTC
Description of problem:
As subject

Version-Release number of selected component (if applicable):
kernel-5.14.0-152.el9.x86_64
selinux-policy-34.1.40-1.el9.noarch
libvirt-8.5.0-5.el9.x86_64
qemu-kvm-7.0.0-11.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a host with a NVME disk
2. Attach the disk to the running VM
➜  ~ virsh list 
 Id   Name   State
----------------------
 3    rhel   running

➜  ~ cat nvme.xml 
  <disk type='nvme' device='disk'>
    <driver name='qemu' type='raw'/>
    <source type='pci' managed='yes' namespace='1'>
      <address domain='0x0000' bus='0x87' slot='0x00' function='0x0'/>
    </source>
    <target dev='vdb' bus='virtio'/>
  </disk>


➜  ~ virsh attach-device rhel nvme.xml
error: Failed to attach device from nvme.xml
error: internal error: unable to execute QEMU command 'blockdev-add': Failed to open VFIO group file: /dev/vfio/54: Permission denied

The selinux denial message:
Aug 25 05:49:36 dell-per730-37 setroubleshoot[2455]: SELinux is preventing /usr/libexec/qemu-kvm from 'read, write' accesses on the chr_file 54. For complete SELinux messages run: sealert -l 34557985-d3f4-42e4-a4f6-8e1e2a3611ce
Aug 25 05:49:36 dell-per730-37 setroubleshoot[2455]: SELinux is preventing /usr/libexec/qemu-kvm from 'read, write' accesses on the chr_file 54.#012#012*****  Plugin device (91.4 confidence) suggests   ****************************#012#012If you want to allow qemu-kvm to have read write access on the 54 chr_file#012Then you need to change the label on 54 to a type of a similar device.#012Do#012# semanage fcontext -a -t SIMILAR_TYPE '54'#012# restorecon -v '54'#012#012*****  Plugin catchall (9.59 confidence) suggests   **************************#012#012If you believe that qemu-kvm should be allowed read write access on the 54 chr_file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'qemu-kvm' --raw | audit2allow -M my-qemukvm#012# semodule -X 300 -i my-qemukvm.pp#012

Actual results:
As above

Expected results:
No selinux denial

Additional info:
1. The test could pass when selinux is permissive.
2. VM could start with a NVME disk
3. The details from selaert:
➜  ~ sealert -l 34557985-d3f4-42e4-a4f6-8e1e2a3611ce                
SELinux is preventing /usr/libexec/qemu-kvm from 'read, write' accesses on the chr_file 54.
                                                                               
*****  Plugin device (91.4 confidence) suggests   ****************************
                                       
If you want to allow qemu-kvm to have read write access on the 54 chr_file
Then you need to change the label on 54 to a type of a similar device.
Do                                                                             
# semanage fcontext -a -t SIMILAR_TYPE '54'
# restorecon -v '54'
                                                                                                                                                               *****  Plugin catchall (9.59 confidence) suggests   **************************                                                                                 

If you believe that qemu-kvm should be allowed read write access on the 54 chr_file by default.
Then you should report this as a bug.                                                                                                                          You can generate a local policy module to allow this access.                                                                                                   Do                                                                             
allow this access for now by executing: 
# ausearch -c 'qemu-kvm' --raw | audit2allow -M my-qemukvm
# semodule -X 300 -i my-qemukvm.pp
                                       

Additional Information:
Source Context                system_u:system_r:svirt_t:s0:c5,c897
Target Context                system_u:object_r:device_t:s0
Target Objects                54 [ chr_file ]
Source                        qemu-kvm
Source Path                   /usr/libexec/qemu-kvm
Port                          <Unknown> 
Host                          dell-per730-37.lab.eng.pek2.redhat.com
Source RPM Packages           qemu-kvm-core-7.0.0-11.el9.x86_64
Target RPM Packages           
SELinux Policy RPM            selinux-policy-targeted-34.1.40-1.el9.noarch
Local Policy RPM              selinux-policy-targeted-34.1.40-1.el9.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing 
Host Name                     XXXX
Platform                      Linux XXXX
                              5.14.0-152.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Mon
                              Aug 22 20:01:02 EDT 2022 x86_64 x86_64
Alert Count                   1
First Seen                    2022-08-25 05:49:32 EDT
Last Seen                     2022-08-25 05:49:32 EDT
Local ID                      34557985-d3f4-42e4-a4f6-8e1e2a3611ce

Raw Audit Messages
type=AVC msg=audit(1661420972.879:176): avc:  denied  { read write } for  pid=2382 comm="qemu-kvm" name="54" dev="tmpfs" ino=15 scontext=system_u:system_r:svir
t_t:s0:c5,c897 tcontext=system_u:object_r:device_t:s0 tclass=chr_file permissive=0


type=SYSCALL msg=audit(1661420972.879:176): arch=x86_64 syscall=openat success=no exit=EACCES a0=ffffff9c a1=55ee511f8960 a2=2 a3=0 items=0 ppid=1 pid=2382 aui
d=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm=qemu-kvm exe=/usr/libexec/qemu-kvm subj=sys
tem_u:system_r:svirt_t:s0:c5,c897 key=(null)

Hash: qemu-kvm,svirt_t,device_t,chr_file,read,write

Comment 2 Zdenek Pytela 2022-09-19 14:44:57 UTC
The device file has incorrect SELinux context device_t, but I cannot reproduce the issue. In the policy the label is set correctly:

rhel91# matchpathcon /dev/vfio/54
/dev/vfio/54    system_u:object_r:vfio_device_t:s0

rhel91# ll -Za /dev/vfio
total 0
drwxr-xr-x.  2 root root system_u:object_r:device_t:s0           60 Sep 19 05:04 .
drwxr-xr-x. 20 root root system_u:object_r:device_t:s0         3280 Sep 19 05:04 ..
crw-rw-rw-.  1 root root system_u:object_r:vfio_device_t:s0 10, 196 Sep 19 05:04 vfio

although there is a file transition only for vfio.

What else is needed?

Comment 3 Han Han 2022-09-20 11:07:35 UTC
(In reply to Zdenek Pytela from comment #2)
> The device file has incorrect SELinux context device_t, but I cannot
> reproduce the issue. In the policy the label is set correctly:
> 
> rhel91# matchpathcon /dev/vfio/54
> /dev/vfio/54    system_u:object_r:vfio_device_t:s0
> 
> rhel91# ll -Za /dev/vfio
> total 0
> drwxr-xr-x.  2 root root system_u:object_r:device_t:s0           60 Sep 19
> 05:04 .
> drwxr-xr-x. 20 root root system_u:object_r:device_t:s0         3280 Sep 19
> 05:04 ..
> crw-rw-rw-.  1 root root system_u:object_r:vfio_device_t:s0 10, 196 Sep 19
> 05:04 vfio
> 
> although there is a file transition only for vfio.
> 
> What else is needed?

Nothing else I think. I also tried to attach it to vfio_pci manually by vfio_adm(https://gitlab.com/maximlevitsky/misc_tools/-/raw/master/utils/vfio_adm).
The label of /dev/vfio/54 is the expected label vfio_device_t:
➜  ~ ./vfio_adm --attach --device 0000:87:00.0 --driver vfio_pci
Attaching device 0000:87:00.0 to VFIO/UIO

➜  ~ ls /dev/vfio/54 -Z
system_u:object_r:vfio_device_t:s0 /dev/vfio/54

Maybe the issue is caused by libvirt? 
Michel, could you help to have a look at it?

Comment 4 Michal Privoznik 2022-09-21 13:30:53 UTC
Yeah, this is a libvirt bug. What we are seeing here is a race condition. By default, libvirt starts qemu in its own mount namespace so that it can create and manage a private /dev for it. And what's happening in this particular case is:

1) libvirt attaches NVMe disk to vfio-pci driver,
2) kernel creates /dev/vfio/X node, but with default SELinux device_t label,
3) libvirt creates a copy of /dev/vfio/X in the QEMU's private /dev (part of that is copying SELinux label),
4) SELinux wakes up, sees the new node and sets correct label (vfio_device_t), but does so only in the top level namespace, it's not propagated into the private /dev,
5) libvirt tells QEMU to attach NVMe disk, which fails because of incorrect label.

Now, the problem here is that libvirt does not set SELinux label betwen steps 4) and 5) (it does so for PCI assignment, which is basically the same operation in this specific case).

I could easily verify that this is a race condition by attaching gdb to virtqemud and setting a breakpoint at qemuDomainNamespaceSetupDisk() and letting the program continue once the breakpoint was hit. This gave enough time for SELinux to execute step 4) and thus libvirt created the /dev/vfio/X node with proper label. At any rate, this is a libvirt bug not a SELinux one (although one could make a strong argument that SELinux should have stepped in while the kernel was creating /dev/vfio/X node in step 2).

Comment 5 Michal Privoznik 2022-09-22 11:41:44 UTC
Patch posted upstream:

https://listman.redhat.com/archives/libvir-list/2022-September/234413.html

Comment 7 Michal Privoznik 2022-09-22 14:28:11 UTC
commit 68e93e3180ad4e51bf9f86850dc86d8f528d6564 (HEAD -> master, origin/master, origin/HEAD)
Author:     Michal Prívozník <mprivozn>
AuthorDate: Wed Sep 21 15:56:13 2022 +0200
Commit:     Michal Prívozník <mprivozn>
CommitDate: Thu Sep 22 16:24:05 2022 +0200

    security_selinux: Don't ignore NVMe disks when setting image label
    
    For NVMe disks we skip setting SELinux label on corresponding
    VFIO group (/dev/vfio/X). This bug is only visible with
    namespaces and goes as follows:
    
    1) libvirt assigns NVMe disk to vfio-pci driver,
    2) kernel creates /dev/vfio/X node with generic device_t SELinux
       label,
    3) our namespace code creates the exact copy of the node in
       domain's private /dev,
    4) SELinux policy kicks in an changes the label on the node to
       vfio_device_t (in the top most namespace),
    5) libvirt tells QEMU to attach the NVMe disk, which is denied by
       SELinux policy.
    
    While one can argue that kernel should have created the
    /dev/vfio/X node with the correct SELinux label from the
    beginning (step 2), libvirt can't rely on that and needs to set
    label on its own.
    
    Surprisingly, I already wrote the code that aims on this specific
    case (v6.0.0-rc1~241), but because of a shortcut we take earlier
    it is never ran. The reason is that
    virStorageSourceIsLocalStorage() considers NVMe disks as
    non-local because their source is not accessible via src->path
    (or even if it is, it's not a local path).
    
    Therefore, do not exit early for NVMe disks and let the function
    continue.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2121441
    Fixes: 284a12bae0e4cf93ea72797965d6c12e3a103f40
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Peter Krempa <pkrempa>

v8.7.0-129-g68e93e3180

Comment 10 Han Han 2022-09-23 05:42:11 UTC
Installed the build in comment6. Test nvme attach/detach for 50 times. All passed.
#!/bin/bash
DOM=rhel-9.2
DISK_XML=nvme.xml
for i in {1..50};do
  virsh attach-device $DOM $DISK_XML
  if [ $? -ne 0 ];then
    echo "attach fails"
    break
  fi
  sleep 1
  virsh detach-device $DOM $DISK_XML
  if [ $? -ne 0 ];then
    echo "detach fails"
    break
  fi
  sleep 1
done

Results:
Device attached successfully
Device detached successfully
...

Comment 17 Han Han 2022-09-27 03:01:02 UTC
Tested on libvirt-8.5.0-7.el9_1.x86_64 qemu-kvm-7.0.0-13.el9.x86_64 as comment10. PASS

Comment 19 errata-xmlrpc 2022-11-15 10:04:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8003


Note You need to log in before you can comment on or make changes to this bug.