Bug 1628892

Summary: Permission denied when start guest with egl-headless display
Product: Red Hat Enterprise Linux 7 Reporter: yafu <yafu>
Component: libvirtAssignee: Erik Skultety <eskultet>
Status: CLOSED ERRATA QA Contact: yafu <yafu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: dyuan, eskultet, fjin, jdenemar, lmen, redhat-bz, xuzhang
Target Milestone: rc   
Target Release: 7.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-14.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1644567 (view as bug list) Environment:
Last Closed: 2019-08-06 13:13:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1648236, 1652871, 1671594    
Bug Blocks: 1644567    

Description yafu 2018-09-14 09:29:16 UTC
Description of problem:
Permission denied when start guest with egl-headless display

Version-Release number of selected component (if applicable):
libvirt-4.4.0-9.el7.x86_64
qemu-kvm-rhev-2.12.0-14.el7.x86_64
selinux-policy-3.13.1-224.el7.noarch
kernel-3.10.0-940.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Add '/dev/dri/renderD128' in cgroup_device_acl in qemu.conf and restart libvirtd service:
#cat /etc/libvirt/qemu.conf
cgroup_device_acl = [
    "/dev/null", "/dev/full", "/dev/zero",
    "/dev/random", "/dev/urandom",
    "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
    "/dev/rtc","/dev/hpet", "/dev/sev",
    "/dev/dri/renderD128"
]

#systemctl restart libvirtd

2.Create mdev device;

3.Define a guest with egl-headless display:
<graphics type='egl-headless'/>
    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on'>
      <source>
        <address uuid='d5b66366-625c-4632-ab15-6ef3a54666a8'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>

4.Start the guest:
error: Failed to start domain rhel7.6
error: internal error: process exited while connecting to monitor: 2018-09-06T06:46:38.062062Z qemu-kvm: egl: no drm render node available
2018-09-06T06:46:38.062102Z qemu-kvm: egl: render node init failed

5.Change '/dev/dri/renderD128' owner and group to 'qemu:qemu":
#chown qemu:qemu /dev/dri/renderD128

6.Start the guest again:
# virsh start rhel7.6
Domain rhel7.6 started

Actual results:
Failed to start guest since qemu process having no DAC permission of file /dev/dri/renderD128

Expected results:
Should start guest successfully with egl-headless display.

Additional info:

Comment 2 Erik Skultety 2018-11-19 13:03:52 UTC
QEMU patches to add 'rendernode' to cmdline are in place as of:
commit 91e61947eb2be21b00091d34f5692f89cef41376
Author:     Erik Skultety <eskultet>
AuthorDate: Fri Nov 16 11:14:43 2018 +0100
Commit:     Gerd Hoffmann <kraxel>
CommitDate: Fri Nov 16 11:44:22 2018 +0100

    ui: Allow specifying 'rendernode' display option for egl-headless
    
    As libvirt can't predict which rendernode QEMU would pick, it
    won't adjust the permissions on the device, hence QEMU getting
    "Permission denied" when opening the DRI device. Therefore, enable
    'rendernode' option for egl-headless display type.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1648236
    
    Signed-off-by: Erik Skultety <eskultet>
    Message-id: 27f4617f19aa1072114f10f1aa9dd199735ef982.1542362949.git.eskultet
    Signed-off-by: Gerd Hoffmann <kraxel>

However, there doesn't seem to be possible to introspect the corresponding data structure within QAPI schema, so any libvirt patches to fix this will depend on resolving that issue first.

Comment 3 Erik Skultety 2018-12-03 14:10:52 UTC
Upstream fix:
commit 3163de7d0e7ae92b9f3e06479c8cd46e43ac8058
Refs: v4.10.0-17-g3163de7d0e
Author:     Erik Skultety <eskultet>
AuthorDate: Thu Nov 15 11:38:00 2018 +0100
Commit:     Erik Skultety <eskultet>
CommitDate: Mon Dec 3 14:58:31 2018 +0100

    qemu: command: gfx: egl-headless: Add 'rendernode' option to the cmdline

    Depending on whether QEMU actually supports the option, we can put the
    'rendernode' on the '-display egl-headless' cmdline.

    https://bugzilla.redhat.com/show_bug.cgi?id=1628892

    Signed-off-by: Erik Skultety <eskultet>
    Reviewed-by: Ján Tomko <jtomko>

Comment 4 Erik Skultety 2018-12-07 09:16:47 UTC
The upstream fix above introduced another bug where libvirt is trying to pick a default DRI renderer for SPICE even if the XML doesn't have OpenGL turned on using <gl enable='yes'/>. This is a problem on a headless server without DRI because it fails to start a domain which previously started just fine.

The following scenario needs to be verified:
1. prepare a VM with the following in the XML:
  <graphics type='spice'>
    <listen type='none'/>
  </graphics>
2. rename all /dev/dri/renderDX devices (skip if your host doesn't have DRI)
3. start the VM using virsh

Expected result: the domain starts successfully

Comment 5 Erik Skultety 2018-12-11 14:18:36 UTC
The issue mentioned in comment 4 is fixed upstream by:

commit 1215195fd882efac47c07c16bfff0ad9a33c45a3
Refs: v4.10.0-43-g1215195fd8
Author:     Erik Skultety <eskultet>
AuthorDate: Thu Dec 6 16:12:14 2018 +0100
Commit:     Erik Skultety <eskultet>
CommitDate: Tue Dec 11 15:15:17 2018 +0100

    domain: conf: graphics: Fix picking DRI renderer automatically for SPICE

    Commit 255e0732 introduced a few graphics-related helpers. The problem
    is that virDomainGraphicsNeedsAutoRenderNode returns true if it gets
    NULL as a response from virDomainGraphicsNeedsAutoRenderNode. That's
    okay for egl-headless because that one always needs a DRM render node,
    the same is not true for SPICE though, and unless the XML specifies
    <gl enable='yes'> for SPICE, there's no need for any renderer.

    Signed-off-by: Erik Skultety <eskultet>
    Reviewed-by: Ján Tomko <jtomko>

Comment 6 yafu 2019-01-16 08:08:44 UTC
Hi, Erik,

It needs to add DRI renderer device in the "cgroup_device_acl" in qemu.conf now. Does libvirt need to add  cgroup acl for DRI renderer automatically?

Comment 7 Erik Skultety 2019-01-16 11:42:23 UTC
(In reply to yafu from comment #6)
> Hi, Erik,
> 
> It needs to add DRI renderer device in the "cgroup_device_acl" in qemu.conf
> now. Does libvirt need to add  cgroup acl for DRI renderer automatically?

I'm not sure I understand the question, libvirt already adds the DRI render node to the cgroup automatically. What issues are you experiencing?

Comment 8 yafu 2019-01-17 02:25:37 UTC
(In reply to Erik Skultety from comment #7)
> (In reply to yafu from comment #6)
> > Hi, Erik,
> > 
> > It needs to add DRI renderer device in the "cgroup_device_acl" in qemu.conf
> > now. Does libvirt need to add  cgroup acl for DRI renderer automatically?
> 
> I'm not sure I understand the question, libvirt already adds the DRI render
> node to the cgroup automatically. What issues are you experiencing?

For now, we need to add DRI renderer device in the 'cgroup device acl' in qemu.conf as follows:
1.Add '/dev/dri/renderD128' in cgroup_device_acl in qemu.conf and restart libvirtd service:
#cat /etc/libvirt/qemu.conf
cgroup_device_acl = [
    "/dev/null", "/dev/full", "/dev/zero",
    "/dev/random", "/dev/urandom",
    "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
    "/dev/rtc","/dev/hpet", "/dev/sev",
    "/dev/dri/renderD128"
]

#systemctl restart libvirtd

otherwise, the guest with egl-headless display will fail to start since permission denied.

Comment 9 Erik Skultety 2019-01-17 07:54:35 UTC
(In reply to yafu from comment #8)
> (In reply to Erik Skultety from comment #7)
> > (In reply to yafu from comment #6)
> > > Hi, Erik,
> > > 
> > > It needs to add DRI renderer device in the "cgroup_device_acl" in qemu.conf
> > > now. Does libvirt need to add  cgroup acl for DRI renderer automatically?
> > 
> > I'm not sure I understand the question, libvirt already adds the DRI render
> > node to the cgroup automatically. What issues are you experiencing?
> 
> For now, we need to add DRI renderer device in the 'cgroup device acl' in
> qemu.conf as follows:
> 1.Add '/dev/dri/renderD128' in cgroup_device_acl in qemu.conf and restart
> libvirtd service:
> #cat /etc/libvirt/qemu.conf
> cgroup_device_acl = [
>     "/dev/null", "/dev/full", "/dev/zero",
>     "/dev/random", "/dev/urandom",
>     "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
>     "/dev/rtc","/dev/hpet", "/dev/sev",
>     "/dev/dri/renderD128"
> ]
> 
> #systemctl restart libvirtd
> 
> otherwise, the guest with egl-headless display will fail to start since
> permission denied.

You need libvirt 5.0.0 and qemu 3.1.0

Comment 15 yafu 2019-04-26 02:06:55 UTC
Reproduced with:
libvirt-4.4.0-9.el7.x86_64
qemu-kvm-rhev-2.12.0-14.el7.x86_64
selinux-policy-3.13.1-224.el7.noarch
kernel-3.10.0-940.el7.x86_64

Verified with:
libvirt-4.5.0-15.el7.x86_64
qemu-kvm-rhev-2.12.0-26.el7.x86_64
selinux-policy-3.13.1-245.el7.noarch
kernel-3.10.0-1034.el7.x86_64

Test steps:
Scenario 1: Start guest with egl-headless display
1.Create mdev device;

2.Define a guest with egl-headless display:
<graphics type='egl-headless'>
      <gl rendernode='/dev/dri/renderD128'/>
    </graphics>
    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on'>
      <source>
        <address uuid='d5b66366-625c-4632-ab15-6ef3a54666a8'/>
      </source>
    </hostdev>

4.Start the guest:
#virsh start q35
Domain q35 started

5.Check the qemu cmd line:
#ps aux | grep -i "egl-headless"
...-display egl-headless,rendernode=/dev/dri/renderD128 ... -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/cba683dd-c572-475d-bad4-62c499da6360,display=on,bus=pcie.0,addr=0x3 -
...

Scenario 2: Test scenario in https://bugzilla.redhat.com/show_bug.cgi?id=1628892#c4:
1. Prepare a VM with the following in the XML:
  <graphics type='spice'>
    <listen type='none'/>
  </graphics>

2. Rename all /dev/dri/renderDX devices (skip if your host doesn't have DRI)

3. Start the guest:
#virsh start q35
Domain q35 started

Comment 17 errata-xmlrpc 2019-08-06 13:13:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294