Bug 1659484 - Failed to create /var/lib/libvirt/.cache for shader cache (Permission denied)
Summary: Failed to create /var/lib/libvirt/.cache for shader cache (Permission denied)
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Erik Skultety
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-14 14:07 UTC by Laurent Bigonville
Modified: 2019-03-31 21:06 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-15 15:51:33 UTC


Attachments (Terms of Use)

Description Laurent Bigonville 2018-12-14 14:07:04 UTC
Hi,

When starting a domain with virgl support enabled, qemu complains that
/var/lib/libvirt/.cache doesn't exits. (On debian /var/lib/libvirt/ is the home for the libvirt user)

AFAICS, mesa is using this directory to cache the generated shaders.

Mesa tries this location because it the XDG location for the shader cache
files.

IMVHO, this should be moved to /var/cache/libvirt/ when running a
system domain by setting XDG_CACHE_HOME to that value.

Comment 1 Laurent Bigonville 2018-12-14 14:24:11 UTC
Looking at mesa code MESA_GLSL_CACHE_DIR could be set as well

But I'm starting to wonder if it's not a mesa bug because the error message seems to suggests that it's disabling the feature instead of failing completely

Comment 2 Daniel Berrangé 2018-12-14 18:04:41 UTC
I'm wondering why no one else has reported that problem yet, because on Fedora  'qemu' user home directory is "/", so QEMU should get the same permissions denied if attempting to use $HOME/.cache

Comment 3 Laurent Bigonville 2018-12-20 07:36:19 UTC
Could be related to mesa version? Debian unstable has 18.2.6

Comment 4 Benjamin Xiao 2019-02-08 20:45:28 UTC
I am running into this issue on ArchLinux as well when I am trying to create a Fedora 29 VM.

Unable to complete install: 'internal error: qemu unexpectedly closed the monitor: Failed to create //.cache for shader cache (Permission denied)---disabling.'

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/create.py", line 2119, in _do_async_install
    guest.installer_instance.start_install(guest, meter=meter)
  File "/usr/share/virt-manager/virtinst/installer.py", line 419, in start_install
    doboot, transient)
  File "/usr/share/virt-manager/virtinst/installer.py", line 362, in _create_guest
    domain = self.conn.createXML(install_xml or final_xml, 0)
  File "/usr/lib/python3.7/site-packages/libvirt.py", line 3726, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: Failed to create //.cache for shader cache (Permission denied)---disabling.

Comment 5 thedarkcornerinyourbrain 2019-02-17 22:57:59 UTC
I can confirm this behaviour on Arch Linux when trying to create VMs of any Distro, but only when i enable "OpenGL" on the Virtio-GPU.
When i start the qcow2 image file with QEMU directly by shell, i works with Virtio-GPU and OpenGL enabled!

Here is another user (in Arch Forum) with the same exact issue: https://bbs.archlinux.org/viewtopic.php?id=243555

**Here is the error message when trying to start the VM in virt-manager with OpenGL enabled:** 

Fehler beim Starten der Domain: Interner Fehler: qemu unexpectedly closed the monitor: Failed to create //.cache for shader cache (Permission denied)---disabling.

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1420, in startup
    self._backend.create()
  File "/usr/lib/python3.7/site-packages/libvirt.py", line 1080, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: Interner Fehler: qemu unexpectedly closed the monitor: Failed to create //.cache for shader cache (Permission denied)---disabling.

**And when starting the same qcow2 VM directly with QEMU in shell it works with OpenGL with this command:**

sudo qemu-system-x86_64 -enable-kvm -M q35 -smp 2 -m 2G -hda test.qcow2 -net nic,model=virtio -net user,hostfwd=tcp::2222-:22 -vga virtio -display sdl,gl=on

This is my very first Linux bug report, so please just tell me if i missed something or did something wrong! 
Thank you for your great work with virt-manager, and in advance for any help with this issue, if real bug or not!

I tried to include all info that could be relevant:

**Here is the config of Virt-Manager for this VM:**
IMGUR: https://imgur.com/a/QbFvmy6

**Here are all 3 different settings i tried on the display page to enable OpenGL and their corresponding errors:**
IMGUR: https://imgur.com/a/oNM6Jsj

**Here my Settings as text - While the above provided pictures are in more detail:**
GUEST OS: Antergos KDE, Budgie, Mate
vCPU: 2 - Copy from Host
RAM: 2/3GB
CHIPSET: Q35
DISK: Virtio 20GB - Default Settings
GPU: Virtio w/ 3D-acceleration
DISPLAY: Spice w/ OpenGL
NETWORK: Virtio - Standard NAT
CDDROM: SATA - Only for Setup
USB: 3.0
OTHERS: Defaults
PASSTHROUGH: None

**INFOS - SYSTEM - SOFTWARE - DESKTOP:**
OS: Arch
DE: KDE - Modded with Latte Dock + Extensions. No extra themes ATM. Suru Icon pack. 
BOOTLOADER: systemD UEFI boot
SOFTWARE: Mix of all types, KDE+GTK+Java+Electron, you name it. Around some hundred user installed packages.
GPU-DRIVER: AMD open source
UEFI OR BIOS: Latest UEFI
VIRTUALIZATION ENABLED IN UEFI: Yes
KERNEL: Latest stable Arch releases
MESA: Latest stable Arch releases

**INFOS - SYSTEM - HARDWARE - DESKTOP:**
 MAINBOARD: AMD AM4 - ASUS ROG Strix B450-F Gaming
 CPU: AMD Ryzen 7 1700 (8C/16T + NO IGP)
 GPU: Sapphire Radeon RX 570 Nitro 8GB
 RAM: 16GB = 2X8GB DDR4 non-ECC 2666MHZ Crucial Ballistix Sport LT
 LAN: 1Gb/S Intel (Luckily NO Realtek)
 SYSTEM-DISK: Samsung EVO 960 M2 NVME 250GB - LUKS
 HOME DIR DISK: Samsung EVO 850 SATA 500GB - LUKS
 HDDS: 4 X 8TB Seagate Ironwolf NAS HDDs 7200 RPM (Only attached to power when needed for Backup from my FreeNAS. LVM RAID5)
 DISPLAY: LG 4K HDR WebOS TV @ HDMI 2.0
 INPUT: USB-wired Mouse + Keyboard

Comment 6 Erik Skultety 2019-02-18 12:25:19 UTC
(In reply to Laurent Bigonville from comment #0)
> Hi,
> 
> When starting a domain with virgl support enabled, qemu complains that
> /var/lib/libvirt/.cache doesn't exits. (On debian /var/lib/libvirt/ is the
> home for the libvirt user)
> 
> AFAICS, mesa is using this directory to cache the generated shaders.
> 
> Mesa tries this location because it the XDG location for the shader cache
> files.
> 
> IMVHO, this should be moved to /var/cache/libvirt/ when running a

I think we could override XDG_CACHE_HOME to point to /var/lib/libvirt/qemu/<domain_name>/.cache (possibly overriding more of the XDG envs).

Comment 7 Erik Skultety 2019-02-18 13:16:51 UTC
(In reply to Laurent Bigonville from comment #3)
> Could be related to mesa version? Debian unstable has 18.2.6

I'm running fedora 29 and with both mesa 18.2.8 and 18.3.3 I can launch a VM with a virtio GPU. However, I don't see the "mesa_shader_cache" directory being created anywhere which I suppose means that the on-disk cache is disabled, either by default on my distro or something explicitly defines MESA_GLSL_CACHE_DISABLE which I don't see anywhere to be defining on my system, Gerd do you have any idea what I'm missing here?

Comment 8 Gerd Hoffmann 2019-02-19 12:52:33 UTC
(In reply to Erik Skultety from comment #7)
> (In reply to Laurent Bigonville from comment #3)
> > Could be related to mesa version? Debian unstable has 18.2.6
> 
> I'm running fedora 29 and with both mesa 18.2.8 and 18.3.3 I can launch a VM
> with a virtio GPU. However, I don't see the "mesa_shader_cache" directory
> being created anywhere which I suppose means that the on-disk cache is
> disabled, either by default on my distro or something explicitly defines
> MESA_GLSL_CACHE_DISABLE which I don't see anywhere to be defining on my
> system, Gerd do you have any idea what I'm missing here?

No clue, never seen that.  Possibly only some mesa drivers use this cache, so it could depend on the host gpu hardware whenever you run into this or not.

Intel graphics here.
Comment 9 says radeon (so ati/amd).

Comment 9 gobbledegeek 2019-02-21 06:23:31 UTC
This problem suddenly popped-up for me today on Fedora 29 4.20.8-200.fc29.x86_64 after months of trouble free usage. Ryzen 2400G.

G

Comment 10 Erik Skultety 2019-02-21 07:20:04 UTC
(In reply to gobbledegeek from comment #9)
> This problem suddenly popped-up for me today on Fedora 29
> 4.20.8-200.fc29.x86_64 after months of trouble free usage. Ryzen 2400G.
> 
> G

Thanks for the input, however, I don't have access to any HW with AMD/ATI graphics at the moment, I need to look around our machine reservation system to reproduce.

Comment 11 gobbledegeek 2019-02-23 04:11:21 UTC
My bad - looking closer the errors that popped up for me are a little different:

Error starting domain: internal error: qemu unexpectedly closed the monitor: Failed to create //.cache/mesa_shader_cache for shader cache (Permission denied)---disabling.

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1420, in startup
    self._backend.create()
  File "/usr/lib64/python3.7/site-packages/libvirt.py", line 1080, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: Failed to create //.cache/mesa_shader_cache for shader cache (Permission denied)---disabling.

Does this need a separate CR?

G

Comment 12 Dr. David Alan Gilbert 2019-02-25 10:02:31 UTC
I've just triggered this on my home machine; 'Failed to create //.cache for shader cache (Permission denied)--disabling.
This is f29 with a Radeon RX550 using the default/open/radv mesa drivers.

Comment 13 Daniel Berrangé 2019-02-25 10:33:55 UTC
(In reply to Erik Skultety from comment #6)
> I think we could override XDG_CACHE_HOME to point to
> /var/lib/libvirt/qemu/<domain_name>/.cache (possibly overriding more of the
> XDG envs).

Yes, we should definitely  set XDG_CACHE_HOME, even for the unprivileged libvirtd, since we need to ensure that each guest gets its own distinct cache directory. We can't have a single shared cache as SELinux will block that.

Comment 14 John Galt 2019-03-03 09:59:57 UTC
KVM/QEMU/Virtual Machine Manager:

Error starting domain: internal error: process exited while connecting to monitor: Failed to create //.cache for shader cache (Permission denied)---disabling.

This Fedora29 VM was working properly on Fedora29 host until running dnf update on 27 Feb (updated the host).  The VM still runs with virtio video but I must disable OpenGL in the Spice Server settings.  This reduces the performance significantly.  

I'm a noob, but can follow instructions to retrieve logs, etc.  

My system details can be found here if any help:
https://linux-hardware.org/?probe=e6dca4d6fd

Sincere Thanks, 

Adam

Comment 15 Erik Skultety 2019-03-04 13:59:10 UTC
I don't have AMD HW personally, so I'm trying to find some to test reliably, although I can see the error with Intel too, except that with Intel, Mesa will disable the cache if it fails to create one, so the VM still starts. Anyhow, I prepared a simple patch for libvirt, but it crashes in Mesa's Intel driver i965_dri, so I can't verify whether the fix will suffice. If anyone can give my patch a try and verify it works with AMD HW, I can proceed with proposing it upstream and file a bug against Mesa for Intel driver.

see my github branch for the patch: https://github.com/eskultety/libvirt/commits/xdg-vars

Comment 16 Dr. David Alan Gilbert 2019-03-04 15:34:10 UTC
Tested on my home box; with that libvirt it gets further and falls over on mesa trying to do a pthread_setaffinity (which I saw a different report of).
If I turn off sandobxing in qemu.conf (which is of course a bad thing) - the VM starts; not tried a modern guest yet.

Comment 17 Dr. David Alan Gilbert 2019-03-04 16:07:28 UTC
there's some other bugs somewhere as well; with an f29 boot iso:
  a) The bios boot just shows as mush
  b) and the main OS still shows mush.

Comment 18 Erik Skultety 2019-03-06 15:49:53 UTC
Proposed the libvirt changes:
https://www.redhat.com/archives/libvir-list/2019-March/msg00323.html

Comment 19 Erik Skultety 2019-03-15 15:51:33 UTC
The patches are now upstream:

commit 2d69af29073aa3dc2dc5b79afcecaa703b81125a
Refs: v5.1.0-248-g2d69af2907
Author:     Erik Skultety <eskultet@redhat.com>
AuthorDate: Mon Mar 4 12:47:08 2019 +0100
Commit:     Erik Skultety <eskultet@redhat.com>
CommitDate: Fri Mar 15 16:41:26 2019 +0100

    util: command: Introduce virCommandAddEnvXDG helper

    Some modules/libraries within QEMU could make use of the XDG_ vars when
    writing their data to the disk. Define the most common XDG variables
    and point them to the specific driver's libDir, i.e.

    XDG_CACHE_HOME -> /var/lib/libvirt/<driver>/.cache
    XDG_DATA_HOME -> /var/lib/libvirt/<driver>/.local/share
    XDG_CONFIG_HOME -> /var/lib/libvirt/<driver>/.config

    Signed-off-by: Erik Skultety <eskultet@redhat.com>
    Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


commit 7e73137495334c9995543e87897237e8a88f6c9b
Refs: v5.1.0-249-g7e73137495
Author:     Erik Skultety <eskultet@redhat.com>
AuthorDate: Fri Mar 8 12:15:07 2019 +0100
Commit:     Erik Skultety <eskultet@redhat.com>
CommitDate: Fri Mar 15 16:41:26 2019 +0100

    qemu: command: Enforce setting XDG variables for system QEMU

    For session mode, only XDG_CACHE_HOME is set, because we want to remain
    integrating with services in user session, but for system mode, this
    would have become reading/writing to '/' which carries the obvious issue
    with permissions (also, '/' is the wrong location in 99.9% cases anyway).

    Signed-off-by: Erik Skultety <eskultet@redhat.com>
    Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

commit 75a916988100b8ecdd72b4715ff3b20fa6e99f53
Refs: v5.1.0-250-g75a9169881
Author:     Erik Skultety <eskultet@redhat.com>
AuthorDate: Wed Mar 6 13:29:01 2019 +0100
Commit:     Erik Skultety <eskultet@redhat.com>
CommitDate: Fri Mar 15 16:41:26 2019 +0100

    qemu: command: Override HOME variable for system QEMU

    By default, qemu user's home dir points to '/' which shouldn't be used
    at all. We therefore pass the HOME variable from the current variable
    iff not running as SUID, which means that for systemd we never set it.
    This patch makes sure, that for system QEMU this is always set to
    libDir/<driver>, session mode is left untouched.

    Signed-off-by: Erik Skultety <eskultet@redhat.com>
    Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Beware, seccomp-sandboxing still needs to be turned off as mentioned in comment 16 in order for this to work properly.

Comment 20 gobbledegeek 2019-03-27 09:00:37 UTC
Erik,

>>Beware, seccomp-sandboxing still needs to be turned off as mentioned in comment 16 in order for this to work properly.

Thanks for your work on this. but IMHO we do not have a complete resolution to this as yet - if security has to be compromised in order for libvirtd to run. 
Is there a case for a different CR to be opened in order to plug the need to disable qemu sandboxing?


Thanks
G

Comment 21 Daniel Berrangé 2019-03-27 09:14:57 UTC
There is a patch pending in QEMU to resolve the seccomp problem by returning EPERM from the syscall instead of killing QEMU

https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg04413.html

Comment 22 Erik Skultety 2019-03-27 09:19:22 UTC
(In reply to gobbledegeek from comment #20)
> Erik,
> 
> >>Beware, seccomp-sandboxing still needs to be turned off as mentioned in comment 16 in order for this to work properly.
> 
> Thanks for your work on this. but IMHO we do not have a complete resolution
> to this as yet - if security has to be compromised in order for libvirtd to
> run.
> Is there a case for a different CR to be opened in order to plug the need to
> disable qemu sandboxing?

Not sure I understand your thoughts. From libvirt's POV, I don't think we could do more here, one thing I forgot to do above was to link the ubuntu issue filed for QEMU mentioned by Dave:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815889

Apparently, Mesa reverted setting the thread affinity which means the qemu process should not get killed anymore, so you might want to give that Mesa build a try with seccomp enabled in the qemu config and see whether it works for you.


Note You need to log in before you can comment on or make changes to this bug.