Created attachment 1453897 [details] xml dump Description of problem: This is on Arch Linux so the bug could well be in Arch configuration. Would appreciate any pointers on how to debug. . virgl works with plain qemu (-vga virtio -display gtk,gl=on) . virgl works with spice / remote-viewer (-device virtio-vga,virgl=on \ -spice gl=on,unix,addr=/run/user/1000/spice.sock,disable-ticketing) However, when using virt-manager / virsh, the VM does not start and I am instead left with an apparently crashed qemu process: # ps aux | grep qemu nobody 7675 0.0 0.0 0 0 ? Zl 15:35 0:00 [qemu-system-x86] <defunct> (Arch build uses --with-qemu-user=nobody --with-qemu-group=kvm) Latest released versions of everything: libvirt-4.4.0 qemu-2.12.0 virglrenderer-0.6.0 mesa-18.1.2 linux-4.17.2 xml dump and log file attached. Thanks
Created attachment 1453898 [details] log file
PS. The permission denied thing at the end of log file is a red herring. I had previously added env var MESA_GLSL_CACHE_DISABLE=true to the xml which got rid of the error but didn't solve the problem.
Created attachment 1454087 [details] debug log file with 1:qemu I added some logging to libvirtd (log_filters="1:qemu") and it seems the process chokes when libvirtd attaches to the QEMU monitor and executes "qmp_capabilities" New debug log file attached.
Also happens with qemu:///session To be clear, this is the trigger: --- Original XML +++ New XML @@ -84,7 +84,7 @@ <graphics type="spice"> <listen type="none"/> <image compression="off"/> - <gl enable="no"/> + <gl enable="yes"/> </graphics> <video> <model type="virtio" heads="1" primary="yes"> Tried the following to no avail: - an earlier version of qemu - <listen type='socket'/> instead of <listen type='none'/> Outside of libvirt, I also manually set up spice GL with QMP monitor port and connected to it via telnet and executed "qmp_capabilities" which worked fine. The closest related thing I can find is here: https://forums.gentoo.org/viewtopic-t-954294-start-0.html which seems to be indicating libvirt's use of "-daemonize" can lead to qemu zombies My hardware is Ryzen 7 2700X + Radeon RX 550 + NVME storage i.e. ie: quick. I'm thinking I might be hitting a timing related bug in libvirt. Would be grateful for any input on where to go from here. (re-titling bug report to be more accurate)
Found the culprit! It's the -sandbox arg to qemu, in particular the "resourcecontrol" param. Changing as follows allows qemu binary to run: -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=allow seccomp related apparently. Any thoughts?
Thanks for doing such a thorough examination, this is an interesting issue I'd like to investigate further, so I'll start looking at it, but I probably won't get to it until next week (if you happen to fix it by then, perfect, patches are always welcome). Anyhow, this might affect downstream too, eventually we might want to move this BZ to a related product in order to get more attention and priority.
commenting out this line in qemu-seccomp.c works around it. /* { SCMP_SYS(sched_setscheduler), QEMU_SECCOMP_SET_RESOURCECTL }, */ I suppose we should let changing scheduling to lower priority. A second issue is finding out why libvirt doesn't receive a HUP on the monitor socket when qemu dies with the seccomp rule...
simple reproducer: qemu-system-x86_64 -sandbox on,resourcecontrol=deny -spice gl=on
here is a simple reproducer for the hang (sigsys not received) #include <unistd.h> #include <seccomp.h> #include <pthread.h> #include <sched.h> static void *thread_fn(void *args) { while(1) sleep(1); } int main(void) { { pthread_attr_t attr; pthread_t tid; pthread_attr_init(&attr); pthread_create(&tid, &attr, thread_fn, 0); } { scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_ALLOW); seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(sched_getscheduler), 0); seccomp_load(ctx); } sched_getscheduler(0); return 0; }
we need SECCOMP_RET_KILL_PROCESS, see https://github.com/seccomp/libseccomp/issues/96
The seccomp problem was fixed upstream (commit 6f2231e9b0931e1998d9ed0c509adf7aedc02db2 and bda08a5764d470f101fa38635d30b41179a313e1) and backported in various RHEL versions. No changed identified in libvirt. Closing