Bug 1692323 - qemu crashes with virgl enabled on some GPUs
Summary: qemu crashes with virgl enabled on some GPUs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: F30BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2019-03-25 11:05 UTC by František Zatloukal
Modified: 2019-03-28 17:41 UTC (History)
15 users (show)

Fixed In Version: qemu-3.1.0-6.fc30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-28 17:41:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
qemu bt (10.43 KB, text/plain)
2019-03-25 14:24 UTC, František Zatloukal
no flags Details
qemu bt - all threads (4.61 KB, text/plain)
2019-03-25 14:38 UTC, František Zatloukal
no flags Details

Description František Zatloukal 2019-03-25 11:05:29 UTC
Description of problem:
QEMU crashes when launching VM with 3d acceleration enabled on some GPUs. 

Version-Release number of selected component (if applicable):
qemu-3.1.0-5.fc30.x86_64

How reproducible:
Always (on affected hardware)

Steps to Reproduce:
1. Create new F29 Live VM in F30 GNOME Boxes/or virt-manager with virtio 3d/virgl
2. Wait and see the VM crash

Actual results:
QEMU might fail to run the VM with virgl on some GPUs, with following in journal:

Mar 25 11:10:36 fanys-laptop systemd-coredump[17145]: Process 17104 (qemu-system-x86) of user 1000 dumped core.


Expected results:
QEMU shouldn't fail to run the VM with virgl enabled.

Additional info:
The same can be achieved in virt-manager by switching Video to virtio 3d and enabling OpenGl support on Display tab.

GNOME Boxes doesn't show the tip to disable 3D acceleration if it crashed this way right away.

This seems to be hardware specific. It works just fine on:
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)

but is broken on:
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

Comment 1 Fedora Blocker Bugs Application 2019-03-25 11:47:12 UTC
Proposed as a Blocker for 30-final by Fedora user frantisekz using the blocker tracking app because:

 This breaks "The release must be able host virtual guest instances of the same release." criteria as GNOME Boxes enable virgl by default since F30.

As it seems to be HW specific, I am proposing this as a Final Blocker instead of Beta Blocker (which it would have been if it wasn't HW specific.)

Comment 2 Daniel Berrangé 2019-03-25 11:53:34 UTC
This will need more info beyond "it crashed", especially since its hardware specific. To start with please acquire a stack trace of all threads in QEMU, with all relevant -debuginfo RPMs present.

Comment 3 František Zatloukal 2019-03-25 14:24:55 UTC
Created attachment 1547721 [details]
qemu bt

Comment 4 František Zatloukal 2019-03-25 14:26:17 UTC
If I change virgl and gl qemu options to off, it doesn't crash with:


Thread 3 "qemu-system-x86" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7fffeec09700 (LWP 7321)]
0x00007ffff6e141a3 in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7fffeec086c0) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
34	  res = INTERNAL_SYSCALL (sched_setaffinity, err, 3, pd->tid, cpusetsize

Comment 5 Daniel Berrangé 2019-03-25 14:27:59 UTC
That log file doesn't contain any stack trace, just one single stack frame from one thread.

Please capture a full stack trace eg  "thread apply all backtrace" in the GDB prompt

Comment 6 František Zatloukal 2019-03-25 14:38:30 UTC
Created attachment 1547724 [details]
qemu bt - all threads

Comment 7 Daniel Berrangé 2019-03-25 14:44:48 UTC
This looks like it is caused by Mesa trying to set CPU affinity which is blocked by QEMU's seccomp filters, probably fixed upstream by

https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg06006.html

Comment 8 František Zatloukal 2019-03-25 16:22:53 UTC
So, I tried to do a test build with the patch, but it doesn't compile when applied against either f30 or f31, so I assume it should be used against master. I can try that later, thanks for pointing to the patch Daniel!

/usr/bin/ld: ../qemu-seccomp.o: in function `seccomp_start':
/builddir/build/BUILD/qemu-4.0.0-rc0/qemu-seccomp.c:181: undefined reference to `qemu_seccomp_get_kill_action'
collect2: error: ld returned 1 exit status

Comment 9 Daniel Berrangé 2019-03-25 16:27:18 UTC
Sigh the maintainer broke my patch. Try the original one https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg04413.html

Comment 10 František Zatloukal 2019-03-25 17:43:36 UTC
Okay, with that patch applied, the issue is fixed (tried against f30 branch in dist-git). Thanks!

Comment 11 František Zatloukal 2019-03-25 17:56:35 UTC
I've fired off scratch build with the fix if anyone else is interested: https://koji.fedoraproject.org/koji/taskinfo?taskID=33766636

Daniel, should I create a PR for the qemu package, or would you rather handle it yourself?

Comment 12 Geoffrey Marr 2019-03-25 18:49:25 UTC
Discussed during the 2019-03-25 blocker review meeting: [1]

The decision to classify this bug as an "AcceptedBlocker" (Beta) was made as it violates the following criteria:

"The release must be able host virtual guest instances of the same release", given that it affects the default config of the default virt app on the default desktop.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-03-25/f30-blocker-review.2019-03-25-16.01.txt

Comment 13 Adam Williamson 2019-03-25 20:15:39 UTC
As this was accepted as a blocker, and none of the qemu maintainers seemed to be around, I'm doing the fix for this. The build is currently running:

https://koji.fedoraproject.org/koji/taskinfo?taskID=33767403

and I will edit the existing pending qemu update (which includes a bunch of CVE fixes, so probably good things to pull in) to include it when it's done, and we will fire a new compose.

Comment 14 Fedora Update System 2019-03-25 23:24:37 UTC
qemu-3.1.0-6.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-0664c7724d

Comment 15 Fedora Update System 2019-03-25 23:24:41 UTC
qemu-3.1.0-6.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-0664c7724d

Comment 16 Kevin Fenzi 2019-03-26 00:55:06 UTC
Can those testing this confirm it's working ok when they have libseccomp-2.3.3-5.fc30 installed (whats stable now), and not libseccomp-2.4.0-0.fc30 (updates-testing). 

Our f29 builders were unable to launch f30/f31 guests with the older libseccomp.

Comment 17 Julen Landa Alustiza 2019-03-26 07:15:16 UTC
Tried on 1.7 beta live iso with boxes on a ryzen 5 1600X + nvidia 1060 and can't reproduce it on a normal usage of gnome-boxes

Comment 18 František Zatloukal 2019-03-26 08:33:16 UTC
(In reply to Kevin Fenzi from comment #16)
> Can those testing this confirm it's working ok when they have
> libseccomp-2.3.3-5.fc30 installed (whats stable now), and not
> libseccomp-2.4.0-0.fc30 (updates-testing). 
> 
> Our f29 builders were unable to launch f30/f31 guests with the older
> libseccomp.

It works both with libseccomp-2.3.3-5.fc30.x86_64 and libseccomp-2.4.0-0.fc30 .

Comment 19 Fedora Update System 2019-03-27 00:44:57 UTC
qemu-3.1.0-6.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-0664c7724d

Comment 20 Adam Williamson 2019-03-27 14:33:10 UTC
Frantisek: can you confirm that this works OK with the qemu update in Beta-1.8 (just to make sure I didn't screw up the patch or anything)? Thanks!

Comment 21 František Zatloukal 2019-03-27 15:20:44 UTC
Yeah, it's working just fine!

Comment 22 Fedora Update System 2019-03-28 17:41:14 UTC
qemu-3.1.0-6.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.