Bug 1674324
Summary: | With <graphics type='spice'><gl enable='on'/>, qemu either refuses to start completely or spice-server crashes afterwards | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | David Jaša <djasa> |
Component: | qemu-kvm | Assignee: | Marc-Andre Lureau <marcandre.lureau> |
qemu-kvm sub component: | General | QA Contact: | Guo, Zhiyi <zhguo> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | areis, cfergeau, chayang, coli, ddepaula, jinzhao, juzhang, knoel, marcandre.lureau, mtessun, rbalakri, tpelka, virt-maint, zhguo |
Version: | 8.0 | Flags: | zhguo:
needinfo-
|
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-2.12.0-95.module+el8.2.0+5354+b7ebf7be | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-04-28 15:32:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Jaša
2019-02-11 00:04:59 UTC
(In reply to David Jaša from comment #0) > Additional info: > F29 versions don't suffer any of thes > el7 with RHV qemu behaves the same "behaves the same"? as f29 or as el8? In my testing, el7 is working, so I assume you meant "behave the same as f29 (working)"? virgl is disabled in qemu-kvm-2.12.0-60.module+el8+2725+0ab65287.x86_64, without virgl (ie no <accelerate accel3d='yes'/> support), virtio + spice + gl='on' is not really interesting, so in my opinion this is not a 8.0.0 blocker. This seems to be a bug in qemu. Backtrace of the crash is (don't trust the line numbers too much, this was tested with a local build) (gdb) bt #0 0x00007fea1e59a53f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fea1e584895 in __GI_abort () at abort.c:79 #2 0x00007fea1eb56d62 in spice_logv (log_domain=0x7fea1ec031d2 "Spice", log_level=G_LOG_LEVEL_CRITICAL, strloc=0x7fea1ebf4884 "../server/red-qxl.c:790", function=0x7fea1ebf4b00 <__func__.48991> "spice_qxl_gl_scanout", format=0x7fea1ebf4822 "condition `%s' failed", args=0x7fff10882628) at ../subprojects/spice-common/common/log.c:187 #3 0x00007fea1eb56e20 in spice_log (log_level=G_LOG_LEVEL_CRITICAL, strloc=0x7fea1ebf4884 "../server/red-qxl.c:790", function=0x7fea1ebf4b00 <__func__.48991> "spice_qxl_gl_scanout", format=0x7fea1ebf4822 "condition `%s' failed") at ../subprojects/spice-common/common/log.c:200 #4 0x00007fea1eb0ca4f in spice_qxl_gl_scanout (qxl=0x561a439981d8, fd=137, width=1024, height=768, stride=4096, format=875708993, y_0_top=0) at ../server/red-qxl.c:790 #5 0x0000561a3fe19bee in spice_gl_switch (dcl=0x561a43998198, new_surface=<optimized out>) at /home/teuf/redhat/qemu/include/ui/console.h:332 #6 0x0000561a3fe120aa in dpy_gfx_replace_surface (con=0x561a42d2c800, surface=0x561a43dfa2e0) at ui/console.c:1585 #7 0x0000561a3fb80e5f in virtio_gpu_set_scanout (cmd=0x561a4308c260, g=0x561a43c8d690) at /home/teuf/redhat/qemu/hw/display/virtio-gpu.c:677 #8 0x0000561a3fb80e5f in virtio_gpu_simple_process_cmd (cmd=0x561a4308c260, g=0x561a43c8d690) at /home/teuf/redhat/qemu/hw/display/virtio-gpu.c:855 #9 0x0000561a3fb80e5f in virtio_gpu_process_cmdq (g=<optimized out>) at /home/teuf/redhat/qemu/hw/display/virtio-gpu.c:893 #10 0x0000561a3ff498ce in aio_bh_call (bh=0x561a43df90a0) at util/async.c:118 #11 0x0000561a3ff498ce in aio_bh_poll (ctx=ctx@entry=0x561a427b6960) at util/async.c:118 #12 0x0000561a3ff4ce80 in aio_dispatch (ctx=0x561a427b6960) at util/aio-posix.c:460 #13 0x0000561a3ff497ae in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 #14 0x00007fea2130406d in g_main_dispatch (context=0x561a427b6d20) at gmain.c:3182 #15 0x00007fea2130406d in g_main_context_dispatch (context=context@entry=0x561a427b6d20) at gmain.c:3847 #16 0x0000561a3ff4c098 in glib_pollfds_poll () at util/main-loop.c:215 #17 0x0000561a3ff4c098 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 #18 0x0000561a3ff4c098 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:514 #19 0x0000561a3fc55e29 in main_loop () at vl.c:1923 #20 0x0000561a3fad65c1 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4584 I've reproduced this bug with git master and /configure '--extra-ldflags=-Wl,--build-id -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-O2 -g -pipe -fexceptions -fstack-protector-strong -Wno-error' --enable-spice --enable-opengl --target-list=x86_64-softmmu --enable-kvm --disable-virglrenderer --disable-xen --enable-debug --disable-strip Then I boot a fedora 29 livecd with the config djasa described in the initial comment, connect to it with virt-viewer -a, click on the 'Activities' button in the upper right corner, and click on the firefox icon. More often than not, this triggers the assertion failure and crashes QEMU. This happens because we receive a spice_qxl_gl_scanout call in the middle of a spice_qxl_gl_draw_async call (before red_qxl_gl_draw_async_complete/QXLInterface::async_complete gets called back), which is not valid. qemu_spice_gl_block/qemu_spice_gl_unblock seems to be meant to avoid this kind of situations, however the virtio-gpu implementation is: const GraphicHwOps virtio_gpu_ops = { #ifdef CONFIG_VIRGL .gl_block = virtio_gpu_gl_block, #endif and virgl is disabled in this build, so the blocking while the draw command is in flight is not be functional. thanks Christophe for the analysis, are you working on a patch? I think I have done a related fix in QEMU in the past, I would have to dig in the archives though. Marc-André, was not planning too, then dug at history, found https://git.qemu.org/?p=qemu.git;a=commit;h=c19f4fbce1c2293b7a9bddadddd7a1b69953f534 which seems related, and now trying to revert that patch, and do something similar to https://git.qemu.org/?p=qemu.git;a=blob;f=hw/display/virtio-gpu-3d.c#l406 in virtio-gpu.c is awfully tempting... In short, I'll experiment with that and see if it helps. I found this pending patch: https://github.com/elmarco/qemu/commit/22c94823d741dca97d912f5d737561da12538f75 Looks very similar to what I came up with, cmd->waiting can be removed after this change. I'll test a bit more but this was fixing the crash for me. Ah, and you also need diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index 74f203c727..c1b46e0686 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -1054,9 +1054,7 @@ const GraphicHwOps virtio_gpu_ops = { .gfx_update = virtio_gpu_update_display, .text_update = virtio_gpu_text_update, .ui_info = virtio_gpu_ui_info, -#ifdef CONFIG_VIRGL .gl_block = virtio_gpu_gl_block, -#endif }; static const VMStateDescription vmstate_virtio_gpu_scanout = { djasa, this scratch build should have Marc-André's patch + the change from comment #10 if you want to give it a try. Not sure where the link went https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=20194251 (In reply to Christophe Fergeau from comment #9) > Looks very similar to what I came up with, cmd->waiting can be removed after > this change. I'll test a bit more but this was fixing the crash for me. yep, the following patch in the series: "virtio-gpu: remove useless 'waiting' field" I'll resend soon. I need to push Gerd to review my changes :) I tried the scratch build on two el8 machines with these results: - on one machine, qemu with -spice gl=on fails to start with no error message given (even with G_MESSAGES_DEBUG=all) - on the other machine, qemu starts but guests don't utilize virgl renderer (although they are run from the very same images that do render using VirGL on F29 host) I didn't encounter the crash any more. (In reply to David Jaša from comment #18) > I tried the scratch build on two el8 machines with these results: > - on one machine, qemu with -spice gl=on fails to start with no error > message given (even with G_MESSAGES_DEBUG=all) > - on the other machine, qemu starts but guests don't utilize virgl renderer > (although they are run from the very same images that do render using VirGL > on F29 host) > > I didn't encounter the crash any more. RHEL qemu builds are compiled without virgl support, so I would not expect the guests to be able to use it. (In reply to Christophe Fergeau from comment #19) > (In reply to David Jaša from comment #18) > > I tried the scratch build on two el8 machines with these results: > > - on one machine, qemu with -spice gl=on fails to start with no error > > message given (even with G_MESSAGES_DEBUG=all) > > - on the other machine, qemu starts but guests don't utilize virgl renderer > > (although they are run from the very same images that do render using VirGL > > on F29 host) > > > > I didn't encounter the crash any more. > > RHEL qemu builds are compiled without virgl support, so I would not expect > the guests to be able to use it. <graphics type='spice'><gl enable='yes'> isn't enabling virgl, however it enables qemu gl rendering. David, what made the first machine different from the second? The GPU? Could you give access to the first machine? thanks (sorry for the delay) (In reply to Marc-Andre Lureau from comment #20) ... > <graphics type='spice'><gl enable='yes'> isn't enabling virgl, however it > enables qemu gl rendering. > > David, what made the first machine different from the second? The GPU? Could > you give access to the first machine? > > thanks (sorry for the delay) I'm delayed even more, also sorry. I managed to reproduce both on the same machine, the behaviour depends on libvirt session: - in system session, the VM doesn't start - in user session, the VM starts As you say, the acceleration isn't available so - when you specify <video><model type="virtio"><acceleration accel3d='yes'/>, VM is not started - without specification in xml or when disable, when the VM starts, acceleration is reported as unavailable within VM. The qemu crashes are also pretty frequent: (process:21091): Spice-CRITICAL **: 16:01:30.123: red-qxl.c:708:spice_qxl_gl_scanout: condition `qxl_state->gl_draw_cookie == GL_DRAW_COOKIE_INVALID' failed 2019-07-12 14:01:30.679+0000: shutting down, reason=crashed Christophe, do you still have the GL_DRAW_COOKIE_INVALID backport handy? Can you submit it to rhvirt? This BZ lost ITR flag. I'm setting it back, but the patch needs acks. Also requesting exception+ QE: can you please grant QA_ACK? QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Hmm, I'm not able to reproduce this bug against qemu-kvm-2.12.0-63.module+el8+2833+c7d6d092.x86_64 and qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.x86_64. My qemu command line used: /usr/libexec/qemu-kvm -device virtio-vga -spice addr=/tmp/spice.sock,unix,disable-ticketing,gl=on -monitor stdio -cdrom Fedora-Workstation-Live-x86_64-31-1.9.iso -m 4G -machine q35 With this command, I always get a bland screen After removing gl=on, I can always see fedora welcome page. On latest qemu-kvm-2.12.0-98.module+el8.2.0+5698+10a84757.x86_64, I also hit the same behaviors. Hi Marc-Andre, Could you help to check comment 31 and 32? Thanks! BR/ Zhiyi Do you get only a black screen when connecting the spice client, and no qemu error such as: (process:21091): Spice-CRITICAL **: 16:01:30.123: red-qxl.c:708:spice_qxl_gl_scanout: condition `qxl_state->gl_draw_cookie == GL_DRAW_COOKIE_INVALID' failed ? Black screen may be due to incompatible GPU, what's the host gpu? I don't know if local GL/spice is supported in RHEL8 tbh, we would have to ask the Spice team. (In reply to Marc-Andre Lureau from comment #34) > Do you get only a black screen when connecting the spice client, and no qemu > error such as: > > (process:21091): Spice-CRITICAL **: 16:01:30.123: > red-qxl.c:708:spice_qxl_gl_scanout: condition `qxl_state->gl_draw_cookie == > GL_DRAW_COOKIE_INVALID' failed ? Nothing like this prompts > > Black screen may be due to incompatible GPU, what's the host gpu? > > I don't know if local GL/spice is supported in RHEL8 tbh, we would have to > ask the Spice team. I have checked on different GPUs, but results are same, only open source driver used. Gpu I used: 04:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] 00:02.0 VGA compatible controller: Intel Corporation Iris Plus Graphics 650 21:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] Needinfo David and Christophe for help Reproduce this issue against qemu-kvm-2.12.0-94.module+el8.2.0+5297+222a20af.x86_64 Steps: # gdb /usr/libexec/qemu-kvm (gdb) run -device virtio-vga -spice addr=/tmp/spice.sock,unix,disable-ticketing,gl=on -monitor stdio -cdrom Fedora-Workstation-Live-x86_64-31-1.9.iso -m 4G -machine q35 try to touch some UI Result: qemu core dump with: (process:561): Spice-CRITICAL **: 11:22:11.464: red-qxl.c:708:spice_qxl_gl_scanout: condition `qxl_state->gl_draw_cookie == GL_DRAW_COOKIE_INVALID' failed [Detaching after fork from child process 779] Thread 1 "qemu-kvm" received signal SIGABRT, Aborted. 0x00007ffff2c1270f in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff2c1270f in raise () at /lib64/libc.so.6 #1 0x00007ffff2bfcb25 in abort () at /lib64/libc.so.6 #2 0x00007ffff439f948 in () at /lib64/libspice-server.so.1 #3 0x00007ffff436ff31 in spice_qxl_gl_scanout () at /lib64/libspice-server.so.1 #4 0x0000555555a92b60 in spice_gl_switch (dcl=0x5555573cd7f8, new_surface=<optimized out>) at /usr/src/debug/qemu-kvm-2.12.0-94.module+el8.2.0+5297+222a20af.x86_64/include/ui/console.h:342 #5 0x0000555555a8bc9a in dpy_gfx_replace_surface (con=0x555556445200, surface=0x55555785a090) at ui/console.c:1597 #6 0x00005555558b9ee3 in virtio_gpu_set_scanout (cmd=0x5555575355f0, g=0x5555575f5820) at /usr/src/debug/qemu-kvm-2.12.0-94.module+el8.2.0+5297+222a20af.x86_64/hw/display/virtio-gpu.c:676 #7 0x00005555558b9ee3 in virtio_gpu_simple_process_cmd (cmd=0x5555575355f0, g=0x5555575f5820) at /usr/src/debug/qemu-kvm-2.12.0-94.module+el8.2.0+5297+222a20af.x86_64/hw/display/virtio-gpu.c:854 #8 0x00005555558b9ee3 in virtio_gpu_process_cmdq (g=<optimized out>) at /usr/src/debug/qemu-kvm-2.12.0-94.module+el8.2.0+5297+222a20af.x86_64/hw/display/virtio-gpu.c:892 #9 0x0000555555b6dcb6 in aio_bh_call (bh=0x555557731850) at util/async.c:118 #10 0x0000555555b6dcb6 in aio_bh_poll (ctx=ctx@entry=0x5555564ad1c0) at util/async.c:118 #11 0x0000555555b70e34 in aio_dispatch (ctx=0x5555564ad1c0) at util/aio-posix.c:440 #12 0x0000555555b6db92 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 #13 0x00007ffff76ac67d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #14 0x0000555555b700b0 in glib_pollfds_poll () at util/main-loop.c:215 #15 0x0000555555b700b0 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 #16 0x0000555555b700b0 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 #17 0x0000555555837a27 in main_loop () at vl.c:1981 #18 0x0000555555837a27 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4820 Verify this issue against qemu-kvm-2.12.0-98.module+el8.2.0+5698+10a84757.x86_64, doing some interactions with VM desktop, no crash happen verified per comment 37 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1587 |