Bug 1462381 - Systems with qxl/SPICE and graphical boot enabled fail to boot with kernel 4.12
Summary: Systems with qxl/SPICE and graphical boot enabled fail to boot with kernel 4.12
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 27
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedFreezeException
: 1464780 1465148 1487879 1488043 1490895 1491408 (view as bug list)
Depends On:
Blocks: F27BetaFreezeException
TreeView+ depends on / blocked
 
Reported: 2017-06-16 23:31 UTC by Adam Williamson
Modified: 2017-10-18 14:32 UTC (History)
58 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-02 12:52:52 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Linux Kernel 196777 None None None 2019-04-24 06:51:47 UTC
Red Hat Bugzilla 1490895 None CLOSED kernel crash when trying upgrade VM to F27 2019-08-21 21:10:11 UTC
Red Hat Bugzilla 1491320 None CLOSED heavy screen flicker with latest kernels in qxl+spice VM 2019-08-21 21:10:11 UTC

Internal Links: 1490895 1491320

Description Adam Williamson 2017-06-16 23:31:17 UTC
If you install from the Rawhide Workstation ostree installer image to a VM using qxl/SPICE graphics, then attempt to boot the installed system as usual, it seems to hang during boot, with the bootsplash partly filled in.

If you switch to vga/VNC graphics, the system will boot fine. The system *also* boots fine if you stick with qxl/SPICE, but edit the kernel params and take out 'rhgb'.

This does not seem to affect other Workstation installs, only the ostree installer, at least so far.

I'm using the 'ostree' component so far to file Workstation ostree image-specific bugs as I'm not sure there's a more appropriate component; please advise me if there is.

Comment 1 Adam Williamson 2017-06-16 23:31:41 UTC
CCing halfline as this seems plymouth-related.

Comment 2 Adam Williamson 2017-06-19 23:36:34 UTC
In fact this *does* affect non-ostree installs also, so changing component to plymouth.

Comment 3 John Ellson 2017-07-06 15:38:59 UTC
Possible dup of: #1465148 and #1464780

Comment 4 John Ellson 2017-07-06 15:56:47 UTC
Possible dup of: #1450725

Comment 5 Joachim Frieben 2017-07-09 17:44:23 UTC
*** Bug 1464780 has been marked as a duplicate of this bug. ***

Comment 6 Alessio 2017-07-18 07:31:30 UTC
Same here.
Using Virtio or VMVGA as display driver, instead of QXL, the issue disappears.
As a side note: using VGA, the login screen appears, but the mouse doesn't work; using Cirrus, the boot goes on until the bootsplash takes the form of the Fedora logo, then it hangs.

Comment 7 David H. Gutteridge 2017-07-21 02:20:43 UTC
I've also reported this upstream on the dri-devel@freedesktop.org, as requested by the developer at Canonical who made a bunch of changes to the QXL driver in the 4.12 kernel.

https://lists.freedesktop.org/archives/dri-devel/2017-July/147766.html

Comment 8 David H. Gutteridge 2017-07-21 02:24:57 UTC
Relevant kernel log output:

Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: ------------[ cut here ]------------
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: kernel BUG at drivers/gpu/drm/ttm/ttm_bo_util.c:589!
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: invalid opcode: 0000 [#1] SMP
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core tpm_tis snd_hwdep tpm_tis_core snd_seq tpm snd_seq_device snd_pcm snd_timer virtio_balloon ppdev parport_pc snd pcspkr parport i2c_piix4 qemu_fw_cfg floppy soundcore qxl drm_kms_helper virtio_console syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 8139too crc32c_intel virtio_pci serio_raw virtio_ring virtio 8139cp ata_generic mii pata_acpi
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: CPU: 2 PID: 326 Comm: plymouthd Not tainted 4.12.0 #1
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: task: f4c290c0 task.stack: f4d12000
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: EIP: ttm_bo_kmap+0x120/0x220 [ttm]
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: EFLAGS: 00010202 CPU: 2
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: EAX: f4a01c3c EBX: f1ff13b0 ECX: 00000300 EDX: 00000000
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: ESI: f1e33750 EDI: f4a01cb4 EBP: f4d13d58 ESP: f4d13d3c
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel:  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: CR0: 80050033 CR2: fec00000 CR3: 34cec000 CR4: 000006d0
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel: Call Trace:
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel:  ? qxl_bo_kunmap_atomic_page+0x6c/0x70 [qxl]
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel:  qxl_bo_kmap+0x45/0x70 [qxl]
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel:  qxl_draw_dirty_fb+0x19e/0x3c0 [qxl]
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel:  ? drm_modeset_lock+0x65/0x100 [drm]
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel:  qxl_framebuffer_surface_dirty+0x86/0xe0 [qxl]
Jul 19 00:56:46 arcus-v5.nonus-porta.net kernel:  ? qxl_plane_cleanup_fb+0x30/0x30 [qxl]
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  drm_mode_dirtyfb_ioctl+0x141/0x180 [drm]
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  ? qxl_plane_cleanup_fb+0x30/0x30 [qxl]
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  ? drm_mode_getfb+0x100/0x100 [drm]
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  drm_ioctl+0x1f8/0x430 [drm]
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  ? drm_mode_getfb+0x100/0x100 [drm]
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  ? drm_version+0x80/0x80 [drm]
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  do_vfs_ioctl+0x91/0x670
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  ? syscall_trace_enter+0x20b/0x260
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  SyS_ioctl+0x5e/0x70
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  do_fast_syscall_32+0x6c/0x130
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  entry_SYSENTER_32+0x4e/0x7c
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel: EIP: 0xb7739cc9
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel: EFLAGS: 00000282 CPU: 2
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel: EAX: ffffffda EBX: 00000009 ECX: c01864b1 EDX: bfd95404
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel: ESI: b721088c EDI: c01864b1 EBP: 00000009 ESP: bfd953a8
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel:  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel: Code: 08 c7 41 08 02 00 00 00 8b 4d e8 8b 53 0c 50 8d 1c 8a 8b 55 e4 31 c9 89 d8 e8 dd f1 20 d1 8b 5d 08 89 03 5a eb 90 90 8d 74 26 00 <0f> 0b 8d b6 00 00 00 00 8b 47 38 eb bc 8d 76 00 8b 45 f0 8d 44
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel: EIP: ttm_bo_kmap+0x120/0x220 [ttm] SS:ESP: 0068:f4d13d3c
Jul 19 00:56:47 arcus-v5.nonus-porta.net kernel: ---[ end trace cc7ef630b2758ee6 ]---

Comment 9 David H. Gutteridge 2017-07-21 02:45:09 UTC
(That is, Gabriel Krisman Bertazi at Collabora, not Canoncial. Ahem.)

Comment 10 Joachim Frieben 2017-07-30 17:34:18 UTC
Since kernel-4.12.4-300.fc26, this issue appears to affect Fedora 26, too; see https://bodhi.fedoraproject.org/updates/FEDORA-2017-14ad2c5d17.

Comment 11 Fedora Update System 2017-08-07 21:06:19 UTC
kernel-4.12.5-300.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-adc7d95627

Comment 12 David H. Gutteridge 2017-08-11 13:37:14 UTC
With kernel-4.12.5-200.fc25, a graphical boot succeeds for me, though there are subsequent (less severe, so far) QXL errors in the kernel log.

Comment 13 Fedora Update System 2017-08-13 04:02:14 UTC
kernel-4.12.5-300.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-adc7d95627

Comment 14 David H. Gutteridge 2017-08-13 22:18:26 UTC
Correction, with kernel-4.12.5-200.fc25, graphical boots intermittently fail completely. (I can't comment on the Fedora 26 update, but there's been feedback provided in Bodhi it's still an issue there too.)

Comment 15 Fedora Update System 2017-08-14 21:51:20 UTC
kernel-4.12.5-300.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 16 Paul DeStefano 2017-08-15 04:17:43 UTC
I get the exact same behavior as on rawhide.  After upgrade to kernel-4.12.5-300.fc26, VM will not boot, hangs on rhgb.  I don't understand how this is closed?  Is that automatic or something?

Comment 17 Adam Williamson 2017-08-15 04:55:29 UTC
Yes, bugs are automatically closed when an update that's marked as fixing them is pushed stable (update submitters can configure updates *not* to do this, but the default is to do it).

Comment 18 Jan Kurik 2017-08-15 08:31:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Comment 19 Joachim Frieben 2017-08-15 09:32:46 UTC
(In reply to Adam Williamson from comment #0)
The relevant component rather seems to be the kernel package since when removing boot option "rhgb", the boot process hangs after the system attempts to launch the graphical login manager.

Comment 20 John Ellson 2017-08-15 13:50:26 UTC
Re Comment #19: Isn't it the other way around?   The boot process hangs after the system attempts to launch the graphical login manager,  *unless* the "rhgb" option is removed.

Re Comments #13, #14, #15 : I never saw the issue on fc25 or fc26,   only on rawhide/fc27.   Is there some other fc26 bug getting mixed in here?

Re Comment #7 : reported the issue upstream to freedesktop.org.  Has there been any response yet?   I see there is a comment about the same bug being seen on Debian.

Comment 21 Adam Williamson 2017-08-15 14:04:30 UTC
John: kernels get pushed down the line of stable releases; if this is a kernel bug, it's reasonable that it showed up in F25/F26 when kernel 4.12 reached them.

Comment 22 Joachim Frieben 2017-08-15 14:29:54 UTC
(In reply to John Ellson from comment #20)
When boot option "rhgb" is removed, the boot procedure proceeds much farther up to the point when the graphical boot manager is launched. Therefore, this issue is unlikely to be a Plymouth issue. This conclusion is also supported by comment 10 according to which this issue also started affecting Fedora 26 with the adoption of kernels of the 4.12.x series.

Comment 23 David H. Gutteridge 2017-08-16 00:40:15 UTC
(In reply to John Ellson from comment #20)
> Re Comments #13, #14, #15 : I never saw the issue on fc25 or fc26,   only on
> rawhide/fc27.   Is there some other fc26 bug getting mixed in here?

It's the very same bug in F25 and F26, as far as I know.

> Re Comment #7 : reported the issue upstream to freedesktop.org.  Has there
> been any response yet?   I see there is a comment about the same bug being
> seen on Debian.

No, there hasn't been any response I've seen.

Comment 24 John Ellson 2017-08-16 13:28:08 UTC
Re comment #23

The upstream bug:

     https://bugs.freedesktop.org/show_bug.cgi?id=100725

has been closed based on a comment from you.

But I still see the bug on Fedora 27 with kernel-4.13.0-0.rc4.git4.1

Comment 25 John Ellson 2017-08-16 13:34:00 UTC
I don't think that upstream's 100725 is the same bug at all.

Comment 26 David H. Gutteridge 2017-08-16 13:36:47 UTC
(In reply to John Ellson from comment #24)
> Re comment #23
> 
> The upstream bug:
> 
>      https://bugs.freedesktop.org/show_bug.cgi?id=100725
> 
> has been closed based on a comment from you.
> 
> But I still see the bug on Fedora 27 with kernel-4.13.0-0.rc4.git4.1

No, that's not the relevant upstream bug, that was unrelated. (If you look at the initial comments, you'll see it was an i915 issue.) I happened to mention this issue in it, since I hit it while I was testing for the other matter. There is no upstream bug in Freedesktop.org for this issue, the developer asked I send an email to their mailing list.

Comment 27 John Ellson 2017-08-16 13:53:34 UTC
Re: Comment #21.

Yes,  it is indeed a reasonable expectation ... except that the first kernel-4.12 release (4.12.5-300) only reached my Fedora-26 yesterday,  August 25th.   (I do daily updates to these systems, but not from updates-testing.)

And checking today,  I see you are absolutely correct that the bug now exists in Fedora-26.

Comment 28 Adam Williamson 2017-08-30 12:12:18 UTC
We really should document this in commonbugs, especially since it's now hitting stable releases.

Also, I'm at flock with the entire Fedora kernel team (all both of them), so will ask if they can take another look at this today. Don't think halfline is here, though.

Comment 29 fednuc 2017-09-05 10:27:16 UTC
This seems to also affect other VMs with kernel 4.12 on the guest side (e.g. currently openSUSE Tumbleweed, Ubuntu 17.10 daily builds).

Comment 30 Adam Williamson 2017-09-05 17:02:15 UTC
Yes, it turns out to be an upstream kernel bug, nothing Fedora-specific. We just ran into it first. Our kernel folks are currently working on bisecting it, I think, but it's a slow process.

Comment 31 Michal Schmidt 2017-09-07 14:40:08 UTC
*** Bug 1465148 has been marked as a duplicate of this bug. ***

Comment 32 David H. Gutteridge 2017-09-12 01:44:42 UTC
I noticed the logs for the 4.12.12 kernel builds in Koji note "QXL Fixes", so I test booted kernel-4.12.12-200.fc25 in the applicable VM I have, and multiple graphical boots have been successful. Thanks Fedora kernel team!

Comment 33 Adam Williamson 2017-09-12 19:14:52 UTC
The fix that was applied to the 4.12.12 builds is:

https://www.spinics.net/lists/dri-devel/msg151958.html

note that it is *not* (yet) applied to f27 and rawhide kernels, so f27 and rawhide are likely still subject to this bug.

Comment 34 Debarshi Ray 2017-09-13 15:46:29 UTC
I am not convinced that this is fixed in kernel-4.12.12-300. My F26 VM still gets stuck in Plymouth while booting in Boxes like it was doing before. Sadly, "rpm -Uvh ..." directly from Koji got rid of my good kernel, so I can't quickly check what the backtrace is.

Anyway, this was the backtrace I was getting previously. Slightly different than the one here, but probably the same bug:

kernel: ------------[ cut here ]------------
kernel: kernel BUG at drivers/gpu/drm/ttm/ttm_bo_util.c:589!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: virtio_console qxl 8139too drm_kms_helper crc32c_intel ttm seri
kernel: CPU: 2 PID: 363 Comm: plymouthd Not tainted 4.12.11-300.fc26.x86_64 #1
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/201
kernel: task: ffff97d8f6320000 task.stack: ffffa6ff4073c000
kernel: RIP: 0010:ttm_bo_kmap+0x1b5/0x260 [ttm]
kernel: RSP: 0018:ffffa6ff4073fb90 EFLAGS: 00010283
kernel: RAX: ffff97d938d26190 RBX: ffff97d8f620b400 RCX: ffff97d8f620b690
kernel: RDX: 0000000000000300 RSI: 0000000000000000 RDI: ffff97d8f620b458
kernel: RBP: ffffa6ff4073fbd0 R08: ffff97d8f620b528 R09: 0000000000000400
kernel: R10: 0000000000000008 R11: 00000000000015b4 R12: ffff97d93bb066b0
kernel: R13: 0000000000000000 R14: ffff97d93ca2f598 R15: 0000000000000000
kernel: FS:  00007ff339e72d00(0000) GS:ffff97d93fd00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000055f82097d7b0 CR3: 0000000036134000 CR4: 00000000000406e0
kernel: Call Trace:
kernel:  ? qxl_bo_kunmap_atomic_page+0x85/0x90 [qxl]
kernel:  qxl_bo_kmap+0x42/0x70 [qxl]
kernel:  qxl_draw_dirty_fb+0x1f5/0x420 [qxl]
kernel:  qxl_framebuffer_surface_dirty+0xa0/0xf0 [qxl]
kernel:  ? __kmalloc+0x1d1/0x210
kernel:  drm_mode_dirtyfb_ioctl+0x17e/0x1c0 [drm]
kernel:  drm_ioctl+0x213/0x4d0 [drm]
kernel:  ? drm_mode_getfb+0x110/0x110 [drm]
kernel:  do_vfs_ioctl+0xa5/0x600
kernel:  ? security_file_ioctl+0x43/0x60
kernel:  SyS_ioctl+0x79/0x90
kernel:  do_syscall_64+0x67/0x140
kernel:  entry_SYSCALL64_slow_path+0x25/0x25
kernel: RIP: 0033:0x7ff338e5d5e7
kernel: RSP: 002b:00007fffc0177ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
kernel: RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007ff338e5d5e7
kernel: RDX: 00007fffc0177be0 RSI: 00000000c01864b1 RDI: 0000000000000009
kernel: RBP: 00007fffc0177be0 R08: 00007ff336d7f77c R09: 0000000000000010
kernel: R10: 000000000000000a R11: 0000000000000246 R12: 00000000c01864b1
kernel: R13: 0000000000000009 R14: 000056381d249330 R15: 00007ff3378dc78c
kernel: Code: d0 49 8b be 80 00 00 00 48 c1 e6 0c 41 f6 46 62 04 74 4a 49 03 7e 70 4c 01 e
kernel: RIP: ttm_bo_kmap+0x1b5/0x260 [ttm] RSP: ffffa6ff4073fb90
kernel: ---[ end trace 322249015120732b ]---

Comment 35 David H. Gutteridge 2017-09-13 16:10:01 UTC
(In reply to Debarshi Ray from comment #34)
> I am not convinced that this is fixed in kernel-4.12.12-300. My F26 VM still
> gets stuck in Plymouth while booting in Boxes like it was doing before.
> Sadly, "rpm -Uvh ..." directly from Koji got rid of my good kernel, so I
> can't quickly check what the backtrace is.

The upstream bug (https://bugzilla.kernel.org/show_bug.cgi?id=196777) also indicates the fix applied isn't effective. I don't know why it works for me, but it does.

In my case, I'm running XFCE via virt-manager + QEMU. (The host OS is Debian 9.1.) My kernel is i686+PAE rather than x86_64. (Not that it matters, but one of the purposes of my VM is to build and test i686 kernels. I happened to run into this bug while testing the test kernels.)

Comment 36 Adam Williamson 2017-09-13 16:19:52 UTC
debarshi: "Sadly, "rpm -Uvh ..." directly from Koji got rid of my good kernel, so I can't quickly check what the backtrace is." - for the future, use 'dnf update *.rpm' or similar, dnf still knows what to install and what to update when using local RPMs. You can also use 'rpm -ivh' for the packages that need to be installed and 'rpm -Uvh' for ones that need to be updated, but just letting dnf do it is easier.

Comment 37 Joachim Frieben 2017-09-13 16:56:52 UTC
Bug 1483327 addresses the present issue for Fedora 26. In that case with kernel-4.12.12-300.fc26, the system still hangs at the graphical boot screen. After removing kernel option "rhgb", the system now launches the graphical login manager successfully, and GNOME on Wayland runs as expected. However, the screen exhibits a strong flicker.

Comment 38 Debarshi Ray 2017-09-13 18:02:49 UTC
(In reply to Adam Williamson from comment #36)
> debarshi: "Sadly, "rpm -Uvh ..." directly from Koji got rid of my good
> kernel, so I can't quickly check what the backtrace is." - for the future,
> use 'dnf update *.rpm' or similar, dnf still knows what to install and what
> to update when using local RPMs. You can also use 'rpm -ivh' for the
> packages that need to be installed and 'rpm -Uvh' for ones that need to be
> updated, but just letting dnf do it is easier.

Yeah, so I learnt in #fedora-desktop. :)

Luckily Boxes automatically snapshots the freshly installed VM. So it's not as dire as a completely useless VM. I just need a bit more time to play with this. I ran out of budget today.

Comment 39 Debarshi Ray 2017-09-13 18:03:48 UTC
(In reply to Joachim Frieben from comment #37)
> Bug 1483327 addresses the present issue for Fedora 26. In that case with
> kernel-4.12.12-300.fc26, the system still hangs at the graphical boot
> screen. After removing kernel option "rhgb", the system now launches the
> graphical login manager successfully, and GNOME on Wayland runs as expected.
> However, the screen exhibits a strong flicker.

I see. That's good to know. Thank you.

Comment 40 Kamil Páral 2017-09-15 13:12:05 UTC
*** Bug 1490895 has been marked as a duplicate of this bug. ***

Comment 41 Justin M. Forbes 2017-09-15 18:35:59 UTC
There is a test kernel I am building right now (I am at plumbers). When it completes, I would love to see people testing https://koji.fedoraproject.org/koji/taskinfo?taskID=21888236 and let me know if it addresses the QXL issues.

Comment 42 Norbert Jurkeit 2017-09-17 14:31:16 UTC
(In reply to Justin M. Forbes from comment #41)

My VM was installed from the 64 bit LXDE spin and has also suffered from hanging boots since kernel 4.12 if parameter rhgb is set. With kernel 4.12.13-301.fc26.x86_64 from koji this problem is fixed for me.

However there is still the minor issue that the bootsplash is only displayed during the first startup of the VM while the screen stays black between grub menu and login screen during subsequent boots. The bootsplash only reappears after rebooting the *host*. Looks as if some memory is not initialized properly.

I observed missing bootsplashs also with older 4.12 kernels. Probably their absence caused subsequent boots to succeed even with rhgb specified after an initial startup without rhgb.

Comment 43 Dominic P Geevarghese 2017-09-18 17:19:57 UTC
*** Bug 1491408 has been marked as a duplicate of this bug. ***

Comment 44 Kamil Páral 2017-09-18 17:22:03 UTC
Discussed at blocker review meeting [1]:

AcceptedFreezeException - Because all default VMs are affected, it would be great to have this solved in Beta.

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2017-09-18

Comment 45 Kamil Páral 2017-09-19 09:03:20 UTC
On F27, I'm still seeing this with kernel-4.13.2-300.fc27.x86_64, so proposing as a Final blocker:
"The release must be able host virtual guest instances of the same release. "
https://fedoraproject.org/wiki/Fedora_27_Beta_Release_Criteria#Virtualization_requirements
(This is Beta, but it was already rejected as Beta by being QXL-specific, so proposing as Final).

The problem doesn't seem to occur always, but quite frequently during boot (system hangs, and there's a traceback as in comment 34). I also had to put selinux to permissive to avoid many other issues.

On F26 I tried kernel-4.12.13-301.fc26 from comment 41 and it doesn't help. It did not hang during boot, but screen is heavily blinking and unusable in GNOME (that's bug 1491320).

Comment 46 Chuck Ebbert 2017-09-19 13:23:55 UTC
I can get it to work with Justin's test kernel from comment 41, but the graphics flicker very badly (using QXL + Spice). It's pretty much unusable.

I see the same thing with VGA graphics + VNC to access the guest display. I've been forced to run all my guests in console mode.

Comment 47 Justin M. Forbes 2017-09-19 14:39:36 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=21970731 is a new scratch build that I think will be better overall. It should fix the fail to boot, and not introduce the flicker.  Please test when it is finished.

Comment 48 Kamil Páral 2017-09-19 15:59:54 UTC
(In reply to Justin M. Forbes from comment #47)
> https://koji.fedoraproject.org/koji/taskinfo?taskID=21970731 is a new
> scratch build that I think will be better overall. It should fix the fail to
> boot, and not introduce the flicker.  Please test when it is finished.

This seems to work fine for my F26 VM. Systems boots and doesn't flicker, at least in my 5 boot attempts.

Comment 49 Chuck Ebbert 2017-09-19 16:58:26 UTC
Well, it *almost* works perfectly now. If I leave a user's desktop at the default 1024x768 resolution it's all fine. I can even change the resolution to 1280x720 in the Settings panel and that works too.

However, if I reboot and log back in as that same user (after changing the resolution) the desktop never appears. System is not locked up -- the VMM can still shut it down.

Three seconds after login, I get this in the log:

Sep 19 12:33:18 f26-1 gsd-color[1455]: failed to get edid: unable to get EDID for output
Sep 19 12:33:18 f26-1 gsd-color[1007]: unable to get EDID for xrandr-Virtual-1: unable to get EDID for output
Sep 19 12:33:18 f26-1 gsd-color[1455]: unable to get EDID for xrandr-Virtual-1: unable to get EDID for output

Comment 50 David H. Gutteridge 2017-09-20 05:18:45 UTC
(In reply to Justin M. Forbes from comment #47)
> https://koji.fedoraproject.org/koji/taskinfo?taskID=21970731 is a new
> scratch build that I think will be better overall. It should fix the fail to
> boot, and not introduce the flicker.  Please test when it is finished.

This one works fine for me (as did the previous), with XFCE via virt-manager + QEMU. Thanks!

Comment 51 Joachim Frieben 2017-09-20 05:59:38 UTC
(In reply to Justin M. Forbes from comment #47)
I cannot confirm the positive feedback of previous reporters for scratch build kernel-4.12.13-300.fc26 on a fully updated Fedora 26 virtual guest.
1. When booting with the graphical boot screen, then the system hangs after displaying the Fedora logo whereas before it would hang at an early stage of the progress indicator.
2. When removing kernel boot options "rhgb quiet", then the system hangs after issuing the message "Started User Manager for UID 42."
In both cases, it is however still possible to switch to a virtual console. The system boots correctly for kernel-4.11.8-300.fc26 or after adding kernel boot option "nomodeset".

Comment 52 Justin M. Forbes 2017-09-20 12:33:31 UTC
So that test build removed the patch that introduced the flicker, but this is the comment I got from upstream in regards to that:

"Workaround #1: turn off wayland.
Workaround #2: use virtio-vga instead. wayland doesn't use qxl 2d accel anyway.

Fundamental problem here is that the qxl virtual hardware simply doesn't support pageflip, we have to destroy + re-create the primary surface instead.  This is where the flicker comes from.

Commit "058e9f5c82 drm/qxl: simple crtc page flipping emulated using buffer copy" handles the issue with a pretty gross hack, blitting one framebuffer over the other instead of a proper primary surface update.  With atomic modesetting that doesn't work any more.

We could possibly decouple the primary surface from the drm framebuffers, so the drm framebuffers effectively become shadow framebuffers, and every display update becomes a drm framebuffer -> primary surface blit.  Not sure whenever that scheme can work properly with xorg though.  Also has a high chance to cause xorg performance regressions."

Comment 53 Kamil Páral 2017-09-20 12:55:26 UTC
*** Bug 1488043 has been marked as a duplicate of this bug. ***

Comment 54 Chuck Ebbert 2017-09-20 14:55:59 UTC
(In reply to Justin M. Forbes from comment #52)
> Workaround #2: use virtio-vga instead. wayland doesn't use qxl 2d accel
> anyway.

I assume virtio-vga is the driver simply labeled "Virtio" in the VMM settings?
All my hosts are running CentOS 7.4; that's not supported. You can configure it, but trying to start the VM gets an error saying this version of QEMU doesn't support it.

Comment 55 Joachim Frieben 2017-09-22 03:20:19 UTC
(In reply to Justin M. Forbes from comment #52)
Please keep in mind that this issue presents a -regression- compared to earlier correct behaviour: Wayland on QXL was perfectly usable prior to modifications in the 4.12 series kernels whose usefulness appears questionable.

Comment 56 Justin M. Forbes 2017-09-22 11:09:54 UTC
(In reply to Joachim Frieben from comment #55)
> (In reply to Justin M. Forbes from comment #52)
> Please keep in mind that this issue presents a -regression- compared to
> earlier correct behaviour: Wayland on QXL was perfectly usable prior to
> modifications in the 4.12 series kernels whose usefulness appears
> questionable.

Right, and the developer explained why the "pretty gross hack" that made that work does not work anymore.

Comment 57 sumantro 2017-09-24 12:25:11 UTC
This still persists on nightly 923 and Beta1.2/922.

Comment 58 Justin M. Forbes 2017-09-29 12:23:22 UTC
(In reply to sumantro from comment #57)
> This still persists on nightly 923 and Beta1.2/922.

The failure to boot still exists?

Comment 59 Joachim Frieben 2017-09-29 15:56:43 UTC
(In reply to Justin M. Forbes from comment #58)
Booting from live image Fedora-Workstation-Live-x86_64-27_Beta-1.5 including package kernel-4.13.3-301.fc27 with (additional) boot option "rhgb", graphical boot is working and the graphical login manager is eventually launched successfully. After logging in, the strong flicker reported in comment 37 is observed. Maybe it is time to remember bug 1266484 which addressed that same issue.

Comment 60 Kamil Páral 2017-10-02 12:52:52 UTC
I can confirm this problem is fixed with kernel 4.13.3-301.fc27, the system no longer hangs on boot with QXL VM. I also see it fixed with 4.12.14-300.fc26 on F26 system, although I see no plymouth splash on it for some reason.

I'm closing this bug as fixed. Please reopen if you still see hang on boot with latest kernels. For the flicker, please follow up in bug 1491320.

Comment 61 Dr. David Alan Gilbert 2017-10-02 13:03:18 UTC
Does 1491320 need to gain the block on 1396704?  It's still unusable from the video that's shown there.

Comment 62 Debarshi Ray 2017-10-05 13:46:45 UTC
My F26 VM now boots with kernel-4.13.4-300.fc26.

Comment 63 Cole Robinson 2017-10-18 14:32:46 UTC
*** Bug 1487879 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.