Bug 2020438 - Graphical environment startup fails on VMs with virtio graphics with kernel-5.16.0-0.rc0.20211103gitdcd68326d29b.2.fc36
Summary: Graphical environment startup fails on VMs with virtio graphics with kernel-5...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-04 21:19 UTC by Adam Williamson
Modified: 2021-11-10 00:03 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-10 00:03:47 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2021-11-04 21:19:04 UTC
In today's Rawhide, no image can boot successfully to a graphical environment in a VM with virtio graphics. This affects Workstation live, KDE live, and the installer images. It's affecting x86_64 and aarch64. This causes most openQA tests to fail.

I can reproduce this locally in virt-manager as well. If I change the VM to use qxl graphics instead, it works fine.

The symptom is that whenever the system tries to start the graphical environment, it fails and becomes mostly unresponsive. I think serial console still works, but didn't yet get things set up to be able to do anything over it. The graphical console just shows a mostly-blank screen with a stuck text cursor at top left. It's not possible to switch to any other tty or anything.

The obvious suspect that changed in today's Rawhide (yesterday's was fine) is the kernel, which went from kernel-5.15.0-60.fc36 to kernel-5.16.0-0.rc0.20211103gitdcd68326d29b.2.fc36 .

Comment 1 Adam Williamson 2021-11-04 23:01:00 UTC
OK, using an installed system and sending journal to the serial console, I caught a kernel NULL pointer dereference:

[   21.482904] BUG: kernel NULL pointer dereference, address: 0000000000000018
[   21.485778] #PF: supervisor read access in kernel mode
[   21.488392] #PF: error_code(0x0000) - not-present page
[   21.490598] PGD 0 P4D 0 
[   21.491726] Oops: 0000 [#1] PREEMPT SMP PTI
[   21.494359] CPU: 0 PID: 1 Comm: systemd Not tainted 5.16.0-0.rc0.20211103gitdcd68326d29b.2.fc36.x86_64 #1
[   21.498325] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-6.fc36 04/01/2014
[   21.501792] RIP: 0010:virtio_gpu_poll+0x21/0x100 [virtio_gpu]
[   21.504560] Code: ef 5d e9 52 72 69 d4 66 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 9f d0 01 00 00 48 8b 43 78 48 8b 68 10 48 8b 83 d8 01 00 00 <48> 83 78 18 00 0f 84 bc 00 00 00 48 85 f6 74 23 48 8b 06 4c 8d 83
[   21.513185] RSP: 0018:ffff9de3c0017e28 EFLAGS: 00010286
[   21.513983] RAX: 0000000000000000 RBX: ffff8f4222cb3000 RCX: ffff8f4206c99500
[   21.515609] RDX: 0000000000000001 RSI: ffff9de3c0017ea0 RDI: ffff8f42217e6a00
[   21.518706] RBP: ffff8f42091c4000 R08: ffff8f42064c3301 R09: 0000000000000000
[   21.520017] R10: ffff8f4204a1d600 R11: 0000000000000000 R12: ffff8f4221436080
[   21.520848] R13: ffff8f42217e6a00 R14: 0000000000000000 R15: ffff8f4200264000
[   21.521659] FS:  00007fdf7332ab40(0000) GS:ffff8f427b600000(0000) knlGS:0000000000000000
[   21.526776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.531365] CR2: 0000000000000018 CR3: 0000000105674003 CR4: 0000000000370ef0
[   21.532365] Call Trace:
[   21.532718]  <TASK>
[   21.533024]  ep_item_poll+0x2d/0x50
[   21.533517]  do_epoll_ctl+0x950/0x1020
[   21.534046]  ? ep_loop_check_proc+0xf0/0xf0
[   21.534634]  ? __x64_sys_epoll_ctl+0x51/0x70
[   21.535230]  __x64_sys_epoll_ctl+0x51/0x70
[   21.535843]  do_syscall_64+0x3b/0x90
[   21.536373]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   21.537081] RIP: 0033:0x7fdf73fb071e
[   21.537588] Code: 48 8b 0d 0d 77 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 e9 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d da 76 0e 00 f7 d8 64 89 01 48
[   21.540150] RSP: 002b:00007ffe363385b8 EFLAGS: 00000213 ORIG_RAX: 00000000000000e9
[   21.541190] RAX: ffffffffffffffda RBX: 000055ea5cde8950 RCX: 00007fdf73fb071e
[   21.542172] RDX: 000000000000008d RSI: 0000000000000001 RDI: 0000000000000004
[   21.543156] RBP: 000055ea5cde8950 R08: 000055ea5b18b120 R09: 000055ea5cc81df0
[   21.544016] R10: 00007ffe363385cc R11: 0000000000000213 R12: 0000000000000000
[   21.544871] R13: 000000000000008d R14: 000055ea5cc81e08 R15: 000055ea5cc81df0
[   21.545727]  </TASK>
[   21.546020] Modules linked in: nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr intel_rapl_msr intel_rapl_common kvm_intel sunrpc kvm iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg irqbypass snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device rapl snd_pcm joydev i2c_i801 pcspkr i2c_smbus snd_timer snd virtio_balloon soundcore lpc_ich virtio_gpu virtio_dma_buf zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw sym53c8xx virtio_console scsi_transport_spi virtio_blk virtio_net net_failover failover qemu_fw_cfg ipmi_devintf ipmi_msghandler fuse
[   21.556580] CR2: 0000000000000018
[   21.557058] ---[ end trace 43525d6d683691a9 ]---
[   21.557705] RIP: 0010:virtio_gpu_poll+0x21/0x100 [virtio_gpu]
[   21.558513] Code: ef 5d e9 52 72 69 d4 66 90 0f 1f 44 00 00 41 55 41 54 55 53 48 8b 9f d0 01 00 00 48 8b 43 78 48 8b 68 10 48 8b 83 d8 01 00 00 <48> 83 78 18 00 0f 84 bc 00 00 00 48 85 f6 74 23 48 8b 06 4c 8d 83
[   21.561537] RSP: 0018:ffff9de3c0017e28 EFLAGS: 00010286
[   21.562271] RAX: 0000000000000000 RBX: ffff8f4222cb3000 RCX: ffff8f4206c99500
[   21.563260] RDX: 0000000000000001 RSI: ffff9de3c0017ea0 RDI: ffff8f42217e6a00
[   21.564249] RBP: ffff8f42091c4000 R08: ffff8f42064c3301 R09: 0000000000000000
[   21.565242] R10: ffff8f4204a1d600 R11: 0000000000000000 R12: ffff8f4221436080
[   21.566256] R13: ffff8f42217e6a00 R14: 0000000000000000 R15: ffff8f4200264000
[   21.567243] FS:  00007fdf7332ab40(0000) GS:ffff8f427b600000(0000) knlGS:0000000000000000
[   21.568360] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.569156] CR2: 0000000000000018 CR3: 0000000105674003 CR4: 0000000000370ef0
[   21.570149] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[   21.571366] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: systemd
[   21.572559] preempt_count: 0, expected: 0
[   21.573407] RCU nest depth: 0, expected: 0
[   21.574284] INFO: lockdep is turned off.
[   21.575055] irq event stamp: 2542180
[   21.575712] hardirqs last  enabled at (2542179): [<ffffffff94e4c561>] syscall_enter_from_user_mode+0x21/0x70
[   21.577246] hardirqs last disabled at (2542180): [<ffffffff94e4b7e8>] exc_page_fault+0x38/0x2e0
[   21.578492] softirqs last  enabled at (2542176): [<ffffffff940f22f7>] __irq_exit_rcu+0x107/0x170
[   21.579705] softirqs last disabled at (2542169): [<ffffffff940f22f7>] __irq_exit_rcu+0x107/0x170
[   21.580913] CPU: 0 PID: 1 Comm: systemd Tainted: G      D          --------- ---  5.16.0-0.rc0.20211103gitdcd68326d29b.2.fc36.x86_64 #1
[   21.582578] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-6.fc36 04/01/2014
[   21.583751] Call Trace:
[   21.584099]  <TASK>
[   21.584398]  dump_stack_lvl+0x59/0x73
[   21.584913]  __might_resched.cold+0x101/0x13c
[   21.585446]  exit_signals+0x1a/0x330
[   21.585883]  do_exit+0xb8/0xc30
[   21.586289]  ? __x64_sys_epoll_ctl+0x51/0x70
[   21.586810]  rewind_stack_do_exit+0x17/0x17
[   21.587315] RIP: 0033:0x7fdf73fb071e
[   21.587752] Code: 48 8b 0d 0d 77 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 e9 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d da 76 0e 00 f7 d8 64 89 01 48
[   21.590151] RSP: 002b:00007ffe363385b8 EFLAGS: 00000213 ORIG_RAX: 00000000000000e9
[   21.591197] RAX: ffffffffffffffda RBX: 000055ea5cde8950 RCX: 00007fdf73fb071e
[   21.592225] RDX: 000000000000008d RSI: 0000000000000001 RDI: 0000000000000004
[   21.593223] RBP: 000055ea5cde8950 R08: 000055ea5b18b120 R09: 000055ea5cc81df0
[   21.594850] R10: 00007ffe363385cc R11: 0000000000000213 R12: 0000000000000000
[   21.595988] R13: 000000000000008d R14: 000055ea5cc81e08 R15: 000055ea5cc81df0
[   21.597027]  </TASK>
[   21.597394] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[   21.608200] Kernel Offset: 0x13000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   21.609763] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---

This looks like the problem.

Comment 3 Adam Williamson 2021-11-10 00:03:47 UTC
Fixed in latest Rawhide kernel, thanks.


Note You need to log in before you can comment on or make changes to this bug.