Bug 1593028

Summary: No boot messages / plymouth / decryption prompt shown on display of aarch64 VM
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: dracutAssignee: dracut-maint-list
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 29CC: airlied, berrange, bskeggs, crobinso, dracut-maint-list, ewk, hdegoede, ichavero, itamar, jarodwilson, jglisse, john.j5live, jonathan, josef, kernel-maint, kraxel, linville, mchehab, mjg59, pbonzini, pbrobinson, pjones, pwhalen, rstrode, steved, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-17 22:25:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
log with plymouth, kernel and drm messages as verbose as i have none

Description Adam Williamson 2018-06-19 21:57:20 UTC
I'm not 100% sure where to ascribe this, but halfline and airlied seem to think it comes down to qemu / edk2, so filing here for now.

It seems that when running an aarch64 UEFI VM via qemu with Fedora as both the host and the guest, boot messages are never displayed on the VM's 'screen'. This is the case whether plymouth is enabled or not (tested with 'plymouth.enable=0' kernel arg). With an installed system, on boot, we see the grub menu, select an entry, then we see a screen with these messages:

EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...

and then nothing more. If no interaction is required during boot, the next time the screen changes, we see the login prompt. If interaction is required - e.g. to enter a decryption passphrase for an encrypted storage device - we never see the prompt and cannot decrypt the storage.

Boot messages and disk decryption prompts *are* sent to the serial console, if the VM has one. This, I believe, is due to this kernel config option in the Fedora aarch64 kernel config:


which basically means 'console=ttyAMA0' is always on the cmdline for aarch64 kernel boots. However, it is expected - according to halfline - that at least if we add 'console=tty0' or similar to the cmdline, boot messages should *also* appear on the active VT during boot...but they don't.

If we boot with plymouth debugging enabled, we can see from the plymouth debugging messages that it notices the active console on tty0 (which is tty1, of course) and tries to use it, but this just doesn't appear to work:

[ply-device-manager.c:766]                 create_devices_from_terminals:checking for consoles
[ply-device-manager.c:544]                        add_consoles_from_file:opening /sys/class/tty/console/active
[ply-device-manager.c:552]                        add_consoles_from_file:reading file
[ply-device-manager.c:590]                        add_consoles_from_file:console /dev/ttyAMA0 found!
[ply-device-manager.c:590]                        add_consoles_from_file:console /dev/tty1 found!

[ply-terminal.c:603]                             ply_terminal_open:trying to open terminal '/dev/tty1'
[ply-terminal.c:396]                 ply_terminal_refresh_geometry:looking up terminal text geometry
[ply-terminal.c:410]                 ply_terminal_refresh_geometry:terminal is now 80x25 text cells
[ply-terminal.c:447]                                 get_active_vt:Remembering that initial vt is 1
[ply-device-manager.c:652]             create_text_displays_for_terminal:adding text display for terminal /dev/tty1

[ply-boot-splash.c:174]              ply_boot_splash_add_text_display:adding 80x25 text display
[ply-terminal.c:599]                             ply_terminal_open:terminal /dev/tty1 is already open

halfline pointed out that the framebuffer init messages from the kernel make an interesting comparison to the above:

[   20.319649] [drm] number of scanouts: 1
[   20.320362] [drm] number of cap sets: 0
[   20.349056] Console: switching to colour frame buffer device 128x48
[   20.360528] virtio_gpu virtio0: fb0: virtiodrmfb frame buffer device
[   20.375724] [drm] Initialized virtio_gpu 0.0.1 0 for virtio0 on minor 0

note that the resolution reported there is 128x48, but plymouth reports setting up at 80x25. "<halfline> so seems like the kernel is confused".

Here's some further discussion between halfline, pjones and airlied:

<halfline> airlied: so is it supposed to print "fb: switching to virtiodrmfb from EFI VGA" or something ?
<airlied> nope
<airlied> not if it never loads efifb
<airlied> and I don't see any mention of efifb anywhere
<halfline> ah okay
<airlied> it's likely the boot_vga detection stuff wonn't work, but there should only be one framebuffer device in the system
<halfline> it's weird that /dev/tty0 shows up as 80x25.  does vgacon get loaded on aarch64 ?
<halfline> i thought it was an x86 thing
<halfline> the thing that i wonder is, this device is clearly using efi, so why isn't it using efifb ...
<airlied> that's the dummy console
<halfline> okay, so if nothing is mapped to the console then it just lies and says 80x25
<halfline> but anyway if there's no efifb and only one fb device, ths fbcon=map:1 is a waste of time..
<halfline> is there a reason efifb wouldn't get used on an efi system when CONFIG_FB_EFI=y ?
<halfline> pjones: do you know ^ ?
<pjones> halfline: presumably the firmware isn't setting it up
<halfline> firmware is provided qemu i guess
<halfline> is there some gotcha for doing that set up with qemu ?
<halfline> are the oddly windows-looking messages at the top of https://openqa.stg.fedoraproject.org/tests/317344/file/serial0.txt useful for seeing what the firmware has set up?
<pjones> is this aarch64?  If so it's the aavmf build from edk2-aarch64
<pjones> I wonder if it knows how to do it with some video configurations but not others
<halfline> oh you're saying the fimware might be confused because it's using the spiffy new virtio gpu stuff ?
<pjones> right
<pjones> somebody might need to actually write support for it
<halfline> well shit, that's now how it's supposed to work
<halfline> *not
<halfline> code falls from the sky like mana from heaven
<pjones> c64688f36a8b3 (<kraxel>             2018-06-13 178)   INF OvmfPkg/QemuRamfbDxe/QemuRamfbDxe.inf
<pjones> 92f200c2d63c5 (<lersek>             2016-08-16 179)   INF OvmfPkg/VirtioGpuDxe/VirtioGpu.inf
<pjones> hrm, looks like upstream has that code
<halfline> yummy mana
<halfline> adamw: maybe your stuff is just too old or something
<pjones> the firmware in f28 is edk2-aarch64-20180529gitee3198e672e2-1.fc28.noarch , so it should be new enough
<pjones> I wonder if OvmfPkg/QemuVideoDxe is somehow keeping VirtioGpuDxe from taking over - can you see the firmware menus at all?
<pjones>     VgpuGop->GopModeInfo.PixelFormat = PixelBltOnly;
<halfline> yea it showed some firmware messages in the screenshots
<pjones> I don't see any other pixel format set there, so what's going on is the VirtioGpu driver only sets up a Blt() surface, not a linear framebuffer, so efifb can't support it because the Blt() interface goes away after ExitBootServices()
<halfline> EFI stub: messages
<pjones> (the driver in the firmware)
<halfline> so what you're saying is the mana is undercooked
<pjones> so that's why airlied has noticed that efifb isn't involved - because it isn't, and can't be.
<halfline> i thought it was a little chewy
<halfline> okay but the next question is
<pjones> I think that's what I'm saying, yes.  I don't know how practical making it support exposing a linear framebuffer and doing switcharoo later is or isn't.
<halfline> does efifb need to be involved
<halfline> seemes like it shouldn't be
<halfline> needed
<pjones> I doubt it
<pjones> since if it is, the difference is going to be that you'll get graphics for half of boot and then whatever driver isn't working right now won't work later in the boot process.
<pjones> I'm just helping you understand why this isn't my problem ;)
<halfline> adamw: shoot kraxel a mail and see if he'll debug it
<halfline> i guess he probably has familiarity with most of the relevant code
<halfline> well unless airlied is stealthly debugging it right now
<pjones> er, probably laszlo instead?
<pjones> since it's the other driver git blam says kraxel committed
<halfline> ohh
<halfline> well that's inconvenient
<airlied> yeah kraxel is definitely the better person here :P
<pjones> okay
* airlied can't handle qemu internals for long
<pjones> not qemu, edk2
<pjones> well, possibly
<pjones> I don't know.
<pjones> Cc them both ;)

For the record, the host at present is running Fedora 28, with exactly the edk2-aarch64 version discussed there, edk2-aarch64-20180529gitee3198e672e2-1.fc28.noarch . The guest images being tested are Rawhide. This bug appears to have existed at least since March 2018 (which is when we started running these tests on Fedora staging openQA).

The qemu command used for these tests usually looks like this:

/usr/bin/qemu-system-aarch64 -serial file:serial0 -soundhw ac97 -device virtio-gpu-pci -vga std -global isa-fdc.driveA= -m 3072 -machine virt -cpu host -device virtio-rng-pci -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -device virtio-scsi-pci,id=scsi0 -device virtio-blk,drive=hd1,serial=1 -drive file=raid/l1,cache=unsafe,if=none,id=hd1,format=qcow2 -drive media=cdrom,if=none,id=cd0,format=raw,file=/var/lib/openqa/share/factory/iso/Fedora-Server-dvd-aarch64-Rawhide-20180615.n.0.iso,snapshot=on -device scsi-cd,drive=cd0,bus=scsi0.0 -bios /usr/share/edk2/aarch64/QEMU_EFI-pflash.raw -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 2 -enable-kvm -no-shutdown -vnc :91,share=force-shared -qmp unix:qmp_socket,server,nowait -monitor unix:hmp_socket,server,nowait -S -monitor telnet:,server,nowait

I do note that "-device virtio-gpu-pci -vga std" seems somewhat contradictory, but in practice the VM winds up with virtio graphics. I have tried all kinds of variations on the above, none of which make things better.

Just '-vga std' results in the VM never successfully initializing graphics, whichs seems to be expected, according to this mail from Cole Robinson: https://www.redhat.com/archives/libvir-list/2017-June/msg01277.html - "arm/aarch64 -M virt on KVM doesn't and will never work with standard VGA card emulation."

Just passing "-device virtio-gpu-pci" (with no "-vga std") results in exactly the behaviour described in this bug, i.e. dropping "-vga std" makes no difference at all.

With the stock Fedora qemu package, "-vga virtio" does not in fact work on aarch64 - qemu startup fails with an error "qemu-system-aarch64: Virtio VGA not available". I tried compiling qemu with a patch to enable the CONFIG_VIRTIO_VGA option on aarch64; this succeeds, but doesn't help at all. Running qemu with just '-vga virtio' (no '-device virtio-gpu-pci') now works, but results in a boot where just "parallel0 console" is displayed on the screen (IIRC). Running with '-vga virtio -device virtio-gpu-pci' results in exactly the same behaviour as '-device virtio-gpu-pci' alone or '-device virtio-gpu-pci -vga std', the bug described here.

Basically, all combinations I've tried either result in the bug described here or graphics in the VM not really working at all.

This bug currently breaks all openQA tests that use encrypted storage on aarch64, so it'd be great if we could get it fixed.

Comment 1 Adam Williamson 2018-06-19 22:01:20 UTC
pwhalen, can you confirm or deny that an aarch64 VM run manually behaves the same? I don't have a local aarch64 box handy to test myself.

Comment 2 Adam Williamson 2018-06-19 22:54:52 UTC
Created attachment 1453045 [details]
log with plymouth, kernel and drm messages as verbose as i have

Comment 3 Gerd Hoffmann 2018-06-20 06:47:22 UTC
Maybe we simply have an initialization order issue here?
I see ...

  Started Forward Password Requests to Plymouth Directory Watch.

... comes before fbcon setup ...

  Console: switching to colour frame buffer device 128x48
  virtio_gpu virtio0: fb0: virtiodrmfb frame buffer device

... in the log.

Try build the aarch64 kernel with CONFIG_DRM_VIRTIO_GPU=y (which is probably a good idea anyway given we have no firmware framebuffer to use) so the framebuffer is already there when plymouth starts.

Comment 4 Gerd Hoffmann 2018-06-20 06:50:58 UTC
> <airlied> it's likely the boot_vga detection stuff wonn't work, but there
> should only be one framebuffer device in the system

Note that virtio-gpu-pci is PCI_CLASS_DISPLAY_OTHER not PCI_CLASS_DISPLAY_VGA.  Possibly this is the reason plymouth doesn't pick up the device?

Comment 5 Gerd Hoffmann 2018-06-20 06:57:40 UTC
> Just passing "-device virtio-gpu-pci" (with no "-vga std") results in
> exactly the behaviour described in this bug, i.e. dropping "-vga std" makes
> no difference at all.

You should use "-vga none -device virtio-gpu-pci".  The -vga switch is ignored on qemu-system-aarch64 but when trying to reproduce this on x86_64 with ovmf you need it.

Comment 6 Gerd Hoffmann 2018-06-20 07:09:07 UTC
> <pjones> c64688f36a8b3 (<kraxel>             2018-06-13 178)  
> INF OvmfPkg/QemuRamfbDxe/QemuRamfbDxe.inf

If you wanna play with that you need cutting edge edk2 and qemu (git master as of today) and a kernel with https://www.spinics.net/lists/linux-efi/msg14247.html

Then use "qemu -vga none -device ramfb".

Comment 7 Adam Williamson 2018-06-20 22:48:26 UTC
So...IIUC you're basically saying the problem and the difference with x86_64 here could be that there's *no* firmware framebuffer on aarch64 (well, maybe there is as of like today with all bleeding-edge bits) but there is on x86_64, with edk2?

Will try with a CONFIG_DRM_VIRTIO_GPU=y kernel, thanks.

Comment 8 Paul Whalen 2018-06-20 23:53:08 UTC
(In reply to Adam Williamson from comment #1)
> pwhalen, can you confirm or deny that an aarch64 VM run manually behaves the
> same? I don't have a local aarch64 box handy to test myself.

Confirm, we use the serial console (ttyAMA0) for output.

Comment 9 Gerd Hoffmann 2018-06-21 06:02:17 UTC
(In reply to Adam Williamson from comment #7)
> So...IIUC you're basically saying the problem and the difference with x86_64
> here could be that there's *no* firmware framebuffer on aarch64 (well, maybe
> there is as of like today with all bleeding-edge bits) but there is on
> x86_64, with edk2?

To be exact, there is a firmware framebuffer with "virtio-vga" (which uses the vga compatibility bits of the device), but there isn't one with "virtio-gpu-pci" (which is the same device without vga compatibility).  virtio-vga is not available on aarch64.  x86_64 should behave like aarch64 when using virtio-gpu-pci (see comment 5).

Comment 10 Adam Williamson 2018-06-22 06:25:33 UTC
OK, so I've been testing this today, and I'm pretty sure you're right. I built a scratch kernel with CONFIG_DRM_VIRTIO_GPU=y - https://koji.fedoraproject.org/koji/taskinfo?taskID=27769483 - and hacked up the openQA test so it installs that kernel in the installed system chroot after install is complete, before rebooting. With that change, we *do* get messages during boot, including the decryption prompt.

So, re-assigning to kernel. I guess the recommendation is to make this change in the official kernel builds at least until we have the firmware framebuffer working on aarch64 with packaged edk2, kernel and qemu?

Thanks a lot!

Comment 11 Peter Robinson 2018-06-22 16:12:26 UTC
> So, re-assigning to kernel. I guess the recommendation is to make this
> change in the official kernel builds at least until we have the firmware
> framebuffer working on aarch64 with packaged edk2, kernel and qemu?

What change is being recommended here?

Comment 12 Adam Williamson 2018-06-22 16:16:42 UTC
Changing CONFIG_DRM_VIRTIO_GPU=m to CONFIG_DRM_VIRTIO_GPU=y for aarch64, as there is currently no good way to get a firmware framebuffer in aarch64 VMs, so this is the only way to get early boot messages (inc. decryption prompts and the like) to show up on a VT.

Note this requires some other changes, the diff I wound up with is:

--- a/kernel-aarch64.config
+++ b/kernel-aarch64.config
@@ -1377,6 +1377,7 @@ CONFIG_DRM_AST=m
 # CONFIG_DRM_CDNS_DSI is not set
+# CONFIG_DRM_DEBUG_MM is not set
@@ -1402,7 +1403,7 @@ CONFIG_DRM_I2C_SIL164=m
 # CONFIG_DRM_LEGACY is not set
@@ -1480,7 +1481,7 @@ CONFIG_DRM_VC4_HDMI_CEC=y
 # CONFIG_DRM_XEN is not set
 # CONFIG_DS1682 is not set
 # CONFIG_DS1803 is not set
@@ -2166,7 +2167,7 @@ CONFIG_HZ_100=y
 # CONFIG_HZ_300 is not set
 # CONFIG_HZ_500 is not set
 # CONFIG_HZ_PERIODIC is not set
 # CONFIG_I2C_ALI1535 is not set

I suppose another possibility might be to keep building it as a module, but get dracut to load it during the initramfs phase...haven't tried that yet.

Comment 13 Adam Williamson 2018-06-22 19:47:47 UTC
Update: I think I've found a further wrinkle here, investigating the dracut angle.

I think dracut actually already tries to pull DRM modules into the initramfs. In generic mode it pulls in the whole of drivers/gpu/drm , which should ensure virtio-gpu is included...and I think it actually is, and is loaded during the initramfs phase of boot, because when booting an installer image (which has a generic initramfs) we *do* get messages from about halfway through the initramfs phase.

However, I think there may be a bug in what it does in hostonly mode. It does this:

        for i in /sys/bus/{pci/devices,soc/devices/soc?}/*/modalias; do
            [[ -e $i ]] || continue
            if hostonly="" dracut_instmods --silent -s "drm_crtc_init" -S "iw_handler_get_spy" $(<$i); then
                if strstr "$(modinfo -F filename $(<$i) 2>/dev/null)" radeon.ko; then
                    hostonly='' instmods amdkfd

I think the bug is that we're missing a bit in that 'for' loop. *It doesn't look in /sys/bus/virtio/devices*...which is exactly where the entry for virtio-gpu will be.

So, I'm currently testing a patch to add virtio/devices to that bit. If I'm right, that should cause virtio-gpu to be pulled into the host-only initramfs generated during install, and then on the installed system, we should get a framebuffer (and hence boot messages) from about halfway through the initramfs phase...which should be early enough for us to see decryption prompts.

If that's the case, we can fix this just by fixing dracut, without touching the kernel.

Comment 14 Adam Williamson 2018-06-22 20:16:27 UTC
So my test looked to confirm that fixes things, so I've gone ahead and submitted the patch upstream - https://github.com/dracutdevs/dracut/pull/418 - and done a build for Rawhide with the patch included - https://koji.fedoraproject.org/koji/taskinfo?taskID=27791964 . Should hopefully be fixed with the next compose that includes that build.

Comment 15 Jan Kurik 2018-08-14 11:21:28 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle.
Changing version to '29'.

Comment 16 Adam Williamson 2018-12-17 22:25:21 UTC
This ought to be fixed by now.

Comment 17 Adam Williamson 2018-12-20 17:08:37 UTC
This seems to be back in current Rawhide; encrypted install tests almost always fail on the decrypt prompt not showing up. Oddly it seems to work just very occasionally, but almost always fails. Am looking into it now.

Comment 18 Adam Williamson 2018-12-20 17:12:25 UTC
I'm suspecting plymouth for the new issue; this seems to have been working reliably up till about late October then broken some time between then and mid-November (hard to be more precise than that), in that timeframe I do see a new dracut, but without any obviously incriminating changes. There was also, however, a new plymouth - 0.9.4 - which comes with some very related changes (to the drm code), so that looks like a good suspect. Will file a new bug.

Comment 19 Zbigniew Jędrzejewski-Szmek 2018-12-20 17:18:01 UTC
There's a general bug/problem that prompts from systemd-ask-password are not visible on the console because they get overwritten by kernel messages. A user can always press ^L to reprint the prompt, and that works fine. Unfortunately we don't have an easy way to say "keep this line at the bottom". It's a race condition, so it might be broken by code changes even if they are not "wrong".

Comment 20 Adam Williamson 2018-12-20 17:33:03 UTC
Nah, that's not the problem: it's not that the prompt gets scrolled off the screen by other content, *nothing* is displayed. I think it's plymouth DRM mode init stuff. See https://bugzilla.redhat.com/show_bug.cgi?id=1661288 .

(Note: I think the problem you mention doesn't happen when Plymouth is active, as plymouth really *does* display password prompts 'over the top of' boot messages. I think that problem only happens when Plymouth is disabled. IMBW though!)