Bug 1661288 - Boot message display, and initial console display, sent to serial console not tty0 on aarch64 qemu VMs (due to ACPI SPCR table)
Summary: Boot message display, and initial console display, sent to serial console not...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: rawhide
Hardware: aarch64
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-20 17:29 UTC by Adam Williamson
Modified: 2022-11-08 14:56 UTC (History)
38 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
Debug patch (682 bytes, patch)
2019-04-02 17:24 UTC, Hans de Goede
no flags Details | Diff

Description Adam Williamson 2018-12-20 17:29:41 UTC
Since plymouth 0.9.4 showed up in Rawhide - I *think* - some of the openQA aarch64 tests are commonly failing. They seem to be the tests that rely on something showing up on the display during the boot process before any graphical login manager is started or a manual tty switch (ctrl-alt-Fx) is performed. Particularly, the anaconda text install test and all the tests that use disk encryption (and so need plymouth to display the decryption prompt) seem to fail often.

When the tests fail, after the bootloader, instead of anything showing up on the screen, we get the qemu 'Guest disabled display.' screen:

https://openqa.stg.fedoraproject.org/tests/442526#step/_boot_to_anaconda/4

when the test passes, it seems like this is displayed for a short time after the bootloader, then after a few seconds the screen 'comes to life' with boot messages displayed.

When openQA tests fail, openQA is set to switch to tty2 and do some diagnostics and stuff. This always seems to work - when the test hits ctrl-alt-f2, the 'Guest disabled display.' message disappears, and the tty appears.

The test history is a bit messy with failures from other causes and failed composes and stuff, so I can't pin down with complete certainty when this started to happen. But it *seems* to be around late October / early November, which puts plymouth 0.9.4 in the frame, as that showed up on 2018-11-05. And it definitely had a bunch of drm-related changes.

I will try to get some debug logs and stuff.

Comment 1 Adam Williamson 2018-12-20 17:30:36 UTC
CCings Hans, as he's behind a lot of the DRM changes. I note that there's a lot of change that landed *after* 0.9.4 too; I'm not yet sure if any of those commits seem likely to help with this or not.

Comment 2 Hans de Goede 2018-12-20 18:46:26 UTC
Hi,

In the original description you also mention anaconda text mode installs failing, that sort of rules out plymouth since plymouth is not used in that scenario.

Since you also mentioned seeing a 'Guest disabled display.' message I'm leaning towards problems with the qxl (kernel) driver, is it possible to change the display type to something other then qxl for the tests?  (you can still use spice, spice also works with the regular cirrus logic graphics emulation).

Something else which might be worth trying is adding "fbcon=nodefer" to the kernel commandline, although we have fbcon deferral in place for quite a while now, basically since aprox half way through the F29 cycle.

Regards,

Hans

Comment 3 Hans de Goede 2018-12-20 18:48:27 UTC
Ah I just saw that the summary says this is with virtio-gpu graphics, still might be worth to try with another type of virtual GPU.

Comment 4 Adam Williamson 2018-12-20 22:47:02 UTC
Hmm, I thought plymouth *was* in the installer environment - rhgb does not seem to be on the command line at present, so it won't be fully 'enabled', but it does do some stuff even without that. I'll look into it. I didn't think to check for kernel changes around the approximate time this started breaking, but indeed kernel is another obvious potential suspect.

I can test fbcon=nodefer easily enough, so I'll do that. Thanks!

Comment 5 Adam Williamson 2019-03-29 15:18:48 UTC
Hum, I seem not to have followed up on this, but for the record the bug *is* still happening on current F30 and Rawhide openQA tests. I'll try and check some of the stuff we discussed above.

Comment 6 Adam Williamson 2019-03-29 15:24:54 UTC
On the graphics device thing - the only choices openQA thinks are possible for aarch64 are virtio-gpu and 'VGA', I just tried with 'VGA' and it just doesn't work at all, the bootloader doesn't get displayed.

Comment 7 Adam Williamson 2019-03-29 16:36:15 UTC
With fbcon=nodefer , it behaves *differently*, but still not correctly. I don't get 'Guest disabled display' any more - instead just a black screen with a blinking cursor, but nothing else (no boot messages, no decryption prompt) is ever displayed. See video at https://openqa.stg.fedoraproject.org/tests/507667/file/video.ogv , skip to around 1:00 to see the end of the install process and the subsequent boot with 'fbcon=nodefer'.

Comment 8 Hans de Goede 2019-03-29 19:58:57 UTC
I'm still have the feeling that this is an issue with the virtio-gpu driver on arm64, can you try passing nomodeset on the kernel commandline; and/or add modprobe.blacklist=virtio-gpu ?

And have you tried removing rhgb from the cmdline of the non installer test-cases?

Comment 9 Adam Williamson 2019-03-29 21:18:06 UTC
trying those now (removing rhgb is the hardest to do, for...reasons...but I'll figure it out). To clear up one question from earlier, plymouth is definitely included in the installer environment, though 'rhgb' is not on the command line (so I don't expect removing it on the installed system to do anything, but I will try it).

I'm also going to try 'plymouth.enable=0'...

Comment 10 Adam Williamson 2019-03-29 21:44:00 UTC
So:

1. 'fbcon=nodefer nomodeset' fails to the blinking cursor - https://openqa.stg.fedoraproject.org/tests/507910
2. just 'nomodeset' fails to 'Guest disabled display.' - https://openqa.stg.fedoraproject.org/tests/507911
3. 'plymouth.enable=0' fails to 'Guest disabled display.' - https://openqa.stg.fedoraproject.org/tests/507912
4. 'fbcon=nodefer modprobe.blacklist=virtio-gpu' fails to the blinking cursor - https://openqa.stg.fedoraproject.org/tests/507928

Also - the installed system does not actually have 'rhgb' on the cmdline. You can see this in the bootloader screenshots from the above tests, e.g. https://openqa.stg.fedoraproject.org/tests/507911#step/disk_guided_encrypted_postinstall/2 . I think this is because it's a minimal install and plymouth isn't included in minimal; anaconda is set not to add 'rhgb' in that case.

so, not getting anywhere fast :/

Comment 11 Adam Williamson 2019-03-29 21:58:27 UTC
Hum, here is a thing. I got logs from a test that passed (didn't need anything to be visible during boot), and the same test on x86_64. On aarch64, kernel logs this:

Mar 26 08:42:09 localhost.localdomain kernel: printk: console [ttyAMA0] enabled

on x86_64, it logs this:

Mar 29 08:28:33 localhost.localdomain kernel: printk: console [tty0] enabled

which looks rather like, on aarch64, the kernel is deciding to log to the serial console and not tty0!

Hah. In fact, now I've found that, this is definitely what's going on...openQA logs the serial console messages. If you look at the log from one of our aarch64 tests:

https://openqa.stg.fedoraproject.org/tests/507911/file/serial0.txt

all the messages are there, and indeed the log ends with:

[    3.350236] [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0
[[0;32m  OK  [0m] Found device [0;1;39m/dev/disk/by-…d-caef-4492-b947-130837f25c0a[0m.

         Starting [0;1;39mCryptography Setu…caef-4492-b947-130837f25c0a[0m...


Please enter passphrase for disk luks-c63f486d-caef-4492-b947-130837f25c0a::

so, that's definitely the problem here. The question, I guess, is why does it do that on aarch64 but not x86_64? The fact that the serial console is ttyAMA0 on aarch64 but ttyS0 on x86_64 might have something to do with it, I guess?

Re-assigning to the kernel for now...

Comment 12 Adam Williamson 2019-03-29 22:15:46 UTC
........hah.

So I have found the evil culprit who introduced this problem! The notorious, dastardly no-gooder...

...all I had to do was look in a mirror.

We had actually noticed something very much like this before, and there was a bug for it (https://bugzilla.redhat.com/show_bug.cgi?id=1594402), and I had workarounds in the openQA tests.

The bug figured this was happening because of an explicit kernel config option that used to be set for aarch64 - 'CONFIG_CMDLINE="console=ttyAMA0"'. That got taken out, and the bug was closed. Then back in December 2018 - as you'll notice, right when this bug was filed...I removed the openQA workaround:

https://pagure.io/fedora-qa/os-autoinst-distri-fedora/c/f40599ee156c2e1f5c44851f96ecd15c9ebe6f75?branch=master

only somehow, despite what the commit message said, when it turned out that this broke things again I completely forgot about taking the workaround out, and just filed this bug...

So, it seems that we wind up with a console *only* on ttyAMA0 on aarch64 for some reason even *without* the 'CONFIG_CMDLINE="console=ttyAMA0"' kernel option. I don't know why - if kernel / aarch64 folks could look into that, it'd be very much appreciated. In the mean time, I'll put the workaround back into the openQA tests...

Comment 13 Hans de Goede 2019-03-30 11:37:36 UTC
Is the qemu based vm on which openQA is running perhaps using devicetree rather then UEFI ?  In devicetree it is possible the indicate the preferred/default console= value by setting stdout-path in the chosen node in devicetree:

https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/chosen.txt

Doing a "grep -R stdout-path arch/arm64/boot/dts" in the kernel tree shows that most
dts files define stdout-path and point this to the serial0 or serial1 alias.

Comment 14 Adam Williamson 2019-03-31 00:36:12 UTC
No, it's UEFI.

Comment 15 Hans de Goede 2019-03-31 07:43:44 UTC
(In reply to Adam Williamson from comment #14)
> No, it's UEFI.

Well could still be something similar, ACPI tables can embed devicetree-data and on aarch64 this is sometimes being used AFAIK and maybe UEFI has something UEFI specific which is similar?

Comment 18 Adam Williamson 2019-04-01 15:05:37 UTC
Gahh.

Comment 19 Adam Williamson 2019-04-01 15:07:24 UTC
OK, here is the real aarch64 command (sorry, posted a ppc64 one before):

/usr/bin/qemu-system-aarch64 -device virtio-gpu-pci -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw ac97 -m 3072 -machine virt -cpu host -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 2 -enable-kvm -no-shutdown -vnc :92,share=force-shared -device virtio-serial -chardev socket,path=virtio_console,server,nowait,id=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-file,filename=/var/lib/openqa/pool/2/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/2/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/2/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/2/raid/pflash-vars-overlay0,unit=1

Comment 20 Adam Williamson 2019-04-01 15:14:17 UTC
and here is an x86_64 *UEFI* one (sorry, I am awful at stuff this morning):

/usr/bin/qemu-system-x86_64 -vga std -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw ac97 -global isa-fdc.driveA= -m 2048 -cpu Nehalem -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -device usb-ehci -device usb-tablet -smp 2 -enable-kvm -no-shutdown -vnc :99,share=force-shared -device virtio-serial -chardev socket,path=virtio_console,server,nowait,id=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-file,filename=/var/lib/openqa/pool/9/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/9/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/9/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/9/raid/pflash-vars-overlay0,unit=1

Comment 21 Hans de Goede 2019-04-01 18:19:53 UTC
Ok, so I notice that the x86_64 UEFI vm is using "-vga std" where as the aarch64 one is using -device virtio-gpu-pci.

I wonder if the UEFI on aarch64 is providing an EFI framebuffer on top of the virtio-gpu device, otherwise the kernel will not have any video output / "graphical" (non serial) console and it will likely default to serial output because of that!

Adam, can you see if grub gets shown on the video-output of the virtio-GPU on aarch64 ? Something else to look at is doing: "dmesg | grep efifb" on x86_64 (real hardware) this gives:

[    0.564527] pci 0000:00:02.0: BAR 2: assigned to efifb
[    1.019222] efifb: probing for efifb
[    1.019234] efifb: showing boot graphics
[    1.019980] efifb: framebuffer at 0xc0000000, using 8100k, total 8100k
[    1.019980] efifb: mode is 1920x1080x32, linelength=7680, pages=1
[    1.019981] efifb: scrolling: redraw
[    1.019982] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0

I have a hunch that the kernel is putting its console on serial, simply because there is no video output the kernel recognizes until much later on when the virtio-gpu kernel module gets loaded from the initrd.

Comment 22 Adam Williamson 2019-04-01 18:43:59 UTC
"Adam, can you see if grub gets shown on the video-output of the virtio-GPU on aarch64 ?"

Yes it does. Otherwise the test would fail at other points (and changing the kernel options for testing would not be possible, as we do this interactively in the boot menu).

Also note that the 'workaround' I mentioned for this in openQA is simply to pass an explicit 'console=tty0' on the cmdline, which works fine; the messages show up on the console just great when we do that.

Comment 23 Adam Williamson 2019-04-01 18:45:44 UTC
er, to be clear, i mean the messages show up on the *screen* when we do that.

Comment 24 Hans de Goede 2019-04-02 07:33:16 UTC
(In reply to Adam Williamson from comment #22)
> "Adam, can you see if grub gets shown on the video-output of the virtio-GPU
> on aarch64 ?"
> 
> Yes it does. Otherwise the test would fail at other points (and changing the
> kernel options for testing would not be possible, as we do this
> interactively in the boot menu).

Ok, so grub has a way to show text, so at least the text-interface part of the
UEFI GOP (video BIOS) is working, but there could still be an issue with the
graphical mode.

(In reply to Adam Williamson from comment #23)
> er, to be clear, i mean the messages show up on the *screen* when we do that.

Right, but the question is if any messages show up *before* the initrd loads the virtio-gpu drm driver.

Can you grab a dmesg from one of the openQA aarch64 vms and attach it here?

> Also note that the 'workaround' I mentioned for this in openQA is simply to
> pass an explicit 'console=tty0' on the cmdline, which works fine; the
> messages show up on the console just great when we do that.

That forces the kernel to use the dummy-console as primary console when no
other video-output is used. I suspect that no video-output is detected during
early boot and this causes the kernel to default to the serial console when no
console= argument is specified. Note this is just a theory, if you can attach
a dmesg then I can try to verify that theory.

Comment 25 Peter Robinson 2019-04-02 11:28:56 UTC
(In reply to Adam Williamson from comment #14)
> No, it's UEFI.

To be clear UEFI and DeviceTree in this context are not mutually exclusive. In this case it's both UEFI and device tree.

Comment 26 Peter Robinson 2019-04-02 11:40:05 UTC
What is the kernel command line when you don't explicitly add tty0 from a 'dmesg | grep "Kernel command line:" '

Also the DT that qemu creates probably has a serial console specified based on some the the qemu cmd line, there seems to be a few serial options so I'm not sure what the impact they would have are:
-chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0

-device virtio-serial -chardev socket,path=virtio_console,server,nowait,id=virtio_console,logfile=virtio_console.log,logappend=on

-device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket

Also for graphics you need a usb input device, not sure if "-device nec-usb-xhci" is a valid aarch64 USB device, I think it should be something generic like "(devices, 'controller', None, {'type': 'usb', 'index': '0'})"

Comment 27 Adam Williamson 2019-04-02 14:51:34 UTC
peter: hans: https://openqa.stg.fedoraproject.org/tests/510455/file/disk_custom_btrfs_postinstall-dmesg.log should answer both your questions. That is dmesg from a test run with the workaround disabled.

Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.1.0-0.rc3.git0.1.fc31.aarch64 root=UUID=52b9e6ba-5010-4462-a8af-de827becd1e7 ro rootflags=subvol=root

Comment 28 Hans de Goede 2019-04-02 15:14:04 UTC
Right, so this confirms part of my thinking. Even though aarch64 kernels are build with efifb support, the efifb driver is not loading / binding, on x86_64 I get:

[hans@shalem linux]$ dmesg | grep efifb
[    0.568619] pci 0000:00:02.0: BAR 2: assigned to efifb
[    1.048239] efifb: probing for efifb
[    1.048251] efifb: showing boot graphics
[    1.048996] efifb: framebuffer at 0xc0000000, using 8100k, total 8100k
[    1.048997] efifb: mode is 1920x1080x32, linelength=7680, pages=1
[    1.048997] efifb: scrolling: redraw
[    1.048998] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0

In the dmesg you link to, there is nothing of the above. Hmm, not sure that
is the cause though, because this messages gets printed earlier (on my x86_64 system):

[hans@shalem linux]$ dmesg | grep console
[    0.479376] printk: console [tty0] enabled

Where as the dmesg from the aarch64 vm has:

[    0.201141] printk: console [ttyAMA0] enabled

This all suggests that the tty0 vs ttyAMA0 choice is made before the efifb gets enumerated and thus the lack of efifb support on aarch64 is not the cause of this.

Still the lack of efifb support on aarch64 is a problem, this means that if the initrd somehow does not have the virtio-gpu driver (or fails to load it) and it also fails to find the rootfs the user will just see a black screen, as the kenrel relies on efifb support to show errors before a drm driver is loaded, but that is a different issue. Still something which probably should have a bug to track it. This probably is caused by screen_info.orig_video_isVGA == VIDEO_TYPE_EFI not getting set in the aarch64 vm.

As for what is the cause of this, chances are Peter is right and qemu is injecting a linux,stdout property into the dtb embedded in the ACPI tables.

Comment 29 Adam Williamson 2019-04-02 15:34:22 UTC
Right, I've been following the 'printk: console [foo] enabled' thing too. It's actually interesting that we only get *one* such message in either case, because AFAICS there's nothing in the relevant function to 'enforce' that you can only register one real console. You can't register a boot console after a real console has been registered, but AFAICS you can register as many 'real' consoles as you like, so it seems slightly interesting that in fact exactly one console gets registered in both cases, but it's a different one...

The function is register_console in kernel/printk/printk.c , btw. *Lots* of things call it, I haven't yet dug out the flow of exactly how and why and when it gets called in different situations.

BTW, here's a directly comparable dmesg.log, from running the same test on the same compose on x86_64 UEFI: https://openqa.stg.fedoraproject.org/tests/510954/file/disk_custom_btrfs_postinstall-dmesg.log

Comment 30 Adam Williamson 2019-04-02 15:47:41 UTC
So here is an interesting thing. On x86_64, we get this:

[    0.214612] Console: colour dummy device 80x25
[    0.214618] printk: console [tty0] enabled
[    0.214653] ACPI: Core revision 20190215

but on aarch64, we get only this:

[    0.000148] Console: colour dummy device 80x25
[    0.000234] ACPI: Core revision 20190215

this comes *before* ttyAMA0 is registered on aarch64, note, so it's not that tty0 isn't getting registered because ttyAMA0 got registered earlier.

Going by the text of those messages, I think that is coming from drivers/tty/vt/vt.c `con_init`, which has this at the end:

	pr_info("Console: %s %s %dx%d\n",
		vc->vc_can_do_color ? "colour" : "mono",
		display_desc, vc->vc_cols, vc->vc_rows);
	printable = 1;

	console_unlock();

#ifdef CONFIG_VT_CONSOLE
	register_console(&vt_console_driver);
#endif
	return 0;

the first part of that is what prints the "Console: colour dummy device 80x25" message, and then I think that `register_console(&vt_console_driver);` call is where we get tty0 enabled on x86_64.

So the obvious suspicion would be that CONFIG_VT_CONSOLE isn't defined on aarch64, but...it is. So it *seems* that for some reason we are hitting that `register_console()` call on both aarch64 and x86_64, but it's not actually enabling the console on aarch64...

Comment 31 Adam Williamson 2019-04-02 15:52:26 UTC
My best guess is that, somehow, on aarch64 that register_console call is bailing out in this block:

	/*
	 *	See if this console matches one we selected on
	 *	the command line.
	 */
	for (i = 0, c = console_cmdline;
	     i < MAX_CMDLINECONSOLES && c->name[0];
	     i++, c++) {
		if (!newcon->match ||
		    newcon->match(newcon, c->name, c->index, c->options) != 0) {
			/* default matching */
			BUILD_BUG_ON(sizeof(c->name) != sizeof(newcon->name));
			if (strcmp(c->name, newcon->name) != 0)
				continue;
			if (newcon->index >= 0 &&
			    newcon->index != c->index)
				continue;
			if (newcon->index < 0)
				newcon->index = c->index;

			if (_braille_register_console(newcon, c))
				return;

			if (newcon->setup &&
			    newcon->setup(newcon, c->options) != 0)
				break;
		}

		newcon->flags |= CON_ENABLED;
		if (i == preferred_console) {
			newcon->flags |= CON_CONSDEV;
			has_preferred = true;
		}
		break;
	}

	if (!(newcon->flags & CON_ENABLED))
		return;

but on x86_64 it isn't. I'm not sure why that would be yet, though.

Comment 32 Adam Williamson 2019-04-02 15:55:58 UTC
by 'bailing out' I mean, it winds up without the CON_ENABLED flag and returns on that 'return;' at the end. That's the only branch I can see where `register_console()` returns before printing the "console [foo] enabled" message but without explicitly logging why it's doing that (which seems to be what's happening here).

Comment 33 Hans de Goede 2019-04-02 17:22:02 UTC
On x86_64 (and others) that loop you quote is a no-op (all entries have c->name[0] == 0) unless you specify a console= argument on the kernel cmdline.

Instead this block is triggered on x86_64 (without a console= argument)

        if (!has_preferred) {
                if (newcon->index < 0)
                        newcon->index = 0;
                if (newcon->setup == NULL ||
                    newcon->setup(newcon, NULL) == 0) {
                        newcon->flags |= CON_ENABLED;
                        if (newcon->device) {
                                newcon->flags |= CON_CONSDEV;
                                has_preferred = true;
                        }
                }
        }

I believe that aarch64 simulates a console= argument being present by calling
add_preferred_console() (defined in the same file), which simulates a console=
argument being present, causing the above if condition to not be true and the
loop you quoted to actually become functional and match on the console=
argument.

Likely the culprit is: drivers/of/base.c: of_console_check()

/**
 * of_console_check() - Test and setup console for DT setup
 * @dn - Pointer to device node
 * @name - Name to use for preferred console without index. ex. "ttyS"
 * @index - Index to use for preferred console.
 *
 * Check if the given device node matches the stdout-path property in the
 * /chosen node. If it does then register it as the preferred console and return
 * TRUE. Otherwise return FALSE.
 */
bool of_console_check(struct device_node *dn, char *name, int index)
{
        if (!dn || dn != of_stdout || console_set_on_cmdline)
                return false;

        /*
         * XXX: cast `options' to char pointer to suppress complication
         * warnings: printk, UART and console drivers expect char pointer.
         */
        return !add_preferred_console(name, index, (char *)of_stdout_options);
}
EXPORT_SYMBOL_GPL(of_console_check);

Which gets called from:

drivers/tty/serial/serial_core.c: uart_add_one_port()

Comment 34 Hans de Goede 2019-04-02 17:24:49 UTC
Created attachment 1551099 [details]
Debug patch

If you can do a aarch64 kernel scratch-build with this patch added and then boot one of the openqa vms with the resulting kernel, then we can verify my theory that something is calling add_preferred_console() causing this problem.

Comment 35 Hans de Goede 2019-04-02 17:31:41 UTC
Never mind that scratch build, doing a grep over the entire kernel tree for "add_preferred_console", I found what seems to be the ACPI equivalent of linux.stdout in devicetree specifically for ACPI on ARM:

drivers/acpi/spcr.c:

int __init acpi_parse_spcr(bool enable_earlycon, bool enable_console)
{
       ...

        if (enable_console)
                err = add_preferred_console(uart, 0, opts + strlen(uart) + 1);
        else
                err = 0;
done:
        acpi_put_table((struct acpi_table_header *)table);
        return err;
}

This gets called from:

arch/arm64/kernel/acpi.c

With the enable_console variable being hard-coded to 'true'.

And from the dmesg of the aarch64 openqa vm:

[    0.000000] ACPI: SPCR: console: pl011,mmio,0x9000000,9600

So that is your culprit.

At which point the question becomes how to get qemu to not fill in / add a SPCR table to the ACPI tables it passes to Linux.

Comment 36 Adam Williamson 2019-04-02 20:54:50 UTC
So, it looks to me rather like qemu just unconditionally includes an SPCR table in the ACPI tables for ARM, ever since this commit:

https://git.kraxel.org/cgit/qemu/commit/?id=f264d51d8ad939d7fb339d61a8cf680ed0cb21a2

that code is basically unchanged since, there is still `build_spcr` which creates the table and it is unconditionally called by `virt_acpi_build`.

qemu folks, any thoughts on this?

Comment 37 Adam Williamson 2019-04-02 20:58:42 UTC
The upstream merge request was https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg03934.html , but it's similarly uninformative on *why* this was done.

Comment 38 Adam Williamson 2019-04-02 21:03:32 UTC
Hum, there's some interesting context in the thread for the kernel merge request for *parsing* the SPCR:

http://lkml.iu.edu/hypermail/linux/kernel/1603.3/04619.html

"'ARM Server Base Boot Requirements' [1] mentions SPCR (Serial Port Console Redirection Table) [2] as a mandatory ACPI table that specifies the configuration of serial console."

(the [1] points to http://infocenter.arm.com/help/topic/com.arm.doc.den0044c/Server_Base_Boot_Requirements_v1_1_Arm_DEN_0044C.pdf )

So it suggests this SPCR is *mandatory* on ARM. Given that, is it really correct that the kernel's behaviour on encountering it is apparently to always use only it as the console, unless something else is specified on the cmdline?

Comment 39 Adam Williamson 2019-04-02 21:36:10 UTC
The SPCR itself was invented by Microsoft, it seems:

https://docs.microsoft.com/en-us/windows-hardware/drivers/serports/serial-port-console-redirection-table

There's some interesting text in the explanation there:

"The table provides information about the configuration and use of the serial port or non-legacy UART interface. On a system where the BIOS or system firmware uses the serial port for console input/output, this table should be used to convey information about the settings, to ensure a seamless transition between the firmware console output and Windows EMS output."

That - especially "on a system where the BIOS or system firmware uses the serial port for console input/output" - seems somewhat to imply that the table should *only* be present when "the BIOS or system firmware uses the serial port for console input/output", i.e. its presence implies a specific configuration of the system. But then the "ARM Server Base Boot Requirements" seem to mandate its presence in all circumstances.

This seems like an unfortunate conflict: under the original (Microsoft) design it seems like it may be reasonable to assume that if an SPDR is present this indicates that console output probably should be directed to that console, but under the ARM SBBR, since the SPDR's presence is *mandatory*, it can't really be read as having any definite meaning about where the administrator of the system in question actually wants console output to go.

Comment 40 Peter Robinson 2019-04-03 01:30:06 UTC
(In reply to Adam Williamson from comment #39)
> The SPCR itself was invented by Microsoft, it seems:
> 
> https://docs.microsoft.com/en-us/windows-hardware/drivers/serports/serial-
> port-console-redirection-table
> 
> There's some interesting text in the explanation there:

That's fascinating, except SPCR is part of the ACPI standard so is likely completely irrelevant in this situation because I don't believe that qemu generates ACPI Tables but rather a Device Tree, although it's possible it's doing ACPI

Comment 41 Adam Williamson 2019-04-03 01:33:48 UTC
I'm pretty sure it does, we found the specific part of qemu where it happens. See https://git.kraxel.org/cgit/qemu/commit/?id=f264d51d8ad939d7fb339d61a8cf680ed0cb21a2 .

Comment 42 Adam Williamson 2019-04-03 01:36:06 UTC
also note the log line Hans pointed out in #c35:

[    0.000000] ACPI: SPCR: console: pl011,mmio,0x9000000,9600

Comment 43 ardb 2019-04-03 03:22:16 UTC
Note that on bare metal ARM platforms in edk2-platforms, we typically include the ConsolePrefDxe driver which permits the user to configure whether the graphical or serial console is the preferred console when both are available. This code removes the SPCR table (and the /chosen/stdout-path DT property) if a EFI framebuffer is available and the preference is set to 'graphical'.

It seems to me that we need something similar for QEMU as well.

As for the memory mapped EFI framebuffer (as opposed to the Blt() only one) - we now have Gerd's ramfb code supported as well, so if we enable virtio-gpu with the ramfb extension, we will have our early framebuffer back under Linux (and note that we now have support for earlycon=efifb in Linux as well, both for x86 and ARM)

Comment 44 Eric Auger 2019-04-03 09:43:07 UTC
Yes the SPCR table is currently unconditionnally added in the RSDT table. So if I understand correctly you would need us to hide this table if there is a virtio-gpu-pci + ramfb devices instantiated in the qemu CLI. Is that correct?  Indeed the SBBR mentions the SPCR as mandatory ACPI tables but the description also ways it is for headless operations. If we make sure all operations can use a display, I guess we can remove it?

Comment 45 Leif Lindholm 2019-04-03 09:48:49 UTC
(In reply to Eric Auger from comment #44)
> Indeed the SBBR mentions the SPCR as mandatory ACPI tables but the
> description also ways it is for headless operations. If we make sure all
> operations can use a display, I guess we can remove it?

My interpretation was always that it was mandatory for the firmware to *be able to* expose an SPCR, not that it always had to do so.

Comment 46 ardb 2019-04-03 10:37:07 UTC
(In reply to Eric Auger from comment #44)
> Yes the SPCR table is currently unconditionnally added in the RSDT table. So
> if I understand correctly you would need us to hide this table if there is a
> virtio-gpu-pci + ramfb devices instantiated in the qemu CLI. Is that
> correct?

I think we should be able to handle that entirely in the guest firmware, although simply incorporating the ConsolePrefDxe driver may be undesirable since it keeps it configuration in a NV variable that QEMU cannot manipulate directly.

>  Indeed the SBBR mentions the SPCR as mandatory ACPI tables but the
> description also ways it is for headless operations. If we make sure all
> operations can use a display, I guess we can remove it?

Yes, if the language needs to be clarified we should do so, but it makes no sense for the SPCR to exist if the serial port is not in fact the preferred console.

Comment 47 Adam Williamson 2019-04-03 14:49:53 UTC
The language in the SBBR is slightly unclear for being so concise, I guess. What it specifically says is "The following tables are mandatory for all compliant systems." It doesn't define exactly what "are mandatory" means. To me the most simple reading is "must be included", but Leif's reading that it means the firmware "must be able to include it" does also seem supportable, so to that extent it does seem ambiguous. A clarification would probably be a good idea.

BTW, one thing I'm curious about here - what is the purpose of this 'preferred console' mechanism which makes the anointed console the Chosen One? As a dumb QA guy it seems to me that it'd make sense for the kernel to just shove the messages at *both* the serial console specified in an SPCR *and* at tty0, just in case. If it turns out nothing is listening to either one, where's the harm?

Comment 48 Laszlo Ersek 2019-04-03 15:13:09 UTC
The SPCR definition at <docs.microsoft.com> implies that, the SPCR table
is used to direct kernel ("Windows EMS") messages to the same serial
port where the firmware wrote its own messages. (And the ACPI 6.2 spec
simply defers to the Microsoft definition, via <https://uefi.org/acpi>.)

SPCR says "Console Redirection" in the name. If we don't want to
redirect the kernel console like that, we should not produce the table.

Therefore I agree with Ard's comment 46 -- SPCR should not be marked
"mandatory" in the SBBR, and QEMU should produce it only conditionally.


... What about DBG2? Both the SBBR and Microsoft call DBG2 mandatory
(and ACPI defers to Microsoft again, via <https://uefi.org/acpi>). This
requirement looks sane. QEMU doesn't produce a DBG2 table however. That
looks like a bug (or feature gap anyway). I think DBG2 doesn't imply a
console preference, so I'd be curious how the guest kernel handled DBG2.

How about:

(1) graphical console configured on the QEMU command line:
    - produce DBG2 (pointing to the PL011 that is known to the kernel as
      ttyAMA0),
    - don't produce SPCR

(2) graphical console not configured on the QEMU command line:
    - produce both DBG2 and SPCR (pointing to the same PL011 UART,
      "ttyAMA0")

From "Documentation/admin-guide/serial-console.rst":

> You can specify multiple console= options on the kernel command line.
> Output will appear on all of them. The last device will be used when
> you open ``/dev/console``. So, for example::
>
>         console=ttyS1,9600 console=tty0
>
> defines that opening ``/dev/console`` will get you the current
> foreground virtual console, and kernel messages will appear on both
> the VGA console and the 2nd serial port (ttyS1 or COM2) at 9600 baud.

So case (1) should translate to:

  console=ttyAMA0 console=tty0
          ^               ^
          |               from the absence of SPCR and the availability
          |               of a graphical console
          from DBG2

and case (2) should translate to:

  console=ttyAMA0
          ^
          from the SPCR

Comment 49 Laszlo Ersek 2019-04-03 15:22:09 UTC
(In reply to Adam Williamson from comment #47)

> just shove the messages at *both* the serial console [specified in an
> SPCR] *and* at tty0

Indeed this is the most sensible behavior (on x86_64 too!), with one
tweak:

the SPCR specifically says that it *re*directs. So if the SPCR is
present, you can't have the "both" semantics; the kernel must send
whatever would go to tty0 to ttyAMA0.

However, if you replace SPCR in the quote above with DBG2, then I agree
100%. (See my comment 48.) DBG2 doesn't seem to express a preference, it
just says "log to this port too, and pick the preferred console like you
would anyway".

Comment 50 Eric Auger 2019-04-03 16:16:18 UTC
(In reply to Laszlo Ersek from comment #48)

> (1) graphical console configured on the QEMU command line:
>     - produce DBG2 (pointing to the PL011 that is known to the kernel as
>       ttyAMA0),
>     - don't produce SPCR
> 
> (2) graphical console not configured on the QEMU command line:
>     - produce both DBG2 and SPCR (pointing to the same PL011 UART,
>       "ttyAMA0")
> 
If there is a consensus about this implementation, I can work pushing this upstream. Ard seemed to indicate he would prefer to get things implemented at FW level though; did I misunderstand anything?

Comment 51 Adam Williamson 2019-04-03 16:24:07 UTC
From a quick poke through the Linux kernel I don't think it parses DBG2 tables at present, but I am certainly not an expert, so I may be wrong.

Comment 52 Laszlo Ersek 2019-04-03 16:31:58 UTC
(In reply to Eric Auger from comment #50)
> (In reply to Laszlo Ersek from comment #48)
>
> > (1) graphical console configured on the QEMU command line:
> >     - produce DBG2 (pointing to the PL011 that is known to the
> >       kernel as ttyAMA0),
> >     - don't produce SPCR
> >
> > (2) graphical console not configured on the QEMU command line:
> >     - produce both DBG2 and SPCR (pointing to the same PL011 UART,
> >       "ttyAMA0")
> >
> If there is a consensus about this implementation, I can work pushing
> this upstream.

It needs to play together with the guest kernel though, and I have no
clue how the kernel actually handles DBG2.

> Ard seemed to indicate he would prefer to get things implemented at FW
> level though; did I misunderstand anything?

You probably refer to the beginning of comment 46,

> I think we should be able to handle that entirely in the guest
> firmware, although simply incorporating the ConsolePrefDxe driver may
> be undesirable since it keeps it configuration in a NV variable that
> QEMU cannot manipulate directly.

but there's a worse problem than just the NV variable -- in the
environment where ConsolePrefDxe currently runs, the firmware *owns*
SPCR (see comment 43). But on QEMU, SPCR is owned (generated) by QEMU,
and the firmware only installs it, blindly. Keeping the firmware in the
dark about specific ACPI artifacts generated by QEMU was a design goal
-- if not *the* design goal -- of the ACPI linker/loader. Thus the
firmware shouldn't attempt to hide SPCR if QEMU generates it.

If the conditions under (1) and (2) are not approprite, you could always
introduce machine properties, for influencing SPCR generation more
directly on the QEMU command line.

Comment 53 Al Stone 2019-04-24 01:13:59 UTC
Let's be a little careful with the ACPI tables here.

Yes, the SPCR is mandated by the SBBR for all ARM-based *servers* -- if you
want to run RHEL on it, it must have an SPCR, just as you must use UEFI and
ACPI.  The reason is that not all ARM servers are consistent in how the hardware
provides a serial console; some use a pl011, some use the same UART as x86, 
some are just weird, but the SPCR makes sure that we actually have a serial
console of some sort.  Since the SBBR also assumes that servers are in general
headless, a console seemed kind of important.

Note that this also allows us to not use console= on the command line, just like
on x86 -- something other QA and provisioning systems found horribly confusing.

Further, the SPCR is indeed owned by Microsoft.  It is one of a special class
of tables in the ACPI spec that are not defined in the spec, but are used by
all who need them -- if they have permission to do so.  For the SPCR, we spent
a great deal of time and effort to secure that permission for Linux in general
and the spec in particular.  Hence, we can freely use the definition and provide
an implementation.

The DBG2 is not as clear cut, and is also owned by Microsoft.  It is not used
by ARM because (1) the licensing is not clear, and (2) it was does not carry all
the console information needed; the SPCR does have clear licensing and does have
all the info needed.  There are references to DBG2 in the kernel due to ACPICA,
the upstream reference implementation for ACPI, and for all I know we have uses
of it in drivers; I would recommend against doing that.

The other fun bit is that x86_64 *also* supports the SPCR now (added about 4.18,
maybe?).  In my mind, the most sensible thing to do is provide an SPCR for everyone
that uses ACPI and avoid having to supply things on the command line completely.
It might also avoid all the differing assumptions the architectures make about 
what console is used when given in SPCR, versus on the command line, versus not
provided at all.

At any rate, I would argue that an SPCR *must* be provided on arm64 if it is SBBR
compliant.  That was my intent, at least, when I used the word "mandatory".

On a side note, don't read too much into ACPI table names; Serial Port Console
Redirection (SPCR) was chosen for a particular set of circumstances in Windows
many years ago, just as DBG2 was ('2' is only because it is the successor to the
DBGP table).  The spec is a bit over 20 years old and some of the terminology
has gotten a little blurred over time.

Comment 54 Adam Williamson 2019-04-24 01:35:08 UTC
"In my mind, the most sensible thing to do is provide an SPCR for everyone
that uses ACPI and avoid having to supply things on the command line completely."

The problem is that the way the kernel currently behaves, providing an SPCR only "avoid[s] having to supply things on the command line completely" *IF YOU WANT OUTPUT TO GO TO THAT SERIAL CONSOLE*. If you want the output to go to an actual display, then providing an SPCR does the *opposite*: it means you have to pass console= whereas without one, you don't.

Comment 55 Al Stone 2019-04-24 03:14:35 UTC
(In reply to Adam Williamson from comment #54)
> "In my mind, the most sensible thing to do is provide an SPCR for everyone
> that uses ACPI and avoid having to supply things on the command line
> completely."
> 
> The problem is that the way the kernel currently behaves, providing an SPCR
> only "avoid[s] having to supply things on the command line completely" *IF
> YOU WANT OUTPUT TO GO TO THAT SERIAL CONSOLE*. If you want the output to go
> to an actual display, then providing an SPCR does the *opposite*: it means
> you have to pass console= whereas without one, you don't.

Ah, my bad.  I misunderstood and got it backwards.

Comment 56 ardb 2019-04-24 07:26:46 UTC
On Windows, DBG2 is used for the windows debugger, not for a console, so I agree that implementing DBG2 is rather pointless on a Linux system, and not only for legal reasons.

The SPCR, however, is used by Windows to identify the serial console, allowing you to interact with the system if the graphical UI is not accessible for some reason. So it is like an administrator serial rather than a console in the way we usually think about it, and the ordinary graphical UI is always enabled in parallel.

In order to mirror that behavior, and the behavior we have on Linux/x86, the SPCR should be treated by the OS as a secondary console rather than supersede all others so that tty0 still is the default console if it is available (assuming that the dummy console is not identified as one that is suitable for hosting tty[0...n]). Changing that now might break some systems though ...

So this is why I introduced the console preference DXE in Tianocore - by default, it removes the SPCR when exposing a GOP, so that the graphical console takes precedence. Note that the serial is still described in DSDT so you can still run a getty on it.

Comment 57 Laszlo Ersek 2019-04-24 11:04:54 UTC
(Re: comments 53 through 56)

The problem with the SPCR is that Linux seems to interpret it as, "you *must* redirect the console to this particular serial port", rather than, "in case you want a serial console, here's the serial port details for you". In other words, the bug appears to be Linux's mis-interpretation of the SPCR.

Can we fix that? How about a new kernel parameter that requests "standards conformant SPCR handling". The default (buggy) behavior doesn't change, so new kernels that don't receive the parameter keep working the same. Old kernels that don't recognize the parameter shrug it off if the parameter is specified. New kernels can exhibit the right behavior when the parameter is specified. In Fedora & RHEL, we'd make the new parameter default (similar to "rhgb" and/or "quiet"), in the boot loader config.

--*--

If we can't fix Linux, where should we work around it?

(1) In QEMU: QEMU owns the hardware (board) definition and the ACPI tables too.

(2) In the guest firmware: the guest firmware could grow smarts about the interaction of the SPCR, the hardware (= availability of a graphical display), and the Linux ACPI bug.

On physical hardware, only option (2) exists for working around the Linux problem (just drop "guest" from the description). Plus, the firmware is kept in tight sync with the board particulars, by the board vendor.

In a virtual machine however, I totally vote (1) -- assuming we don't want (or can't) fix Linux in the first place. Because, the "board vendor" is QEMU, and eliminating the "tight sync" with the firmware, wrt. ACPI, has been a design goal for years.

Comment 58 Ben Cotton 2019-08-13 17:03:02 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 59 Ben Cotton 2019-08-13 19:06:54 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 60 Alper Nebi Yasak 2020-03-31 12:52:49 UTC
I've recently been annoyed by this (but in my case it was the stdout-path property setting /dev/console to a serial console) and tried to fix it on the kernel side. I don't know if the way I did it is even appropriate, I'd appreciate some input on it if anyone has time to look at it:

https://lore.kernel.org/lkml/44156595-0eee-58da-4376-fd25b634d21b@gmail.com/T/

(with an s/#elif/#else/ correction on the last patch)

Comment 61 Ben Cotton 2020-11-03 16:51:26 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 62 Ben Cotton 2021-02-09 16:10:43 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 63 Adam Williamson 2021-03-17 15:58:01 UTC
So I just ran into this area again. I think I'm seeing something very similar to Alper at this point. We recently removed plymouth from the default package set for non-graphical installs due to https://bugzilla.redhat.com/show_bug.cgi?id=1933378 , but I've just noticed that's causing issues here again on aarch64 :(

Without plymouth installed, we don't seem to get any boot messages on the VT during boot *even though we're explicitly booting with console=tty0* (the workaround I put in for this bug back in 2019 and have not yet removed). The decryption prompt is winding up on the serial console.

Alper, did you ever get any traction with your patch? I don't see any replies.

Comment 64 Adam Williamson 2021-03-17 17:42:27 UTC
I've filed https://bugzilla.redhat.com/show_bug.cgi?id=1940163 for that specific problem. Will reference Alper's patch there.

Comment 65 Alper Nebi Yasak 2021-03-17 18:08:18 UTC
> Alper, did you ever get any traction with your patch? I don't see any replies.

I had sent a v2 [1] which got some attention, but I didn't work on it since then.

[1] https://lore.kernel.org/linux-serial/20200430161438.17640-1-alpernebiyasak@gmail.com/T/

Comment 66 Eric Auger 2021-08-08 05:40:22 UTC
FYI we plan to generate DBG2 ACPI table from QEMU (https://bugzilla.redhat.com/show_bug.cgi?id=1990552) since the lack of DBG2 table is detected by some kernel firmware test suite tests. if someone does object to that, being aware of legal issues for instance, please let us know. From what I understand it won't fix this BZ (which looks more related to should we generate SPCR or not, ie. generating DBG2 won't change anything actually).

Comment 67 Ben Cotton 2022-05-12 16:40:24 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 68 Adam Williamson 2022-05-14 00:39:04 UTC
I don't think anything's been done about this. At least, I don't see any indication of it upstream. I'll try and find time to confirm it's still a problem in openQA if I remove workarounds next week.

Comment 69 Peter Robinson 2022-05-14 07:31:59 UTC
(In reply to Adam Williamson from comment #68)
> I don't think anything's been done about this. At least, I don't see any
> indication of it upstream. I'll try and find time to confirm it's still a
> problem in openQA if I remove workarounds next week.

Not sure if this would be upstream fixes or Fedora settings/build options.

Comment 70 Adam Williamson 2022-05-17 00:40:28 UTC
So yeah, I just hacked out the workaround we have for this in one of the aarch64 tests (rescue mode), ran the test, and indeed we get nothing on the screen. So the workaround is still needed, the issue is still there.

Comment 71 Ray Strode [halfline] 2022-10-19 21:21:50 UTC
(In reply to Adam Williamson from comment #63)
> Without plymouth installed, we don't seem to get any boot messages on the VT
> during boot *even though we're explicitly booting with console=tty0* (the
> workaround I put in for this bug back in 2019 and have not yet removed). The
> decryption prompt is winding up on the serial console.

this bug is pretty long and I haven't read all of it (and my train is about to stop so I don't have time to do a full read through), so apologize if I'm just adding noise, but I want to point out the kernel only sends console messages to the *last* console= line specified on the kernel command line (or the default console if no console= lines is specified).

One feature of plymouth is that it broadcasts the console to all consoles specified. If you disable plymouth you lose that syndication feature. So maybe you had console=tty0 first instead of last?

Comment 72 Peter Robinson 2022-10-19 21:38:40 UTC
> this bug is pretty long and I haven't read all of it (and my train is about
> to stop so I don't have time to do a full read through), so apologize if I'm
> just adding noise, but I want to point out the kernel only sends console
> messages to the *last* console= line specified on the kernel command line
> (or the default console if no console= lines is specified).
> 
> One feature of plymouth is that it broadcasts the console to all consoles
> specified. If you disable plymouth you lose that syndication feature. So
> maybe you had console=tty0 first instead of last?

That's not my experience with Fedora minimal image (which doesn't ship with plymouth at all). Booting on a Raspberry Pi, or any other arm device, you'll get output on both a HDMI output and a serial port at the same time. It's always been that way, and if you follow the upstream discussion around printk changes that doesn't align with your statement around "If you disable plymouth you lose that syndication feature".

Comment 73 Ray Strode [halfline] 2022-10-20 12:58:44 UTC
I think, perhaps, we're miscommunicating (maybe in part because I fired of a quick message from my phone on the train). when I said "console messages" i meant "/dev/console messages" and I think when you say "output" you mean "printk output".

Comment 74 Ray Strode [halfline] 2022-10-20 12:59:51 UTC
(but realize if "quiet" is on the command line, the user should, in theory, only see /dev/console messages anyway)

Comment 75 Ray Strode [halfline] 2022-10-20 18:00:17 UTC
Just for further clarity, let me explain how I believe it's currently designed to work:

1. dmesg output goes to all consoles in /sys/class/tty/console/active
2. dmesg output is muted for the most part if quiet is on the kernel command line (quiet is default on fedora)
3. /dev/console output goes to the last console on the kernel commandline.
4. disk password requests go to /dev/console (and replies get read from /dev/console)
5. if plymouth is used, /dev/console gets redirected to plymouth, and gets syndicated to all consoles in /sys/class/tty/console/active
6. if plymouth is used, plymouth handles asking for the password directly.

so if adamw has e.g. 

console=tty0 console=ttyAMA0

on the kernel command line, and no plymouth installed, then to me it's expected the password only gets asked for on ttyAMA0.

if adamw has e.g.

console=ttyAMA0 console=tty0
or just
console=tty0

on the kernel command line, and no plymouth installed, then to me it's expected the password gets asked on the hdmi output.

Could be the current behavior isn't matching those expectations (not sure, adamw, can you chime in?), but that is how I believe it's designed to work.

Comment 76 Adam Williamson 2022-10-20 18:35:58 UTC
It's on my list to check back on your points, but it's below a lot of Fedora 37-related fires at the moment :P

Comment 77 Ray Strode [halfline] 2022-11-04 15:18:06 UTC
I did some digging on a partner private bug and it seems the difference is on arm we do:

→       →       acpi_parse_spcr(earlycon_acpi_spcr_enable, true);•

and on x86 we do:

→       acpi_parse_spcr(earlycon_acpi_spcr_enable, false);• 

Using true, means the serial console gets "preferred" status and tty0 won't get used unless console=tty0 is put on the kernel command line.
I don't think the true is necessary. The pl011 driver adds a serial console regardless via uart_add_one_port in pl011_register_port. maybe
ttyAMA0 should be a early bootconsole by default though. Just from browsing the code, it currently seems to only get set if earlycon is put on
the command line.

Comment 78 Ray Strode [halfline] 2022-11-04 15:23:15 UTC
s/the difference/a difference/  

What i'm saying is, I don't see a reason to make arm and x86 act differently. They both have a fbcon, so they both should let tty0 be the preferred console, imo. 

This doesn't prevent the user from adding console=ttyS0 if they want to explicitly use the serial console for /dev/console messages.


Note You need to log in before you can comment on or make changes to this bug.