Bug 2241632 - Netinstall ISO renders a black screen when using kickstart install (bare metal and VM)
Summary: Netinstall ISO renders a black screen when using kickstart install (bare meta...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mutter
Version: 39
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Florian Müllner
QA Contact: Fedora Extras Quality Assurance
URL: https://ngompa.fedorapeople.org/bugs/...
Whiteboard: AcceptedBlocker
Depends On:
Blocks: 2184978 F39FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2023-10-01 14:03 UTC by Neal Gompa
Modified: 2023-10-30 14:04 UTC (History)
34 users (show)

Fixed In Version: mutter-45.0-11.fc39
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-10-26 23:38:45 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
anaconda.log (14.50 KB, text/plain)
2023-10-02 12:54 UTC, Kamil Páral
no flags Details
dbus.log (3.69 KB, text/plain)
2023-10-02 12:54 UTC, Kamil Páral
no flags Details
hawkey.log (60 bytes, text/plain)
2023-10-02 12:54 UTC, Kamil Páral
no flags Details
journal.txt (561.08 KB, text/plain)
2023-10-02 12:54 UTC, Kamil Páral
no flags Details
packaging.log (36.93 KB, text/plain)
2023-10-02 12:55 UTC, Kamil Páral
no flags Details
program.log (2.15 KB, text/plain)
2023-10-02 12:55 UTC, Kamil Páral
no flags Details
storage.log (73.36 KB, text/plain)
2023-10-02 12:55 UTC, Kamil Páral
no flags Details
syslog (502.22 KB, text/plain)
2023-10-02 12:55 UTC, Kamil Páral
no flags Details
X.log (27.05 KB, text/plain)
2023-10-02 12:55 UTC, Kamil Páral
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Gitlab GNOME mutter merge_requests 3329 0 None opened Draft: x11: Manage windows after initial redirection 2023-10-17 15:30:58 UTC

Description Neal Gompa 2023-10-01 14:03:17 UTC
When trying to do a kickstart install on real hardware (especially a partial kickstart where some spokes are not configured), the installer UI does not get drawn. What happens instead is that it boots up with a black screen and the cursor. In a fully automatic install, this is just merely annoying as it goes through the installation and reboots. In a partially configured install, this means that you cannot see the spokes to configure to start the installation. The UI is there, just not drawn, which means you can still click on things and navigate the UI if you know it by memory (which is probably not something we expect people to do).

Reproducible: Always

Steps to Reproduce:
1. Download the F39 "Everything boot" ISO from nightly.fedoraproject.org
2. Boot it on a real system, modifying the boot args for install to add 'inst.dhcp inst.ks="https://ngompa.fedorapeople.org/binoc-test-fc39.ks"' to the boot arguments

Actual Results:  
Anaconda loads but nothing is drawn. If you know the layout of the installer by memory, you can navigate since all the buttons and widgets are there, they just aren't drawn properly.

Expected Results:  
Anaconda loads and the UI is drawn properly. You can navigate the UI properly without having to work from memory.

This is only reproducible on real hardware. When using KVM or VMware, it works perfectly fine.

I have been able to trigger this with every compose going back to when we branched from Rawhide. I suspect this has been around for a while... :(

Comment 1 Neal Gompa 2023-10-01 14:05:50 UTC
This is also reproducible with the Server netinstall ISO (since it's the same thing just with server branding instead...).

Comment 2 Fedora Blocker Bugs Application 2023-10-01 14:08:31 UTC
Proposed as a Blocker for 39-final by Fedora user ngompa using the blocker tracking app because:

 This violates the criterion "The installer must be able to complete an installation using all supported interfaces." as the user cannot reasonably complete installation using the supported kickstart+graphical interface.

Comment 3 Kamil Páral 2023-10-02 12:53:51 UTC
I can confirm this problem even in a libvirt VM (both BIOS and UEFI). It happened to me in 6/6 attempts. I can also reproduce it using https://fedorapeople.org/groups/qa/kickstarts/example-minimal.ks , so it's not specific to Neal's kickstart. I tested with Fedora-Everything-netinst-x86_64-39-20231002.n.0.iso. I'm attaching logs below.

Comment 4 Kamil Páral 2023-10-02 12:54:48 UTC
Created attachment 1991608 [details]
anaconda.log

Comment 5 Kamil Páral 2023-10-02 12:54:52 UTC
Created attachment 1991609 [details]
dbus.log

Comment 6 Kamil Páral 2023-10-02 12:54:55 UTC
Created attachment 1991610 [details]
hawkey.log

Comment 7 Kamil Páral 2023-10-02 12:54:59 UTC
Created attachment 1991611 [details]
journal.txt

Comment 8 Kamil Páral 2023-10-02 12:55:03 UTC
Created attachment 1991612 [details]
packaging.log

Comment 9 Kamil Páral 2023-10-02 12:55:06 UTC
Created attachment 1991613 [details]
program.log

Comment 10 Kamil Páral 2023-10-02 12:55:10 UTC
Created attachment 1991614 [details]
storage.log

Comment 11 Kamil Páral 2023-10-02 12:55:14 UTC
Created attachment 1991615 [details]
syslog

Comment 12 Kamil Páral 2023-10-02 12:55:18 UTC
Created attachment 1991616 [details]
X.log

Comment 13 František Zatloukal 2023-10-02 16:37:54 UTC
Discussed during the 2023-10-02 blocker review meeting: [1]

The decision to classify this bug as a AcceptedBlocker (Final) was made:

"This is accepted as a violation of the following criterion: "The installer must be able to complete an installation using all supported interfaces." as the user cannot reasonably complete installation using the supported kickstart+graphical interface."

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2023-10-02/f39-blocker-review.2023-10-02-16.01.log.txt

Comment 14 Neal Gompa 2023-10-02 20:39:36 UTC
So you're telling me this actually got *worse* over the course of F39? Because at the beginning, I'm pretty sure it worked in my VM environments, just not in real hardware. I'm glad it's reproducible everywhere now, though. It means we can add a test to keep this from happening again without telling people to buy a rando mini PC to run the tests. :)

Comment 15 Adam Williamson 2023-10-03 15:42:20 UTC
it may depend on the hw the VM is emulating.

Comment 16 Vladimír Slávik 2023-10-03 16:50:33 UTC
Some notes from reviewing the logs.

X.log has the following lines:
> (EE) Failed to load module "fbdev" (module does not exist, 0)
> (EE) Failed to load module "vesa" (module does not exist, 0)
> (EE) modeset(0): glamor initialization failed
This is the same on rawhide which successfully shows something.

journal has:
> display: failed to call GetCurrentState from mutter over DBUS
Ditto, same on rawhide which shows ok, also on RHEL 9.

Comment 17 Adam Williamson 2023-10-03 22:15:01 UTC
I don't think any of those messages matter.

Comment 18 Vladimír Slávik 2023-10-05 17:22:50 UTC
The deciding factor seems to be presence of "keyboard" in kickstart. With this, I get either the black screen or gnome-kiosk crash.

I tried some combinations of --vckeymap and --xlayouts, as well as multiple languages. Does not seem to matter.

Comment 19 Adam Williamson 2023-10-10 15:27:36 UTC
This was happening at least as far back as Fedora-39-20230910.n.0 - that's the oldest run the openQA video is still present for (videos from earlier runs have been garbage collected unfortunately).

Comment 20 Adam Williamson 2023-10-10 17:15:51 UTC
Based on some debugging vslavik did, we think this is actually a systemd-localed issue. The bug can be avoided by disabling every point on this codepath where anaconda makes a dbus call to ask localed to load an X layout - lines 150, 164 and 166 of pyanaconda/modules/localization/runtime.py . That code has not changed in anaconda recently.

Separately, I tested random images I had lying around and found that this broke between Fedora-Everything-netinst-x86_64-Rawhide-20230713.n.1.iso and Fedora-Everything-netinst-x86_64-39-20230828.n.0.iso . That's the window during which systemd 254 landed. So I kinda suspect something in systemd 254 broke this.

Comment 21 Adam Williamson 2023-10-12 03:43:30 UTC
hum. well. I tried Rawhide images with both systemd 253.12 and 253.1 and...they still do this. So...maybe it's not systemd? But if not, honestly, I'm not sure what *else* it might be. Hum.

Comment 22 Adam Williamson 2023-10-13 00:09:41 UTC
Also still happens on a Rawhide image with anaconda-39.23-3 (same version that was in Fedora-Everything-netinst-x86_64-Rawhide-20230713.n.1.iso ). So I'm a bit stuck now. Have to do some thinking about what else could possibly be causing this. dbus?

Comment 23 Adam Williamson 2023-10-13 01:07:47 UTC
So I at least managed to narrow the delta a bit, after realizing that Silverblue installers are affected too. I happen to have a couple of those in a narrower range:

Fedora-Silverblue-ostree-x86_64-Rawhide-20230728.n.0.iso - GOOD
Fedora-Silverblue-ostree-x86_64-39-20230815.n.0.iso - BAD

so now we're down to "something that changed between July 28 and August 15".

Comment 24 Adam Williamson 2023-10-13 01:36:34 UTC
Oooh! My latest wild shot in the dark seems to be a hit.

I built a current Rawhide image with old gnome-kiosk and mutter - gnome-kiosk-44.0-2.fc39.x86_64 and mutter-44.2-2.fc39.x86_64 . That image does not have the bug. So this appears to be a bug there, probably mutter (I had to downgrade both because gnome-kiosk is built against mutter and the soname changed).

Comment 25 Adam Williamson 2023-10-13 01:43:44 UTC
I guess I should clarify (since vslavik said he can't reproduce the bug on Rawhide) that for me the bug reproduces every single try (in a VM) with both current Rawhide and F39 images. So the fact that it works OK with a Rawhide image with downgraded mutter/gnome-kiosk clearly implicates one of those packages.

Comment 26 Adam Williamson 2023-10-13 01:54:47 UTC
Brief summary for Workstation folks: kickstart installs using a kickstart with a 'keyboard' directive often do not display the anaconda UI, they just show a blank screen (though if the kickstart is fully complete, the install will run to completion).

For me this is 100% reproducible in a VM booting with inst.ks=https://fedorapeople.org/groups/qa/kickstarts/base-net.ks .

We can prevent the bug happening by disabling all anaconda's calls to systemd-localed (via dbus) to set X keyboard layout based on the kickstart contents (this is obviously not a fix, but a significant diagnostic fact). These calls all happen (I believe, and per the logs) before the X server is started.

The bug does not happen with mutter and gnome-kiosk downgraded as per comment #24.

Comment 27 Radek Vykydal 2023-10-13 08:25:35 UTC
(In reply to Adam Williamson from comment #24)
> Oooh! My latest wild shot in the dark seems to be a hit.
> 
> I built a current Rawhide image with old gnome-kiosk and mutter -
> gnome-kiosk-44.0-2.fc39.x86_64 and mutter-44.2-2.fc39.x86_64 . That image
> does not have the bug. So this appears to be a bug there, probably mutter (I
> had to downgrade both because gnome-kiosk is built against mutter and the
> soname changed).

This seems to be in exact agreement with what we were seeing in our kickstart tests: https://github.com/rhinstaller/kickstart-tests/issues/997#issuecomment-1676845348
Sorry for being so late here, I should have followed up on the issue back then, but we had other pressing priorities at that time, plus I guess we saw the issue as quite a rare flake because in the most cases the installation went on just with the black screen, which kickstart tests can't detect.

Comment 28 Adam Williamson 2023-10-13 23:01:25 UTC
OK, so I bisected this. This is the tightest I can bisect it - builds of any of these commits cause gnome-kiosk to crash on startup:

The first bad commit could be any of:
0f88f0931c11431354556b1ffaae082048e98777
3e95609073b3a455693e19e58b365688b7f877ba
a27b9d9707b0c5ccfd6aec3e5f335937c1796429
02a436d607481492a37ad15fcc401abf6385eeff
761a254e6f8b8643ce6530e85daf041f25edc683
15b25568b29ec0e082f6a18fef550078102aaca1
We cannot bisect more!

all those commits were part of https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2445 .

Comment 29 Adam Williamson 2023-10-13 23:58:22 UTC
One more bit of data: if I hack up anaconda to wipe /etc/X11/xorg.conf.d/00-keyboard.conf - that's the file localed writes when you ask it to set an X keyboard layout - before it starts X, the bug also goes away. So it seems like the definition of the bug is more or less:


from mutter 15b25568b29ec0e082f6a18fef550078102aaca1 onwards, gnome-kiosk on X.org as launched by anaconda displays a blank screen if an /etc/X11/xorg.conf.d/00-keyboard.conf with contents like this is present:

# Written by systemd-localed(8), read by systemd-localed and Xorg. It's
# probably wise not to edit this file manually. Use localectl(1) to
# instruct systemd-localed to update it.
Section "InputClass"
        Identifier "system-keyboard"
        MatchIsKeyboard "on"
        Option "XkbLayout" "us"
        Option "XkbModel" "pc105"
EndSection

Comment 30 Adam Williamson 2023-10-14 00:54:42 UTC
Bit more progress - https://gitlab.gnome.org/GNOME/mutter/-/issues/3089#note_1868806

Comment 31 Vladimír Slávik 2023-10-16 07:38:29 UTC
> since vslavik said he can't reproduce the bug on Rawhide

Sorry, maybe that was a bit misleading. These were graphics-related log messages, which I eliminated as a successful start had the same. I could reproduce the bug on Rawhide too.

Comment 32 Adam Williamson 2023-10-17 17:13:13 UTC
sadly the posted patch does not appear to fix the bug.

Comment 33 Fedora Update System 2023-10-20 17:45:48 UTC
FEDORA-2023-16d9c333e4 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-16d9c333e4

Comment 34 Brian Morrison 2023-10-20 18:37:17 UTC
Just tried this new mutter update mutter-45.0-10.fc39.x86_64.rpm. I use cqrprop https://github.com/ok2cqr/cqrprop/releases which displays a window on the desktop with solar data in it.

With this updated mutter the window does not appear on the desktop although the window outline is shown on my bottom panel's workspace view.

Downgrading to mutter-45.0-9.fc39.x86_64.rpm makes cqrprop work normally again.

I don't know what this backport actually changed.

Comment 35 Brian Morrison 2023-10-20 18:41:25 UTC
Should have said, using GNOME Wayland desktop with all available F39 rpm updates from updates-testing.

Comment 36 Adam Williamson 2023-10-20 19:19:09 UTC
Thanks for the feedback. When it's done, can you try https://koji.fedoraproject.org/koji/taskinfo?taskID=107839077 ? That's a build that reduces the change in -10 to (hopefully) the smallest needed to fix the blocker bug we're trying to fix. It'd be good to know if it avoids the problem you saw.

Comment 37 Brian Morrison 2023-10-20 19:55:35 UTC
Tried the mutter-45.0-11 build, unfortunately I still see the same problem.

It could be something about the way cqrprop is coded, but it has never done this before now with these last two mutter packages.

Comment 38 Adam Williamson 2023-10-20 23:08:21 UTC
as the proposed fix breaks at least two other things, one of which is obviously release-blocking (you can't see anaconda on the Workstation live image), it's no good. setting back to ASSIGNED.

Comment 39 Fedora Update System 2023-10-21 02:27:51 UTC
FEDORA-2023-16d9c333e4 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-16d9c333e4`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-16d9c333e4

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 40 Adam Williamson 2023-10-21 05:27:10 UTC
Brian: new build to try - can you try https://koji.fedoraproject.org/koji/taskinfo?taskID=107860827 ? it adds another change that might solve the problem.

Comment 41 Brian Morrison 2023-10-21 11:56:08 UTC
OK, so I have installed the second build of the mutter-45.0-11 package and I now see the solar data window on the desktop with the correct contents.

Here's hoping this also fixes the anaconda and black screen problems.

Comment 42 Martin Jackson 2023-10-21 14:34:21 UTC
I too have installed mutter-45.0-11 from Koji and found that Chrome, VSCode, and Discord all render properly with it under Wayland.

Is it useful information that 45.0-10 was able to render Chrome and VSCode under an Xorg session but not under a Wayland one?

Comment 43 Adam Williamson 2023-10-21 18:12:34 UTC
It helps us confirm the issue, yes. Thanks.

Comment 44 Neal Gompa 2023-10-21 21:30:42 UTC
Could we switch Anaconda to start as an XWayland app instead of an X11 app? I would think that would work around this issue.

Comment 45 Adam Williamson 2023-10-21 23:23:40 UTC
that's way too much change. we actually have a working fix upstream now, anyway.

Comment 46 Neal Gompa 2023-10-23 16:02:08 UTC
So where's the update for the fix?

Comment 47 Adam Williamson 2023-10-23 16:21:04 UTC
I was waiting for jadahl to turn it into a proper MR that would get some review. It's not particularly urgent as we still have ARM blockers. But if there's no movement soon I'll backport it.

The working fix is https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3329#note_1874837 , an additional change on top of the changes in that MR. Scratch build at https://koji.fedoraproject.org/koji/taskinfo?taskID=107860827 (I think that's the right one).

Comment 48 Martin Jackson 2023-10-23 20:26:22 UTC
The link goes to mutter-45.0-11.fc39, which I am happy to report seems work as intended (I am writing this from a Chrome window in a wayland session). I'll be happy to provide karma to the update and/or retest when it hits bodhi

Comment 49 Adam Williamson 2023-10-24 18:54:07 UTC
The new build is now in the update, please re-test and re-karma.

Comment 50 Edgar Hoch 2023-10-24 19:59:12 UTC
Would it be possible that the update is included in a nightly build of Fedora 39 iso image (server?)? Then I can update my kickstart environment and test if the installation screen will not get black?

Comment 51 Adam Williamson 2023-10-24 20:46:35 UTC
Not before it's pushed stable, but I've already verified that part of the fix several times.

Comment 52 Fedora Update System 2023-10-25 02:52:30 UTC
FEDORA-2023-16d9c333e4 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-16d9c333e4`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-16d9c333e4

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 53 Adam Williamson 2023-10-26 21:27:20 UTC
Fix confirmed in RC-1.2.

Comment 54 Fedora Update System 2023-10-26 23:38:45 UTC
FEDORA-2023-16d9c333e4 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 55 Jonathan Billings 2023-10-27 19:42:31 UTC
Any chance we can have a new netinstall ISO published with this fix?

Comment 56 Adam Williamson 2023-10-27 23:22:56 UTC
It's already in Final RC-1.2, and nightlies from today onwards - https://openqa.fedoraproject.org/nightlies.html

Comment 57 Jonathan Billings 2023-10-30 14:04:35 UTC
Thank you! I can confirm that I see the graphical anaconda install screen when using Fedora-Everything-netinst-x86_64-39-20231030.n.0.iso


Note You need to log in before you can comment on or make changes to this bug.