Bug 2269385
Summary: | rhgb breaks custom/minimal install on most filesystem layouts | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kamil Páral <kparal> | ||||||||
Component: | plymouth | Assignee: | Ray Strode [halfline] <rstrode> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 40 | CC: | awilliam, gnome-sig, hdegoede, kevin, lruzicka, robatino, rstrode, spaceboy60, zbyszek | ||||||||
Target Milestone: | --- | Flags: | kparal:
needinfo?
(rstrode) |
||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | AcceptedBlocker | ||||||||||
Fixed In Version: | plymouth-24.004.60-4.fc40 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2024-03-19 04:23:06 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 2187792 | ||||||||||
Attachments: |
|
Description
Kamil Páral
2024-03-13 15:04:47 UTC
Created attachment 2021429 [details]
journal (broken boot)
This is a journal from a broken boot (with rhgb). The system is operational, just the screen never updates. The system was rebooted with Ctrl+Alt+Del after a while.
Created attachment 2021430 [details]
journal (ok boot)
This is a journal from an OK boot (rhgb removed). Screen updates as expected.
Created attachment 2021431 [details]
list of rpms installed
Everything netinst is a release blocking deliverable, proposing for a blocker discussion. Oh jeez, I actually saw this on my test system when verifying the firmware RAID bug fix, but I was in a hurry and didn't think much of it, figured it was just a weird blip... Can you produce 'success' by doing an install from Everything boot iso, but going through custom partitioning and setting the filesystem to XFS (but otherwise letting it create the partitions for you)? Uhhhh, very nice! The difference is really in the partition layout! It doesn't matter whether you use Everything or Server netinst (I checked), it only depends on the target layout: WORKS: /boot ext4 / lvm -> xfs /boot xfs / lvm -> xfs /boot ext4 / lvm -> ext4 DOESN'T WORK: /boot ext4 / btrfs /boot xfs / btrfs /boot ext4 / ext4 /boot ext4 / xfs /boot xfs / xfs Eh, it looks like it depends on the / partition, not /boot (as I assumed), and it only works if the / partition is inside LVM! Also, once you install the full Server package set, any partition layout (most probably, haven't checked everything) works. The bug only affects Custom and Minimal sets. I suspect that the file system is not a direct cause, but instead it just changes timing and exposes the issue in some other component. One thing that clearly fails is this: Mar 13 15:55:09 fedora systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup... Mar 13 15:55:09 fedora systemd[1]: Mounted sys-kernel-config.mount - Kernel Configuration File System. Mar 13 15:55:09 fedora systemd-vconsole-setup[537]: setfont: ERROR kdfontop.c:183 put_font_kdfontop: Unable to load such font with such kernel version Mar 13 15:55:09 fedora systemd-vconsole-setup[534]: /usr/bin/setfont failed with a "system error" (EX_OSERR), ignoring. Mar 13 15:55:09 fedora systemd-vconsole-setup[534]: Setting source virtual console failed, ignoring remaining ones. Mar 13 15:55:09 fedora systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup. But this should only cause the text console not to get the right fonts, it shouldn't interfere with getting a text console. There were a few patches in systemd after v255 to make handle this better, and so far we didn't backport them because it didn't seem important enough. But if you don't figure out a different reason, we can try, at least to see if it makes a difference. +4 in https://pagure.io/fedora-qa/blocker-review/issue/1521 , marking accepted. I have a couple interesting findings.
First, changing which plymouth theme is active doesn't have any effect.
Second, installing plymouth-graphics-libs resolves the problem, the bootsplash appears and the login prompt is usable.
Third, while this was true yesterday, it's not true today:
> Even installing the whole server-product-environment group on an affected system doesn't resolve the problem
Today, when I install the server group, bootsplash appears, login prompt works. When I uninstall it, it's back to the broken state. I tried to bisect whether I find a package that flips the behavior, and it looks like if I install nfs-utils and certain iwlwifi-*-firmware packages together, it flips to a working state. But it's weird and inconsistent.
From all these bits, I have a feeling that this is really a race condition, as Zbigniew suggested. And having different filesystems, or different processes/services running during boot, or having the boot files large enough (take longer to load) changes the timing, which changes whether the race condition occurs.
OK, this is really bugging me now because it sounds *super* familiar - I swear I remember the name plymouth-graphics-libs in the context of a very similar bug before. But I can't find it. I'll keep looking. ooh, okay, so I kinda suspect the changes from https://src.fedoraproject.org/rpms/plymouth/c/e08eb228aef455106511b0eb6155e17e09aced29?branch=rawhide (they were rolled into the next major version release, so they no longer exist as patches in the package, but they are in the upstream). We should probably try reverting those selectively... So, first I checked F39 Everything netinst, just to be sure - it works just fine, as expected. Now, on F40, I tried downgrading plymouth. It changes things! So it really seems to be a regression in plymouth. plymouth-22.02.122-6.fc40 [1] is the last plymouth that works. plymouth-23.358.4-6.fc40 [2] is the first plymouth that doesn't work. So it broke somewhere between those versions. I'll see if I can narrow it down more. But at this point I believe we need Ray to start looking into it. [1] https://koji.fedoraproject.org/koji/buildinfo?buildID=2322964 [2] https://koji.fedoraproject.org/koji/buildinfo?buildID=2337638 So even more precise is that this commit works: https://src.fedoraproject.org/rpms/plymouth/c/6534ca93a154ef3c49bbfe7406a63aac5120d2cf?branch=f40 (that's plymouth-22.02.122-6.fc40) And this commit doesn't: https://src.fedoraproject.org/rpms/plymouth/c/9c15b6a28ab0a8ede11b24cfa8486e534a0aa492?branch=f40 (that's plymouth-23.356.9-4.fc40) There are no further actionable commits between those two. So in order to dig further, I'd have to try git bisect on the upstream source code. Yeah, that would be my next step, set up the spec file to build git snapshots then just bisect it. I will probably do this over the weekend or on Monday if nobody else gets to it first. Bisected to https://gitlab.freedesktop.org/plymouth/plymouth/-/commit/48881ba2ef3d25fd27fd150d4d5957d4df9868e0 . Will see if that reverts cleanly. FEDORA-2024-adf0027989 (plymouth-24.004.60-4.fc40) has been submitted as an update to Fedora 40. https://bodhi.fedoraproject.org/updates/FEDORA-2024-adf0027989 FEDORA-2024-adf0027989 has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-adf0027989` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-adf0027989 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. (In reply to Fedora Update System from comment #16) > FEDORA-2024-adf0027989 (plymouth-24.004.60-4.fc40) has been submitted as an > update to Fedora 40. > https://bodhi.fedoraproject.org/updates/FEDORA-2024-adf0027989 This fixes the problem on my hardware. Reported upstream: https://gitlab.freedesktop.org/plymouth/plymouth/-/issues/249 Yeah, works for me, too. FEDORA-2024-adf0027989 (plymouth-24.004.60-4.fc40) has been pushed to the Fedora 40 stable repository. If problem still persists, please make note of it in this bug report. Still not working here what is not working? did you test with a fresh image? if you only updated the system, you also need to run `dracut -f` and reboot to see the fix. It boots to login screen and then crashes to a blank blinking screen from there all you can do is get to the command line with alt alt control F2. It was updated not from my fresh image, I will run dracut -f` later to see if it fixes the issue. That does not sound like this bug. With this bug, you saw *nothing at all* on the screen. No login prompt, no blinking cursor. dracut -f`didn't help. Maybe need to file new bug. dracut -f`didn't help. Maybe need to file new bug. Yeah, from your description I would say so. spaceboy60, please link to your new bug here, and also include information, whether downgrading plymouth* packages (and running `sudo dracut -f`) to some older version resolves the problem for you. We'll discuss there. Thanks! |