Bug 1794333 - Booting stops with plymouth displaying a pixel garbage on the left side of the screen
Summary: Booting stops with plymouth displaying a pixel garbage on the left side of th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: plymouth
Version: 31
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Ray Strode [halfline]
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1794331 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-23 10:27 UTC by nomaad.mailbox
Modified: 2020-04-05 03:03 UTC (History)
4 users (show)

Fixed In Version: plymouth-0.9.4-14.20200325gite31c81f.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-25 10:27:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Black boot screen with pixel garbage band on the left side of the screen (1.18 MB, image/jpeg)
2020-01-23 10:27 UTC, nomaad.mailbox
no flags Details

Description nomaad.mailbox 2020-01-23 10:27:27 UTC
Created attachment 1654806 [details]
Black boot screen with pixel garbage band on the left side of the screen

Description of problem:
From about 2 out of 10 bootings of my laptop stops with plymouth displaying a black screen pixel garbage on the left (see attached photo) at a point of booting when normally I should enter password to unlock encrypted LUKS root disks. No other operation is possible at this point but reboot with Ctrl+Alt+Del

I attempted to capture something useful by turnig on debugging for plymouth by adding plymouth:debug to the kernel line of boot entry, but I wasn't able to identify anything of use in the /var/log/plymouth-debug.log output.
By getting the boot messages shown up the following messages are the last before halt:
--- BEGINNING OF COPIED MESSAGES ---

[    6.632566] audit: type=1701 audit(1579158981.429:9): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=460 comm="plymouthd" exe="/usr/sbin/plymouthd" sig=11 res=1
[    6.649566] audit: type=1131 audit(1579158981.446:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

--- END OF COPIED MESSAGES ---

Version-Release number of selected component (if applicable):
Name        : plymouth
Version     : 0.9.4
Release     : 11.20191022git32c097c.fc31
Architecture: x86_64
Install Date: Tue 12 Nov 2019 16:17:47 CET
Group       : Unspecified
Size        : 315059
License     : GPLv2+
Signature   : RSA/SHA256, Wed 23 Oct 2019 11:28:17 CEST, Key ID 50cb390b3c3359c4
Source RPM  : plymouth-0.9.4-11.20191022git32c097c.fc31.src.rpm
Build Date  : Wed 23 Oct 2019 01:46:51 CEST
Build Host  : buildvm-04.phx2.fedoraproject.org


How reproducible:
Boot my Fedora normally a dozen of times.

Steps to Reproduce:
1. Boot Fedora normally.
2. Boot Fedora normally.
3. Boot Fedora normally.

Actual results:
2 out of 10 bootings stop with a black screen with pixel garbage on the left side of the screen.

Expected results:
Continuous booting.

Additional info:
The HW is a MacBook Air 7,2 Early 2015 with i5-5250U CPU and Intel HD Graphics 6000 (i915) graphics controller and an external DELL U2718Q display is attached to the Thunderbolt port and is displaying the same as the built-in screen.

Comment 1 nomaad.mailbox 2020-01-24 13:21:54 UTC
*** Bug 1794331 has been marked as a duplicate of this bug. ***

Comment 2 Hans de Goede 2020-01-25 14:07:15 UTC
I suspect that what you are seeing is not a plymouth bug, but a problem with the kernel video (i915) drivers.

Perhaps this problem has started with the recent switch to the 5.4.x kernels hitting Fedora 31 updates. Can you try switching back to a 5.3.x kernel and see if that helps?

Comment 3 nomaad.mailbox 2020-01-27 08:47:10 UTC
I installed kernel version 5.3.7-301.fc31.x86_64 and the first time I started it the issue has happened, but there was a slight difference: the pixel garbage was like made up of horizontally stretched pixels.

Comment 4 nomaad.mailbox 2020-01-28 07:55:37 UTC
I though videos would probably of more help so I made one of a normal boot and a hung boot on my laptop.
You can find these here:
Normal boot: https://drive.google.com/file/d/1-fokKQW5ORf62mJts0hxJnHfPunfFKgR/view?usp=sharing
Hung boot: https://drive.google.com/file/d/1ohNOZwcHKcZDib505t2vhXhdiGRnz8oD/view?usp=sharing

It can be observed, that that pixel garbage is actually normal part of the boot process, so I suppose the difference is that during a hung boot the underlying process halts at a point where the pixel garbage can be seen on the screen.

Comment 5 nomaad.mailbox 2020-01-28 08:17:00 UTC
I made a video of the hung boot in non quiet mode too to see something of what happens:
https://drive.google.com/file/d/18rzSmn-IedurFl_gtGTW4-GVtAa3F0Ep/view?usp=sharing

Comment 6 nomaad.mailbox 2020-02-12 15:40:11 UTC
I have a suspicion that this issue is related to the fact that there is an external display connected to my laptop during the boot, which suspicion is supported by the fact that I used the laptop for a week without external display connected and in spite the machine was booted at least once a day, the issue didn't happened even a single time.

Comment 7 Hans de Goede 2020-02-12 16:03:04 UTC
Ah, yes that sounds familiar.

I recently got a thunderbolt dock to test with myself and I'm sometimes seeing a similar problem, I guess this is related to thunderbolt connected displays. I've looking into this problem (which appears to be a plymouth crash) on my to do list.

I will get back to you when I have looked into this in more detail.

Comment 8 Hans de Goede 2020-02-19 18:06:17 UTC
Ok, I've a set of patches which should fix this.

I've done a scratch-build of plymouth with these patches added, please give it a try:
https://koji.fedoraproject.org/koji/taskinfo?taskID=41654795

To install this do the following:

1) Run "rpm -qa | grep plymouth | sort"
2) Download the rpms from the link for the plymouth (sub)packages which you have installed
3) In a folder with the downloaded rpms run: "sudo rpm -Uvh plymouth*"
4) Run "sudo dracut -f" to regenerate the initrd for your currently running kernel so that the new plymouth gets added to it
5) Run "uname -a" note the kernel version
6) reboot with the external monitor connected
7) Run "uname -a" again, if the kernel version is different the kernel was upgraded since you last rebooted, go to 
step 4.

After doing the above steps, check if you can still reproduce the problem.

Comment 9 nomaad.mailbox 2020-02-27 12:42:22 UTC
I've got error as I missed to download the binary rpm of plymouth, which I just did.
Will give update after I'll have a dozen of reboots behind me.

Comment 10 nomaad.mailbox 2020-02-28 12:42:48 UTC
Regrettably I have to report that today morning when I turned on the laptop with plymouth-0.9.4-13.20191224gitd7c737d.fc31.x86_64 and with the external display connected to the Thunderbolt 2 port the issue re-appeared.

Comment 11 Hans de Goede 2020-02-29 19:17:41 UTC
(In reply to nomaad.mailbox from comment #10)
> Regrettably I have to report that today morning when I turned on the laptop
> with plymouth-0.9.4-13.20191224gitd7c737d.fc31.x86_64 and with the external
> display connected to the Thunderbolt 2 port the issue re-appeared.

That is unfortunate, for me that update does fix some issues I was seeing when plugging in / out thunderbolt connected monitors while plymouth is running. I'm afraid that I do not have any other ideas one how to fix this.

Comment 12 nomaad.mailbox 2020-03-03 07:48:34 UTC
I hope you don't mind me asking, but I was pondering if there is some way for getting meaningful log or debug from Plymouth or at least related to this issue. I think I wasn't able to grasp it completely how Plymouth's logging should work, even if I could have it create its log. I didn't found th documentation too much helpful in this regards.

Thank you in advance.

Comment 13 Hans de Goede 2020-03-03 09:04:07 UTC
(In reply to nomaad.mailbox from comment #12)
> I hope you don't mind me asking, but I was pondering if there is some way
> for getting meaningful log or debug from Plymouth or at least related to
> this issue. I think I wasn't able to grasp it completely how Plymouth's
> logging should work, even if I could have it create its log. I didn't found
> th documentation too much helpful in this regards.
> 
> Thank you in advance.

I normally add the following to the kernel commandline for debugging: "plymouth.debug=stream:/dev/null", this will cause plymouth to write log messages to /dev/null while running (so it does not write to the console and behaves as it would normally) and then on quit it will write a copy of the log (which it maintains in RAM) to /var/log/plymouth-debug.log.

Now this is only useful if: a. the machine fully boots b. plymouth misbehaves but does not crash. You can work around problem b. by changing the kernel commandline option to: "plymouth.debug=stream:/run/plymouth.log" then everything up to the crash will be in /run/plymouth.log, but if plymouth crashes inside the initrd and then gets restarted after switching root then the restart will overwrite the log.

If the machine does not fully boot, which I think is the case here then things become trickier, what I do there is:

a. Create a new partition on your main disk, often there is still some space at the end since the end is aligned to 1 MiB, this partition only needs to be say 0.3MB in my case this partition is nvme0n1p7.

b. Edit /lib/systemd/system/plymouth-start.service adding an ExecStartPre line to the [Service] section like this:

ExecStartPre=-/bin/sh -c 'sleep 5'

This is purely to delay the start of plymouth past the point where the main disk of the system is detected. Note with thunderbolt this might be enough to avoid the issue you are seeing if the issue is timing related :|

c. Add the following to the kernel commandline (adjusted for the partition which you created) :

"plymouth.debug=stream:/dev/nvme0n1p7"

d. Before rebooting do (adjusted for the partition which you created) :

sudo dd if=/dev/zero of=/dev/nvme0n1p7

WARNING: doing this on the wrong partition will cause data loss!

e. Reboot until the problem reproduces

f. On the (re)boot after the problem has reproduced edit the kernel commandline in grub before booting and remove the "plymouth.debug=stream:/dev/nvme0n1p7" so that this boot plymouth will not write there and the logs from the (previous) boot with the issue are preserved.

g. sudo dd if=/dev/nvme0n1p7 of=log.txt

And now you have a log.txt with plymouth logging from the troublesome boot + a bunch of 0 bytes at the end.

You can repeat steps d - g as often as you want to get more logs.

If you manage to gather any useful logs this way, please attach them here.

Comment 14 nomaad.mailbox 2020-03-03 10:57:16 UTC
Thank you, this method cool, I made it. Capturing the Plymouth debug output to the tiny partition works fine.

I restarted the system about 20 times but the issue didn't happen once. 
Today kernel 5.5.7-200 was installed to my system and I did 8-10 reboots with that without the issue happening. So I thought the new kernel may provided resolution to the issue so I did some reboots with 5.5.6-201 and some even with 5.5.5-200 but all the bootings went all right.
I even tried bootings without the ExecStartPre in plymouth-start.service to see if the delay provides a workaround, but couldn't reproduce the issue.

I'll come back when I can catch it - unless it won't recur for several weeks. :)

Comment 15 Hans de Goede 2020-03-03 11:02:58 UTC
Note, do not forget to remove the plymouth-debug=... line from the kernel commandline the next boot after it hangs at boot again!

Comment 16 nomaad.mailbox 2020-03-25 08:09:57 UTC
I'm happy to report that the issue did not recur even once since then.
From my perspective it's fine to consider this issue to be solved.

Thank you very much for your efforts with it and for your support!
Andras

Comment 17 Hans de Goede 2020-03-25 10:27:19 UTC
Thank you for your feedback. I'll close this bug then.

Comment 18 Fedora Update System 2020-03-26 09:25:16 UTC
FEDORA-2020-164659331d has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-164659331d

Comment 19 Fedora Update System 2020-03-27 19:12:06 UTC
FEDORA-2020-164659331d has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-164659331d`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-164659331d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2020-04-05 03:03:22 UTC
FEDORA-2020-164659331d has been pushed to the Fedora 31 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.