Red Hat Bugzilla – Bug 470471
Plymouth don't show errors during boot
Last modified: 2009-12-18 01:46:32 EST
Description of problem:
Sometimes my powernow_k8 module fails to start (caught signal 11) and the plymouth stops animating, but I have no idea it happened, the only clue that something is wrong is the stopped animation
Version-Release number of selected component (if applicable):
Always when there's an error during boot
Steps to Reproduce:
1. Hard to tell howto reproduce, my powernow_k8 module mostly fails if I shut down the notebook with holding the power button instead of normal power off. Then it always fails to load the powernow_k8
Plymouth stops animating, the progressbar stops and the user has no idea of what's going on. I can press the Esc and see what happened, but I think the user should be notified of this
Notify the user that something went wrong during boot
Happens on Fedora 10 Preview x86_64 on hp Pavilion tx2500 tablet-pc
I don't know much about powernow_k8.
Is it a kernel module that's oopsing? You say it caught signal 11, which makes me think it's a userspace program. When it crashes boot up doesn't continue? Is it loaded in the initrd?
Can you tell me what it says when you press escape?
Yeah, it's a kernel module for managing cpu power consumption (I guess). Now I realized, I made a small mistake in the initial problem description (I was submitting two plymouth bugs and I swapped some details). Actually the animation does not stop, it goes on forever, the progress bar reaches it's end and stops there, but the animation still continues.
Anyway, when it crashes, the boot doesn't continue. Only the plymouth animation continues. I guess it's loaded by the initrd although I'm not sure (how can I tell?).
It says exactly this (it may be little bit messed-up as I have czech l10n and I'm translating the output from czech...):
Begging non-interactive startup
/etc/rc5.d/S06cpuspeed: line 112: 1838 Unauthorized memory access (SIGSEGV) /sbin/modprobe powernow-k8 2>/dev/null
The interesting thing is, that when it happens and I press escape to see it, then if I press ctrl+alt+delete it goes straight to the kdm screen, but neither the keyboard nor mouse works, so I can only shut it down manually using the power button.
seems like upstart is getting hosed somehow. (ctrl-alt-delete is causing upstart to wake up from its trance)
So, what's likely to be happening here is:
modprobe segfaults/oopses. It's entirely possible at this point the init script is hung in the kernel.
ctrl-alt-del wakes up upstart; since the first thing it does is 'stop' the currently running runlevel start, prefdm may start. However, it's also switching to runlevel 6 to reboot.
Since you've oopsed, it's unlikely you'll be able to reboot cleanly. So, you have to hard poweroff.
In the meantime, assigning to the kernel for the crash. If you can reproduce the error, I'd remove 'rhgb quiet' from the boot parameters, and see if you can get actual output from the kernel.
(In reply to comment #4)
> In the meantime, assigning to the kernel for the crash. If you can reproduce
> the error, I'd remove 'rhgb quiet' from the boot parameters, and see if you can
> get actual output from the kernel.
No need to, this kernel bug is reported separately (see #470551). I reported this because the plymouth doesn't show that something has happened and keeps animating forever, which isn't good.
Unfortunately, there's not really a good mechanism for plymouth to detect 'stuck', if it's truly hanging forever.
rhgb used to revert to the detailed view if it hadn't detected some sort of activity in a certain period of time (and would go back to graphical if the user didn't interact).
yea, we had to disable that feature in rhgb. There's no timeout that's reasonable for all systems.
hmm, thinking about this more, we might be able to detect oopses... Probably won't be able to detect panics, but that should get solved by the kernel when it sends panics over the framebuffer.
*** Bug 470468 has been marked as a duplicate of this bug. ***
Okay, I'm just adding the info from the bug marked as a dupe of this one as they are considered related. The other bug is about crashing X and plymouth not reacting to it. When there's an error in the xorg.conf file, which forces the X to crash, the plymouth just keeps running, although the animations are stopped. I guess the plymouth should show some info about unsuccessful X start, maybe show the X output (the log) on the screen or something, but definitely not keep the plymouth screen as nothing happened. And I'm sure the X tried to start, cause there's a log created from the crash.
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.
More information and reason for this action is here:
Even non-errors are a problem.
When a file system check is neeed, on todays large drives it can take a very long time. When the normal <a minute boot starts to take 5 or 10 minutes, the assumption is that it is hung. There is no clue that there is something "normal" happening. Pressing the reset button results in the same behavior. Selecting another kernel version results in the same behavior, until one waits long enough for the check to complete, which might take a very long time on a partition that is a significant fraction of a terabyte.
Not only is some error notification needed, simply uncommon events need notification.
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '10'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 10's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 10 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.