Bug 470471

Summary: Plymouth don't show errors during boot
Product: [Fedora] Fedora Reporter: Martin Klapetek <martin.klapetek>
Component: plymouthAssignee: Ray Strode [halfline] <rstrode>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 10CC: cdahlin, jrb, kernel-maint, notting, rstrode, wb8rcr
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-18 06:46:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Klapetek 2008-11-07 11:27:21 UTC
Description of problem:
Sometimes my powernow_k8 module fails to start (caught signal 11) and the plymouth stops animating, but I have no idea it happened, the only clue that something is wrong is the stopped animation


Version-Release number of selected component (if applicable):
plymouth-0.6.0-0.2008.10.30.4.fc10.x86_64

How reproducible:
Always when there's an error during boot

Steps to Reproduce:
1. Hard to tell howto reproduce, my powernow_k8 module mostly fails if I shut down the notebook with holding the power button instead of normal power off. Then it always fails to load the powernow_k8

  
Actual results:
Plymouth stops animating, the progressbar stops and the user has no idea of what's going on. I can press the Esc and see what happened, but I think the user should be notified of this

Expected results:
Notify the user that something went wrong during boot

Additional info:
Happens on Fedora 10 Preview x86_64 on hp Pavilion tx2500 tablet-pc

Comment 1 Ray Strode [halfline] 2008-11-07 15:03:18 UTC
I don't know much about powernow_k8.

Is it a kernel module that's oopsing?  You say it caught signal 11, which makes me think it's a userspace program.  When it crashes boot up doesn't continue?  Is it loaded in the initrd?

Can you tell me what it says when you press escape?

Comment 2 Martin Klapetek 2008-11-07 16:31:46 UTC
Yeah, it's a kernel module for managing cpu power consumption (I guess). Now I realized, I made a small mistake in the initial problem description (I was submitting two plymouth bugs and I swapped some details). Actually the animation does not stop, it goes on forever, the progress bar reaches it's end and stops there, but the animation still continues. 

Anyway, when it crashes, the boot doesn't continue. Only the plymouth animation continues. I guess it's loaded by the initrd although I'm not sure (how can I tell?).

It says exactly this (it may be little bit messed-up as I have czech l10n and I'm translating the output from czech...):

Begging non-interactive startup
/etc/rc5.d/S06cpuspeed: line 112: 1838 Unauthorized memory access (SIGSEGV)    /sbin/modprobe powernow-k8 2>/dev/null

The interesting thing is, that when it happens and I press escape to see it, then if I press ctrl+alt+delete it goes straight to the kdm screen, but neither the keyboard nor mouse works, so I can only shut it down manually using the power button.

Comment 3 Ray Strode [halfline] 2008-11-10 20:24:13 UTC
seems like upstart is getting hosed somehow.  (ctrl-alt-delete is causing upstart to wake up from its trance)

Comment 4 Bill Nottingham 2008-11-10 20:35:09 UTC
So, what's likely to be happening here is:

modprobe segfaults/oopses. It's entirely possible at this point the init script is hung in the kernel.

ctrl-alt-del wakes up upstart; since the first thing it does is 'stop' the currently running runlevel start, prefdm may start. However, it's also switching to runlevel 6 to reboot.

Since you've oopsed, it's unlikely you'll be able to reboot cleanly. So, you have to hard poweroff.

In the meantime, assigning to the kernel for the crash. If you can reproduce the error, I'd remove 'rhgb quiet' from the boot parameters, and see if you can get actual output from the kernel.

Comment 5 Martin Klapetek 2008-11-10 20:46:07 UTC
(In reply to comment #4)
> In the meantime, assigning to the kernel for the crash. If you can reproduce
> the error, I'd remove 'rhgb quiet' from the boot parameters, and see if you can
> get actual output from the kernel.

No need to, this kernel bug is reported separately (see #470551). I reported this because the plymouth doesn't show that something has happened and keeps animating forever, which isn't good.

Comment 6 Bill Nottingham 2008-11-10 20:52:45 UTC
Unfortunately, there's not really a good mechanism for plymouth to detect 'stuck', if it's truly hanging forever.

Comment 7 Casey Dahlin 2008-11-10 20:57:34 UTC
rhgb used to revert to the detailed view if it hadn't detected some sort of activity in a certain period of time (and would go back to graphical if the user didn't interact).

Comment 8 Ray Strode [halfline] 2008-11-10 21:21:06 UTC
yea, we had to disable that feature in rhgb.  There's no timeout that's reasonable for all systems.

Comment 9 Ray Strode [halfline] 2008-11-10 21:24:50 UTC
hmm, thinking about this more, we might be able to detect oopses...  Probably won't be able to detect panics, but that should get solved by the kernel when it sends panics over the framebuffer.

Comment 10 Ray Strode [halfline] 2008-11-11 16:12:20 UTC
*** Bug 470468 has been marked as a duplicate of this bug. ***

Comment 11 Martin Klapetek 2008-11-11 16:32:59 UTC
Okay, I'm just adding the info from the bug marked as a dupe of this one as they are considered related. The other bug is about crashing X and plymouth not reacting to it. When there's an error in the xorg.conf file, which forces the X to crash, the plymouth just keeps running, although the animations are stopped. I guess the plymouth should show some info about unsuccessful X start, maybe show the X output (the log) on the screen or something, but definitely not keep the plymouth screen as nothing happened. And I'm sure the X tried to start, cause there's a log created from the crash.

Comment 12 Bug Zapper 2008-11-26 04:57:49 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 John J. McDonough 2009-07-05 12:57:09 UTC
Even non-errors are a problem.

When a file system check is neeed, on todays large drives it can take a very long time.  When the normal <a minute boot starts to take 5 or 10 minutes, the assumption is that it is hung.  There is no clue that there is something "normal" happening.  Pressing the reset button results in the same behavior.  Selecting another kernel version results in the same behavior, until one waits long enough for the check to complete, which might take a very long time on a partition that is a significant fraction of a terabyte.

Not only is some error notification needed, simply uncommon events need notification.

Comment 14 Bug Zapper 2009-11-18 08:47:00 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Bug Zapper 2009-12-18 06:46:32 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.