Bug 441765 - [mdraid] [patch] Boot hang in all recent Fedora kernels
[mdraid] [patch] Boot hang in all recent Fedora kernels
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
All Linux
low Severity low
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks: F10Blocker/F10FinalBlocker F9KernelBlocker
  Show dependency treegraph
 
Reported: 2008-04-09 17:24 EDT by Nicolas Mailhot
Modified: 2008-05-02 06:58 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-02 06:58:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Screen capture (235.48 KB, image/jpeg)
2008-04-09 17:28 EDT, Nicolas Mailhot
no flags Details
successful dmesg (37.18 KB, application/octet-stream)
2008-04-09 17:32 EDT, Nicolas Mailhot
no flags Details
Screen capture (125.30 KB, image/jpeg)
2008-04-11 16:00 EDT, Nicolas Mailhot
no flags Details
Blurry but complete screen capture (102.04 KB, image/jpeg)
2008-04-18 18:17 EDT, Nicolas Mailhot
no flags Details
Oops complete screen capture (127.50 KB, image/jpeg)
2008-04-19 08:58 EDT, Nicolas Mailhot
no flags Details
patch (765 bytes, text/plain)
2008-04-29 15:17 EDT, Chuck Ebbert
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 10484 None None None Never

  None (edit)
Description Nicolas Mailhot 2008-04-09 17:24:53 EDT
Description of problem:

I normally do not reboot my always-on rawhide system very often, unless I'm
building my own testing mm kernels. This has not been the case for quite a
while. Recently, however, following the death of the home set-top dvd player,
and a rainy winter day, I remembered my old gaming windows partition and
rebooted on it (also changed the system gfx card).

Getting back into linux however proved a challenge. The system would oops on
every recent rawhide kernel 9 times out of 10. Strangely enough my old mm kernel
with the associated old initrd would always boot.

I've now captured a partial oops on a picture (very difficult it scrolls out of
the screen fast). I hope it's sufficient to point investigations in some
directions. I don't know if it's a new bug or something triggered by recent
unrelated rawhide changes. The problems always occurs at udev start time, then
the system quickly gets stuck, and need a reset.

Version-Release number of selected component (if applicable):

Couldn't find a recent fedora kernel without the problem

How reproducible:

Almost always, from cold or hot boot, sometimes the boot sequence succeeds but I
haven't found a reliable way to boot so far. The old mm kernel always boots fine

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Nicolas Mailhot 2008-04-09 17:28:10 EDT
Created attachment 301898 [details]
Screen capture

The best screen capture I could produce so far. It's blurry because the screen
is scrolling and the camera captured some remanence
Comment 2 Nicolas Mailhot 2008-04-09 17:32:33 EDT
Created attachment 301899 [details]
successful dmesg

The same kernel booted successfully the next iteration. Here is the associated
dmesg. Since getting a recent fedora kernel to boot can easily take ~ 1h of
trials I counted myself lucky and stopped the attempts to capture the oops on
camera
Comment 3 Chuck Ebbert 2008-04-10 19:03:10 EDT
Can you use the boot_delay parameter to slow down the scrolling and get a clear
picture of the oops?

Try boot_delay=100 to start with.
Comment 4 Nicolas Mailhot 2008-04-11 16:00:47 EDT
Created attachment 302165 [details]
Screen capture

I'm afraid the boot option only results in a blank screen

While I were at it however I retried a picture series and this one is a bit
better I think
Comment 5 Nicolas Mailhot 2008-04-11 16:13:21 EDT
clearing NEEDINFO
Comment 6 Chuck Ebbert 2008-04-11 18:35:56 EDT
That was good enough but too much had scrolled off the screen.
Comment 7 Nicolas Mailhot 2008-04-12 04:24:02 EDT
I'm afraid than without a reliable way to slow scrolling I can't do any better.
The previous lines just scroll too fast - they always show up as a lot of
surimposed lines in pictures (much worse than my first shot). The scrolling
slows down a little there that's why I could make the picture
Comment 8 Dave Jones 2008-04-14 12:51:47 EDT
even after trying higher values for boot_delay ?
Comment 9 Nicolas Mailhot 2008-04-14 13:57:56 EDT
boot_delay didn't result in a slower boot it resulted in a blank screen and no boot
Comment 10 Nicolas Mailhot 2008-04-18 18:15:32 EDT
Did a new run of tests with 2.6.25-1.fc9.x86_64. Turns out
1. Pressing shift+page-up like mad is a somewhat reliable way to avoid the hang
2. It's an "unable to handle null pointer deference" bug, and I managed to get a
somehow blurry but readable picture of the start of the error message
Comment 11 Nicolas Mailhot 2008-04-18 18:17:13 EDT
Created attachment 302953 [details]
Blurry but complete screen capture
Comment 12 Nicolas Mailhot 2008-04-19 08:58:19 EDT
Created attachment 302996 [details]
Oops complete screen capture

This one should be as complete and clear as it could be
Comment 13 Chuck Ebbert 2008-04-22 11:24:12 EDT
I added my analysis of the failure to the upstream bug -- thank you for filing that.
Comment 14 Nicolas Mailhot 2008-04-29 13:44:39 EDT
A fix was posted in upstream's bugzilla. Please integrate it to the Fedora
kernel before F9 release.
Comment 15 Chuck Ebbert 2008-04-29 15:17:56 EDT
Created attachment 304150 [details]
patch
Comment 16 Chuck Ebbert 2008-04-30 22:03:06 EDT
Patch in 2.6.25-13
Comment 17 Nicolas Mailhot 2008-05-01 09:21:52 EDT
I confirm 2.6.25-13 fix the issues. I hope it is not restricted to F9 updates.
I'd hate to have a boot crasher in the initial F9 kernel
Comment 18 Nicolas Mailhot 2008-05-01 09:31:41 EDT
Thank you for working on it
Comment 19 Chuck Ebbert 2008-05-02 06:58:10 EDT
2.6.25-14 tagged for F9-final

Note You need to log in before you can comment on or make changes to this bug.