Description of problem:
I normally do not reboot my always-on rawhide system very often, unless I'm
building my own testing mm kernels. This has not been the case for quite a
while. Recently, however, following the death of the home set-top dvd player,
and a rainy winter day, I remembered my old gaming windows partition and
rebooted on it (also changed the system gfx card).
Getting back into linux however proved a challenge. The system would oops on
every recent rawhide kernel 9 times out of 10. Strangely enough my old mm kernel
with the associated old initrd would always boot.
I've now captured a partial oops on a picture (very difficult it scrolls out of
the screen fast). I hope it's sufficient to point investigations in some
directions. I don't know if it's a new bug or something triggered by recent
unrelated rawhide changes. The problems always occurs at udev start time, then
the system quickly gets stuck, and need a reset.
Version-Release number of selected component (if applicable):
Couldn't find a recent fedora kernel without the problem
Almost always, from cold or hot boot, sometimes the boot sequence succeeds but I
haven't found a reliable way to boot so far. The old mm kernel always boots fine
Steps to Reproduce:
Created attachment 301898 [details]
The best screen capture I could produce so far. It's blurry because the screen
is scrolling and the camera captured some remanence
Created attachment 301899 [details]
The same kernel booted successfully the next iteration. Here is the associated
dmesg. Since getting a recent fedora kernel to boot can easily take ~ 1h of
trials I counted myself lucky and stopped the attempts to capture the oops on
Can you use the boot_delay parameter to slow down the scrolling and get a clear
picture of the oops?
Try boot_delay=100 to start with.
Created attachment 302165 [details]
I'm afraid the boot option only results in a blank screen
While I were at it however I retried a picture series and this one is a bit
better I think
That was good enough but too much had scrolled off the screen.
I'm afraid than without a reliable way to slow scrolling I can't do any better.
The previous lines just scroll too fast - they always show up as a lot of
surimposed lines in pictures (much worse than my first shot). The scrolling
slows down a little there that's why I could make the picture
even after trying higher values for boot_delay ?
boot_delay didn't result in a slower boot it resulted in a blank screen and no boot
Did a new run of tests with 2.6.25-1.fc9.x86_64. Turns out
1. Pressing shift+page-up like mad is a somewhat reliable way to avoid the hang
2. It's an "unable to handle null pointer deference" bug, and I managed to get a
somehow blurry but readable picture of the start of the error message
Created attachment 302953 [details]
Blurry but complete screen capture
Created attachment 302996 [details]
Oops complete screen capture
This one should be as complete and clear as it could be
I added my analysis of the failure to the upstream bug -- thank you for filing that.
A fix was posted in upstream's bugzilla. Please integrate it to the Fedora
kernel before F9 release.
Created attachment 304150 [details]
Patch in 2.6.25-13
I confirm 2.6.25-13 fix the issues. I hope it is not restricted to F9 updates.
I'd hate to have a boot crasher in the initial F9 kernel
Thank you for working on it
2.6.25-14 tagged for F9-final