Description of problem: I normally do not reboot my always-on rawhide system very often, unless I'm building my own testing mm kernels. This has not been the case for quite a while. Recently, however, following the death of the home set-top dvd player, and a rainy winter day, I remembered my old gaming windows partition and rebooted on it (also changed the system gfx card). Getting back into linux however proved a challenge. The system would oops on every recent rawhide kernel 9 times out of 10. Strangely enough my old mm kernel with the associated old initrd would always boot. I've now captured a partial oops on a picture (very difficult it scrolls out of the screen fast). I hope it's sufficient to point investigations in some directions. I don't know if it's a new bug or something triggered by recent unrelated rawhide changes. The problems always occurs at udev start time, then the system quickly gets stuck, and need a reset. Version-Release number of selected component (if applicable): Couldn't find a recent fedora kernel without the problem How reproducible: Almost always, from cold or hot boot, sometimes the boot sequence succeeds but I haven't found a reliable way to boot so far. The old mm kernel always boots fine Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 301898 [details] Screen capture The best screen capture I could produce so far. It's blurry because the screen is scrolling and the camera captured some remanence
Created attachment 301899 [details] successful dmesg The same kernel booted successfully the next iteration. Here is the associated dmesg. Since getting a recent fedora kernel to boot can easily take ~ 1h of trials I counted myself lucky and stopped the attempts to capture the oops on camera
Can you use the boot_delay parameter to slow down the scrolling and get a clear picture of the oops? Try boot_delay=100 to start with.
Created attachment 302165 [details] Screen capture I'm afraid the boot option only results in a blank screen While I were at it however I retried a picture series and this one is a bit better I think
clearing NEEDINFO
That was good enough but too much had scrolled off the screen.
I'm afraid than without a reliable way to slow scrolling I can't do any better. The previous lines just scroll too fast - they always show up as a lot of surimposed lines in pictures (much worse than my first shot). The scrolling slows down a little there that's why I could make the picture
even after trying higher values for boot_delay ?
boot_delay didn't result in a slower boot it resulted in a blank screen and no boot
Did a new run of tests with 2.6.25-1.fc9.x86_64. Turns out 1. Pressing shift+page-up like mad is a somewhat reliable way to avoid the hang 2. It's an "unable to handle null pointer deference" bug, and I managed to get a somehow blurry but readable picture of the start of the error message
Created attachment 302953 [details] Blurry but complete screen capture
Created attachment 302996 [details] Oops complete screen capture This one should be as complete and clear as it could be
I added my analysis of the failure to the upstream bug -- thank you for filing that.
A fix was posted in upstream's bugzilla. Please integrate it to the Fedora kernel before F9 release.
Created attachment 304150 [details] patch
Patch in 2.6.25-13
I confirm 2.6.25-13 fix the issues. I hope it is not restricted to F9 updates. I'd hate to have a boot crasher in the initial F9 kernel
Thank you for working on it
2.6.25-14 tagged for F9-final