Bug 173296 - Kernel panic upon boot
Kernel panic upon boot
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
MassClosed
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-15 20:02 EST by Jean-Baptiste Michaud
Modified: 2015-01-04 17:23 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-19 23:37:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
requested file (393 bytes, text/plain)
2005-11-15 23:03 EST, Jean-Baptiste Michaud
no flags Details
loader config (1.11 KB, text/plain)
2005-11-15 23:13 EST, Jean-Baptiste Michaud
no flags Details
filesystem config (1013 bytes, text/plain)
2005-11-15 23:13 EST, Jean-Baptiste Michaud
no flags Details
partition table (791 bytes, text/plain)
2005-11-15 23:18 EST, Jean-Baptiste Michaud
no flags Details
dmesg.1169 (19.64 KB, text/plain)
2005-11-16 13:10 EST, Jean-Baptiste Michaud
no flags Details
the output of kernel 1833 (1.11 MB, application/x-zip-compressed)
2006-03-08 00:06 EST, Jean-Baptiste Michaud
no flags Details
a quick pci overview (1.84 KB, text/plain)
2006-03-08 15:23 EST, Jean-Baptiste Michaud
no flags Details
lspci verbose dump (19.27 KB, text/plain)
2006-03-08 15:24 EST, Jean-Baptiste Michaud
no flags Details

  None (edit)
Description Jean-Baptiste Michaud 2005-11-15 20:02:20 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 1.0.3705; InfoPath.1)

Description of problem:
kernel panic (because of failed disc mounting)

Version-Release number of selected component (if applicable):
all of 2.6.14.1637 (UP and SMP)

How reproducible:
Always

Steps to Reproduce:
1.boot with kernel 2.6.14.1637 UP or SMP
2.
3.
  

Actual Results:  mkrootdev: label / not found
mount: error 2 mounting ext3
error opening /dev/console: 2
Error dup2ing fd of 0 to 0
" of 0 to 1
" of 0 to 2
switchroot: mount failed 22
kernel panic not syncing attempted to kill init

Expected Results:  boot properly :-)

Additional info:

My hard disk is on a megaraid adapter. Are megaraid drivers incorporated by default in that release?
2.6.11.1169 and 2.6.13.1532 boot fine
How can I pass arguments to the kernel from grub (and which ones) so that the boot is logged?
Comment 1 Dave Jones 2005-11-15 22:12:19 EST
please paste the output of /etc/modprobe.conf
Comment 2 Jean-Baptiste Michaud 2005-11-15 23:03:53 EST
Created attachment 121106 [details]
requested file

requested info
Comment 3 Jean-Baptiste Michaud 2005-11-15 23:13:12 EST
Created attachment 121107 [details]
loader config
Comment 4 Jean-Baptiste Michaud 2005-11-15 23:13:44 EST
Created attachment 121108 [details]
filesystem config
Comment 5 Jean-Baptiste Michaud 2005-11-15 23:18:08 EST
Created attachment 121109 [details]
partition table

sda1: win 2k3 i386
sda2: linux /boot i686
sda3: win 2k3 x64
sda4: container
sda5: linux /boot x86_64 
sda6: linux / i686
sda7: linux / x86_64
sda8: linux swap
sda9: storage (all systems)
Comment 6 Jean-Baptiste Michaud 2005-11-15 23:19:03 EST
Comment on attachment 121107 [details]
loader config

64-bit OSes removed for clarity
Comment 7 Jean-Baptiste Michaud 2005-11-15 23:25:04 EST
Dave

I have attached supplemental filesystem config info if that can help.
Although I removed the 64-bit OSes for clarity because the bug does not seem 
version-related, the bug is with all 2.6.14.1637 kernels, even 64-bit ones.

Like I said, no matter how complex, the config works fine for all earlier 
kernels.

good night,

jbm
Comment 8 Dave Jones 2005-11-16 01:01:08 EST
puzzling. my first thought was that this was due to the megaraid module being
split up into separate megaraid.ko and megaraid_mbox.ko files, but that happened
a while back (before the kernels that you reported were working). 

Does the kernel at http://people.redhat.com/davej/kernels/Fedora/FC4/ fare any
better ?
Comment 9 Jean-Baptiste Michaud 2005-11-16 13:10:59 EST
Created attachment 121140 [details]
dmesg.1169

I have attached the boot log from the good 2.6.11.1169 just in case you see
anything blatent that you know won't work in 2.6.14.

Aside of that, 2.6.14.1640 does not even say Red Hat 4.2.15 initializing, it
reboots immediately!!?
Comment 10 Jean-Baptiste Michaud 2006-01-19 18:56:32 EST
kernel 1656 adds an extra line of info:

PCI:failed to allocate mem resource #6: 20000@d0000000 for 0000:02:00.0

BTW, my raid controller is in slot 6 of the motherboard, but i doubt this 
would be enumerated as #6 in linux
Comment 11 Dave Jones 2006-02-03 00:13:56 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 12 Jean-Baptiste Michaud 2006-02-03 02:48:24 EST
release 1830 changes nothing as far as this bug is concerned
Comment 13 Jean-Baptiste Michaud 2006-02-03 03:00:30 EST
well well
in close inspection of the dmesg output, i stumbled upon this:

Boot video device is 0000:02:00.0

It thus seems that the kernel fails to allocate memory for the video card, and 
that affects the ability to mount disks somehow?
Comment 14 Jean-Baptiste Michaud 2006-02-10 16:16:40 EST
SO...

the graphic card that was unallocated was a pci-express nvidia 7800gtx. so i 
swapped in an old plain-pci S3 trio v64 and the same not duping error occurred.

so i believe the error is not video-hardware related, but is most likely 
chipset-related, i.e. a nforce 410/430 (nforce4 professionnal products) 
initialization problem.

when i have time, i will carefully inspect the kernel config files to see if i 
can spot something blatent. Dave, maybe you know of specific modules or 
options that are required for that chipset.

Also, by enabling the debug switch at boot, i get an extra usefull error 
message near the end: unable to open /dev/console. Does that make sense, the 
disk not mounting because of no console because of no grpahic card?

Since nothing can be written to disk at boot, if you know of any mean of 
inspecting the complete output of the debug switch, let me know, because my 
non-bionic eye is too slow for the humonguous amount of data that switch 
produces. maybe there is something useful around the 'failure to allocate 
resources' message.

regards.
Comment 15 Dave Jones 2006-02-21 00:44:01 EST
for the next kernel update, I'm adding a switch, where you can add 
boot_delay=1000 to make it pause 1000ms after each printk, which may make it
easier to spot problems that happen really early on in boot that get scrolled past.

Comment 16 Matthew Miller 2006-03-07 22:06:03 EST
I'm seeing a similar problem with an IBM xSeries 336. I'm installing two
machines, one in 32-bit mode and one in 64-bit, and the 32-bit one works fine,
and the 64-bit one works fine with the 2.6.11 kernel installed initially, but
any of the 2.6.15 update kernels in the updates area (1.1830_FC4, 1.1831_FC4,
and 1.1833_FC4) cause a can't-mount-/ crash very similar to that described above. 

I'm pretty sure the two systems are configured identically (other than which
arch of FC4 I've installed). I haven't tried switching which is which, though.


Hardware is rather different from that described above -- LSI Logic SCSI
controler using mptbase/mptspi. And an ATI Radeon 7000 video card.
However, not sure about video card messages -- as described, it all scrolls off
the screen really fast. 

This system is actually scheduled to be hooked to a serial console, so I can
probably capture the output that way if it'll be helpful. I just now learned
about the boot_delay flag -- will try that tomorrow.
Comment 17 Matthew Miller 2006-03-07 23:29:28 EST
Oh, hey -- the last comment (#16) in closed bug #169691 looks like it might be
related to my problem. It points to this:

http://forums.fedoraforum.org/showthread.php?p=387546

Not sure why everything works fine on 32 bit, though.

I'll try the suggestions there tomorrow and see if that helps....
Comment 18 Jean-Baptiste Michaud 2006-03-08 00:06:09 EST
Created attachment 125783 [details]
the output of kernel 1833

I ran the new boot_delay switch today... the attached file contains snapshots
taken with a digital camera. Appologies, some page are harder to read, but I
believe they do not contain much valuable info.

The "fail to allocate" error is in image 0010.

There is only one other error message, in image 0011,
"pcie_portdrv_probe -> Dev[005d:10de] has invalid IRQ"
Comment 19 Matthew Miller 2006-03-08 11:45:27 EST
Okay, yeah, forcing the appropriate scsi modules into the initrd was the problem
in my case. Sheesh -- initrd-is-missing-scsi problems are _so_ 1998. :)
Comment 20 Jean-Baptiste Michaud 2006-03-08 15:21:50 EST
IMO, it looks more and more like a chipset configuration problem...

a did a quick google on the error detected last night and found it to be 
related to the nforce4 pci bridge. see for instance (although unresolved) : 
http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-06/5829.html or 
http://lists.zerezo.com/linux-kernel/msg13171.html I do not know if it is pci 
express related though since a video card in a regular pci slot behind the 
same chipset also causes crashes with kernels newer than or equal to 1637.

what is interesting is that error is also produced by my working 1369 kernel 
(see dmesg already attached).

do you know which module are responsible for nvidia nforce4 (ck804) support 
and if they changed between kernel 1532 and 1637?
Comment 21 Jean-Baptiste Michaud 2006-03-08 15:23:20 EST
Created attachment 125833 [details]
a quick pci overview

since it is going this way, here are the output of lspci in short (1) and
verbose (2) forms.
Comment 22 Jean-Baptiste Michaud 2006-03-08 15:24:15 EST
Created attachment 125834 [details]
lspci verbose dump
Comment 23 Dave Jones 2006-09-16 21:39:42 EDT
[This comment added as part of a mass-update to all open FC4 kernel bugs]

FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel.  As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
to FC5.

Please retest with Fedora Core 5.

Thank you.
Comment 24 Dave Jones 2006-10-16 13:34:09 EDT
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.
Comment 25 Jon Stanley 2008-01-19 23:37:28 EST
(this is a mass-close to kernel bugs in NEEDINFO state)

As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.

If you believe that this bug was closed in error, please feel free to reopen
this bug.

Note You need to log in before you can comment on or make changes to this bug.