Bug 501441

Summary: Kernel will not boot (kernel panic)
Product: Red Hat Enterprise Linux 5 Reporter: David Dreggors <dadreggors>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.3CC: ajb, anton, dzickus, prarit
Target Milestone: rc   
Target Release: ---   
Hardware: i686   
OS: Linux   
URL: https://bugzilla.redhat.com/show_bug.cgi?id=499999
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-22 11:20:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
RHEL5 RPM with fix from BZ 501178
none
This is the kernel panic I get when booting. none

Description David Dreggors 2009-05-19 07:10:16 UTC
Description of problem:

When working on Bug #499999, I tried the patch kernel "kernel-2.6.18-148.el5.bz499999test.i686.rpm" as posted by Michal Schmidt.

The kernel would not boot on my Compaq CQ60-206US (nForce MCP78S chipset), I was then asked to make a new bug report by Michal Schmidt.



Version-Release number of selected component (if applicable):
kernel-2.6.18-148.el5.bz499999test.i686.rpm


How reproducible:
Every boot

Steps to Reproduce:
1. Start laptop

  
Actual results:
kernel panic referencing cpu_freq_governor


Expected results:
Kernel loads and system boots

Additional info:
CPU: Athlon X2 64 Bit
Mem: 2 GB
Chipset: nVidia MCP78S
Drive Type: Sata


Here are the results I found (as posted at https://bugzilla.redhat.com/show_bug.cgi?id=499999):


I just installed and tried the i686 rpm, cannot get this kernel to boot. I get
kernel panic at CPU Scaling immediately after Checking for new hardware line.

The kernel panic mentions cpu_freq_governor, cpu_set_policy, etc... in the call
trace, then:

Kernel panic - not syncing: Fatal exception


I tried "pci=nomsi" (for my sata drives) and "nomce"
(since I have a Compaq).


OLD WORKING KERNEL:
On the running kernel, which I am using now, I have to have both on the kernel
line or my laptop will not boot. I see this kernel panic without the
"pci=nomsi" on kernel line with running kernel. I get the "Machine Check
Exception" if I do not have "nomce".

NEW TEST KERNEL:
I tried them seperately then together and then without any. The system either
hangs with no error or gives kernel panic but always at the same place in boot.
Almost seems that the "pci=nomsi" option is not taking as I get the same error
on old kernel without it.

Comment 1 David Dreggors 2009-05-19 07:15:14 UTC
To clarify, the "old working kernel" above is "2.6.18-128.1.10.el5" and it runs fine as long as I have "pci=nomsi" and "nomce" on the boot line.

Only the test patch kernel is failing to boot.

Comment 2 Don Zickus 2009-05-19 20:33:56 UTC
Can you try booting your machine with 'nmi_watchdog=2' on the kernel command line and remove the 'quiet' command from that line.  Hopefully the kernel will panic after a while and the stack will match the one in bugzilla 501178.

Comment 3 David Dreggors 2009-05-20 03:21:57 UTC
Yes, it does panic with stack trace, but they are completely different.

The stack in bug #501178 mentions clock ticks and scheduler. Mine has many references to cpu_freq_governor, cpu_freq_<this>, cpu_freq_<that> etc...

How can I grab all that in a file so that I can copy and paste after reboot?

Comment 4 Don Zickus 2009-05-20 13:58:13 UTC
You can't really, unless you have a serial console port attached to your laptop.

Ok, your problem still maybe similar.  Prarit is working on a creating an rpm with a patch that we will like you to try.  Hopefully he will have something in the next couple of hours.

Comment 5 Prarit Bhargava 2009-05-20 15:01:13 UTC
Created attachment 344822 [details]
RHEL5 RPM with fix from BZ 501178

Please test with this RPM.  It only contains a fix for this issue.

Comment 7 Prarit Bhargava 2009-05-20 15:11:15 UTC
(In reply to comment #3)
> Yes, it does panic with stack trace, but they are completely different.
> 
> The stack in bug #501178 mentions clock ticks and scheduler. Mine has many
> references to cpu_freq_governor, cpu_freq_<this>, cpu_freq_<that> etc...
> 
> How can I grab all that in a file so that I can copy and paste after reboot?  

... boot with "vga=791".

Pull out your cellphone.  Take a picture.  Attach it to this BZ.

:)

P.

Comment 8 David Dreggors 2009-05-20 20:26:42 UTC
Actually I did take a picture with my cel the first time. Unfortunately, my cel has no way to upload to computer (does not have USB port) and I do not have a data plan either so I cannot email them either :(

Comment 9 Prarit Bhargava 2009-05-21 13:15:55 UTC
(In reply to comment #8)
> Actually I did take a picture with my cel the first time. Unfortunately, my cel
> has no way to upload to computer (does not have USB port) and I do not have a
> data plan either so I cannot email them either :(  

:)  Okay, then can you at least type it out so we can see where the panic is?

P.

Comment 10 David Dreggors 2009-05-22 02:29:31 UTC
I have a digital camera now that has a usb cable. Problem is that as I said before, you can not be sure when it will hang up (always right after "Checking for hardware changes") or when it will decide to give the kernel panic messaage.


I have rebooted 4 or 5 times already and every boot into that kernel so far tonight has just hung the system. I will keep on this until I get the panic again I guess.

I will post the image back later (as soon as I can get picture).

Comment 11 David Dreggors 2009-05-22 03:14:02 UTC
Created attachment 345051 [details]
This is the kernel panic I get when booting.

I get this kernel panic randomly. Usually with this kernel it just hangs after "Checking for hardware changes". Once in a while (with no change to kernel options) it gives this kernel panic message.

Comment 12 David Dreggors 2009-05-22 04:53:16 UTC
In bug #499999 I tried a new kernel (kernel-2.6.18-150.el5) and this kernel boots properly.

Comment 13 Prarit Bhargava 2009-05-22 11:20:02 UTC
(In reply to comment #12)
> In bug #499999 I tried a new kernel (kernel-2.6.18-150.el5) and this kernel
> boots properly.  

Okay -- CLOSED as NOTABUG.

P.

Comment 14 Don Zickus 2009-05-22 11:39:44 UTC
Just for the record.  Prarit investigated a similar problem on a laptop in our office.  It was determined the patches from bz 297731 were causing the problems.  In -150.el5 those patches were reverted which is probably why your issue was fixed.