Bug 394981 - hpet_alloc Kernel Panic
Summary: hpet_alloc Kernel Panic
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 8
Hardware: i386
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 383551 (view as bug list)
Depends On:
Blocks: K12LTSP
TreeView+ depends on / blocked
 
Reported: 2007-11-21 22:54 UTC by John Williams
Modified: 2008-02-07 20:56 UTC (History)
2 users (show)

Fixed In Version: 2.6.23.14-115.fc8
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-02-07 20:56:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Screen showing panic message. (594.62 KB, image/jpeg)
2007-11-21 22:54 UTC, John Williams
no flags Details
Panic message using vga=1 (547.46 KB, image/jpeg)
2007-11-27 13:42 UTC, John Williams
no flags Details
Warren's oops message using vga=1 (852.55 KB, image/jpeg)
2007-11-27 22:28 UTC, Chuck Ebbert
no flags Details
hpet-panic-seven-0x90.jpg (166.07 KB, image/jpeg)
2007-11-28 18:29 UTC, Warren Togami
no flags Details
hpet_paravirt-noreplace.jpg (167.44 KB, image/jpeg)
2007-11-29 22:56 UTC, Warren Togami
no flags Details
hpet_no_paravirt.jpg (160.96 KB, image/jpeg)
2007-11-29 22:57 UTC, Warren Togami
no flags Details
T170 thin client (28.29 KB, text/plain)
2007-12-05 05:41 UTC, Warren Togami
no flags Details
dmesg from Ubuntu's 2.6.22-based kernel (15.51 KB, text/plain)
2007-12-05 23:34 UTC, Warren Togami
no flags Details
Full config for the above Ubuntu kernel (73.62 KB, text/plain)
2007-12-06 21:55 UTC, Warren Togami
no flags Details

Description John Williams 2007-11-21 22:54:12 UTC
Description of problem:
Hi,
I've done a fresh install of Fedora 8 on three computers. Only one had a problem
- a PCChips M811 Motherboard with Athlon XP 1800Hz Processor. The Motherboard
doesn't power off the machine at shutdown so I have been running it with the
BIOS set to ACPI disabled and booting with acpi=off. There were no problems with
Fedora 6 and 7.
With Fedora 8 I often get:
'Kernel Panic Not syncing - attempted to kill init'
This happens straight after the line 
'Uncompressing Linux...OK, booting the Kernel'
Occasionally it will boot normally. If the Kernel Panic occurs, pressing the
RESET button usually lets the system boot normally.
Attached is a photo of the Panic screen.

If I change the boot parameter to pci=noacpi instead of acpi=off the panics are
fewer. I get 
'Uncompressing Linux...OK, booting the Kernel'
'ACPI Unable to load the system description tables.'
and then the system boots normally.

Occasionally I still get the Kernel Panic straight after the line: 
'Uncompressing Linux...OK, booting the Kernel'

Version-Release number of selected component (if applicable):
2.6.23.1-49.fc8

How reproducible:
Occasionally at Random

Steps to Reproduce:
1. Boot or restart the computer
2.
3.
  
Actual results:
Kernel Panic - See attachment


Expected results: Normal boot


Additional info:
This has been happening with the first F8 kernel and the updated kernels.

Comment 1 John Williams 2007-11-21 22:54:12 UTC
Created attachment 266371 [details]
Screen showing panic message.

Comment 2 John Williams 2007-11-22 09:09:52 UTC
Correction: Apologies, the processor is an AMD Duron 1800Mhz.
John.

Comment 3 John Williams 2007-11-22 14:45:51 UTC
I don't know whether this is relevant but I've just noticed that Fedora 8
doesn't detect the floppy drive on this computer although the BIOS does. There
is no list of devices beginning fd0 in /dev and there is no floppy icon in the
'Computer' folder on the desktop.
John.

Comment 4 Chuck Ebbert 2007-11-27 00:28:17 UTC
Is there any way you can capture the missing line or so at the top of the oops?
A serial console might be necessary...


Comment 5 Chuck Ebbert 2007-11-27 00:35:05 UTC
   0:   89 54 24 1c             mov    %edx,0x1c(%esp)
   4:   8b 7c 24 30             mov    0x30(%esp),%edi
   8:   43                      inc    %ebx
   9:   8b 44 24 34             mov    0x34(%esp),%eax
   d:   8b 97 f0 00 00 00       mov    0xf0(%edi),%edx
  13:   8b 7c 24 28             mov    0x28(%esp),%edi
  17:   01 d0                   add    %edx,%eax
  19:   03 47 18                add    0x18(%edi),%eax
  1c:   89 46 08                mov    %eax,0x8(%esi)
  1f:   2b 54 24 1c             sub    0x1c(%esp),%edx
  23:   39 ea                   cmp    %ebp,%edx
  25:   72 dd                   jb     0x4
  27:   89 c8                   mov    %ecx,%eax
  29:   50                      push   %eax
  2a:   9d                      popf
=>2b:   8d 04 05 00 00 00 00    lea    0x0(,%eax,1),%eax

Code was replaced by the paravirt code-patching functions.


Comment 6 Chuck Ebbert 2007-11-27 01:03:15 UTC
Warren's oopsing instruction was:

8d b4 26 00 00 00 00    lea    0x0(%esi),%esi

Which is the generic 7-byte NOP instead of the K7 optimized one.


Comment 7 Warren Togami 2007-11-27 04:31:49 UTC
http://people.redhat.com/wtogami/temp/hpet_vga1.jpg
I managed to make it panic with vga=1.  I hope that this is valuable.

Comment 8 Warren Togami 2007-11-27 05:36:11 UTC
http://www.disklessworkstations.com/cgi-bin/web/200029.html
VIA CLE266 chipset "T170" thin client is affected by this as well.

Comment 9 John Williams 2007-11-27 13:42:11 UTC
Created attachment 269721 [details]
Panic message using vga=1

Same panic message with extra lines at the top

Comment 10 Chuck Ebbert 2007-11-27 22:28:19 UTC
Created attachment 270441 [details]
Warren's oops message using vga=1

Comment 11 Warren Togami 2007-11-28 18:29:54 UTC
Created attachment 271591 [details]
hpet-panic-seven-0x90.jpg

--- ./include/asm-i386/processor.h.orig 2007-11-28 12:10:17.000000000 -0500
+++ ./include/asm-i386/processor.h	2007-11-28 12:10:50.000000000 -0500
@@ -656,7 +656,7 @@
 #define GENERIC_NOP4	     ".byte 0x8d,0x74,0x26,0x00\n"
 #define GENERIC_NOP5	     GENERIC_NOP1 GENERIC_NOP4
 #define GENERIC_NOP6	".byte 0x8d,0xb6,0x00,0x00,0x00,0x00\n"
-#define GENERIC_NOP7	".byte 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00\n"
+#define GENERIC_NOP7	".byte 0x90,0x90,0x90,0x90,0x90,0x90,0x90\n"
 #define GENERIC_NOP8	GENERIC_NOP1 GENERIC_NOP7
 
 /* Opteron nops */

Tried replacing the seven byte NOP with seven single byte NOP's.  It still
panics, this time on the second 0x90 NOP.  "divide error" on a NOP?

Comment 12 Chuck Ebbert 2007-11-28 20:23:46 UTC
*** Bug 383551 has been marked as a duplicate of this bug. ***

Comment 13 Warren Togami 2007-11-29 22:56:14 UTC
Created attachment 273401 [details]
hpet_paravirt-noreplace.jpg

Tried "paravirt-noreplace" as a boot parameter so it wont replace the paravirt
calls with NOP's.  decodecode on the code shows the divide error happening on a
pop... which also isn't supposed to be possible.

Comment 14 Warren Togami 2007-11-29 22:57:28 UTC
Created attachment 273411 [details]
hpet_no_paravirt.jpg

Rebuilt this kernel with CONFIG_PARAVIRT disabled.  It *still* crashes.  I
think we've established that this self-modifying code and paravirt stuff has
nothing to do with this problem.

But where does that leave us? =)

Comment 15 Warren Togami 2007-12-05 05:41:34 UTC
Created attachment 277621 [details]
T170 thin client

lspci -vv
lspci -vvn
dmesg
/proc/cpuinfo

This should be enough to blacklist hpet for this buggy motherboard?

Comment 16 Chuck Ebbert 2007-12-05 23:15:28 UTC
The system gets a divide error immediately after enabling interrupts. No divide
instruction is anywhere around the oopsing one. It's almost like the hardware
interrupt 0 is causing CPU interrupt 0 instead of 32.

Vendor says we should blacklist the HPET on this chipset but there doesn't seem
to be any infrastructure for that upstream.

Comment 17 Warren Togami 2007-12-05 23:34:07 UTC
Created attachment 278901 [details]
dmesg from Ubuntu's 2.6.22-based kernel

hpet kernel option settings for Ubuntu's kernel:

CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HPET_RTC_IRQ is not set
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y

No mention of hpet anywhere in the boot, however it does initialize
clocksources tsc and acpi_pm successfully.  Perhaps it never crashed prior to
2.6.23 because it somehow didn't attempt to initialize hpet at all?

Comment 18 Warren Togami 2007-12-06 21:55:28 UTC
Created attachment 280221 [details]
Full config for the above Ubuntu kernel

Comment 19 Thomas Gleixner 2007-12-10 06:32:12 UTC
(In reply to comment #17)
> No mention of hpet anywhere in the boot, however it does initialize
> clocksources tsc and acpi_pm successfully.  Perhaps it never crashed prior to
> 2.6.23 because it somehow didn't attempt to initialize hpet at all?

Right.

Does the problem still exist with 2.6.24-rc4-latest-git ?




Comment 20 Thomas Gleixner 2007-12-10 06:35:26 UTC
Also can you reliably boot with "hpet=disable" on the kernel command line ?

Comment 21 John Williams 2007-12-10 10:23:34 UTC
In the case of the M811 Motherboard, acpi=off hpet=disable seems to be working
fine. There have been no panics yet. There is a line in dmesg:-
Force enabled HPET at base address 0xfed00000

Comment 22 Thomas Gleixner 2007-12-10 11:03:15 UTC
(In reply to comment #21)
> In the case of the M811 Motherboard, acpi=off hpet=disable seems to be working
> fine. There have been no panics yet. There is a line in dmesg:-
> Force enabled HPET at base address 0xfed00000

Ah, that gives a clue. The force enable of HPET on this chipset is causing the
trouble. 

What happens if you omit "acpi=off" and only put "hpet=disable" on the command
line ?


Comment 23 Thomas Gleixner 2007-12-10 11:38:24 UTC
Chuck, Warren,

we have a patch in mainline, which prevents that the HPET is force enabled
automatically on undocumented chipsets.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b17530bda22e7ffbf08f7a8a50743256b1672f6a

The chipset support for the 8237/39 has a check for the hpet=force command line
option.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b196884e2f5d45fb505b46011e41ca95e0859e34

This is probably missing in the f8 2.6.23 based kernels. I probably never ported
that back to the -hrt patches.

We added this, because we did not trust the undocumented features. Sorry that I
missed to backport it. I can do this tomorrow when I'm back from India. Have no
access to my main devel bbox right now.

Thanks,

     tglx


Comment 24 John Williams 2007-12-10 12:07:26 UTC
It works equally well with just "hpet=disable" (I normally have acpi=off because
acpi is diabled in BIOS)

Comment 25 Thomas Gleixner 2007-12-10 12:56:54 UTC
Chuck,

I uploaded an untested 2.6.23-hrt4 to 
http://www.kernel.org/pub/linux/kernel/people/tglx/hrtimers/2.6.23/
Has not propagated yet to the public servers, but should show up soon.

It has the backport of the above checks and the hrtimer-prevent-overflow one.

Thanks,

    tglx


Comment 26 Chuck Ebbert 2007-12-10 19:05:25 UTC
-hrt4 is in 2.6.23.9-87

Comment 27 Fedora Update System 2008-01-15 22:55:54 UTC
kernel-2.6.23.13-105.fc8 has been pushed to the Fedora 8 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'

Comment 28 Fedora Update System 2008-01-24 22:01:20 UTC
kernel-2.6.23.14-115.fc8 has been pushed to the Fedora 8 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'

Comment 29 Fedora Update System 2008-02-07 20:56:19 UTC
kernel-2.6.23.14-115.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.