This service will be undergoing maintenance at 00:00 UTC, 2016-09-28. It is expected to last about 1 hours
Bug 137129 - kernel panic on Althlon64 Nvidia CK804 board
kernel panic on Althlon64 Nvidia CK804 board
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-10-25 18:06 EDT by Tom Duffy
Modified: 2015-01-04 17:11 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-04 09:06:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Console output (1.37 MB, image/jpeg)
2005-09-23 22:25 EDT, Thomas Schwanhäuser
no flags Details

  None (edit)
Description Tom Duffy 2004-10-25 18:06:45 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; rv:1.7.3)
Gecko/20041020 Firefox/0.10.1

Description of problem:
Unable to handle kernel paging request at fffffeff8014384d RIP:
[<fffffeff8014384d>]
PML4 0
Oops: 0010 [1]
CPU 0
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 ds
yenta_socket pcmcia_core sunrpc ext3 jbd dm_mod button battery ac
ohci_hcd ehci_hcd forcedeth snd_intel8x0 snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc gameport
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore e1000 floppy
xfs sata_nv libata qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.9-1.640
RIP: 0010:[<fffffeff8014384d>] [<fffffeff8014384d>]
RSP: 0018:ffffffff80499170  EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffffff80429120 RCX: 000001003d1e7ec8
RDX: ffffffff80499188 RSI: ffffffff80429bf0 RDI: 000001003db26e60
RBP: 000001003db26e60 R08: ffffffff80499188 R09: ffffffff804fc180
R10: ffffffff804fc180 R11: ffffffff804fc180 R12: fffffeff8014384d
R13: ffffffff80499188 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a959b2ea0(0000) GS:ffffffff80504100(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: fffffeff8014384d CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80506000, task
ffffffff8041a380)
Stack: ffffffff80143108 ffffffff80504100 0000000000000246 ffffffff80499188
       ffffffff80499188 0000000000000000 ffffffffffffffef ffffffff8013ef79
       0000000000000001 ffffffff804c49b0
Call Trace:<IRQ> <ffffffff80143108>{run_timer_softirq+663}
<ffffffff8013ef79>{__do_softirq+65}
       <ffffffff8013ef84>{__do_softirq+76}
<ffffffff8013f00b>{do_softirq+49}
       <ffffffff80113907>{do_IRQ+756} <ffffffff80110dcf>{ret_from_intr+0}
        <EOI> <ffffffff8024cf8a>{acpi_processor_idle+334}
       <ffffffff8010e6d7>{cpu_idle+26}
<ffffffff805096fc>{start_kernel+641}
       <ffffffff805091ab>{_sinittext+427}

Code:  Bad RIP value.
RIP [<fffffeff8014384d>] RSP <ffffffff80499170>
CR2: fffffeff8014384d
 <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():1[expected: 0], irqs_disabled():0

Call Trace:<IRQ> <ffffffff8013347f>{__might_sleep+173}
<ffffffff80139f81>{profile_task_exit+33}
       <ffffffff8013baae>{do_exit+34} <ffffffff80111e06>{oops_end+159}
       <ffffffff80124888>{do_page_fault+1263}
<ffffffff80132da1>{recalc_task_prio+337}
       <ffffffff80132da1>{recalc_task_prio+337}
<ffffffff802a2502>{__ide_end_request+274}
       <ffffffff80132da1>{recalc_task_prio+337}
<ffffffff801110fd>{error_exit+0}
       <ffffffff80143108>{run_timer_softirq+663}
<ffffffff8013ef79>{__do_softirq+65}
       <ffffffff8013ef84>{__do_softirq+76}
<ffffffff8013f00b>{do_softirq+49}
       <ffffffff80113907>{do_IRQ+756} <ffffffff80110dcf>{ret_from_intr+0}
        <EOI> <ffffffff8024cf8a>{acpi_processor_idle+334}
       <ffffffff8010e6d7>{cpu_idle+26}
<ffffffff805096fc>{start_kernel+641}
       <ffffffff805091ab>{_sinittext+427}
Kernel panic - not syncing: Aiee, killing interrupt handler!


Version-Release number of selected component (if applicable):
kernel-2.6.9-1.640

How reproducible:
Sometimes

Steps to Reproduce:
1. This happens after a while of being up on my 1P x86_64 (Athlon64)
nvidia CK804 system


Additional info:
Comment 1 Dave Jones 2004-10-26 23:11:52 EDT
can you run memtest on this box ? The fffffeff8014384d looks on first
impression to be a random bitflip (the 'e' should be an 'f'), which
means either faulty ram, or we got something corrupting memory.
Comment 2 Tom Duffy 2004-10-28 16:05:31 EDT
I am currently running memtest86+ 1.27.

Right before I did that, my machine panic'ed again, this time with
2.6.9-1.643:

Unable to handle kernel paging request at fffffeff8014384d RIP:
[<fffffeff8014384d>]
PML4 0
Oops: 0010 [1]
CPU 0
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev
i2c_core ds yenta_socket pcmcia_core sunrpc ext3 jbd dm_mod button
battery ac ohci_hcd ehci_hcd forcedeth snd_intel8x0 snd_ac97_codec
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc gameport
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore e1000 floppy
xfs sata_nv libata qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.9-1.643
RIP: 0010:[<fffffeff8014384d>] [<fffffeff8014384d>]
RSP: 0018:ffffffff80499170  EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffffff80429120 RCX: 000001003d427e48
RDX: ffffffff80499188 RSI: ffffffff80429200 RDI: 000001003d6e03b0
RBP: 000001003d6e03b0 R08: ffffffff80499188 R09: ffffffff804fc180
R10: ffffffff804fc180 R11: ffffffff804fc180 R12: fffffeff8014384d
R13: ffffffff80499188 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a959a0e60(0000) GS:ffffffff80504100(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: fffffeff8014384d CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80506000, task
ffffffff8041a380)
Stack: ffffffff80143108 ffffffff80504100 0000000000000246 ffffffff80499188
       ffffffff80499188 0000000000000000 ffffffffffffffef ffffffff8013ef79
       0000000000000001 ffffffff804c49b0
Call Trace:<IRQ> <ffffffff80143108>{run_timer_softirq+663}
<ffffffff8013ef79>{__do_softirq+65}
       <ffffffff8013ef84>{__do_softirq+76}
<ffffffff8013f00b>{do_softirq+49}
       <ffffffff80113907>{do_IRQ+756} <ffffffff80110dcf>{ret_from_intr+0}
        <EOI> <ffffffff8024cf8a>{acpi_processor_idle+334}
       <ffffffff8010e6d7>{cpu_idle+26}
<ffffffff805096fc>{start_kernel+641}
       <ffffffff805091ab>{_sinittext+427}

Code:  Bad RIP value.
RIP [<fffffeff8014384d>] RSP <ffffffff80499170>
CR2: fffffeff8014384d
 <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():1[expected: 0], irqs_disabled():0

Call Trace:<IRQ> <ffffffff8013347f>{__might_sleep+173}
<ffffffff80139f81>{profile_task_exit+33}
       <ffffffff8013baae>{do_exit+34} <ffffffff80111e06>{oops_end+159}
       <ffffffff80124888>{do_page_fault+1263}
<ffffffff80132da1>{recalc_task_prio+337}
       <ffffffff802a250e>{__ide_end_request+274}
<ffffffff801110fd>{error_exit+0}
       <ffffffff80143108>{run_timer_softirq+663}
<ffffffff8013ef79>{__do_softirq+65}
       <ffffffff8013ef84>{__do_softirq+76}
<ffffffff8013f00b>{do_softirq+49}
       <ffffffff80113907>{do_IRQ+756} <ffffffff80110dcf>{ret_from_intr+0}
        <EOI> <ffffffff8024cf8a>{acpi_processor_idle+334}
       <ffffffff8010e6d7>{cpu_idle+26}
<ffffffff805096fc>{start_kernel+641}
       <ffffffff805091ab>{_sinittext+427}
Kernel panic - not syncing: Aiee, killing interrupt handler!
Comment 3 Tom Duffy 2004-10-29 13:03:21 EDT
How long would you like me to run it?

      Memtest86+ v1.27      | Pass 93%
####################################
Athlon 64 (0.09) 2210 Mhz   | Test 39% ###############
L1 Cache:  128K 18114MB/s   | Test #11 [Moving inv, 32 bit pattern, no
cache]
L2 Cache:  512K  4500MB/s   | Testing:   96K - 1024M 1024M
Memory  : 1024M  2214MB/s   | Pattern:   ffffefff
Chipset :


 WallTime   Cached  RsvdMem   MemMap   Cache  ECC  Test  Pass  Errors
ECC Errs
 ---------  ------  -------  --------  -----  ---  ----  ----  ------
--------
  21:03:54   1024M     257M  e820-Std   off   off   All     4       0
       0
 -----------------------------------------------------------------------------
Comment 4 Tom Duffy 2004-12-01 18:56:54 EST
and it happened again today with 2.6.9-1.681_FC3:

Unable to handle kernel paging request at fffffeff80143619 RIP:
[<fffffeff80143619>]
PML4 0
Oops: 0010 [1]
CPU 0
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev
i2c_core ds yenta_socket pcmcia_core sunrpc ext3 jbd dm_mod button
battery ac ohci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc gameport
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore e1000
forcedeth floppy xfs sata_nv libata qla2300 qla2xxx scsi_transport_fc
sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.9-1.681_FC3
RIP: 0010:[<fffffeff80143619>] [<fffffeff80143619>]
RSP: 0018:ffffffff804a03f0  EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffffff8042ee00 RCX: 000001003f60fe88
RDX: ffffffff804a0408 RSI: ffffffff8042fd30 RDI: 000001003f76ab20
RBP: 000001003f76ab20 R08: ffffffff804a0408 R09: 0000000000000000
R10: 0000000014b3fd00 R11: ffffffff80509a00 R12: fffffeff80143619
R13: ffffffff804a0408 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a95566b00(0000) GS:ffffffff80511980(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: fffffeff80143619 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80514000, task
ffffffff80420000)
Stack: ffffffff80142ed4 0000000000000349 0000000000000246 ffffffff804a0408
       ffffffff804a0408 ffffffff80514000 000001003dcd5b80 0000000004000001
       0000000000000001 ffffffff804cbc30
Call Trace:<IRQ> <ffffffff80142ed4>{run_timer_softirq+663}
<ffffffff8013ed50>{__do_softirq+76}
       <ffffffff8013edd7>{do_softirq+49} <ffffffff801138a3>{do_IRQ+756}
       <ffffffff80110d6b>{ret_from_intr+0}  <EOI>
<ffffffff8010e647>{default_idle+0}
       <ffffffff8010e667>{default_idle+32} <ffffffff8010e6d7>{cpu_idle+26}
       <ffffffff805176fc>{start_kernel+641}
<ffffffff805171ab>{_sinittext+427}


Code:  Bad RIP value.
RIP [<fffffeff80143619>] RSP <ffffffff804a03f0>
CR2: fffffeff80143619
                 <3>Debug: sleeping function called from invalid
context at include/linux/rwsem.h:43
base  in_atomic():1[expected: 0], irqs_disabled():0
    : ##########
Call Trace:########        <IRQ>                
<ffffffff80133173>{__might_sleebase      :
#<ffffffff80139d4d>{profile_task_exit+33} ################
       #               <ffffffff8013b87a>{do_exit+34}                
  1054/285base<ffffffff80111da2>{oops_end+159}      : ########
       ##########      <ffffffff801245e6>{do_page_fault+1155}        
        <fbase      :7ae4>{timer_interrupt+874}           1055/ 2852
        ###############<ffffffff80113176>{handle_IRQ_event+40}###    
           <ffffffffa01d1ae9>{:snd_intel8x0:snd_intel8x0_interrupt+77}
   10ba 2852
se      : ######       ############   
<ffffffff80111099>{error_exit+0} base       :
#############fffffff80142ed4>{run_timer_softirq+663}7/2852
#####                                 
<ffffffff8013ed50>{__do_softirq+76}
<ffffffff8013edd7>{do_softirq+49}base      : #### ##############
base   <ffffffff80110d6b>{ret_from_intr+0}   : ########### #######   
     852
                        <EOI>        
1060/285<ffffffff8010e647>{default_idle+0}base      :
##<ffffffff8010e667>{default_idle+32}################ 
base <ffffffff805176fc>{start_kernel+641} cpu_idle+26}  1061/2852
            : #########<ffffffff805171ab>{_sinittext+427}######### 
          1062/2Kernel panic - not syncing: Aiee, killing interrupt
handler!
base      : <4>

Then, it spins out of control spitting out these messages forever
until I reboot:

atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86,
might be trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86,
might be trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86,
might be trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86,
might be trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86,
might be trying access hardware directly.
Comment 5 Tom Duffy 2004-12-21 16:01:35 EST
on 2.6.9-1.715_FC3

Unable to handle kernel paging request at fffffeff8014278f RIP:
[<fffffeff8014278f>]
PML4 0
Oops: 0010 [1]
CPU 0
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev
i2c_core ds yenta_socket pcmcia_core sunrpc ext3 jbd sr_mod dm_mod
usb_storage button battery ac joydev ohci_hcd snd_intel8x0
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd
soundcore e1000 forcedeth floppy xfs sata_nv libata qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.9-1.715_FC3
RIP: 0010:[<fffffeff8014278f>] [<fffffeff8014278f>]
RSP: 0018:ffffffff8048be80  EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffffff8041a4e0 RCX: 000001003f585e88
RDX: ffffffff8048be98 RSI: fffffeff8014278f RDI: 000001003f602b60
RBP: ffffffff8048be98 R08: ffffffff8048be98 R09: 000001003ec8f240
R10: 000001003ec8f240 R11: ffffffff80501f18 R12: 000000000000007a
R13: ffffffff80501f18 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a95566b00(0000) GS:ffffffff804fd700(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: fffffeff8014278f CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80500000, task
ffffffff8040b680)
Stack: ffffffff8014205d ffffffff80501f18 0000000000000246 ffffffff8048be98
       ffffffff8048be98 0000000000000000 ffffffff802b621c 0000000000000001
       ffffffff804b7930 000000000000000a
Call Trace:<IRQ> <ffffffff8014205d>{run_timer_softirq+591}
<ffffffff802b621c>{usb_hcd_irq+41}
       <ffffffff8013e1dc>{__do_softirq+76}
<ffffffff8013e263>{do_softirq+49}
       <ffffffff8011379f>{do_IRQ+664} <ffffffff80110d0f>{ret_from_intr+0}
        <EOI> <ffffffff8010e647>{default_idle+0}
<ffffffff8010e667>{default_idle+32}
       <ffffffff8010e6d7>{cpu_idle+26}
<ffffffff805036f3>{start_kernel+632}
       <ffffffff805031ab>{_sinittext+427}

Code:  Bad RIP value.
RIP [<fffffeff8014278f>] RSP <ffffffff8048be80>
CR2: fffffeff8014278f
 <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():1[expected: 0], irqs_disabled():0

Call Trace:<IRQ> <ffffffff8013311b>{__might_sleep+173}
<ffffffff8013968d>{profile_task_exit+33}
       <ffffffff8013b0c6>{do_exit+34} <ffffffff80111d46>{oops_end+159}
       <ffffffff8012423a>{do_page_fault+1155}
<ffffffff8013378b>{__wake_up_common+67}
       <ffffffff8011103d>{error_exit+0}
<ffffffff8014205d>{run_timer_softirq+591}
       <ffffffff802b621c>{usb_hcd_irq+41}
<ffffffff8013e1dc>{__do_softirq+76}
       <ffffffff8013e263>{do_softirq+49} <ffffffff8011379f>{do_IRQ+664}
       <ffffffff80110d0f>{ret_from_intr+0}  <EOI>
<ffffffff8010e647>{default_idle+0}
       <ffffffff8010e667>{default_idle+32} <ffffffff8010e6d7>{cpu_idle+26}
       <ffffffff805036f3>{start_kernel+632}
<ffffffff805031ab>{_sinittext+427}

Kernel panic - not syncing: Aiee, killing interrupt handler!
Comment 6 Mathias Retzlaff 2005-01-11 13:24:38 EST
Me too: (What is the status of the bug? - Does the newest kernel 737 
fix this?)

Yesterday we have had a crash, that appears to me, to be the same as 
the one described here. Unfortunately I couldn't see the complete 
error message, because (as seen above) the server had a kernel-panic, 
did not respond to any keyboard input (as scrolling up) and no 
message appears in the logfile on the server (/var/log/messages), 
after reboot.
So I only saw the last screen of the message and that looked much 
like the end of the messages above.

After this crash the hard disk had many corruptions and much data was 
lost.

This is a production server and we need an solution for this 
urgently. - Workarounds welcome!

Running kernel as problem occurred: 2.6.9-1.681_FC3smp
Comment 7 Mathias Retzlaff 2005-01-11 13:30:14 EST
Forgotten config:

Hardware used:
CPU: Dual Opteron 250
RAM: 8GB DDR 400
MoBo: TYAN Thunder K8S Pro (S2882)
SCSI: Adaptec 39160 
Comment 8 Mathias Retzlaff 2005-02-05 11:59:39 EST
Happened again, with kernel 2.6.10-1.741_FC3smp.

What else can I do to help finding the problem?
Comment 9 Lonni J Friedman 2005-08-31 21:43:01 EDT
Mathias,
What do you typically have running on this system?  If you upgrade to the latest
FC3 kernel do the crashes continue?

-Lonni
Comment 10 Dan Carpenter 2005-09-03 01:24:18 EDT
Mathias Retzlaff,
The last part of Tom's bug really isn't important.  In the first section the
kernel is unable to handle a paging request in an atomic section of code and so
it crashes later on when it calls a sleeping function in the last section.  It's
only the first part that's important.

One thing you could try is to upgrade to the 3.03e BIOS and limit the memory to
166MHz in the BIOS.

Tom,
Is there any way you could post the entire dmesg?

Comment 11 Thomas Schwanhäuser 2005-09-23 22:23:08 EDT
>sleeping function called from invalid context at include/linux/rwsem.h:43

Hi, is there any solution to this problem? We experiencing it on a regular basis
on a Dual Opteron System, 4GB RAM and " 2.6.12-1.1447_FC4smp #1 SMP Fri Aug 26
21:03:12 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux"

Attached is a picture of the last kernel panic. It's not saved to the logfile.
Comment 12 Thomas Schwanhäuser 2005-09-23 22:25:05 EDT
Created attachment 119218 [details]
Console output

The messages don't get saved to disc, so here is a "screenshot".
Comment 13 Dave Jones 2006-01-16 17:08:55 EST
This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.
Comment 14 Dave Jones 2006-02-03 00:08:33 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 15 John Thacker 2006-05-04 09:06:45 EDT
Closing per previous comment.

Note You need to log in before you can comment on or make changes to this bug.