Bug 155448 - <0>Kernel panic - not syncing: Fatal exception in interrupt
Summary: <0>Kernel panic - not syncing: Fatal exception in interrupt
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 3
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-04-20 13:27 UTC by Andrzej Szymanski
Modified: 2015-01-04 22:18 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-08-04 06:55:19 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Andrzej Szymanski 2005-04-20 13:27:48 UTC
Description of problem:
Kernel panic during heavy network + i/o load.

Version-Release number of selected component (if applicable):
2.6.11-1.14_FC3smp i686

How reproducible:
I cannot reproduce the problem manually, but it seems to appear at the same time
of day. At failure time two streamers are testing tapes and other machines are
starting to feed daily backups to the affected server using samba.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Dump from serial console:

do_IRQ: stack overflow: 480
 [<c01061b7>] do_IRQ+0x87/0x89<1>Unable to handle kernel NULL pointer
dereference at virtual address 0000006c
 printing eip:
c0119815
*pde = 2beac001
Oops: 0000 [#1]
SMP 
Modules linked in: i8xx_tco nfsd lockd md5 ipv6 parport_pc lp parport autofs4
w83627hf eeprom lm75 i2c_sensor i2c_isa sunrpc ipt_mac ipt_state ip_conntrack
iptable_filter ip_tables ext3 jbd video button battery ac uhci_hcd ehci_hcd
hw_random i2c_i801 i2c_core snd_intel8x0 snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc sk98lin floppy st
xfs exportfs raid5 xor raid1 raid0 dm_mod sata_promise libata aic7xxx sd_mod
scsi_mod
CPU:    11
EIP:    0060:[<c0119815>]    Not tainted VLI
EFLAGS: 00010082   (2.6.11-1.14_FC3smp) 
EIP is at do_page_fault+0x96/0x605
eax: f7c87000   ebx: c2066a60   ecx: f7c8704c   edx: f7c87100
esi: 00000000   edi: c011977f   ebp: 00000000   esp: f7c8702c
ds: 007b   es: 007b   ss: 0068
Process íA (pid: 4096, threadinfo=f7c86000 task=f7c84000)
Stack: 000cffff 00000000 00000000 0000006c 00000000 f7c87100 f7cdee00 f7c85c1c 
       f7c87100 c0317291 00000000 0000000e 0000000b 00000000 00000000 00000000 
       00000000 00000000 00030001 2934e2e0 426506cd 2934e2e0 426506cd 2934e2e0 
Call Trace:
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119815>] do_page_fault+0x96/0x605
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c012960c>] internal_add_timer+0x55/0xa9
 [<c0129758>] __mod_timer+0xf8/0x159
 [<c021dc98>] poke_blanked_console+0x6f/0xbd
 [<c0219b43>] set_cursor+0x5a/0x6e
 [<c021cdd2>] vt_console_print+0x22d/0x304
 [<c01cd835>] __delay+0x9/0xa
Code: 85 e7 01 00 00 b8 00 f0 ff ff 21 e0 81 7c 24 0c ff ff ff bf 8b 28 c7 44 24
48 01 00 03 00 0f 87 18 04 00 00 f7 40 14 ff ff ff ef <8b> 5d 6c 0f 85 cf 01 00
00 85 db 0f 84 c7 01 00 00 8d 73 30 8b 
 <1>Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c01469e1
*pde = 2beac001
Oops: 0000 [#2]
SMP 
Modules linked in: i8xx_tco nfsd lockd md5 ipv6 parport_pc lp parport autofs4
w83627hf eeprom lm75 i2c_sensor i2c_isa sunrpc ipt_mac ipt_state ip_conntrack
iptable_filter ip_tables ext3 jbd video button battery ac uhci_hcd ehci_hcd
hw_random i2c_i801 i2c_core snd_intel8x0 snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc sk98lin floppy st
xfs exportfs raid5 xor raid1 raid0 dm_mod sata_promise libata aic7xxx sd_mod
scsi_mod
CPU:    11
EIP:    0060:[<c01469e1>]    Not tainted VLI
EFLAGS: 00010086   (2.6.11-1.14_FC3smp) 
EIP is at do_drain+0x22/0x3c
eax: f7400f40   ebx: f7400e80   ecx: f7c86000   edx: 00000010
esi: 00000000   edi: f7400f40   ebp: c03172f3   esp: f7c86e90
ds: 007b   es: 007b   ss: 0068
Process íA (pid: 4096, threadinfo=f7c86000 task=f7c84000)
Stack: c01469bf 00000001 00000000 c011605f f7c86000 f7c86ff8 c0104960 f7c86000 
       00000000 c0355ecc f7c86ff8 00000000 c03172f3 00000000 c032007b f7c8007b 
       fffffffb c01050ec 00000060 00000246 c0327f02 c03172f3 00000000 00000001 
Call Trace:
 [<c01469bf>] do_drain+0x0/0x3c
 [<c011605f>] smp_call_function_interrupt+0x3a/0x57
 [<c0104960>] call_function_interrupt+0x1c/0x24
 [<c01050ec>] die+0x11a/0x18e
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119b31>] do_page_fault+0x3b2/0x605
 [<c01d52fb>] pci_mmap_resource+0x0/0x31
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
Code: 9a 31 c0 83 c4 04 5b 5e c3 57 56 53 89 c3 b8 00 f0 ff ff 8d bb c0 00 00 00
21 e0 8b 40 10 8b 34 83 89 f8 e8 30 a4 1b 00 8d 56 10 <8b> 0e 89 d8 e8 cd 06 00
00 89 f8 e8 7f a4 1b 00 c7 06 00 00 00 
 <1>Unable to handle kernel NULL pointer dereference at virtual address 00000034
 printing eip:
c0106170
*pde = 2beac001
Oops: 0002 [#3]
SMP 
Modules linked in: i8xx_tco nfsd lockd md5 ipv6 parport_pc lp parport autofs4
w83627hf eeprom lm75 i2c_sensor i2c_isa sunrpc ipt_mac ipt_state ip_conntrack
iptable_filter ip_tables ext3 jbd video button battery ac uhci_hcd ehci_hcd
hw_random i2c_i801 i2c_core snd_intel8x0 snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc sk98lin floppy st
xfs exportfs raid5 xor raid1 raid0 dm_mod sata_promise libata aic7xxx sd_mod
scsi_mod
CPU:    11
EIP:    0060:[<c0106170>]    Not tainted VLI
EFLAGS: 00010086   (2.6.11-1.14_FC3smp) 
EIP is at do_IRQ+0x40/0x89
eax: f7c84000   ebx: 00001000   ecx: 00000000   edx: 00000000
esi: f7c86d10   edi: 0000000f   ebp: f7c86000   esp: f7c86cf4
ds: 007b   es: 007b   ss: 0068
Process íA (pid: 4096, threadinfo=f7c86000 task=f7c84000)
Stack: c03f5484 c01168ad f7c86000 f7c86e5c 00000000 c03172f3 c01048f6 f7c86000 
       00000000 c0355ecc f7c86e5c 00000000 c03172f3 00000000 c032007b f7c8007b 
       ffffff0f c01050ec 00000060 00000246 c0327f02 c03172f3 00000000 00000002 
Call Trace:
 [<c01168ad>] smp_apic_timer_interrupt+0xb7/0xc0
 [<c01048f6>] common_interrupt+0x1a/0x20
 [<c01050ec>] die+0x11a/0x18e
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119b31>] do_page_fault+0x3b2/0x605
 [<c011b71a>] recalc_task_prio+0xe0/0x150
 [<c011b814>] activate_task+0x8a/0x99
 [<c011bce3>] try_to_wake_up+0x238/0x270
 [<c0134208>] autoremove_wake_function+0x15/0x37
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
 [<c01469e1>] do_drain+0x22/0x3c
 [<c01469bf>] do_drain+0x0/0x3c
 [<c011605f>] smp_call_function_interrupt+0x3a/0x57
 [<c0104960>] call_function_interrupt+0x1c/0x24
 [<c01050ec>] die+0x11a/0x18e
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0119b31>] do_page_fault+0x3b2/0x605
 [<c01d52fb>] pci_mmap_resource+0x0/0x31
 [<c011977f>] do_page_fault+0x0/0x605
 [<c0104a2b>] error_code+0x2b/0x30
Code: 45 14 00 00 01 00 b8 ff 0f 00 00 21 e0 3d 37 02 00 00 76 46 8b 45 10 8b 14
85 20 d0 3f c0 39 d5 74 2d 8b 45 00 8d 9a 00 10 00 00 <89> 62 34 89 02 89 f8 89
f2 87 dc e8 81 81 03 00 89 dc e8 96 01 
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 

Expected results:
The system works stable.

Additional info:
It seems that the problem persisted in previous kernel versions - I was using
"noapic" option which gave better stablility but lead to memory leaks. Currently
noapic option does not change much in terms of stability.

Comment 1 Dave Jones 2005-04-20 17:21:30 UTC
the only similar bug of this nature I've seen recently also used xfs, which is
known to have problems with stack usage.  I'm inclined to believe thats where
the problems begin here.

This has been reported to the upstream XFS developers on a few occasions, yet it
doesnt seem important enough to them to fix.


Comment 2 Andrzej Szymanski 2005-04-20 18:37:30 UTC
the affected system is using XFS on almost all filesystems


Comment 3 Dave Jones 2005-07-15 19:20:06 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 4 Andrzej Szymanski 2005-07-29 05:21:48 UTC
Unfortunately, after a disk crash the system was restored from backup using
ext3, so I no longer use XFS and I'm not able to verify whether the new kernel
release fixes the problem. The problem disappeared right after switching to ext3.

Sorry.


Note You need to log in before you can comment on or make changes to this bug.