Description of problem: Kernel panic during heavy network + i/o load. Version-Release number of selected component (if applicable): 2.6.11-1.14_FC3smp i686 How reproducible: I cannot reproduce the problem manually, but it seems to appear at the same time of day. At failure time two streamers are testing tapes and other machines are starting to feed daily backups to the affected server using samba. Steps to Reproduce: 1. 2. 3. Actual results: Dump from serial console: do_IRQ: stack overflow: 480 [<c01061b7>] do_IRQ+0x87/0x89<1>Unable to handle kernel NULL pointer dereference at virtual address 0000006c printing eip: c0119815 *pde = 2beac001 Oops: 0000 [#1] SMP Modules linked in: i8xx_tco nfsd lockd md5 ipv6 parport_pc lp parport autofs4 w83627hf eeprom lm75 i2c_sensor i2c_isa sunrpc ipt_mac ipt_state ip_conntrack iptable_filter ip_tables ext3 jbd video button battery ac uhci_hcd ehci_hcd hw_random i2c_i801 i2c_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc sk98lin floppy st xfs exportfs raid5 xor raid1 raid0 dm_mod sata_promise libata aic7xxx sd_mod scsi_mod CPU: 11 EIP: 0060:[<c0119815>] Not tainted VLI EFLAGS: 00010082 (2.6.11-1.14_FC3smp) EIP is at do_page_fault+0x96/0x605 eax: f7c87000 ebx: c2066a60 ecx: f7c8704c edx: f7c87100 esi: 00000000 edi: c011977f ebp: 00000000 esp: f7c8702c ds: 007b es: 007b ss: 0068 Process ÃA (pid: 4096, threadinfo=f7c86000 task=f7c84000) Stack: 000cffff 00000000 00000000 0000006c 00000000 f7c87100 f7cdee00 f7c85c1c f7c87100 c0317291 00000000 0000000e 0000000b 00000000 00000000 00000000 00000000 00000000 00030001 2934e2e0 426506cd 2934e2e0 426506cd 2934e2e0 Call Trace: [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c011977f>] do_page_fault+0x0/0x605 [<c0119815>] do_page_fault+0x96/0x605 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c012960c>] internal_add_timer+0x55/0xa9 [<c0129758>] __mod_timer+0xf8/0x159 [<c021dc98>] poke_blanked_console+0x6f/0xbd [<c0219b43>] set_cursor+0x5a/0x6e [<c021cdd2>] vt_console_print+0x22d/0x304 [<c01cd835>] __delay+0x9/0xa Code: 85 e7 01 00 00 b8 00 f0 ff ff 21 e0 81 7c 24 0c ff ff ff bf 8b 28 c7 44 24 48 01 00 03 00 0f 87 18 04 00 00 f7 40 14 ff ff ff ef <8b> 5d 6c 0f 85 cf 01 00 00 85 db 0f 84 c7 01 00 00 8d 73 30 8b <1>Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c01469e1 *pde = 2beac001 Oops: 0000 [#2] SMP Modules linked in: i8xx_tco nfsd lockd md5 ipv6 parport_pc lp parport autofs4 w83627hf eeprom lm75 i2c_sensor i2c_isa sunrpc ipt_mac ipt_state ip_conntrack iptable_filter ip_tables ext3 jbd video button battery ac uhci_hcd ehci_hcd hw_random i2c_i801 i2c_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc sk98lin floppy st xfs exportfs raid5 xor raid1 raid0 dm_mod sata_promise libata aic7xxx sd_mod scsi_mod CPU: 11 EIP: 0060:[<c01469e1>] Not tainted VLI EFLAGS: 00010086 (2.6.11-1.14_FC3smp) EIP is at do_drain+0x22/0x3c eax: f7400f40 ebx: f7400e80 ecx: f7c86000 edx: 00000010 esi: 00000000 edi: f7400f40 ebp: c03172f3 esp: f7c86e90 ds: 007b es: 007b ss: 0068 Process ÃA (pid: 4096, threadinfo=f7c86000 task=f7c84000) Stack: c01469bf 00000001 00000000 c011605f f7c86000 f7c86ff8 c0104960 f7c86000 00000000 c0355ecc f7c86ff8 00000000 c03172f3 00000000 c032007b f7c8007b fffffffb c01050ec 00000060 00000246 c0327f02 c03172f3 00000000 00000001 Call Trace: [<c01469bf>] do_drain+0x0/0x3c [<c011605f>] smp_call_function_interrupt+0x3a/0x57 [<c0104960>] call_function_interrupt+0x1c/0x24 [<c01050ec>] die+0x11a/0x18e [<c011977f>] do_page_fault+0x0/0x605 [<c0119b31>] do_page_fault+0x3b2/0x605 [<c01d52fb>] pci_mmap_resource+0x0/0x31 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 Code: 9a 31 c0 83 c4 04 5b 5e c3 57 56 53 89 c3 b8 00 f0 ff ff 8d bb c0 00 00 00 21 e0 8b 40 10 8b 34 83 89 f8 e8 30 a4 1b 00 8d 56 10 <8b> 0e 89 d8 e8 cd 06 00 00 89 f8 e8 7f a4 1b 00 c7 06 00 00 00 <1>Unable to handle kernel NULL pointer dereference at virtual address 00000034 printing eip: c0106170 *pde = 2beac001 Oops: 0002 [#3] SMP Modules linked in: i8xx_tco nfsd lockd md5 ipv6 parport_pc lp parport autofs4 w83627hf eeprom lm75 i2c_sensor i2c_isa sunrpc ipt_mac ipt_state ip_conntrack iptable_filter ip_tables ext3 jbd video button battery ac uhci_hcd ehci_hcd hw_random i2c_i801 i2c_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc sk98lin floppy st xfs exportfs raid5 xor raid1 raid0 dm_mod sata_promise libata aic7xxx sd_mod scsi_mod CPU: 11 EIP: 0060:[<c0106170>] Not tainted VLI EFLAGS: 00010086 (2.6.11-1.14_FC3smp) EIP is at do_IRQ+0x40/0x89 eax: f7c84000 ebx: 00001000 ecx: 00000000 edx: 00000000 esi: f7c86d10 edi: 0000000f ebp: f7c86000 esp: f7c86cf4 ds: 007b es: 007b ss: 0068 Process ÃA (pid: 4096, threadinfo=f7c86000 task=f7c84000) Stack: c03f5484 c01168ad f7c86000 f7c86e5c 00000000 c03172f3 c01048f6 f7c86000 00000000 c0355ecc f7c86e5c 00000000 c03172f3 00000000 c032007b f7c8007b ffffff0f c01050ec 00000060 00000246 c0327f02 c03172f3 00000000 00000002 Call Trace: [<c01168ad>] smp_apic_timer_interrupt+0xb7/0xc0 [<c01048f6>] common_interrupt+0x1a/0x20 [<c01050ec>] die+0x11a/0x18e [<c011977f>] do_page_fault+0x0/0x605 [<c0119b31>] do_page_fault+0x3b2/0x605 [<c011b71a>] recalc_task_prio+0xe0/0x150 [<c011b814>] activate_task+0x8a/0x99 [<c011bce3>] try_to_wake_up+0x238/0x270 [<c0134208>] autoremove_wake_function+0x15/0x37 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 [<c01469e1>] do_drain+0x22/0x3c [<c01469bf>] do_drain+0x0/0x3c [<c011605f>] smp_call_function_interrupt+0x3a/0x57 [<c0104960>] call_function_interrupt+0x1c/0x24 [<c01050ec>] die+0x11a/0x18e [<c011977f>] do_page_fault+0x0/0x605 [<c0119b31>] do_page_fault+0x3b2/0x605 [<c01d52fb>] pci_mmap_resource+0x0/0x31 [<c011977f>] do_page_fault+0x0/0x605 [<c0104a2b>] error_code+0x2b/0x30 Code: 45 14 00 00 01 00 b8 ff 0f 00 00 21 e0 3d 37 02 00 00 76 46 8b 45 10 8b 14 85 20 d0 3f c0 39 d5 74 2d 8b 45 00 8d 9a 00 10 00 00 <89> 62 34 89 02 89 f8 89 f2 87 dc e8 81 81 03 00 89 dc e8 96 01 <0>Kernel panic - not syncing: Fatal exception in interrupt Expected results: The system works stable. Additional info: It seems that the problem persisted in previous kernel versions - I was using "noapic" option which gave better stablility but lead to memory leaks. Currently noapic option does not change much in terms of stability.
the only similar bug of this nature I've seen recently also used xfs, which is known to have problems with stack usage. I'm inclined to believe thats where the problems begin here. This has been reported to the upstream XFS developers on a few occasions, yet it doesnt seem important enough to them to fix.
the affected system is using XFS on almost all filesystems
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
Unfortunately, after a disk crash the system was restored from backup using ext3, so I no longer use XFS and I'm not able to verify whether the new kernel release fixes the problem. The problem disappeared right after switching to ext3. Sorry.