Description of problem: IO load (smb+rsync) causes all kernels since 2.6.10-1.770_FC3 to crash. The stack trace is do_IRQ: stack overflow: 452 [<c0105962>] do_IRQ+0x83/0x85 [<c0103b0a>] common_interrupt+0x1a/0x20 [<c0291e32>] cfq_set_request+0x1b5/0x500 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c0291c7d>] cfq_set_request+0x0/0x500 [<c028452e>] elv_set_request+0x20/0x23 [<c02872bd>] get_request+0x219/0x582 [<c029017f>] cfq_find_rq_rb+0x2e/0x96 [<c029030c>] cfq_merge+0x0/0xd1 [<c02903a7>] cfq_merge+0x9b/0xd1 [<c02883d0>] __make_request+0x165/0x628 [<c028418a>] __elv_add_request+0x74/0x9d [<c0288fd4>] generic_make_request+0x19b/0x276 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<e006414a>] handle_stripe+0xfa2/0x16c4 [raid5] [<e0061f99>] raid5_build_block+0x20/0x75 [raid5] [<e00613de>] get_active_stripe+0x96/0x566 [raid5] [<e0061fe3>] raid5_build_block+0x6a/0x75 [raid5] [<e0064f01>] make_request+0x34a/0x53a [raid5] [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c0288fd4>] generic_make_request+0x19b/0x276 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c017da39>] bio_clone+0xad/0xb2 [<e00432f7>] __map_bio+0x30/0xc8 [dm_mod] [<e0043531>] __clone_and_map+0xcd/0x309 [dm_mod] [<c02b0b61>] ide_dma_exec_cmd+0x1f/0x23 [<c02b0b86>] ide_dma_start+0x21/0x2d [<e004380a>] __split_bio+0x9d/0x10b [dm_mod] [<c02a0000>] ide_timing_merge+0xc2/0xc8 [<e00438d7>] dm_request+0x5f/0x88 [dm_mod] [<c0288fd4>] generic_make_request+0x19b/0x276 [<c0155507>] buffered_rmqueue+0x154/0x2e2 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c02890fa>] submit_bio+0x4b/0xc5 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c017d776>] bio_alloc_bioset+0x154/0x1c5 [<c017cfa2>] submit_bh+0x133/0x17f [<c017d06f>] ll_rw_block+0x81/0x83 [<e014759d>] search_by_key+0x113/0xd8b [reiserfs] [<c028405c>] elv_merged_request+0x15/0x1a [<c028867f>] __make_request+0x414/0x628 [<c0103b0a>] common_interrupt+0x1a/0x20 [<c0288fd4>] generic_make_request+0x19b/0x276 [<e014825d>] search_for_position_by_key+0x48/0x358 [reiserfs] [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<e0133022>] make_cpu_key+0x42/0x49 [reiserfs] [<e01332b4>] _get_block_create_0+0xcd/0x680 [reiserfs] [<e0064f08>] make_request+0x351/0x53a [raid5] [<e0134601>] reiserfs_get_block+0xb87/0x11b7 [reiserfs] [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c0288fd4>] generic_make_request+0x19b/0x276 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c017da39>] bio_clone+0xad/0xb2 [<e00432f7>] __map_bio+0x30/0xc8 [dm_mod] [<e0043531>] __clone_and_map+0xcd/0x309 [dm_mod] [<e0043855>] __split_bio+0xe8/0x10b [dm_mod] [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<e00438d7>] dm_request+0x5f/0x88 [dm_mod] [<c0288fd4>] generic_make_request+0x19b/0x276 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c0153c3a>] mempool_alloc+0x72/0x250 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c01a7be0>] mpage_end_io_read+0x0/0x6f [<c01a7be0>] mpage_end_io_read+0x0/0x6f [<c01a7f15>] do_mpage_readpage+0x151/0x417 [<c01a7be0>] mpage_end_io_read+0x0/0x6f [<c02890fa>] submit_bio+0x4b/0xc5 [<c015bb5e>] __pagevec_lru_add+0x133/0x287 [<c0206b4d>] radix_tree_insert+0x74/0x10b [<c01a8270>] mpage_readpages+0x95/0x111 [<e0133a7a>] reiserfs_get_block+0x0/0x11b7 [reiserfs] [<e0134c31>] reiserfs_readpages+0x0/0x15 [reiserfs] [<c0157e2b>] read_pages+0xf5/0x105 [<e0133a7a>] reiserfs_get_block+0x0/0x11b7 [reiserfs] [<c015589d>] __alloc_pages+0x169/0x3cb [<c0157f28>] __do_page_cache_readahead+0xed/0xf9 [<c0158032>] blockable_page_cache_readahead+0x41/0xa2 [<c0158103>] make_ahead_window+0x70/0xa4 [<c01581bf>] page_cache_readahead+0x88/0x161 [<c0150fd3>] do_generic_mapping_read+0x524/0x6ce [<c01513ea>] __generic_file_aio_read+0x18a/0x1f0 [<c015117d>] file_read_actor+0x0/0xe3 [<c015153b>] generic_file_read+0x9c/0xbe [<c013e9b9>] autoremove_wake_function+0x0/0x37 [<c0177546>] vfs_read+0xad/0x108 [<c01777e1>] sys_read+0x41/0x6a [<c010394d>] syscall_call+0x7/0xb ======================= ======================= ======================= ======================= ======================= ======================= Version-Release number of selected component (if applicable): 2.6.12-1.1376_FC3 How reproducible: generate a moderate amount of IO load (smb and an rsync is usually enough) Additional info: grub boot parameters title Fedora Core (2.6.12-1.1376_FC3) root (hd0,0) kernel /vmlinuz-2.6.12-1.1376_FC3 ro root=/dev/md2 rhgb quiet console=ttyS0,38400 console=tty0 noapic initrd /initrd-2.6.12-1.1376_FC3.img gem:/tmp # cat /proc/partitions major minor #blocks name 3 0 156290904 hda 3 1 104391 hda1 3 2 987997 hda2 3 4 1 hda4 3 5 155195901 hda5 3 64 156290904 hdb 3 65 104391 hdb1 3 66 987997 hdb2 3 68 1 hdb4 3 69 155195901 hdb5 22 0 156290904 hdc 22 1 104391 hdc1 22 2 987997 hdc2 22 4 1 hdc4 22 5 155195901 hdc5 22 64 156290904 hdd 22 65 104391 hdd1 22 66 987997 hdd2 22 68 1 hdd4 22 69 155195901 hdd5 9 0 104320 md0 9 3 465587328 md3 9 2 987904 md2 9 1 987904 md1 253 0 2097152 dm-0 253 1 10485760 dm-1 253 2 10485760 dm-2 253 3 442499072 dm-3 gem:/tmp # lspci 00:00.0 Host bridge: ATI Technologies Inc Radeon 9100 IGP Host Bridge (rev 02) 00:01.0 PCI bridge: ATI Technologies Inc Radeon 9100 IGP AGP Bridge 00:13.0 USB Controller: ATI Technologies Inc OHCI USB Controller #1 (rev 01) 00:13.1 USB Controller: ATI Technologies Inc OHCI USB Controller #2 (rev 01) 00:13.2 USB Controller: ATI Technologies Inc EHCI USB Controller (rev 01) 00:14.0 SMBus: ATI Technologies Inc ATI SMBus (rev 18) 00:14.1 IDE interface: ATI Technologies Inc: Unknown device 4349 00:14.3 ISA bridge: ATI Technologies Inc: Unknown device 434c 00:14.4 PCI bridge: ATI Technologies Inc: Unknown device 4342 01:05.0 VGA compatible controller: ATI Technologies Inc Radeon 9100 IGP 02:06.0 RAID bus controller: Integrated Technology Express, Inc. IT/ITE8212 Dual channel ATA RAID controller (PCI version seems to be IT8212, embedded seems (rev 11) 02:09.0 Ethernet controller: National Semiconductor Corporation DP83815 (MacPhyter) Ethernet Controller
possible fix has been merged into cvs for the next update.
I got similar with kernel-smp-2.6.13-1.1526_FC4: Oct 5 03:25:43 alexandria kernel: do_IRQ: stack overflow: 496 Oct 5 03:25:43 alexandria kernel: [<c0105f44>] do_IRQ+0x84/0x86 Oct 5 03:25:43 alexandria kernel: Unable to handle kernel paging request at virtual addr ess bec93e70 Oct 5 03:25:43 alexandria kernel: printing eip: Oct 5 03:25:43 alexandria kernel: c012180d Oct 5 03:25:43 alexandria kernel: *pde = 00000000 Oct 5 03:25:43 alexandria kernel: Oops: 0000 [#1] Oct 5 03:25:43 alexandria kernel: SMP Oct 5 03:25:43 alexandria kernel: Modules linked in: nfs nfsd exportfs lockd nfs_acl ipv 6 autofs4 w83627hf w83781d adm1021 i2c_sensor i2c_isa sunrpc jfs video button battery ac uhci_hcd hw_random i2c_i801 i2c_core shpchp eepro100 mii e1000 dm_snapshot dm_zero dm_mir ror ext3 jbd raid5 xor raid1 dm_mod mv_sata(U) sd_mod scsi_mod Oct 5 03:25:43 alexandria kernel: CPU: -195358332 Oct 5 03:25:43 alexandria kernel: EIP: 0060:[<c012180d>] Not tainted VLI Oct 5 03:25:43 alexandria kernel: EFLAGS: 00010086 (2.6.13-1.1526_FC4smp) Oct 5 03:25:43 alexandria kernel: EIP is at vprintk+0x1a7/0x2aa Oct 5 03:25:43 alexandria kernel: eax: f45b109c ebx: 00000000 ecx: 00020000 edx: c 0466841 Oct 5 03:25:43 alexandria kernel: esi: 00000001 edi: 00000082 ebp: 00000010 esp: f 45b1184 Oct 5 03:25:43 alexandria kernel: ds: 007b es: 007b ss: 0068 Oct 5 03:25:43 alexandria kernel: Process (pid: -195358164, threadinfo=f45b1000 task=f4 5b1184) Oct 5 03:25:43 alexandria kernel: Stack: f45b120c c0121a42 00000000 c0466840 0000001f 00 000086 00000000 c012185a Oct 5 03:25:43 alexandria kernel: c0466841 00000000 00000000 00000000 00000000 00 000000 00000000 00000000 Oct 5 03:25:43 alexandria kernel: c046685c 00000000 00000000 00000000 00000000 00 000000 00000000 00000000 Oct 5 03:25:43 alexandria kernel: Call Trace: Oct 5 03:25:43 alexandria kernel: [<c0121a42>] release_console_sem+0xad/0xb5 Oct 5 03:25:43 alexandria kernel: [<c012185a>] vprintk+0x1f4/0x2aa Oct 5 03:25:43 alexandria kernel: [<c0105f44>] do_IRQ+0x84/0x86 Oct 5 03:25:43 alexandria kernel: [<c0121662>] printk+0x1b/0x1f Oct 5 03:25:43 alexandria kernel: [<c01047b5>] show_trace+0x56/0x78 Oct 5 03:25:43 alexandria kernel: [<c0105f44>] do_IRQ+0x84/0x86 Oct 5 03:25:43 alexandria kernel: [<c01048b2>] dump_stack+0x13/0x17 Oct 5 03:25:43 alexandria kernel: [<c0105f44>] do_IRQ+0x84/0x86 Oct 5 03:25:43 alexandria kernel: [<c0104392>] common_interrupt+0x1a/0x20 Oct 5 03:25:43 alexandria kernel: [<c024e3b0>] get_io_context+0xc/0xd Oct 5 03:25:43 alexandria kernel: [<c0255e24>] cfq_get_io_context+0x18/0xcd Oct 5 03:25:43 alexandria kernel: [<c02566ff>] cfq_set_request+0x69/0x225 Oct 5 03:25:43 alexandria kernel: [<c0256696>] cfq_set_request+0x0/0x225 Oct 5 03:25:43 alexandria kernel: [<c024a2cf>] elv_set_request+0x1e/0x33 Oct 5 03:25:43 alexandria kernel: [<c024c86f>] get_request+0xfd/0x2af Oct 5 03:25:43 alexandria kernel: [<c024ca3a>] get_request_wait+0x19/0xfb Oct 5 03:25:43 alexandria kernel: [<f889f6f5>] commandsQueueAddTail+0x71/0x80 [mv_sata] Oct 5 03:25:43 alexandria kernel: [<c024d31f>] __make_request+0xa7/0x4c3 Oct 5 03:25:43 alexandria kernel: [<f88a0000>] _doDevErrorRecovery+0x2e/0x46 [mv_sata] Oct 5 03:25:43 alexandria kernel: [<c024da21>] generic_make_request+0x9a/0x24b Oct 5 03:25:43 alexandria kernel: [<f881ae58>] compute_blocknr+0xe5/0x16e [raid5] Oct 5 03:25:43 alexandria kernel: [<c01347c2>] autoremove_wake_function+0x0/0x37 Oct 5 03:25:43 alexandria kernel: [<f881c0d2>] handle_stripe+0x721/0x1079 [raid5] Oct 5 03:25:43 alexandria kernel: [<f881ab41>] raid5_build_block+0x66/0x70 [raid5] Oct 5 03:25:43 alexandria kernel: [<f881a3ff>] get_active_stripe+0x1a0/0x393 [raid5] Oct 5 03:25:43 alexandria kernel: [<f881ced4>] make_request+0x2cf/0x300 [raid5] Oct 5 03:25:43 alexandria kernel: [<c01347c2>] autoremove_wake_function+0x0/0x37 Oct 5 03:25:43 alexandria kernel: [<c024da21>] generic_make_request+0x9a/0x24b Oct 5 03:25:43 alexandria kernel: [<c0169012>] bio_clone+0xa5/0xb6 Oct 5 03:25:43 alexandria kernel: [<c01347c2>] autoremove_wake_function+0x0/0x37 Oct 5 03:25:43 alexandria kernel: [<f886854d>] __clone_and_map+0xb3/0x328 [dm_mod] Oct 5 03:25:43 alexandria kernel: [<c0148ce1>] mempool_alloc+0x26/0xe7 Oct 5 03:25:43 alexandria kernel: [<f8868894>] __split_bio+0xd2/0x114 [dm_mod] Oct 5 03:25:43 alexandria kernel: [<f8868954>] dm_request+0x7e/0x94 [dm_mod] Oct 5 03:25:43 alexandria kernel: [<c024da21>] generic_make_request+0x9a/0x24b Oct 5 03:25:43 alexandria kernel: [<c01347d7>] autoremove_wake_function+0x15/0x37 Oct 5 03:25:43 alexandria kernel: [<c01347c2>] autoremove_wake_function+0x0/0x37 Oct 5 03:25:43 alexandria kernel: [<c024dc17>] submit_bio+0x45/0xcb Oct 5 03:25:43 alexandria kernel: [<c0148ce1>] mempool_alloc+0x26/0xe7 Oct 5 03:25:43 alexandria kernel: [<c0149f39>] buffered_rmqueue+0xc6/0x228 Oct 5 03:25:43 alexandria kernel: [<c01691f6>] bio_add_page+0x26/0x2c Oct 5 03:25:43 alexandria kernel: [<f8b72a58>] metapage_readpage+0x186/0x1c5 [jfs] Oct 5 03:25:43 alexandria kernel: [<c0147291>] read_cache_page+0x88/0x137 Oct 5 03:25:43 alexandria kernel: [<f8b728d2>] metapage_readpage+0x0/0x1c5 [jfs] Oct 5 03:25:43 alexandria kernel: [<f8b72c5e>] __get_metapage+0x112/0x425 [jfs] Oct 5 03:25:43 alexandria kernel: [<c024da21>] generic_make_request+0x9a/0x24b Oct 5 03:25:43 alexandria kernel: [<f8b5d459>] xtSearch+0x3ef/0x739 [jfs] Oct 5 03:25:43 alexandria kernel: [<c0169012>] bio_clone+0xa5/0xb6 Oct 5 03:25:43 alexandria kernel: [<f8b5c644>] xtLookup+0xc4/0x236 [jfs] Oct 5 03:25:43 alexandria kernel: [<f8868954>] dm_request+0x7e/0x94 [dm_mod] Oct 5 03:25:43 alexandria kernel: [<c024da21>] generic_make_request+0x9a/0x24b Oct 5 03:25:43 alexandria kernel: [<f886854d>] __clone_and_map+0xb3/0x328 [dm_mod] Oct 5 03:25:43 alexandria kernel: [<f8b7252e>] metapage_get_blocks+0xa3/0xe4 [jfs] Oct 5 03:25:43 alexandria kernel: [<f8b72798>] metapage_writepage+0xa8/0x1e2 [jfs] Oct 5 03:25:43 alexandria kernel: [<c0186900>] mpage_writepages+0x227/0x3ee Oct 5 03:25:44 alexandria kernel: [<f8b726f0>] metapage_writepage+0x0/0x1e2 [jfs] Oct 5 03:25:44 alexandria kernel: [<c01456a8>] __filemap_fdatawrite_range+0x66/0x72 Oct 5 03:25:44 alexandria kernel: [<c0145725>] filemap_flush+0x23/0x27 Oct 5 03:25:44 alexandria kernel: [<f8b740c9>] lmLogSync+0x15d/0x1ed [jfs] Oct 5 03:25:44 alexandria kernel: [<f8b7358a>] lmLog+0x7a/0x194 [jfs] Oct 5 03:25:44 alexandria kernel: [<f8b77799>] diLog+0xf1/0x103 [jfs] Oct 5 03:25:44 alexandria kernel: [<f8b7764a>] txLog+0xb1/0x10f [jfs] Oct 5 03:25:44 alexandria kernel: [<f8b774ab>] txCommit+0x1fa/0x2e8 [jfs] Oct 5 03:25:44 alexandria kernel: [<f8b58d84>] jfs_commit_inode+0x109/0x11c [jfs] Oct 5 03:25:44 alexandria kernel: [<f8b71efe>] extAlloc+0x3ae/0x471 [jfs] Oct 5 03:25:44 alexandria kernel: [<f8b59198>] jfs_get_blocks+0x274/0x2cd [jfs] Oct 5 03:25:44 alexandria kernel: [<f8b59211>] jfs_get_block+0x20/0x25 [jfs] Oct 5 03:25:44 alexandria kernel: [<c0167cb8>] nobh_prepare_write+0x13d/0x3f4 Oct 5 03:25:44 alexandria kernel: [<c014a22a>] __alloc_pages+0xfe/0x44e Oct 5 03:25:44 alexandria kernel: [<c0145b01>] add_to_page_cache+0x4e/0xaf Oct 5 03:25:44 alexandria kernel: [<f8b5924d>] jfs_prepare_write+0x0/0x15 [jfs] Oct 5 03:25:44 alexandria kernel: [<c0147ada>] generic_file_buffered_write+0x298/0x642 Oct 5 03:25:44 alexandria kernel: [<f8b591f1>] jfs_get_block+0x0/0x25 [jfs] Oct 5 03:25:44 alexandria kernel: [<c0126089>] current_fs_time+0x5a/0x75 Oct 5 03:25:44 alexandria kernel: [<c017d3a4>] inode_update_time+0x2d/0x9b Oct 5 03:25:44 alexandria kernel: [<c0148128>] __generic_file_aio_write_nolock+0x2a4/0x 4d2 Oct 5 03:25:44 alexandria kernel: [<c02b35f4>] sock_common_recvmsg+0x41/0x57 Oct 5 03:25:44 alexandria kernel: [<c0148471>] __generic_file_write_nolock+0x89/0xa3 Oct 5 03:25:44 alexandria kernel: [<f9302813>] svc_expkey_lookup+0x371/0x3ef [nfsd] Oct 5 03:25:44 alexandria kernel: [<c01347c2>] autoremove_wake_function+0x0/0x37 Oct 5 03:25:44 alexandria kernel: [<c01487be>] generic_file_writev+0x49/0xb3 Oct 5 03:25:44 alexandria kernel: [<c0148775>] generic_file_writev+0x0/0xb3 Oct 5 03:25:44 alexandria kernel: [<c01647c0>] do_readv_writev+0x1f4/0x271 Oct 5 03:25:44 alexandria kernel: [<c014860d>] generic_file_write+0x0/0xc5 Oct 5 03:25:44 alexandria kernel: [<f8b58ae9>] jfs_open+0xd/0x87 [jfs] Oct 5 03:25:44 alexandria kernel: [<c01635dc>] dentry_open+0x16f/0x1e8 Oct 5 03:25:44 alexandria kernel: [<c01648cb>] vfs_writev+0x3d/0x53 Oct 5 03:25:44 alexandria kernel: [<f92ff9db>] nfsd_write+0x31a/0x72c [nfsd] Oct 5 03:25:44 alexandria kernel: [<c031657b>] schedule+0x53b/0xb8e Oct 5 03:25:44 alexandria kernel: [<c016cc09>] vfs_getattr+0x52/0xa2 Oct 5 03:25:44 alexandria kernel: [<f9308b94>] nfs3svc_decode_writeargs+0x0/0x17d [nfsd ] Oct 5 03:25:44 alexandria kernel: [<f930712c>] nfsd3_proc_write+0xf9/0x121 [nfsd] Oct 5 03:25:44 alexandria kernel: [<f9308b94>] nfs3svc_decode_writeargs+0x0/0x17d [nfsd ] Oct 5 03:25:44 alexandria kernel: [<f92fb5e4>] nfsd_dispatch+0x76/0x1c2 [nfsd] Oct 5 03:25:44 alexandria kernel: [<f924d047>] svc_authenticate+0x97/0xae [sunrpc] Oct 5 03:25:44 alexandria kernel: [<f924a7c3>] svc_process+0x3b4/0x671 [sunrpc] Oct 5 03:25:44 alexandria kernel: [<f92fb3ab>] nfsd+0x184/0x347 [nfsd] Oct 5 03:25:44 alexandria kernel: [<f92fb227>] nfsd+0x0/0x347 [nfsd] Oct 5 03:25:44 alexandria kernel: [<c0101ca1>] kernel_thread_helper+0x5/0xb Oct 5 03:25:44 alexandria kernel: ======================= Oct 5 03:25:44 alexandria kernel: Unable to handle kernel NULL pointer dereference at vi rtual address 00000001 Oct 5 03:25:44 alexandria kernel: printing eip: Oct 5 03:25:44 alexandria kernel: c010477d Oct 5 03:25:44 alexandria kernel: *pde = 01b42001
From User-Agent: XML-RPC kernel-2.6.12-1.1380_FC3 has been pushed for FC3, which should resolve this issue. If these problems are still present in this version, then please make note of it in this bug report.
Is there a fix for the FC4 kernel?
This kernel hangs (no crash) during boot trying to remount the root file system.
Upgrading on FC3 from 1378 to 1380 and rebooting hangs the machine during/immediatly after the file system check on the MD device. Rolling back to 1378 allows the machine to boot properly.
1381 fixes the hang, but it does so by reverting the change that I was hoping would fix this bug.
Yes, I can confirm that 1381smp still has the bug. I don't have a serial console, and so can't provide screen dumps, but I also get a stack overflow. I have a SMP dual Xeon system built on a Supermicro X5DAL-G and two 3ware 7506-4 RAID cards in JBOD mode to support eight 400GB drives in JFS-on-LVM-on-software RAID. The system normally runs fine as a media file server, but when I stress it (e.g., copy a file to it over NFS, as opposed to pulling from it) there's a fair chance it'll randomly hang. Right now I'm using 2.6.9-1.667smp; reading the above gives me hope it'll do until the bug gets fixed.
I can now also confirm that kernel-smp-2.6.9-1.667 also has the issue. Identical symptoms; high load (BitTorrent and a RAID 5 rebuild, with a large diff-over-NFS acting as the feather that broke the camel's back) leads to stack overflow on an otherwise-stable system. Can someone confirm that switching to Fedora Core 4 or RHEL gets rid of this issue?
every release of every distro has this problem right now. It's still being worked out upstream, (A patch appeared this afternoon which may solve the issue). I'll build an FC3 kernel with it soon for testing.
Using kernel 1381 I was working with a mail file over 1.3MB and I got this error fseek: Invalid argument panic: temporary file seek Aborted Rolling back to 1378, didn't exhibit this problem.
(In reply to comment #10) > every release of every distro has this problem right now. It's still being > worked out upstream, (A patch appeared this afternoon which may solve the issue). > I'll build an FC3 kernel with it soon for testing. Glad to hear it. Meanwhile, I've rolled 1381 from source with the 4K stack turned off; hopefully 8K will be enough to keep the system stable. If not, I'll try a 1381-with-16K-stack-patch variant that Linuxant has made available.
Any ETA on a fix? FC3 will be moved to legacy shortly and I would love to upgrade to FC4 or FC5, but am unwilling to move from my current working kernel.
As mentioned above, I've been running 1381 with the 4K stack turned off for the past six weeks (never had to try the Conexant kernels, but I'm sure they'd do just as well) and I am happy to report that it has worked out well; no more stack overflows! Yay!
This is a mass-update to all currently open Fedora Core 3 kernel bugs. Fedora Core 3 support has transitioned to the Fedora Legacy project. Due to the limited resources of this project, typically only updates for new security issues are released. As this bug isn't security related, it has been migrated to a Fedora Core 4 bug. Please upgrade to this newer release, and test if this bug is still present there. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. Thank you.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Received the mass email regarding the potential closing of this bug. Unfortunately, due to current requirements of my server, I am unable to upgrade from FC3 and am not willing to change my kernel 2.6.10-1.770_FC3. I don't see any specific information which indicates that this patch has been fixed. Hopefully I will be able to upgrade in about 1 months time (when the server isn't required to be 100% available) and will be able to test out any new kernels.
The important line from the changelog regarding this bug is this... - Reduce block layer stack usage. let me know how it works out when you get a chance.
OK, I finally upgraded from FC3 to FC4 and my testing of the 2.6.15-1.2054_FC5 kernel is very positive. I have been doing a severe amount of IO to the box (I copied off all my data, added a disk to my software raid array, rebuilt the server and copied back on all my data ~400GB worth) and it hasn't crashed. This would certainly have caused the FC3 kernels a lot of trouble. The bug appears squashed. Thanks.
[This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you.
I confirm this bugs exists in FC5 kernel 2.6.15-1.2054_FC5smp. This bug still occurs on a high disk IO load. I'm using Dual Xeon 2.8Ghz\Areca ARC 1120 sata raid.
2.6.15 is pretty ancient now, try with the 2.6.18 update that went out today.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Looks like this bug is alive and well in all kernels up to and including 2.6.18-1.2239.fc5smp. I've been experiencing crashes under heavy I/O on my mythbacked box ever while recording 2 HD shows, and one standard def at the same time. This has been going on since ~2.6.16 but I was finally able to get something to dump to the netconsole last night. Here's what I got from the first crash: Nov 12 20:26:37 chef BUG: sleeping function called from invalid context at kernel/sched.c:4509 Nov 12 20:26:37 chef in_atomic():1, irqs_disabled():0 Nov 12 20:26:37 chef [<c04050ef>] dump_trace+0x69/0x1af Nov 12 20:26:37 chef [<c040524d>] show_trace_log_lvl+0x18/0x2c Nov 12 20:26:37 chef [<c0405800>] show_trace+0xf/0x11 Nov 12 20:26:37 chef [<c04058fa>] dump_stack+0x15/0x17 Nov 12 20:26:37 chef [<c0420c06>] __cond_resched+0x12/0x3c Nov 12 20:26:37 chef [<c060e4bc>] cond_resched+0x2a/0x31 Nov 12 20:26:37 chef BUG: unable to handle kernel paging request at virtual address ffa3d283 Nov 12 20:26:37 chef printing eip: Nov 12 20:45:57 chef do_IRQ: stack overflow: 500 and from the second crash: Nov 13 21:56:46 chef do_IRQ: stack overflow: 500 Nov 13 21:56:46 chef [<c04050ef>] Nov 13 21:56:46 chef dump_trace+0x69/0x1af I don't have a free serial port so I can't do a serial console and this seems to be all that I can get on the netconsole. Is there anything else I can do to get a good dump?
Adding PCI and module info [root@chef ~]# lsmod Module Size Used by netconsole 7649 0 nfsd 221169 17 exportfs 10177 1 nfsd lockd 66505 2 nfsd nfs_acl 8001 1 nfsd autofs4 25669 1 sunrpc 158589 12 nfsd,lockd,nfs_acl ext3 136137 1 jbd 63593 1 ext3 raid1 27201 1 video 21317 0 sbs 20353 0 i2c_ec 9537 1 sbs button 11217 0 battery 14533 0 asus_acpi 20825 0 ac 9669 0 ipv6 267361 26 lp 17161 0 parport_pc 31461 1 parport 41097 2 lp,parport_pc wm8775 10317 0 cx25840 28113 0 lirc_atiusb 19360 1 lirc_dev 17044 1 lirc_atiusb cx88_blackbird 22853 0 tuner 63221 0 cx88_dvb 19941 1 nvidia 4537876 20 cx8800 38349 1 cx88_blackbird cx8802 17221 2 cx88_blackbird,cx88_dvb snd_hda_intel 20760 0 cx88xx 65637 4 cx88_blackbird,cx88_dvb,cx8800,cx8802 snd_hda_codec 163328 1 snd_hda_intel cx88_vp3054_i2c 8897 1 cx88_dvb ivtv 170128 0 snd_seq_dummy 7428 0 b2c2_flexcop_pci 13145 0 b2c2_flexcop 32469 1 b2c2_flexcop_pci ir_common 32325 1 cx88xx snd_seq_oss 36736 0 or51132 14277 1 cx88_dvb video_buf_dvb 11077 1 cx88_dvb mt352 10949 2 cx88_dvb,b2c2_flexcop compat_ioctl32 5697 1 cx8800 i2c_algo_bit 13001 3 cx88xx,cx88_vp3054_i2c,ivtv snd_seq_midi_event 11136 1 snd_seq_oss mt312 12356 1 b2c2_flexcop cx2341x 15429 2 cx88_blackbird,ivtv snd_seq 54128 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event video_buf 29253 6 cx88_blackbird,cx88_dvb,cx8800,cx8802,cx88xx,video_buf_dvb btcx_risc 9289 3 cx8800,cx8802,cx88xx bcm3510 14021 1 b2c2_flexcop snd_seq_device 11788 3 snd_seq_dummy,snd_seq_oss,snd_seq dvb_pll 18885 2 cx88_dvb,b2c2_flexcop tveeprom 18769 2 cx88xx,ivtv snd_pcm_oss 44416 0 stv0299 14921 1 b2c2_flexcop isl6421 6721 1 cx88_dvb zl10353 9797 1 cx88_dvb dvb_core 83689 3 b2c2_flexcop,video_buf_dvb,stv0299 stv0297 12361 1 b2c2_flexcop videodev 27201 4 cx88_blackbird,cx8800,cx88xx,ivtv snd_mixer_oss 19840 1 snd_pcm_oss cx24123 16329 1 cx88_dvb snd_pcm 77956 3 snd_hda_intel,snd_hda_codec,snd_pcm_oss nxt200x 17733 2 cx88_dvb,b2c2_flexcop cx22702 10565 1 cx88_dvb lgdt330x 12509 2 cx88_dvb,b2c2_flexcop v4l1_compat 16581 3 cx8800,ivtv,videodev snd_timer 23684 2 snd_seq,snd_pcm v4l2_common 26433 7 cx25840,cx88_blackbird,tuner,cx8800,ivtv,cx2341x,videodev snd 53380 10 snd_hda_intel,snd_hda_codec,snd_seq_dummy,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer r8169 34761 0 ide_cd 42593 2 sg 38621 0 uhci_hcd 28109 0 serio_raw 11461 0 ehci_hcd 36173 0 soundcore 14241 1 snd i2c_i801 11981 0 i2c_core 25793 25 i2c_ec,wm8775,cx25840,tuner,cx88_dvb,nvidia,cx88xx,ivtv,b2c2_flexcop,or51132,mt352,i2c_algo_bit,mt312,bcm3510,dvb_pll,tveeprom,stv0299,isl6421,zl10353,stv0297,cx24123,nxt200x,cx22702,lgdt330x,i2c_i801 snd_page_alloc 12168 2 snd_hda_intel,snd_pcm cdrom 38881 1 ide_cd pcspkr 7489 0 dm_snapshot 21357 0 dm_zero 6337 0 dm_mirror 32913 0 dm_mod 61273 16 dm_snapshot,dm_zero,dm_mirror raid0 12225 1 xfs 526853 2 ata_piix 18121 4 sata_sil 15945 0 libata 103001 2 ata_piix,sata_sil sd_mod 24897 16 scsi_mod 138601 3 sg,libata,sd_mod [root@chef ~]# lspci 00:00.0 Host bridge: Intel Corporation 945G/P Memory Controller Hub (rev 81) 00:01.0 PCI bridge: Intel Corporation 945G/P PCI Express Graphics Port (rev 81) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) Serial ATA Storage Controllers cc=IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 VGA compatible controller: nVidia Corporation GeForce 6200 TurboCache(TM) (rev a1) 02:01.0 Multimedia video controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder (rev 05) 02:01.2 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder [MPEG Port] (rev 05) 02:02.0 Multimedia video controller: Internext Compression Inc iTVC16 (CX23416) MPEG-2 Encoder (rev 01) 02:03.0 Network controller: Techsan Electronics Co Ltd B2C2 FlexCopII DVB chip / Technisat SkyStar2 DVB card (rev 02) 02:05.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
you have a mixture of proprietary modules, and out-of-tree modules loaded that complicate the situation. There's nothing that can be fixed in the Fedora kernel related to these, and we can't rule out that they're involved or not. In the absense of follow-up from the original reporter, I'm closing this out. If you can reproduce it on a current kernel using just Fedora kernel modules, please open up a new bug.
As the original report of the bug, I have not had a problem with recent kernels. I have an ever greater number of disks attached and hit the server harder than when I reported the bug and it appears stable. That doesn't say, though, that their isn't a bug :)