Bug 217858

Summary: repeated stack overflow on fc5 2.6.18-based kernels
Product: [Fedora] Fedora Reporter: Jeff Layton <jlayton>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: rhbz001, steved, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.20-1.2933.fc6xen Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-04 19:07:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeff Layton 2006-11-30 14:01:11 UTC
I have a machine at home that has been happily running FC5 since close to its
release. Since the kernel was upgraded to 2.6.18, I seem to be getting stack
overflow messages and crashes within about 24 hours of uptime. Here's my most
recent one:

do_IRQ: stack overflow: 488

 [<c0403f10>] dump_trace+0x69/0x1af
 [<c040406e>] show_trace_log_lvl+0x18/0x2c
 [<c04045e9>] show_trace+0xf/0x11
 [<c0404673>] dump_stack+0x15/0x17
 [<c040533c>] do_IRQ+0x69/0xb8
 [<c04037ca>] common_interrupt+0x1a/0x20
DWARF2 unwinder stuck at common_interrupt+0x1a/0x20
Leftover inexact backtrace:
 [<c045d75c>] kmem_cache_alloc+0x47/0x4f
 [<c0448c4a>] mempool_alloc+0x37/0xd3
 [<c04d74e0>] __delay+0x6/0x7
 [<f88c087e>] ata_tf_load+0x1cc/0x237 [libata]
 [<c04d1f3d>] cfq_set_request+0x21f/0x343
 [<f88bcd88>] ata_qc_issue_prot+0xf0/0x238 [libata]
 [<c04d1d1e>] cfq_set_request+0x0/0x343
 [<c04c6e1b>] elv_set_request+0x1e/0x2d
 [<c04c970a>] get_request+0x161/0x31a
 [<c04d1479>] cfq_insert_request+0x42/0x4df
 [<c04c9ec7>] get_request_wait+0x1b/0x15e
 [<c04c769c>] elv_insert+0x12d/0x1d6
 [<c04ca0fa>] blk_plug_device+0x6e/0xb5
 [<c04cb6b1>] __make_request+0x2e2/0x389
 [<c04037ca>] common_interrupt+0x1a/0x20
 [<c04c908f>] generic_make_request+0x29b/0x2ab
 [<c04c6d87>] elv_merged_request+0x16/0x1c
 [<c05fbc9b>] _spin_unlock_irq+0x5/0x7
 [<c04cb72f>] __make_request+0x360/0x389
 [<c0465c2e>] __bio_clone+0x6f/0x8a
 [<f889af99>] make_request+0x15c/0x510 [raid1]
 [<c04d1373>] cfq_add_crq_rb+0xba/0xc3
 [<c04d1479>] cfq_insert_request+0x42/0x4df
 [<c04c908f>] generic_make_request+0x29b/0x2ab
 [<c04c769c>] elv_insert+0x12d/0x1d6
 [<c04037ca>] common_interrupt+0x1a/0x20
 [<f88e482d>] linear_map+0xd/0x2b [dm_mod]
 [<f88e2425>] __map_bio+0xd0/0xfd [dm_mod]
 [<c0465c2e>] __bio_clone+0x6f/0x8a
 [<f88e21ff>] clone_bio+0x24/0x4d [dm_mod]
 [<f88e2c62>] __split_bio+0x17f/0x41d [dm_mod]
 [<c04ca0fa>] blk_plug_device+0x6e/0xb5
 [<c040537f>] do_IRQ+0xac/0xb8
 [<f88e347a>] dm_request+0xb0/0xbd [dm_mod]
 [<c04c908f>] generic_make_request+0x29b/0x2ab
 [<c040537f>] do_IRQ+0xac/0xb8
 [<c05fbca2>] _read_unlock_irq+0x5/0x7
 [<c0462893>] __find_get_block+0xd1/0x13c
 [<c04628d3>] __find_get_block+0x111/0x13c
 [<c04cad7c>] submit_bio+0xae/0xb5
 [<c0448c4a>] mempool_alloc+0x37/0xd3
 [<c04629e7>] __getblk+0xe9/0x2a1
 [<c0465352>] bio_alloc_bioset+0x9b/0xf3
 [<c04622e0>] submit_bh+0xe1/0xff
 [<c0464099>] __bread+0x67/0xa3
 [<f89091bb>] read_block_bitmap+0x2f/0x61 [ext3]
 [<f890a0f7>] ext3_new_blocks+0x23f/0x5e4 [ext3]
 [<f890c5ae>] ext3_mark_iloc_dirty+0x2ea/0x345 [ext3]
 [<f890d0a4>] ext3_get_blocks_handle+0x3bc/0x938 [ext3]
 [<f88d2c12>] do_get_write_access+0x4d3/0x500 [jbd]
 [<f890d967>] ext3_get_block+0xbd/0xd3 [ext3]
 [<c046347f>] __block_prepare_write+0x1b4/0x450
 [<f88d31aa>] journal_start+0xb9/0xea [jbd]
 [<c046373d>] block_prepare_write+0x22/0x2f
 [<f890d8aa>] ext3_get_block+0x0/0xd3 [ext3]
 [<f890ee02>] ext3_prepare_write+0x96/0x155 [ext3]
 [<f890d8aa>] ext3_get_block+0x0/0xd3 [ext3]
 [<f890ed6c>] ext3_prepare_write+0x0/0x155 [ext3]
 [<c0447545>] generic_file_buffered_write+0x246/0x5eb
 [<c041fe40>] current_fs_time+0x45/0x51
 [<c0447c7c>] __generic_file_aio_write_nolock+0x392/0x3dc
 [<c0447e0c>] __generic_file_write_nolock+0x8b/0x9e
 [<c04760cc>] iput+0x3d/0x66
 [<c042c7dc>] autoremove_wake_function+0x0/0x35
 [<c05fade6>] mutex_lock+0x1a/0x27
 [<c0447e5a>] generic_file_writev+0x3b/0xa2
 [<c0447e1f>] generic_file_writev+0x0/0xa2
 [<c046089a>] do_sync_write+0x0/0xfb
 [<c0461033>] do_readv_writev+0x16e/0x287
 [<c0461189>] vfs_writev+0x3d/0x48
 [<f8ecd3f6>] nfsd_vfs_write+0xd1/0x2ad [nfsd]
 [<c04037ca>] common_interrupt+0x1a/0x20
 [<f8ecdba4>] nfsd_write+0x6f/0xe5 [nfsd]
 [<f8ed920c>] nfsd4_proc_compound+0x14b0/0x165c [nfsd]
 [<c04d7cf2>] copy_to_user+0x40/0x56
 [<c05d5b28>] tcp_v4_do_rcv+0x28/0x2ae
 [<c05a32b4>] skb_copy_datagram_iovec+0x53/0x1dc
 [<c04037ca>] common_interrupt+0x1a/0x20
 [<c059e88f>] release_sock+0x63/0xa3
 [<c04201ce>] local_bh_enable_ip+0x25/0x30
 [<c05ca94e>] tcp_recvmsg+0x88e/0x996
 [<c05e08ec>] inet_sendmsg+0x3b/0x45
 [<c059e209>] sock_common_recvmsg+0x3e/0x54
 [<c059c1ab>] sock_recvmsg+0xef/0x10a
 [<c042c7dc>] autoremove_wake_function+0x0/0x35
 [<c05fbc9b>] _spin_unlock_irq+0x5/0x7
 [<c0418400>] enqueue_task+0x29/0x39
 [<c041851a>] __activate_task+0x1c/0x29
 [<c04189e5>] try_to_wake_up+0xdf/0xea
 [<c0417e83>] __wake_up_common+0x2f/0x53
 [<c04182a8>] __wake_up+0x2a/0x3d
 [<f8e045e5>] svc_sock_enqueue+0x1ee/0x230 [sunrpc]
 [<c05fbc03>] _spin_unlock_bh+0x5/0xd
 [<f8e05bef>] svc_tcp_recvfrom+0x709/0x77a [sunrpc]
 [<c040537f>] do_IRQ+0xac/0xb8
 [<c04037ca>] common_interrupt+0x1a/0x20
 [<f8edb0de>] nfsd4_decode_compound+0x326/0xdb8 [nfsd]
 [<f8e09567>] sunrpc_cache_lookup+0x4b/0xf9 [sunrpc]
 [<f8eca0d5>] nfsd_dispatch+0xc5/0x180 [nfsd]
 [<f8e03b32>] svc_process+0x3bd/0x62f [sunrpc]
 [<c04037ca>] common_interrupt+0x1a/0x20
 [<f8eca5cd>] nfsd+0x197/0x2de [nfsd]
 [<f8eca436>] nfsd+0x0/0x2de [nfsd]
 [<c0403ac7>] kernel_thread_helper+0x7/0x10
BUG: unable to handle kernel paging request at virtual address 3178302f
 printing eip:
c0404001
*pde = 00000000
Oops: 0000 [#1]
last sysfs file: /block/sda/sda1/size
Modules linked in: md5 nfsd exportfs lockd nfs_acl ipv6 autofs4 hidp l2cap
bluetooth rpcsec_gss_krb5 auth_rpcgss des sunrpc loop video sbs i2c_ec container
button usblp battery asus_acpi ac usb_storage lp parport_pc parport ohci1394
ehci_hcd ieee1394 uhci_hcd st floppy snd_via82xx gameport snd_ac97_codec
snd_ac97_bus snd_seq_dummy snd_seq_oss sg snd_seq_midi_event snd_seq snd_pcm_oss
serio_raw snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart
snd_rawmidi snd_seq_device i2c_viapro snd pcspkr i2c_core soundcore ide_cd cdrom
r8169 dm_snapshot dm_zero dm_mirror dm_mod raid1 ext3 jbd sata_via libata
sym53c8xx scsi_transport_spi sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c0404001>]    Not tainted VLI
EFLAGS: 00010083   (2.6.18-1.2239.fc5 #1) 
EIP is at dump_trace+0x15a/0x1af
eax: 31783ffd   ebx: 00000000   ecx: 3178302f   edx: c063b1fe
esi: 3178302f   edi: 31783000   ebp: c061dd25   esp: f5c50198
ds: 007b   es: 007b   ss: 0068
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000010a
 printing eip:
c05fc9d1
*pde = 52dca067
Oops: 0000 [#2]
last sysfs file: /block/sda/sda1/size
Modules linked in: md5 nfsd exportfs lockd nfs_acl ipv6 autofs4 hidp l2cap
bluetooth rpcsec_gss_krb5 auth_rpcgss des sunrpc loop video sbs i2c_ec container
button usblp battery asus_acpi ac usb_storage lp parport_pc parport ohci1394
ehci_hcd ieee1394 uhci_hcd st floppy snd_via82xx gameport snd_ac97_codec
snd_ac97_bus snd_seq_dummy snd_seq_oss sg snd_seq_midi_event snd_seq snd_pcm_oss
serio_raw snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart
snd_rawmidi snd_seq_device i2c_viapro snd pcspkr i2c_core soundcore ide_cd cdrom
r8169 dm_snapshot dm_zero dm_mirror dm_mod raid1 ext3 jbd sata_via libata
sym53c8xx scsi_transport_spi sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c05fc9d1>]    Not tainted VLI
EFLAGS: 00010046   (2.6.18-1.2239.fc5 #1) 
EIP is at do_page_fault+0x11f/0x4db
eax: f5c50070   ebx: 00000002   ecx: f5c50048   edx: 0000000d
esi: 0000008a   edi: 00000086   ebp: 00000000   esp: f5c50030
ds: 007b   es: 007b   ss: 0068
Process psb_ho�
 *
ho�
, (pid: -292779072, ti=f5c4f000 task=e452f040 task.ti=f5c4f000)
Stack: 00000022 00000086 f5c50070 c0768dec 00000086 00000000 f5c50070 c061b47d 
       00000000 0000000e 0000000b 00000002 00010083 c05fc8b2 f5c50198 c04038a1 
       00000002 f5c50000 00000086 00010083 f5c50164 f5c50198 00000022 0000007b 
Call Trace:


...at which point I hit the reset switch on the box. The machine is very stable
on 2.6.17 kernels.

I've seen these crashes on all current 2.6.18-based fedora kernels (regular and
xen0), the one above was from:

kernel-2.6.18-1.2239.fc5

...the box has been very stable while running 2.6.17-based kernels. I've had
weeks of uptime on:

kernel-xen0-2.6.17-1.2187_FC5

Comment 1 Jeff Layton 2007-04-04 19:07:03 UTC
I've since updated to fc6 and haven't seen this any longer...