Bug 513350
Summary: | Kernel BUG at net/core/skbuff.c:94 running connectathon on PriorityHardware | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jan Tluka <jtluka> |
Component: | kernel | Assignee: | Andy Gospodarek <agospoda> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 5.4 | CC: | dfeng, dhoward, nhorman, peterm, rnickel, tgraf |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-08-29 13:52:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 525215, 533192 |
Description
Jan Tluka
2009-07-23 08:42:51 UTC
From this error is looks like the sk_buff is only 1662 bytes long rather than the expected 2k byte length. Could this be because if e1000_alloc_rx_buffers finds an skb it just presumes it to be good regardless of the size? That of course wouldn't completely explain why we are seeing a buffer that is too small, but it could be a start. Just as an update... Someone installed 2.6.30-rc4 on this box a while ago and it fails the connectathon test. I installed the latest Intel sourceforge driver and it does as well. So either e1000e is broken in a lot of places or this really isn't an e1000e problem.... -157.wl5 with e1000e v1.0.2.5 from sf.net. Almost identical behaviour to upstream, iirc. nfs: server sol10-nfs not responding, still trying 0000:00:19.0: eth0: Detected Tx Unit Hang: TDH <ea> TDT <ed> next_to_use <ed> next_to_clean <e9> buffer_info[next_to_clean]: time_stamp <10010f2f4> next_to_watch <ea> jiffies <10010fe03> next_to_watch.status <0> 0000:00:19.0: eth0: Detected Tx Unit Hang: TDH <ea> TDT <ed> next_to_use <ed> next_to_clean <e9> buffer_info[next_to_clean]: time_stamp <10010f2f4> next_to_watch <ea> jiffies <1001105d4> next_to_watch.status <0> 0000:00:19.0: eth0: Detected Tx Unit Hang: TDH <ea> TDT <ed> next_to_use <ed> next_to_clean <e9> buffer_info[next_to_clean]: time_stamp <10010f2f4> next_to_watch <ea> jiffies <100110da5> next_to_watch.status <0> 0000:00:19.0: eth0: Detected Tx Unit Hang: TDH <ea> TDT <ed> next_to_use <ed> next_to_clean <e9> buffer_info[next_to_clean]: time_stamp <10010f2f4> next_to_watch <ea> jiffies <1001115da> next_to_watch.status <0> 0000:00:19.0: eth0: Detected Tx Unit Hang: TDH <ea> TDT <ed> next_to_use <ed> next_to_clean <e9> buffer_info[next_to_clean]: time_stamp <10010f2f4> next_to_watch <ea> jiffies <100111dab> next_to_watch.status <0> Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff8000d041>] put_page+0x0/0x2e PGD 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:19.0/irq CPU 0 Modules linked in: nfs fscache nfs_acl e1000e(U) autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc sr_mod snd_hwdep cdrom i2c_i801 snd i2c_core parport_serial shpchp parport_pc soundcore parport sg pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 14, comm: events/0 Tainted: G 2.6.18-157.el5 #1 RIP: 0010:[<ffffffff8000d041>] [<ffffffff8000d041>] put_page+0x0/0x2e RSP: 0018:ffff810137f11d98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8100a86457c0 RCX: 0000000000000002 RDX: ffff81008ef57e80 RSI: 000000008ef58000 RDI: 0000000000000000 RBP: 0000000000000001 R08: ffff81008ef59000 R09: 0000000000000000 R10: ffff81012a2f2000 R11: 0000000000000100 R12: ffffc200000bf000 R13: ffff81013787c800 R14: ffff81012aad4940 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff803c1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000012a8e2000 CR4: 00000000000006e0 Process events/0 (pid: 14, threadinfo ffff810137f10000, task ffff810137f067a0) Stack: ffffffff80224d8f ffff810121bf6bc0 ffff8100a86457c0 ffff8101283ac500 ffffffff80028ffe ffff8101283ac000 ffffffff885cc070 ffff8101283ac500 ffff8101283ac000 ffff8101283ac500 ffff8101283ac000 0000000000000282 Call Trace: [<ffffffff80224d8f>] skb_release_data+0x5f/0x99 [<ffffffff80028ffe>] __kfree_skb+0x11/0x1a [<ffffffff885cc070>] :e1000e:e1000_clean_rx_ring+0x110/0x1f0 [<ffffffff885cc520>] :e1000e:e1000_reset_task+0x0/0x10 [<ffffffff885cc4c0>] :e1000e:e1000e_down+0x100/0x110 [<ffffffff885cc509>] :e1000e:e1000e_reinit_locked+0x39/0x50 [<ffffffff8004de1c>] run_workqueue+0x94/0xe4 [<ffffffff8004a680>] worker_thread+0x0/0x122 [<ffffffff8004a770>] worker_thread+0xf0/0x122 [<ffffffff8008cde1>] default_wake_function+0x0/0xe [<ffffffff80033282>] kthread+0xfe/0x132 [<ffffffff8005efb1>] child_rip+0xa/0x11 [<ffffffff80033184>] kthread+0x0/0x132 [<ffffffff8005efa7>] child_rip+0x0/0x11 Code: 8b 07 f6 c4 40 74 05 e9 05 1e 02 00 8b 47 08 85 c0 75 0a 0f RIP [<ffffffff8000d041>] put_page+0x0/0x2e RSP <ffff810137f11d98> CR2: 0000000000000000 <0>Kernel panic - not syncing: Fatal exception Message frombad magic number for tty struct (136:0) in tty_write syslogd@ at Monbad magic number for tty struct (136:0) in release_dev Failure on -159debug: ***** Summary for server 'rhel3-nfs': '0' tests failed ***** NFS version Type Test Return code nfsvers=2 udp -b:base 0 nfsvers=2 udp -g:general 0 nfsvers=2 udp -s:special 0 nfsvers=2 udp -l:lock 0 nfsvers=2 tcp -b:base 0 nfsvers=2 tcp -g:general 0 nfsvers=2 tcp -s:special 0 nfsvers=2 tcp -l:lock 0 nfsvers=3 udp -b:base 0 nfsvers=3 udp -g:general 0 nfsvers=3 udp -s:special 0 nfsvers=3 udp -l:lock 0 nfsvers=3 tcp -b:base 0 nfsvers=3 tcp -g:general 0 nfsvers=3 tcp -s:special 0 nfsvers=3 tcp -l:lock 0 Total time: 216 /kernel/filesystems/nfs/connectathon/rhel3-nfs result: PASS metric: 216 Log: /tmp/tmp.Bn3166 /kernel/filesystems/nfs/connectathon/sol10-nfs/nfsvers=2_udp/base result: PASS metric: 0 Log: /tmp/tmp.Bn3166 DMesg: /tmp/dmesg.log log moved to: '/tmp/tmp.A12626' /kernel/filesystems/nfs/connectathon/sol10-nfs/nfsvers=2_udp/general result: PASS metric: 0 Log: /tmp/tmp.Bn3166 DMesg: /tmp/dmesg.log log moved to: '/tmp/tmp.m13160' eth0: Detected Tx Unit Hang: TDH <cb> TDT <cc> next_to_use <cc> next_to_clean <ca> buffer_info[next_to_clean]: time_stamp <1000eb8b4> next_to_watch <cb> jiffies <1000ec031> next_to_watch.status <0> eth0: Detected Tx Unit Hang: TDH <cb> TDT <cc> next_to_use <cc> next_to_clean <ca> buffer_info[next_to_clean]: time_stamp <1000eb8b4> next_to_watch <cb> jiffies <1000ec865> next_to_watch.status <0> nfs: server sol10-nfs not responding, still trying eth0: Detected Tx Unit Hang: TDH <cb> TDT <cc> next_to_use <cc> next_to_clean <ca> buffer_info[next_to_clean]: time_stamp <1000eb8b4> next_to_watch <cb> jiffies <1000ed035> next_to_watch.status <0> eth0: Detected Tx Unit Hang: TDH <cb> TDT <cc> next_to_use <cc> next_to_clean <ca> buffer_info[next_to_clean]: time_stamp <1000eb8b4> next_to_watch <cb> jiffies <1000ed7a1> next_to_watch.status <0> eth0: Hardware Error slab error in verify_redzone_free(): cache `size-2048': memory outside object was overwritten Call Trace: [<ffffffff80033336>] cache_free_debugcheck+0x106/0x217 [<ffffffff8000b936>] kfree+0xcc/0x25d [<ffffffff8002a034>] __kfree_skb+0x11/0x1a [<ffffffff88202cdd>] :e1000e:e1000_clean_rx_ring+0xde/0x1b6 [<ffffffff88204cab>] :e1000e:e1000_reset_task+0x0/0xc [<ffffffff882030dc>] :e1000e:e1000e_reinit_locked+0x3d/0x50 [<ffffffff80050091>] run_workqueue+0x9a/0xf4 [<ffffffff8004c883>] worker_thread+0x0/0x122 [<ffffffff8004c973>] worker_thread+0xf0/0x122 [<ffffffff80090692>] default_wake_function+0x0/0xe [<ffffffff80034a48>] kthread+0xfe/0x132 [<ffffffff80067fb9>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff80061079>] child_rip+0xa/0x11 [<ffffffff800688bd>] _spin_unlock_irq+0x24/0x27 [<ffffffff800606a8>] restore_args+0x0/0x30 [<ffffffff8003494a>] kthread+0x0/0x132 [<ffffffff8006106f>] child_rip+0x0/0x11 ffff81012e424918: redzone 1:0x170ff601, redzone 2:0x170fc2a5. slab error in verify_redzone_free(): cache `size-2048': memory outside object was overwritten Call Trace: [<ffffffff80033336>] cache_free_debugcheck+0x106/0x217 [<ffffffff8000b936>] kfree+0xcc/0x25d [<ffffffff8002a034>] __kfree_skb+0x11/0x1a [<ffffffff88202cdd>] :e1000e:e1000_clean_rx_ring+0xde/0x1b6 [<ffffffff88204cab>] :e1000e:e1000_reset_task+0x0/0xc [<ffffffff882030dc>] :e1000e:e1000e_reinit_locked+0x3d/0x50 [<ffffffff80050091>] run_workqueue+0x9a/0xf4 [<ffffffff8004c883>] worker_thread+0x0/0x122 [<ffffffff8004c973>] worker_thread+0xf0/0x122 [<ffffffff80090692>] default_wake_function+0x0/0xe [<ffffffff80034a48>] kthread+0xfe/0x132 [<ffffffff80067fb9>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff80061079>] child_rip+0xa/0x11 [<ffffffff800688bd>] _spin_unlock_irq+0x24/0x27 [<ffffffff800606a8>] restore_args+0x0/0x30 [<ffffffff8003494a>] kthread+0x0/0x132 [<ffffffff8006106f>] child_rip+0x0/0x11 ffff81008d750a98: redzone 1:0x170f0000, redzone 2:0x170fc2a5. Unable to handle kernel paging request at 0000000002000000 RIP: [<ffffffff80233f97>] skb_drop_list+0xb/0x22 PGD 12b0a8067 PUD 12aca8067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 0 Modules linked in: nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device sr_mod snd_pcm_oss cdrom snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep sg parport_serial snd e1000e i2c_i801 i2c_core pcspkr shpchp parport_pc parport soundcore dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 14, comm: events/0 Not tainted 2.6.18-159.el5debug #1 RIP: 0010:[<ffffffff80233f97>] [<ffffffff80233f97>] skb_drop_list+0xb/0x22 RSP: 0018:ffff810137897db0 EFLAGS: 00010206 RAX: 0000000002000000 RBX: ffff810134a3c4f0 RCX: 0000000000000002 RDX: ffff810131237800 RSI: 000000011ebab000 RDI: ffff81011ebaab18 RBP: 0000000000000000 R08: ffff81011ebac000 R09: 0000000000000000 R10: ffff810137a368a0 R11: 00000000000000f8 R12: ffff8101338b2a60 R13: ffff810137a368a0 R14: ffff8101342e0680 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff80433000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000002000000 CR3: 000000012ad54000 CR4: 00000000000006e0 Process events/0 (pid: 14, threadinfo ffff810137896000, task ffff810137894440) Stack: ffff810134a3c4f0 ffffffff80234033 ffff810131237878 ffff810134a3c4f0 ffffc200000bfa50 ffffffff8002a034 ffff8101326cfd48 ffffffff88202cdd 0000000000000042 ffff8101342e0680 ffff8101342e0718 ffff8101375a4c78 Call Trace: [<ffffffff80234033>] skb_release_data+0x85/0x99 [<ffffffff8002a034>] __kfree_skb+0x11/0x1a [<ffffffff88202cdd>] :e1000e:e1000_clean_rx_ring+0xde/0x1b6 [<ffffffff88204cab>] :e1000e:e1000_reset_task+0x0/0xc [<ffffffff882030dc>] :e1000e:e1000e_reinit_locked+0x3d/0x50 [<ffffffff80050091>] run_workqueue+0x9a/0xf4 [<ffffffff8004c883>] worker_thread+0x0/0x122 [<ffffffff8004c973>] worker_thread+0xf0/0x122 [<ffffffff80090692>] default_wake_function+0x0/0xe [<ffffffff80034a48>] kthread+0xfe/0x132 [<ffffffff80067fb9>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff80061079>] child_rip+0xa/0x11 [<ffffffff800688bd>] _spin_unlock_irq+0x24/0x27 [<ffffffff800606a8>] restore_args+0x0/0x30 [<ffffffff8003494a>] kthread+0x0/0x132 [<ffffffff8006106f>] child_rip+0x0/0x11 Code: 48 8b 18 48 89 c7 e8 65 ff ff ff 48 85 db 74 05 48 89 d8 eb RIP [<ffffffff80233f97>] skb_drop_list+0xb/0x22 RSP <ffff810137897db0> CR2: 0000000002000000 <0>Kernel panic - not syncing: Fatal exception Moving this entry to RHEL 5.5. Both the SF e1000e driver and the 2.6.30-rc4 kernel exhibit the same problem. When the problem is fixed upstream we can review a backport for inclusion in RHEL 5.5. Failure on 2.6.31-rc4: log moved to: '/tmp/tmp.W17147' BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8108eaa0>] put_page+0x4/0xca PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:ff/0000:ff:02.3/irq CPU 1 Modules linked in: nfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth rfkill lockd sunrpc ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath sbs sbshc battery acpi_memhotplug ac lp snd_hda_codec_intelhdmi snd_hda_codec_realtek snd_hda_intel sg snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device sr_mod snd_pcm_oss cdrom video snd_mixer_oss rtc_cmos output rtc_core snd_pcm rtc_lib snd_timer button snd parport_serial parport_pc parport i2c_i801 e1000e soundcore i2c_core snd_page_alloc shpchp pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 0, comm: swapper Not tainted 2.6.31-rc4 #6 To be filled by O.E.M. RIP: 0010:[<ffffffff8108eaa0>] [<ffffffff8108eaa0>] put_page+0x4/0xca RSP: 0018:ffff880028069d60 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88012a83e480 RCX: 0000000000000011 RDX: ffff8800af130640 RSI: 00000000640000e0 RDI: 0000000000000000 RBP: 0000000000000001 R08: ffffffff81b3ce00 R09: 0000000000000002 R10: 0000000000000000 R11: ffffffff81272de9 R12: ffffc90014717af0 R13: ffff88012a83e480 R14: ffff88012f916460 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880028066000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880137b1a000, task ffff880137b19560) Stack: ffff88012a83e480 0000000000000001 ffffc90014717af0 ffffffff81249720 <0> ffff88012a83e480 ffff88012a83e480 ffff88012a83e480 ffffffff81249474 <0> ffff8800af130030 ffffffff81272ad4 0000000000000000 000000000000018f Call Trace: <IRQ> [<ffffffff81249720>] ? skb_release_data+0x65/0xaa [<ffffffff81249474>] ? __kfree_skb+0x9/0x6f [<ffffffff81272ad4>] ? ip_rcv_finish+0x3a0/0x3b0 [<ffffffffa0152ab3>] ? e1000_clean_rx_irq+0x22e/0x2cd [e1000e] [<ffffffffa01518cf>] ? e1000_clean+0x6e/0x21d [e1000e] [<ffffffff81253778>] ? net_rx_action+0xa9/0x17d [<ffffffff8104153e>] ? __do_softirq+0xc5/0x182 [<ffffffff8100ca3c>] ? call_softirq+0x1c/0x28 [<ffffffff8100ddd2>] ? do_softirq+0x2c/0x68 [<ffffffff8100d452>] ? do_IRQ+0xa0/0xb6 [<ffffffff8100c2d3>] ? ret_from_intr+0x0/0xa <EOI> [<ffffffff8100c42e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff811a852f>] ? acpi_safe_halt+0x27/0x39 [<ffffffff811a861f>] ? acpi_idle_enter_c1+0x6f/0xc7 [<ffffffff81231396>] ? ladder_select_state+0x2b/0x135 [<ffffffff812309e5>] ? cpuidle_idle_call+0x7f/0xbe [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e Closing as CURRENTRELEASE as RHEL5.7 seems to be working just fine. |