Reported by a customer; Escalated by GSS. list_del corruption. next->prev should be ffff8102bcd2c0d0, but was ffff8102bcabe0d0 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at lib/list_debug.c:70 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:1c.0/0000:0e:00.1/irq CPU 2 Modules linked in: md5 sctp ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sr_mod cdrom igb 8021q i2c_i801 e1000e dca i2c_core sg pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-194.26.1.el5 #1 RIP: 0010:[<ffffffff80154438>] [<ffffffff80154438>] list_del+0x48/0x71 RSP: 0018:ffff81010631bb20 EFLAGS: 00010286 RAX: 0000000000000058 RBX: ffff8102bcd2c0d0 RCX: ffffffff80311da8 RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 RBP: ffff810330f60080 R08: ffffffff80311da8 R09: 0000000000000032 R10: ffff81010631b7c0 R11: 0000000000000000 R12: ffff810330f60080 R13: ffff81033f56f200 R14: 0000000000000000 R15: ffff81033f56f200 FS: 0000000000000000(0000) GS:ffff8101062991c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000f5d54c90 CR3: 00000002cf1ae000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff810106314000, task ffff8101c55640c0) Stack: ffff8102bcd2c000 ffffffff885283b8 ffff81010631bbd0 ffff8102bcd2c000 ffff810330f60080 ffffffff885256ae ffff81033fac1c00 00000002800547f1 0000000300000001 0000000000000004 0000000000000001 0000000000000003 Call Trace: <IRQ> [<ffffffff885283b8>] :sctp:sctp_association_free+0x17/0x118 [<ffffffff885256ae>] :sctp:sctp_do_sm+0x11d/0xd19 [<ffffffff88534e99>] :sctp:sctp_icmp_proto_unreachable+0x2e/0x33 [<ffffffff88535054>] :sctp:sctp_v4_err+0x11f/0x182 [<ffffffff80260fda>] icmp_rcv+0x135/0x163 [<ffffffff800347f5>] ip_local_deliver+0x19d/0x263 [<ffffffff80035963>] ip_rcv+0x539/0x57c [<ffffffff80020aca>] netif_receive_skb+0x470/0x49f [<ffffffff88243425>] :e1000e:e1000_receive_skb+0x1b5/0x1d6 [<ffffffff88247c08>] :e1000e:e1000_clean_rx_irq+0x27a/0x321 [<ffffffff88245c1d>] :e1000e:e1000_clean+0x7c/0x29a [<ffffffff8000c91b>] net_rx_action+0xac/0x1e0 [<ffffffff88245aad>] :e1000e:e1000_intr_msi+0xd6/0xe0 [<ffffffff80012443>] __do_softirq+0x89/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006cb8a>] do_softirq+0x2c/0x85 [<ffffffff8006ca12>] do_IRQ+0xec/0xf5 [<ffffffff8005d615>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8019d6d4>] acpi_processor_idle_simple+0x17d/0x30e [<ffffffff8019d5c3>] acpi_processor_idle_simple+0x6c/0x30e [<ffffffff8019d557>] acpi_processor_idle_simple+0x0/0x30e [<ffffffff8019d557>] acpi_processor_idle_simple+0x0/0x30e [<ffffffff8004923a>] cpu_idle+0x95/0xb8 [<ffffffff80077991>] start_secondary+0x498/0x4a7 Code: 0f 0b 68 93 bb 2b 80 c2 46 00 48 8b 13 48 8b 43 08 48 89 42 RIP [<ffffffff80154438>] list_del+0x48/0x71 RSP <ffff81010631bb20> We crashed in a network interrupt. Interested to know if the network up/down messages are recent (like seconds before the crash) or if they happened a while ago. Will need /var/log/messages to determine. The problem is list corruption. The error checker that ensures the next element points back to this element failed, hence the panic.
Upstream commit: http://git.kernel.org/linus/50b5d6ad63821cea324a5a7a19854d4de1a0a819 commit 50b5d6ad63821cea324a5a7a19854d4de1a0a819 Author: Vlad Yasevich <vladislav.yasevich> Date: Thu May 6 00:56:07 2010 -0700 sctp: Fix a race between ICMP protocol unreachable and connect() ICMP protocol unreachable handling completely disregarded the fact that the user may have locked the socket. It proceeded to destroy the association, even though the user may have held the lock and had a ref on the association. [...] This was because the sctp_wait_for_connect() would aqcure the socket lock and then proceed to release the last reference count on the association, thus cause the fully destruction path to finish freeing the socket. The simplest solution is to start a very short timer in case the socket is owned by user. When the timer expires, we can do some verification and be able to do the release properly. Signed-off-by: Vlad Yasevich <vladislav.yasevich> Signed-off-by: David S. Miller <davem>
Statement: The Linux kernel as shipped with Red Hat Enterprise Linux 4 did not include upstream commit history:5aabd1fe268e850c2e93048a5ccc5eb6970ac49c, and therefore is not affected by this issue. This has been addressed in Red Hat Enterprise Linux 5, 6 and Red Hat Enterprise MRG via http://rhn.redhat.com/errata/RHSA-2011-0163.html, https://rhn.redhat.com/errata/RHSA-2011-0421.html and https://rhn.redhat.com/errata/RHSA-2011-1253.html.
This issue has been addressed in following products: Red Hat Enterprise Linux 5.6.Z Via RHSA-2011:0163 https://rhn.redhat.com/errata/RHSA-2011-0163.html
This issue has been addressed in following products: Red Hat Enterprise Linux 6 Via RHSA-2011:0421 https://rhn.redhat.com/errata/RHSA-2011-0421.html
This issue has been addressed in following products: MRG for RHEL-6 v.2 Via RHSA-2011:1253 https://rhn.redhat.com/errata/RHSA-2011-1253.html