Using the reproducer for CVE-2010-4158, BUG: soft lockup - CPU#2 stuck for 60s! [a.out:4362] CPU 2: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev ixgbe floppy bnx2 8021q ide_cd sr_mod i5000_edac serio_raw dca edac_mc tpm_tis cdrom tpm tpm_bios sg pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 4362, comm: a.out Not tainted 2.6.18-231.el5 #1 RIP: 0010:[<ffffffff80064bbf>] [<ffffffff80064bbf>] .text.lock.spinlock+0x5/0x30 RSP: 0018:ffff81010ebebd90 EFLAGS: 00000286 RAX: 0000000000000000 RBX: ffff81042bf57e80 RCX: 0000000000000002 RDX: 0000000000000036 RSI: 0000000000000003 RDI: ffff81042b56b0c0 RBP: ffff81010ebebd10 R08: ffff81010ebebc78 R09: 0000000000000000 R10: ffff81010ebebcf8 R11: 0000000000000048 R12: ffffffff8005dc8e R13: ffff81042b56b080 R14: ffffffff80078f66 R15: ffff81010ebebd10 FS: 00002b0ab82d06e0(0000) GS:ffff81010eb9ee40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b0ab804ce30 CR3: 0000000424dce000 CR4: 00000000000006e0 Call Trace: <IRQ> [<ffffffff8022db6a>] sock_queue_rcv_skb+0x56/0x16b [<ffffffff80261fd5>] __udp_queue_rcv_skb+0x9/0x3b [<ffffffff8005313f>] udp_queue_rcv_skb+0x283/0x2d6 [<ffffffff80052d74>] udp_rcv+0x3e5/0x52d [<ffffffff80046fbc>] try_to_wake_up+0x472/0x484 [<ffffffff80034950>] ip_local_deliver+0x19d/0x263 [<ffffffff80035aac>] ip_rcv+0x539/0x57c [<ffffffff80020b0f>] netif_receive_skb+0x470/0x49f [<ffffffff800a486c>] hrtimer_wakeup+0x1d/0x22 [<ffffffff800307d3>] process_backlog+0x89/0xe7 [<ffffffff8000c979>] net_rx_action+0xac/0x1b3 [<ffffffff8001245b>] __do_softirq+0x89/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 <EOI> [<ffffffff8006d5f5>] do_softirq+0x2c/0x7d [<ffffffff8002baf1>] local_bh_enable+0x88/0x99 [<ffffffff8002fa6c>] dev_queue_xmit+0x27e/0x2a2 [<ffffffff8003205b>] ip_output+0x2ae/0x2dd [<ffffffff80251dad>] ip_push_pending_frames+0x37d/0x45d [<ffffffff80262264>] udp_push_pending_frames+0x21e/0x243 [<ffffffff8005287d>] udp_sendmsg+0x4d8/0x5ea [<ffffffff80055229>] sock_sendmsg+0xf8/0x14a [<ffffffff800a2896>] autoremove_wake_function+0x0/0x2e [<ffffffff80008d56>] __handle_mm_fault+0x5f3/0x1039 [<ffffffff80062ff0>] thread_return+0x62/0xfe [<ffffffff8022aeac>] sys_sendto+0x11c/0x14f [<ffffffff8005a4c5>] hrtimer_cancel+0xc/0x16 [<ffffffff80063ce5>] do_nanosleep+0x47/0x70 [<ffffffff8005a3b2>] hrtimer_nanosleep+0x58/0x118 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 So the deadlock was happen as following: udp_queue_rcv_skb |_ bh_lock_sock(sk) |_ __udp_queue_rcv_skb |_ sock_queue_rcv_skb |_ sk_filter(sk, skb, 1) -> 1 means needlock |_ bh_lock_sock(sk) -> deadlock This is a regression introduced by rhel5 commit 6865201191, the upstream is okay because sk_filter was adopted to rcu protection, but rhel5 hasn't taken the sk_filter changes, so after we take commit 6865201191 in, we have a deadlock. Found with Dan's reproducer for CVE-2010-4158. Acknowledgements: Red Hat would like to thank Dan Rosenberg for reporting this issue.
Public mention for this: http://www.spinics.net/lists/netdev/msg146404.html
Statement: This issue did not affect the version of Linux kernel as shipped with Red Hat Enterprise Linux 4 as it did not backport the upstream commit 93821778 that introduced this. It did not affect the versions of Linux kernel as shipped with Red Hat Enterprise Linux 6 and Red Hat Enterprise MRG as they have backported the upstream commit fda9ef5d that addressed this. Future kernel update in Red Hat Enterprise Linux 5 may address this flaw.
This issue has been addressed in following products: Red Hat Enterprise Linux 5 Via RHSA-2011:0004 https://rhn.redhat.com/errata/RHSA-2011-0004.html