Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. wc -l /proc/net/ip_conntrack Actual results: Sep 17 09:22:21 server014 kernel: ip_conntrack: falling back to vmalloc. Sep 17 09:23:37 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out Sep 17 09:23:44 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out Sep 17 09:23:50 server014 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [ksoftirqd/3:12] Sep 17 09:23:50 server014 kernel: Sep 17 09:23:50 server014 kernel: Pid: 12, comm: ksoftirqd/3 Sep 17 09:23:50 server014 kernel: EIP: 0060:[<c060d5a9>] CPU: 3 Sep 17 09:23:50 server014 kernel: EIP is at __write_lock_failed+0x9/0x20 Sep 17 09:23:50 server014 kernel: EFLAGS: 00000287 Tainted: G (2.6.18-128.7.1.el5PAE #1) Sep 17 09:23:50 server014 kernel: EAX: f8a2f4c0 EBX: f8a2f4c0 ECX: eae38200 EDX: 00000003 Sep 17 09:23:50 server014 kernel: ESI: f574e69c EDI: c072cf00 EBP: 0000ea60 DS: 007b ES: 007b Sep 17 09:23:50 server014 kernel: CR0: 8005003b CR2: 09676918 CR3: 379c4600 CR4: 000006f0 Sep 17 09:23:50 server014 kernel: [<c060f7c6>] _write_lock_bh+0x18/0x1a Sep 17 09:23:50 server014 kernel: [<f8a24be1>] __ip_ct_refresh_acct+0x26/0x123 [ip_conntrack] Sep 17 09:23:50 server014 kernel: [<f8a2726d>] tcp_packet+0x977/0x9b2 [ip_conntrack] Sep 17 09:23:50 server014 kernel: [<c05b5b91>] dev_queue_xmit+0x22e/0x24d Sep 17 09:23:50 server014 kernel: [<f8a2476e>] __ip_conntrack_find+0xd/0xb8 [ip_conntrack] Sep 17 09:23:50 server014 kernel: [<f8a26145>] ip_conntrack_in+0x340/0x424 [ip_conntrack] Sep 17 09:23:50 server014 kernel: [<c05cbb01>] ip_route_input+0xff/0xc59 Sep 17 09:23:50 server014 kernel: [<c05c9084>] nf_iterate+0x30/0x61 Sep 17 09:23:50 server014 kernel: [<c05d05f0>] dst_output+0x0/0x7 Sep 17 09:23:50 server014 kernel: [<c05c91aa>] nf_hook_slow+0x3a/0x90 Sep 17 09:23:50 server014 kernel: [<c05d05f0>] dst_output+0x0/0x7 Sep 17 09:23:50 server014 kernel: [<c05d2b85>] ip_build_and_send_pkt+0x1b7/0x1db Sep 17 09:23:50 server014 kernel: [<c05d05f0>] dst_output+0x0/0x7 Sep 17 09:23:50 server014 kernel: [<c05e3ceb>] tcp_v4_send_synack+0x8f/0xdd Sep 17 09:23:50 server014 kernel: [<c05e5cb3>] tcp_v4_conn_request+0x395/0x3e0 Sep 17 09:23:50 server014 kernel: [<c042c004>] do_timer+0x5cd/0x8b5 Sep 17 09:23:50 server014 kernel: [<c05de435>] tcp_rcv_state_process+0x5d/0xcb0 Sep 17 09:23:50 server014 kernel: [<c05e4957>] tcp_v4_do_rcv+0x26e/0x2b6 Sep 17 09:23:50 server014 kernel: [<c05e6bd5>] tcp_v4_rcv+0x8ba/0x90d Sep 17 09:23:50 server014 kernel: [<c05cdf60>] ip_local_deliver_finish+0x0/0x197 Sep 17 09:23:50 server014 kernel: [<c05ce7b1>] ip_local_deliver+0x159/0x204 Sep 17 09:23:50 server014 kernel: [<c05ce61e>] ip_rcv+0x46f/0x4a9 Sep 17 09:23:50 server014 kernel: [<c05b3f27>] netif_receive_skb+0x31a/0x33e Sep 17 09:23:50 server014 kernel: [<f8a42096>] bnx2_poll+0xd37/0xf73 [bnx2] Sep 17 09:23:50 server014 kernel: [<c05b8082>] neigh_connected_output+0x73/0x86 Sep 17 09:23:50 server014 kernel: [<c05e04c3>] tcp_transmit_skb+0x5c7/0x5f5 Sep 17 09:23:50 server014 kernel: [<c046d801>] free_block+0xc4/0xe3 Sep 17 09:23:50 server014 kernel: [<c04e90c6>] memmove+0xe/0x22 Sep 17 09:23:50 server014 kernel: [<c05b5880>] net_rx_action+0x92/0x175 Sep 17 09:23:50 server014 kernel: [<c0429047>] __do_softirq+0x87/0x114 Sep 17 09:23:50 server014 kernel: [<c04073d7>] do_softirq+0x52/0x9c Sep 17 09:23:50 server014 kernel: [<c0428ee7>] ksoftirqd+0x0/0xd9 Sep 17 09:23:50 server014 kernel: [<c0428f56>] ksoftirqd+0x6f/0xd9 Sep 17 09:23:50 server014 kernel: [<c04349e1>] kthread+0xc0/0xeb Sep 17 09:23:50 server014 kernel: [<c0434921>] kthread+0x0/0xeb Sep 17 09:23:50 server014 kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10 Sep 17 09:23:50 server014 kernel: ======================= Sep 17 09:24:00 server014 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [ksoftirqd/3:12] Sep 17 09:24:00 server014 kernel: Pid: 12, comm: ksoftirqd/3 Sep 17 09:24:00 server014 kernel: EIP: 0060:[<c060d5a9>] CPU: 3 Sep 17 09:24:00 server014 kernel: EIP is at __write_lock_failed+0x9/0x20 Sep 17 09:24:00 server014 kernel: EFLAGS: 00000287 Tainted: G (2.6.18-128.7.1.el5PAE #1) Sep 17 09:24:00 server014 kernel: EAX: f8a2f4c0 EBX: f8a2f4c0 ECX: ee762954 EDX: c0736000 Sep 17 09:24:00 server014 kernel: ESI: c99da000 EDI: f8a24e24 EBP: 00000007 DS: 007b ES: 007b Sep 17 09:24:00 server014 kernel: CR0: 8005003b CR2: 09676918 CR3: 379c4600 CR4: 000006f0 Sep 17 09:24:00 server014 kernel: [<c060f7c6>] _write_lock_bh+0x18/0x1a Sep 17 09:24:00 server014 kernel: [<f8a24e31>] death_by_timeout+0xd/0x63 [ip_conntrack] Sep 17 09:24:00 server014 kernel: [<c042c5b1>] run_timer_softirq+0xfb/0x151 Sep 17 09:24:00 server014 kernel: [<c0429047>] __do_softirq+0x87/0x114 Sep 17 09:24:00 server014 kernel: [<c04073d7>] do_softirq+0x52/0x9c Sep 17 09:24:00 server014 kernel: [<c0428ee7>] ksoftirqd+0x0/0xd9 Sep 17 09:24:00 server014 kernel: [<c0428f56>] ksoftirqd+0x6f/0xd9 Sep 17 09:24:00 server014 kernel: [<c04349e1>] kthread+0xc0/0xeb Sep 17 09:24:00 server014 kernel: [<c0434921>] kthread+0x0/0xeb Sep 17 09:24:00 server014 kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10 Sep 17 09:24:00 server014 kernel: ======================= Sep 17 09:24:01 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out Sep 17 09:24:08 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out Sep 17 09:24:10 server014 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [ksoftirqd/3:12] Sep 17 09:24:10 server014 kernel: Sep 17 09:24:10 server014 kernel: Pid: 12, comm: ksoftirqd/3 Sep 17 09:24:10 server014 kernel: EIP: 0060:[<c060d5a9>] CPU: 3 Sep 17 09:24:10 server014 kernel: EIP is at __write_lock_failed+0x9/0x20 Sep 17 09:24:10 server014 kernel: EFLAGS: 00000287 Tainted: G (2.6.18-128.7.1.el5PAE #1) Sep 17 09:24:10 server014 kernel: EAX: f8a2f4c0 EBX: f8a2f4c0 ECX: ed403ce4 EDX: c0736000 Sep 17 09:24:10 server014 kernel: ESI: c99da000 EDI: f8a24e24 EBP: 00000007 DS: 007b ES: 007b Sep 17 09:24:10 server014 kernel: CR0: 8005003b CR2: 09676918 CR3: 379c4600 CR4: 000006f0 Sep 17 09:24:10 server014 kernel: [<c060f7c6>] _write_lock_bh+0x18/0x1a Sep 17 09:24:10 server014 kernel: [<f8a24e31>] death_by_timeout+0xd/0x63 [ip_conntrack] Sep 17 09:24:10 server014 kernel: [<c042c5b1>] run_timer_softirq+0xfb/0x151 Sep 17 09:24:10 server014 kernel: [<c0429047>] __do_softirq+0x87/0x114 Sep 17 09:24:10 server014 kernel: [<c04073d7>] do_softirq+0x52/0x9c Sep 17 09:24:10 server014 kernel: [<c0428ee7>] ksoftirqd+0x0/0xd9 Sep 17 09:24:10 server014 kernel: [<c0428f56>] ksoftirqd+0x6f/0xd9 Sep 17 09:24:10 server014 kernel: [<c04349e1>] kthread+0xc0/0xeb Sep 17 09:24:10 server014 kernel: [<c0434921>] kthread+0x0/0xeb Sep 17 09:24:10 server014 kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10 Sep 17 09:24:10 server014 kernel: ======================= Sep 17 09:24:19 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out Sep 17 09:24:21 server014 kernel: bnx2: eth0 NIC Copper Link is Down Sep 17 09:24:21 server014 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Sep 17 09:24:21 server014 kernel: bonding: bond0: now running without any active interface ! Sep 17 09:24:25 server014 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex Sep 17 09:24:25 server014 kernel: bonding: bond0: link status definitely up for interface eth0. Sep 17 09:24:25 server014 kernel: bonding: bond0: making interface eth0 the new active one. Sep 17 09:24:25 server014 kernel: bonding: bond0: first active interface up! Expected results: Additional info: Linux server014 2.6.18-128.7.1.el5PAE #1 SMP Mon Aug 24 08:59:38 EDT 2009 i686 i686 i386 GNU/Linux
Sorry this has taken so long to look at. Can you answer these three questions for me? (1) Is this still a problem with the latest kernels (I suspect it still is)? (2) Also, what did you type before typing: # wc -l /proc/net/ip_conntrack My guess is that you may have performed a 'service iptables [re]start' on another console (and it may not have comepleted) as the message right before it: Sep 17 09:22:21 server014 kernel: ip_conntrack: falling back to vmalloc. is only called when initializing the ipv4 netfilter modules. Another interesting point is this message: Sep 17 09:22:21 server014 kernel: ip_conntrack: falling back to vmalloc. will only be printed when a call to get_free_pages fails (so this system is clearly under memory pressure). If vmalloc is also blocked waiting for memory to become available, this softlockup is understandable since the hash-table allocation holds the same lock used by the code where the softlockups appear to be happening. (3) What do your iptables rules look like. Can you attach them?