Bug 523925 - listing of ip_conntrack table ends up with BUG: soft lockup - CPU#3 stuck for 10s! [ksoftirqd/3:12] and broken network connection
Summary: listing of ip_conntrack table ends up with BUG: soft lockup - CPU#3 stuck for...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: i686
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Andy Gospodarek
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-17 08:49 UTC by Tomas Brandysky
Modified: 2014-06-29 23:01 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-19 02:30:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Tomas Brandysky 2009-09-17 08:49:42 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:

1. wc -l /proc/net/ip_conntrack

  
Actual results:

Sep 17 09:22:21 server014 kernel: ip_conntrack: falling back to vmalloc.
Sep 17 09:23:37 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Sep 17 09:23:44 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Sep 17 09:23:50 server014 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [ksoftirqd/3:12]
Sep 17 09:23:50 server014 kernel:
Sep 17 09:23:50 server014 kernel: Pid: 12, comm:          ksoftirqd/3
Sep 17 09:23:50 server014 kernel: EIP: 0060:[<c060d5a9>] CPU: 3
Sep 17 09:23:50 server014 kernel: EIP is at __write_lock_failed+0x9/0x20
Sep 17 09:23:50 server014 kernel:  EFLAGS: 00000287    Tainted: G       (2.6.18-128.7.1.el5PAE #1)
Sep 17 09:23:50 server014 kernel: EAX: f8a2f4c0 EBX: f8a2f4c0 ECX: eae38200 EDX: 00000003
Sep 17 09:23:50 server014 kernel: ESI: f574e69c EDI: c072cf00 EBP: 0000ea60 DS: 007b ES: 007b
Sep 17 09:23:50 server014 kernel: CR0: 8005003b CR2: 09676918 CR3: 379c4600 CR4: 000006f0
Sep 17 09:23:50 server014 kernel:  [<c060f7c6>] _write_lock_bh+0x18/0x1a
Sep 17 09:23:50 server014 kernel:  [<f8a24be1>] __ip_ct_refresh_acct+0x26/0x123 [ip_conntrack]
Sep 17 09:23:50 server014 kernel:  [<f8a2726d>] tcp_packet+0x977/0x9b2 [ip_conntrack]
Sep 17 09:23:50 server014 kernel:  [<c05b5b91>] dev_queue_xmit+0x22e/0x24d
Sep 17 09:23:50 server014 kernel:  [<f8a2476e>] __ip_conntrack_find+0xd/0xb8 [ip_conntrack]
Sep 17 09:23:50 server014 kernel:  [<f8a26145>] ip_conntrack_in+0x340/0x424 [ip_conntrack]
Sep 17 09:23:50 server014 kernel:  [<c05cbb01>] ip_route_input+0xff/0xc59
Sep 17 09:23:50 server014 kernel:  [<c05c9084>] nf_iterate+0x30/0x61
Sep 17 09:23:50 server014 kernel:  [<c05d05f0>] dst_output+0x0/0x7
Sep 17 09:23:50 server014 kernel:  [<c05c91aa>] nf_hook_slow+0x3a/0x90
Sep 17 09:23:50 server014 kernel:  [<c05d05f0>] dst_output+0x0/0x7
Sep 17 09:23:50 server014 kernel:  [<c05d2b85>] ip_build_and_send_pkt+0x1b7/0x1db
Sep 17 09:23:50 server014 kernel:  [<c05d05f0>] dst_output+0x0/0x7
Sep 17 09:23:50 server014 kernel:  [<c05e3ceb>] tcp_v4_send_synack+0x8f/0xdd
Sep 17 09:23:50 server014 kernel:  [<c05e5cb3>] tcp_v4_conn_request+0x395/0x3e0
Sep 17 09:23:50 server014 kernel:  [<c042c004>] do_timer+0x5cd/0x8b5
Sep 17 09:23:50 server014 kernel:  [<c05de435>] tcp_rcv_state_process+0x5d/0xcb0
Sep 17 09:23:50 server014 kernel:  [<c05e4957>] tcp_v4_do_rcv+0x26e/0x2b6
Sep 17 09:23:50 server014 kernel:  [<c05e6bd5>] tcp_v4_rcv+0x8ba/0x90d
Sep 17 09:23:50 server014 kernel:  [<c05cdf60>] ip_local_deliver_finish+0x0/0x197
Sep 17 09:23:50 server014 kernel:  [<c05ce7b1>] ip_local_deliver+0x159/0x204
Sep 17 09:23:50 server014 kernel:  [<c05ce61e>] ip_rcv+0x46f/0x4a9
Sep 17 09:23:50 server014 kernel:  [<c05b3f27>] netif_receive_skb+0x31a/0x33e
Sep 17 09:23:50 server014 kernel:  [<f8a42096>] bnx2_poll+0xd37/0xf73 [bnx2]
Sep 17 09:23:50 server014 kernel:  [<c05b8082>] neigh_connected_output+0x73/0x86
Sep 17 09:23:50 server014 kernel:  [<c05e04c3>] tcp_transmit_skb+0x5c7/0x5f5
Sep 17 09:23:50 server014 kernel:  [<c046d801>] free_block+0xc4/0xe3
Sep 17 09:23:50 server014 kernel:  [<c04e90c6>] memmove+0xe/0x22
Sep 17 09:23:50 server014 kernel:  [<c05b5880>] net_rx_action+0x92/0x175
Sep 17 09:23:50 server014 kernel:  [<c0429047>] __do_softirq+0x87/0x114
Sep 17 09:23:50 server014 kernel:  [<c04073d7>] do_softirq+0x52/0x9c
Sep 17 09:23:50 server014 kernel:  [<c0428ee7>] ksoftirqd+0x0/0xd9
Sep 17 09:23:50 server014 kernel:  [<c0428f56>] ksoftirqd+0x6f/0xd9
Sep 17 09:23:50 server014 kernel:  [<c04349e1>] kthread+0xc0/0xeb
Sep 17 09:23:50 server014 kernel:  [<c0434921>] kthread+0x0/0xeb
Sep 17 09:23:50 server014 kernel:  [<c0405c53>] kernel_thread_helper+0x7/0x10
Sep 17 09:23:50 server014 kernel:  =======================
Sep 17 09:24:00 server014 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [ksoftirqd/3:12]
Sep 17 09:24:00 server014 kernel: Pid: 12, comm:          ksoftirqd/3
Sep 17 09:24:00 server014 kernel: EIP: 0060:[<c060d5a9>] CPU: 3
Sep 17 09:24:00 server014 kernel: EIP is at __write_lock_failed+0x9/0x20
Sep 17 09:24:00 server014 kernel:  EFLAGS: 00000287    Tainted: G       (2.6.18-128.7.1.el5PAE #1)
Sep 17 09:24:00 server014 kernel: EAX: f8a2f4c0 EBX: f8a2f4c0 ECX: ee762954 EDX: c0736000
Sep 17 09:24:00 server014 kernel: ESI: c99da000 EDI: f8a24e24 EBP: 00000007 DS: 007b ES: 007b
Sep 17 09:24:00 server014 kernel: CR0: 8005003b CR2: 09676918 CR3: 379c4600 CR4: 000006f0
Sep 17 09:24:00 server014 kernel:  [<c060f7c6>] _write_lock_bh+0x18/0x1a
Sep 17 09:24:00 server014 kernel:  [<f8a24e31>] death_by_timeout+0xd/0x63 [ip_conntrack]
Sep 17 09:24:00 server014 kernel:  [<c042c5b1>] run_timer_softirq+0xfb/0x151
Sep 17 09:24:00 server014 kernel:  [<c0429047>] __do_softirq+0x87/0x114
Sep 17 09:24:00 server014 kernel:  [<c04073d7>] do_softirq+0x52/0x9c
Sep 17 09:24:00 server014 kernel:  [<c0428ee7>] ksoftirqd+0x0/0xd9
Sep 17 09:24:00 server014 kernel:  [<c0428f56>] ksoftirqd+0x6f/0xd9
Sep 17 09:24:00 server014 kernel:  [<c04349e1>] kthread+0xc0/0xeb
Sep 17 09:24:00 server014 kernel:  [<c0434921>] kthread+0x0/0xeb
Sep 17 09:24:00 server014 kernel:  [<c0405c53>] kernel_thread_helper+0x7/0x10
Sep 17 09:24:00 server014 kernel:  =======================
Sep 17 09:24:01 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Sep 17 09:24:08 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Sep 17 09:24:10 server014 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [ksoftirqd/3:12]
Sep 17 09:24:10 server014 kernel:
Sep 17 09:24:10 server014 kernel: Pid: 12, comm:          ksoftirqd/3
Sep 17 09:24:10 server014 kernel: EIP: 0060:[<c060d5a9>] CPU: 3
Sep 17 09:24:10 server014 kernel: EIP is at __write_lock_failed+0x9/0x20
Sep 17 09:24:10 server014 kernel:  EFLAGS: 00000287    Tainted: G       (2.6.18-128.7.1.el5PAE #1)
Sep 17 09:24:10 server014 kernel: EAX: f8a2f4c0 EBX: f8a2f4c0 ECX: ed403ce4 EDX: c0736000
Sep 17 09:24:10 server014 kernel: ESI: c99da000 EDI: f8a24e24 EBP: 00000007 DS: 007b ES: 007b
Sep 17 09:24:10 server014 kernel: CR0: 8005003b CR2: 09676918 CR3: 379c4600 CR4: 000006f0
Sep 17 09:24:10 server014 kernel:  [<c060f7c6>] _write_lock_bh+0x18/0x1a
Sep 17 09:24:10 server014 kernel:  [<f8a24e31>] death_by_timeout+0xd/0x63 [ip_conntrack]
Sep 17 09:24:10 server014 kernel:  [<c042c5b1>] run_timer_softirq+0xfb/0x151
Sep 17 09:24:10 server014 kernel:  [<c0429047>] __do_softirq+0x87/0x114
Sep 17 09:24:10 server014 kernel:  [<c04073d7>] do_softirq+0x52/0x9c
Sep 17 09:24:10 server014 kernel:  [<c0428ee7>] ksoftirqd+0x0/0xd9
Sep 17 09:24:10 server014 kernel:  [<c0428f56>] ksoftirqd+0x6f/0xd9
Sep 17 09:24:10 server014 kernel:  [<c04349e1>] kthread+0xc0/0xeb
Sep 17 09:24:10 server014 kernel:  [<c0434921>] kthread+0x0/0xeb
Sep 17 09:24:10 server014 kernel:  [<c0405c53>] kernel_thread_helper+0x7/0x10
Sep 17 09:24:10 server014 kernel:  =======================
Sep 17 09:24:19 server014 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Sep 17 09:24:21 server014 kernel: bnx2: eth0 NIC Copper Link is Down
Sep 17 09:24:21 server014 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Sep 17 09:24:21 server014 kernel: bonding: bond0: now running without any active interface !
Sep 17 09:24:25 server014 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex
Sep 17 09:24:25 server014 kernel: bonding: bond0: link status definitely up for interface eth0.
Sep 17 09:24:25 server014 kernel: bonding: bond0: making interface eth0 the new active one.
Sep 17 09:24:25 server014 kernel: bonding: bond0: first active interface up!



Expected results:


Additional info:

Linux server014 2.6.18-128.7.1.el5PAE #1 SMP Mon Aug 24 08:59:38 EDT 2009 i686 i686 i386 GNU/Linux

Comment 1 Andy Gospodarek 2010-05-25 16:53:00 UTC
Sorry this has taken so long to look at.  Can you answer these three questions for me?

(1) Is this still a problem with the latest kernels (I suspect it still is)?

(2) Also, what did you type before typing:

# wc -l /proc/net/ip_conntrack

My guess is that you may have performed a 'service iptables [re]start' on another console (and it may not have comepleted) as the message right before it:

Sep 17 09:22:21 server014 kernel: ip_conntrack: falling back to vmalloc.

is only called when initializing the ipv4 netfilter modules.

Another interesting point is this message:

Sep 17 09:22:21 server014 kernel: ip_conntrack: falling back to vmalloc.

will only be printed when a call to get_free_pages fails (so this system is clearly under memory pressure).  If vmalloc is also blocked waiting for memory to become available, this softlockup is understandable since the hash-table allocation holds the same lock used by the code where the softlockups appear to be happening.

(3) What do your iptables rules look like.  Can you attach them?


Note You need to log in before you can comment on or make changes to this bug.