Red Hat Bugzilla – Bug 82416
OOPS - Frequently system lockup/crash under some load
Last modified: 2015-01-07 19:03:03 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Description of problem:
since months I am trying to debug a problem on my RedHat 7.3 machine using
The following post and some other bugreports finally brought me to the
conclusion that nss_ldap could be the cause:
I have exactly the same problem the user describes.
After some uptime and under some load the system locks up totally.
All ports and interfaces remain open, pings are possible but no kind of login
or connect is possible.
I don't run nscd which seems to caused some bugs lately.
Nearly the same issue was addressed some times at redhat's and padl's bugzilla
but this problems are all marked as solved and shouldn't occur in nss_ldap-189-
I allready exchanged my hardware. I have 2 similar set up machines running in
a cluster. The error occurs on both.
Maybe an issue:
I had no crashes for some weeks as I had removed the LDAP maps in my postfix
setup. Now these lookups are active and since then the problem seems to occur
again. Bevore the only app which looked up ldap was pam.
Thanks in advance
Version-Release number of selected component (if applicable):
I was able to reproduce the crash doing rsyncs over the 1000mbit nic.
I have got oops now and I have posted it to the kernel list also.
I tried the rawhide kernel 2.4.20 and the problem is the same.
2.4.20-2.25smp from RawHide
Doing a rsync from the crashing host _to_ another host over a 1000 Mbit 3com
The rsynced files include bigger files with about 1.5 gigs.
Below are the OOPS.
NMI Watchdog detected LOCKUP on CPU0, eip c02499ac, registers:
via686a eeprom lm80 i2c-proc i2c-isa i2c-viapro i2c-core tg3 eepro100 mii
ipt_LOG ipt_limit ipt_state ipt_REJECT iptable_nat ip_cona
EIP: 0060:[<c02499ac>] Not tainted
EIP is at .text.lock.tcp_ipv4 [kernel] 0x182 (2.4.20-2.25smp)
eax: 00000001 ebx: d400010a ecx: 00000000 edx: f78837d8
esi: f6f22ae0 edi: c3d3ad40 ebp: f74939f4 esp: f1335d8c
ds: 0068 es: 0068 ss: 0068
Process rsync (pid: 3151, stackpage=f1335000)
Stack: c3d3ad40 f3121f38 00000001 f1335e28 00000000 03ff0202 00000004
00000000 00000006 c3d3ad40 f74939e0 c022d67e c3d3ad40 f1335e28
00000000 00000006 00000000 00000001 00000000 c022d530 c021ce67
Call Trace: [<c022d67e>] ip_local_deliver_finish [kernel] 0x14e
[<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335de0))
[<c021ce67>] nf_hook_slow [kernel] 0x107 (0xf1335de4))
[<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335e00))
[<c022d2b3>] ip_local_deliver [kernel] 0x53 (0xf1335e1c))
[<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335e34))
[<c022d8b9>] ip_rcv_finish [kernel] 0x219 (0xf1335e38))
[<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e5c))
[<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e6c))
[<c021ce67>] nf_hook_slow [kernel] 0x107 (0xf1335e70))
[<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e8c))
[<c022d480>] ip_rcv [kernel] 0x1a0 (0xf1335ea8))
[<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335ec0))
[<c021566e>] netif_receive_skb [kernel] 0x14e (0xf1335ed8))
[<f89d2c7c>] tg3_rx [tg3] 0x27c (0xf1335ef8))
[<f89d2e71>] tg3_poll [tg3] 0x81 (0xf1335f38))
[<c0215917>] net_rx_action [kernel] 0xa7 (0xf1335f58))
[<c01289f9>] do_softirq [kernel] 0xd9 (0xf1335f80))
[<c010b81b>] do_IRQ [kernel] 0xfb (0xf1335f9c))
[<c010e7c8>] call_do_IRQ [kernel] 0x5 (0xf1335fc0))
Code: 7e f8 e9 68 e5 ff ff e8 2c ed eb ff e9 c3 ee ff ff e8 22 ed
console shuts up ...
NMMI Watchdog detected LOCKUP on CPU1, eip f89d9f3b, registers:
I now got rid of the failures by exchanging the tg3 driver with the latest
bcm5700 driver from 3com.
It seems as there is a bug in tg3 with the BCM5701 Gigabit Ethernet card.