From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Description of problem: Hello, since months I am trying to debug a problem on my RedHat 7.3 machine using nss_ldap. The following post and some other bugreports finally brought me to the conclusion that nss_ldap could be the cause: http://www.netsys.com/nssldap/2002/09/msg00014.html I have exactly the same problem the user describes. After some uptime and under some load the system locks up totally. All ports and interfaces remain open, pings are possible but no kind of login or connect is possible. I don't run nscd which seems to caused some bugs lately. Nearly the same issue was addressed some times at redhat's and padl's bugzilla but this problems are all marked as solved and shouldn't occur in nss_ldap-189- 4. I allready exchanged my hardware. I have 2 similar set up machines running in a cluster. The error occurs on both. Maybe an issue: I had no crashes for some weeks as I had removed the LDAP maps in my postfix setup. Now these lookups are active and since then the problem seems to occur again. Bevore the only app which looked up ldap was pam. Please help Thanks in advance Daniel Khan Version-Release number of selected component (if applicable): How reproducible: Couldn't Reproduce Additional info:
I was able to reproduce the crash doing rsyncs over the 1000mbit nic. I have got oops now and I have posted it to the kernel list also. I tried the rawhide kernel 2.4.20 and the problem is the same. Scenario: 2.4.20-2.25smp from RawHide Doing a rsync from the crashing host _to_ another host over a 1000 Mbit 3com (TG3). The rsynced files include bigger files with about 1.5 gigs. Heartbeat runs. Below are the OOPS. <------------------------CUT----------------------------> NMI Watchdog detected LOCKUP on CPU0, eip c02499ac, registers: via686a eeprom lm80 i2c-proc i2c-isa i2c-viapro i2c-core tg3 eepro100 mii ipt_LOG ipt_limit ipt_state ipt_REJECT iptable_nat ip_cona CPU: 0 EIP: 0060:[<c02499ac>] Not tainted EFLAGS: 00000086 EIP is at .text.lock.tcp_ipv4 [kernel] 0x182 (2.4.20-2.25smp) eax: 00000001 ebx: d400010a ecx: 00000000 edx: f78837d8 esi: f6f22ae0 edi: c3d3ad40 ebp: f74939f4 esp: f1335d8c ds: 0068 es: 0068 ss: 0068 Process rsync (pid: 3151, stackpage=f1335000) Stack: c3d3ad40 f3121f38 00000001 f1335e28 00000000 03ff0202 00000004 000003ff 00000000 00000006 c3d3ad40 f74939e0 c022d67e c3d3ad40 f1335e28 c3d5a000 00000000 00000006 00000000 00000001 00000000 c022d530 c021ce67 c3d3ad40 Call Trace: [<c022d67e>] ip_local_deliver_finish [kernel] 0x14e (0xf1335dbc)) [<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335de0)) [<c021ce67>] nf_hook_slow [kernel] 0x107 (0xf1335de4)) [<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335e00)) [<c022d2b3>] ip_local_deliver [kernel] 0x53 (0xf1335e1c)) [<c022d530>] ip_local_deliver_finish [kernel] 0x0 (0xf1335e34)) [<c022d8b9>] ip_rcv_finish [kernel] 0x219 (0xf1335e38)) [<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e5c)) [<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e6c)) [<c021ce67>] nf_hook_slow [kernel] 0x107 (0xf1335e70)) [<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335e8c)) [<c022d480>] ip_rcv [kernel] 0x1a0 (0xf1335ea8)) [<c022d6a0>] ip_rcv_finish [kernel] 0x0 (0xf1335ec0)) [<c021566e>] netif_receive_skb [kernel] 0x14e (0xf1335ed8)) [<f89d2c7c>] tg3_rx [tg3] 0x27c (0xf1335ef8)) [<f89d2e71>] tg3_poll [tg3] 0x81 (0xf1335f38)) [<c0215917>] net_rx_action [kernel] 0xa7 (0xf1335f58)) [<c01289f9>] do_softirq [kernel] 0xd9 (0xf1335f80)) [<c010b81b>] do_IRQ [kernel] 0xfb (0xf1335f9c)) [<c010e7c8>] call_do_IRQ [kernel] 0x5 (0xf1335fc0)) Code: 7e f8 e9 68 e5 ff ff e8 2c ed eb ff e9 c3 ee ff ff e8 22 ed console shuts up ... NMMI Watchdog detected LOCKUP on CPU1, eip f89d9f3b, registers: <------------------------CUT---------------------------->
I now got rid of the failures by exchanging the tg3 driver with the latest bcm5700 driver from 3com. It seems as there is a bug in tg3 with the BCM5701 Gigabit Ethernet card. regards Daniel Khan