Description of problem: During the testing of the RHEL4 kernel host system hangs, and /kernel/filesystems/nfs/connectathon/EXTERNALWATCHDOG triggers. The following issue is recorded: ======================================================= lockd: couldn't shutdown host module! e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue <0> TDH <fa> TDT <61> next_to_use <61> next_to_clean <f8> buffer_info[next_to_clean] time_stamp <ffffd752> next_to_watch <fb> jiffies <ffffe263> next_to_watch.status <0> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue <0> TDH <fa> TDT <61> next_to_use <61> next_to_clean <f8> buffer_info[next_to_clean] time_stamp <ffffd752> next_to_watch <fb> jiffies <ffffea33> next_to_watch.status <0> ======================================================= Version-Release number of selected component (if applicable): I have reproduced this issue testing Cthon (connectathon) with the 2.6.9-89.0.25.EL smp kernel on sonata-02.lab.bos.redhat.com: Job#150379 Recipe-387071 http://rhts.redhat.com/testlogs/2010/04/150379/387071/3127911/console.txt I have reproduced this issue testing Cthon (connectathon) with the 2.6.9-89.0.24.EL smp kernel on sonata-02.lab.bos.redhat.com: Job#150608 Recipe-387719 http://rhts.redhat.com/testlogs/2010/04/150608/387719/3132119/console.txt I have reproduced this issue testing Cthon (connectathon) with the 2.6.9-89.0.23.EL smp kernel on sonata-02.lab.bos.redhat.com: Job#150720 Recipe-387952 http://rhts.redhat.com/testlogs/2010/04/150720/387952/3133816/console.txt I have reproduced this issue testing Cthon (connectathon) with the 2.6.9-89.24.EL smp kernel Job#150810 - issue reproduced with Viveks release kernel. http://rhts.redhat.com/testlogs/2010/04/150810/388356/3135931/console.txt Further data: A later test that I had kicked off, as a clone of the original test, testing Cthon (connectathon) with the 2.6.9-89.0.25.EL smp kernel on sonata-02.lab.bos.redhat.com. Produced a kernel PANIC on this run. The data is similar: Job#150617 Recipe-387728 http://rhts.redhat.com/testlogs/2010/04/150617/387728/3132147/console.txt The following issue is recorded in this instance: ======================================================= e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue <0> TDH <4> TDT <5> next_to_use <5> next_to_clean <3> buffer_info[next_to_clean] time_stamp <1007baf95> next_to_watch <4> jiffies <1007bd190> next_to_watch.status <0> nfs: server rhel5-nfs not responding, still trying stack segment: 0000 [1] SMP CPU 0 Modules linked in: nfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core powernow_k8 cpufreq_powersave loop joydev button battery ac ehci_hcd ohci_hcd snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore r8169 mii e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ahci libata sd_mod scsi_mod Pid: 20074, comm: dumpcap Not tainted 2.6.9-89.0.25.ELsmp RIP: 0010:[<ffffffff802b6145>] <ffffffff802b6145>{skb_copy_datagram_iovec+259} RSP: 0018:0000010212dc9c38 EFLAGS: 00010202 RAX: 0000010212dc9fd8 RBX: 00000102239e2698 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000035 RDI: ffffffff8032e2b0 RBP: 9eb003a0fceb023e R08: fd2f100aee9e1101 R09: 0000000000000006 R10: 0000010212dc9c48 R11: 0000000000000048 R12: 00000000000005c8 R13: 00000000000005c8 R14: 0000000000000022 R15: 0000000000000022 FS: 0000002a96ebd500(0000) GS:ffffffff80504e00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000005b731c CR3: 0000000000101000 CR4: 00000000000006e0 Process dumpcap (pid: 20074, threadinfo 0000010212dc8000, task 0000010219a1c030) Stack: 0000000000005f3e 0000010212dc9f38 00000101e5aced00 000000000000ffff 00000101e5aced00 0000010212dc9ef8 00000000000005ea 000001020f3ffd00 0000000000000020 ffffffff803129b9 Call Trace:<ffffffff803129b9>{packet_recvmsg+138} <ffffffff802b02c0>{sock_recvmsg+284} <ffffffff8018e9e3>{poll_freewait+64} <ffffffff8018ee76>{do_select+978} <ffffffff801361fe>{autoremove_wake_function+0} <ffffffff802afeab>{sockfd_lookup+16} <ffffffff802b173c>{sys_recvfrom+182} <ffffffff8018e9e9>{__pollwait+0} <ffffffff8018f315>{sys_select+1147} <ffffffff80197205>{dnotify_parent+34} <ffffffff801102de>{system_call+126} Code: 48 0f b6 45 03 48 8b 7c 24 08 48 8b 14 c5 80 7a 4a 80 48 b8 RIP <ffffffff802b6145>{skb_copy_datagram_iovec+259} RSP <0000010212dc9c38> <0>Kernel panic - not syncing: Oops [-- MARK -- Wed Apr 21 10:45:00 2010] ======================================================= How reproducible: Reserve the host sonata-02.lab.bos.redhat.com and run the /kernel/filesystems/nfs/connectathon test with any of the above noted kernels. Expected results: This test should pass without issue. Additional info: I ran this test on several other hosts with e1000, using both the noted z-stream and y-stream kernels without issue: Test Cthon 2.6.9-89.0.25.EL smp kernel on other systems with e1000 network driver Job#150618 - no issues. nec-em3.rhts.eng.bos.redhat.com Job#150619 - no issues. sun-x4600m2-01.rhts.eng.bos.redhat.com Job#150620 - no issues. nec-em8.rhts.eng.bos.redhat.com Job#150621 - no issues. nec-em11.rhts.eng.bos.redhat.com Job#150623 - no issues. nec-em24-11.rhts.eng.bos.redhat.com Test Cthon 2.6.9-89.24.EL smp kernel on other systems with e1000 network driver: Job#150775 - no issue. nec-em3.rhts.eng.bos.redhat.com Job#150776 - no issues. nec-em11.rhts.eng.bos.redhat.com Job#150777 - no issues. nec-em11.rhts.eng.bos.redhat.com Job#150778 - no issues. sun-x4200-01.rhts.eng.bos.redhat.com Job#150779 - no issues. dell-pe2850-01.rhts.eng.bos.redhat.com Best, -pbunyan
This looks a LOT like bug 558809, but it doesn't appear to me that the 82541 could be a problem too since it doesn't do packet-split. That is unfortunate.
This is a year old and it seems like problems that often happen on one system are due to a BIOS/firmware problem on that system. I'm going to close this as NOTABUG.