Bug 584532 - Kernel Panic with RHEL4 kernel while running connectathon test on specific host
Kernel Panic with RHEL4 kernel while running connectathon test on specific host
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7.z
All Linux
low Severity medium
: rc
: ---
Assigned To: Andy Gospodarek
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-21 15:45 EDT by PaulB
Modified: 2014-06-29 19:02 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-08-18 21:19:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description PaulB 2010-04-21 15:45:41 EDT
Description of problem:
During the testing of the RHEL4 kernel host system hangs, and /kernel/filesystems/nfs/connectathon/EXTERNALWATCHDOG triggers.

The following issue is recorded:
=======================================================
lockd: couldn't shutdown host module!
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <fa>
  TDT                  <61>
  next_to_use          <61>
  next_to_clean        <f8>
buffer_info[next_to_clean]
  time_stamp           <ffffd752>
  next_to_watch        <fb>
  jiffies              <ffffe263>
  next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <fa>
  TDT                  <61>
  next_to_use          <61>
  next_to_clean        <f8>
buffer_info[next_to_clean]
  time_stamp           <ffffd752>
  next_to_watch        <fb>
  jiffies              <ffffea33>
  next_to_watch.status <0>
=======================================================

Version-Release number of selected component (if applicable):

I have reproduced this issue testing Cthon (connectathon) with the  
2.6.9-89.0.25.EL smp kernel on sonata-02.lab.bos.redhat.com:
Job#150379 Recipe-387071
http://rhts.redhat.com/testlogs/2010/04/150379/387071/3127911/console.txt

I have reproduced this issue testing Cthon (connectathon) with the  
2.6.9-89.0.24.EL smp kernel on sonata-02.lab.bos.redhat.com:
Job#150608 Recipe-387719
http://rhts.redhat.com/testlogs/2010/04/150608/387719/3132119/console.txt

I have reproduced this issue testing Cthon (connectathon) with the
2.6.9-89.0.23.EL smp kernel on sonata-02.lab.bos.redhat.com:
Job#150720 Recipe-387952
http://rhts.redhat.com/testlogs/2010/04/150720/387952/3133816/console.txt

I have reproduced this issue testing Cthon (connectathon) with the
2.6.9-89.24.EL smp kernel
Job#150810 - issue reproduced with Viveks release kernel.
http://rhts.redhat.com/testlogs/2010/04/150810/388356/3135931/console.txt

Further data:
A later test that I had kicked off, as a clone of the original test, testing Cthon (connectathon) with the  
2.6.9-89.0.25.EL smp kernel on sonata-02.lab.bos.redhat.com.
Produced a kernel PANIC on this run. The data is similar:
Job#150617 Recipe-387728
http://rhts.redhat.com/testlogs/2010/04/150617/387728/3132147/console.txt
The following issue is recorded in this instance:
=======================================================
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <4>
  TDT                  <5>
  next_to_use          <5>
  next_to_clean        <3>
buffer_info[next_to_clean]
  time_stamp           <1007baf95>
  next_to_watch        <4>
  jiffies              <1007bd190>
  next_to_watch.status <0>
nfs: server rhel5-nfs not responding, still trying
stack segment: 0000 [1] SMP 
CPU 0 
Modules linked in: nfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core powernow_k8 cpufreq_powersave loop joydev button battery ac ehci_hcd ohci_hcd snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore r8169 mii e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ahci libata sd_mod scsi_mod
Pid: 20074, comm: dumpcap Not tainted 2.6.9-89.0.25.ELsmp
RIP: 0010:[<ffffffff802b6145>] <ffffffff802b6145>{skb_copy_datagram_iovec+259}
RSP: 0018:0000010212dc9c38  EFLAGS: 00010202
RAX: 0000010212dc9fd8 RBX: 00000102239e2698 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000035 RDI: ffffffff8032e2b0
RBP: 9eb003a0fceb023e R08: fd2f100aee9e1101 R09: 0000000000000006
R10: 0000010212dc9c48 R11: 0000000000000048 R12: 00000000000005c8
R13: 00000000000005c8 R14: 0000000000000022 R15: 0000000000000022
FS:  0000002a96ebd500(0000) GS:ffffffff80504e00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000005b731c CR3: 0000000000101000 CR4: 00000000000006e0
Process dumpcap (pid: 20074, threadinfo 0000010212dc8000, task 0000010219a1c030)
Stack: 0000000000005f3e 0000010212dc9f38 00000101e5aced00 000000000000ffff 
       00000101e5aced00 0000010212dc9ef8 00000000000005ea 000001020f3ffd00 
       0000000000000020 ffffffff803129b9 
Call Trace:<ffffffff803129b9>{packet_recvmsg+138} <ffffffff802b02c0>{sock_recvmsg+284} 
       <ffffffff8018e9e3>{poll_freewait+64} <ffffffff8018ee76>{do_select+978} 
       <ffffffff801361fe>{autoremove_wake_function+0} <ffffffff802afeab>{sockfd_lookup+16} 
       <ffffffff802b173c>{sys_recvfrom+182} <ffffffff8018e9e9>{__pollwait+0} 
       <ffffffff8018f315>{sys_select+1147} <ffffffff80197205>{dnotify_parent+34} 
       <ffffffff801102de>{system_call+126} 

Code: 48 0f b6 45 03 48 8b 7c 24 08 48 8b 14 c5 80 7a 4a 80 48 b8 
RIP <ffffffff802b6145>{skb_copy_datagram_iovec+259} RSP <0000010212dc9c38>
 <0>Kernel panic - not syncing: Oops
 [-- MARK -- Wed Apr 21 10:45:00 2010]
=======================================================

How reproducible:
Reserve the host sonata-02.lab.bos.redhat.com and run the /kernel/filesystems/nfs/connectathon test with any of the above noted kernels.

Expected results:
This test should pass without issue.

Additional info:
I ran this test on several other hosts with e1000, using both the noted z-stream and y-stream kernels without issue:
Test Cthon 2.6.9-89.0.25.EL smp kernel on other systems with e1000  
network driver
  Job#150618 - no issues.        nec-em3.rhts.eng.bos.redhat.com
  Job#150619 - no issues.    sun-x4600m2-01.rhts.eng.bos.redhat.com
  Job#150620 - no issues.    nec-em8.rhts.eng.bos.redhat.com
  Job#150621 - no issues.    nec-em11.rhts.eng.bos.redhat.com
  Job#150623 - no issues.    nec-em24-11.rhts.eng.bos.redhat.com

Test Cthon 2.6.9-89.24.EL smp kernel on other systems with e1000 network  
driver:
  Job#150775  - no issue.     nec-em3.rhts.eng.bos.redhat.com
  Job#150776  - no issues.    nec-em11.rhts.eng.bos.redhat.com
  Job#150777  - no issues.	nec-em11.rhts.eng.bos.redhat.com
  Job#150778  - no issues.    sun-x4200-01.rhts.eng.bos.redhat.com
  Job#150779  - no issues.  	dell-pe2850-01.rhts.eng.bos.redhat.com

Best,
-pbunyan
Comment 1 Andy Gospodarek 2010-08-26 17:20:29 EDT
This looks a LOT like bug 558809, but it doesn't appear to me that the 82541 could be a problem too since it doesn't do packet-split.  That is unfortunate.
Comment 2 Andy Gospodarek 2011-08-18 21:19:07 EDT
This is a year old and it seems like problems that often happen on one system are due to a BIOS/firmware problem on that system.  I'm going to close this as NOTABUG.

Note You need to log in before you can comment on or make changes to this bug.