Bug 158030

Summary: e1000 network drivers sleeps when it should not....
Product: [Fedora] Fedora Reporter: Tom Mitchell <mitch48>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED UPSTREAM QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: bfox, davej, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-19 17:44:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Full console dump... none

Description Tom Mitchell 2005-05-17 22:57:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050416 Fedora/1.0.3-1.3.1 Firefox/1.0.3

Description of problem:
The e1000 netwroik driver goes to sleep when it should not
and later the nmi_watchdog  kicks things over.

A stack trace looks like --

       <ffffffff8010fe68>{oops_end+40} <ffffffff8010fe61>{oops_end+33}
       <ffffffff80122afb>{do_page_fault+1963} <ffffffff80211910>{vgacon_cursor+0 }
       <ffffffff80138a8d>{release_console_sem+333} <ffffffff80138ac9>{release_co nsole_sem+393}
       <ffffffff80138d30>{vprintk+528} <ffffffff8010f041>{error_exit+0}
       <ffffffff80211910>{vgacon_cursor+0} <ffffffff80131ab4>{dequeue_task+4}
       <ffffffff80131e05>{deactivate_task+21} <ffffffff803460d0>{schedule+512}
       <ffffffff80112d8a>{timer_interrupt+1066} <ffffffff8015bd3c>{handle_IRQ_ev ent+44}
       <ffffffff8015bebc>{__do_IRQ+332} <ffffffff801419cd>{__mod_timer+317}
       <ffffffff803477ad>{schedule_timeout+253} <ffffffff801425a0>{process_timeo ut+0}
       <ffffffff8014260d>{msleep+93} <ffffffff880b7cc8>{:e1000:e1000_config_dsp_ after_link_change+744}
       <ffffffff880b9cc1>{:e1000:e1000_check_for_link+273}
       <ffffffff880b4d2a>{:e1000:e1000_watchdog+42} <ffffffff801419cd>{__mod_tim er+317}
       <ffffffff880b4d00>{:e1000:e1000_watchdog+0} <ffffffff80141e7e>{run_timer_ softirq+398}
       <ffffffff8013daf1>{__do_softirq+113} <ffffffff8013dba5>{do_softirq+53}
       <ffffffff8010eea5>{apic_timer_interrupt+133}  <EOI> <ffffffff8010c720>{de fault_idle+0}
       <ffffffff8010c740>{default_idle+32} <ffffffff8010c88f>{cpu_idle+63}


The trick to reproduce this is a network link connector that
is not up to snuff and wiggle it. The link goes down as expected
but the driver does an unsafe sleep and the watchdog cries wolf
(as it should)....  does a wolf sound like: Aiee, Aiee, Aiee in the night

Version-Release number of selected component (if applicable):
 2.6.11-1.14_FC3smp 

How reproducible:
Always

Steps to Reproduce:
1.activate ethN on top of the e1000 driver
2. plug/ unplug the connector
3. system panics Oops...
  

Actual Results:   <3>Debug: sleeping function called from invalid context at include/linux/rwsem. h:43
in_atomic():1, irqs_disabled():0

Call Trace:<ffffffff801327cf>{__might_sleep+191} <ffffffff80139349>{profile_task _exit+41}
<ffffffff8013ac72>{do_exit+34} <ffffffff8010fe68>{oops_end+40}
<ffffffff8011005d>{die_nmi+173} <ffffffff8011b26c>{nmi_watchdog_tick+220}
<ffffffff80110ab2>{default_do_nmi+130} <ffffffff8011b346>{do_nmi+134}
<ffffffff8010f423>{paranoid_exit+0} <ffffffff80348369>{.text.lock.spinlock+2}
 <EOE> <ffffffff8013198a>{task_rq_lock+74} <ffffffff80131fcb>{try_to_wake_up+43} 
       <ffffffff80133ce0>{__wake_up_common+64} <ffffffff80133d53>{__wake_up+67}
       <ffffffff802dd574>{sock_def_readable+68} <ffffffff803418b7>{unix_stream_s endmsg+711}
       <ffffffff802d9de9>{sock_sendmsg+297} <ffffffff8015d5fc>{find_get_page+92} 
       <ffffffff8015e4dc>{filemap_nopage+396} <ffffffff8016ecd2>{handle_mm_fault +418}
       <ffffffff8014eec0>{autoremove_wake_function+0} <ffffffff802d9b00>{sockfd_ lookup+32}
       <ffffffff802db6b9>{sys_sendto+233} <ffffffff8019494b>{do_ioctl+123}
       <ffffffff80194cab>{vfs_ioctl+827} <ffffffff80194d3a>{sys_ioctl+106}
       <ffffffff8010e51a>{system_call+126}
Kernel panic - not syncing: Aiee, killing interrupt handler!


Expected Results:  Should down the link....
and up the link when restored.

Additional info:

Dual processor, AMD Opteron, Kernel is 64 bit...

Comment 1 Tom Mitchell 2005-05-17 23:00:23 UTC
Created attachment 114489 [details]
Full console dump...

Just in case I pruned the text in the original post too much
here is the full console listing of the Oops

Comment 2 John W. Linville 2005-05-18 15:24:09 UTC
I believe this issue is fixed in the test kernels here: 
 
   http://people.redhat.com/linville/kernels/fc3/ 
 
Wanna give them a try to confirm?  Thanks! 

Comment 3 Tom Mitchell 2005-05-18 23:05:40 UTC
John,
I have 2.6.11-1.21_FC3.jwltest.9smp installed and running now.
I will poke and prod and try to reproduce the Oops.

thanks,
mitch

Comment 4 Tom Mitchell 2005-05-19 01:54:14 UTC
Uptime is about 4 hours now and the cable/connector is clearly bad.

While networking was worthless when I had this link up
I was able to debug it, bring it down and bring up the other
from the console......

# grep e1000_watchdog_task /var/log/messages | head -1
May 18 14:56:26 box-12 kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up
1000 Mbps Full Duplex
# grep e1000_watchdog_task /var/log/messages | tail  -1
May 18 16:51:30 box-12 kernel: e1000: eth1: e1000_watchdog_task: NIC Link is Up
1000 Mbps Full Duplex

How many times in this two hours you ask....  ;-)
# grep e1000_watchdog_task /var/log/messages | wc
    617    8648   56814
some are up some are down messages so divide in half.

So it appear that my issue has been addressed.

Thanks,
mitch

Comment 5 John W. Linville 2005-05-19 17:44:46 UTC
Excellent!  Now, get yourself another cable and you should be set... :-)