Bug 158030 - e1000 network drivers sleeps when it should not....
e1000 network drivers sleeps when it should not....
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: John W. Linville
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2005-05-17 18:57 EDT by Tom Mitchell
Modified: 2007-11-30 17:11 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-05-19 13:44:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Full console dump... (9.88 KB, text/plain)
2005-05-17 19:00 EDT, Tom Mitchell
no flags Details

  None (edit)
Description Tom Mitchell 2005-05-17 18:57:15 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050416 Fedora/1.0.3-1.3.1 Firefox/1.0.3

Description of problem:
The e1000 netwroik driver goes to sleep when it should not
and later the nmi_watchdog  kicks things over.

A stack trace looks like --

       <ffffffff8010fe68>{oops_end+40} <ffffffff8010fe61>{oops_end+33}
       <ffffffff80122afb>{do_page_fault+1963} <ffffffff80211910>{vgacon_cursor+0 }
       <ffffffff80138a8d>{release_console_sem+333} <ffffffff80138ac9>{release_co nsole_sem+393}
       <ffffffff80138d30>{vprintk+528} <ffffffff8010f041>{error_exit+0}
       <ffffffff80211910>{vgacon_cursor+0} <ffffffff80131ab4>{dequeue_task+4}
       <ffffffff80131e05>{deactivate_task+21} <ffffffff803460d0>{schedule+512}
       <ffffffff80112d8a>{timer_interrupt+1066} <ffffffff8015bd3c>{handle_IRQ_ev ent+44}
       <ffffffff8015bebc>{__do_IRQ+332} <ffffffff801419cd>{__mod_timer+317}
       <ffffffff803477ad>{schedule_timeout+253} <ffffffff801425a0>{process_timeo ut+0}
       <ffffffff8014260d>{msleep+93} <ffffffff880b7cc8>{:e1000:e1000_config_dsp_ after_link_change+744}
       <ffffffff880b4d2a>{:e1000:e1000_watchdog+42} <ffffffff801419cd>{__mod_tim er+317}
       <ffffffff880b4d00>{:e1000:e1000_watchdog+0} <ffffffff80141e7e>{run_timer_ softirq+398}
       <ffffffff8013daf1>{__do_softirq+113} <ffffffff8013dba5>{do_softirq+53}
       <ffffffff8010eea5>{apic_timer_interrupt+133}  <EOI> <ffffffff8010c720>{de fault_idle+0}
       <ffffffff8010c740>{default_idle+32} <ffffffff8010c88f>{cpu_idle+63}

The trick to reproduce this is a network link connector that
is not up to snuff and wiggle it. The link goes down as expected
but the driver does an unsafe sleep and the watchdog cries wolf
(as it should)....  does a wolf sound like: Aiee, Aiee, Aiee in the night

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.activate ethN on top of the e1000 driver
2. plug/ unplug the connector
3. system panics Oops...

Actual Results:   <3>Debug: sleeping function called from invalid context at include/linux/rwsem. h:43
in_atomic():1, irqs_disabled():0

Call Trace:<ffffffff801327cf>{__might_sleep+191} <ffffffff80139349>{profile_task _exit+41}
<ffffffff8013ac72>{do_exit+34} <ffffffff8010fe68>{oops_end+40}
<ffffffff8011005d>{die_nmi+173} <ffffffff8011b26c>{nmi_watchdog_tick+220}
<ffffffff80110ab2>{default_do_nmi+130} <ffffffff8011b346>{do_nmi+134}
<ffffffff8010f423>{paranoid_exit+0} <ffffffff80348369>{.text.lock.spinlock+2}
 <EOE> <ffffffff8013198a>{task_rq_lock+74} <ffffffff80131fcb>{try_to_wake_up+43} 
       <ffffffff80133ce0>{__wake_up_common+64} <ffffffff80133d53>{__wake_up+67}
       <ffffffff802dd574>{sock_def_readable+68} <ffffffff803418b7>{unix_stream_s endmsg+711}
       <ffffffff802d9de9>{sock_sendmsg+297} <ffffffff8015d5fc>{find_get_page+92} 
       <ffffffff8015e4dc>{filemap_nopage+396} <ffffffff8016ecd2>{handle_mm_fault +418}
       <ffffffff8014eec0>{autoremove_wake_function+0} <ffffffff802d9b00>{sockfd_ lookup+32}
       <ffffffff802db6b9>{sys_sendto+233} <ffffffff8019494b>{do_ioctl+123}
       <ffffffff80194cab>{vfs_ioctl+827} <ffffffff80194d3a>{sys_ioctl+106}
Kernel panic - not syncing: Aiee, killing interrupt handler!

Expected Results:  Should down the link....
and up the link when restored.

Additional info:

Dual processor, AMD Opteron, Kernel is 64 bit...
Comment 1 Tom Mitchell 2005-05-17 19:00:23 EDT
Created attachment 114489 [details]
Full console dump...

Just in case I pruned the text in the original post too much
here is the full console listing of the Oops
Comment 2 John W. Linville 2005-05-18 11:24:09 EDT
I believe this issue is fixed in the test kernels here: 
Wanna give them a try to confirm?  Thanks! 
Comment 3 Tom Mitchell 2005-05-18 19:05:40 EDT
I have 2.6.11-1.21_FC3.jwltest.9smp installed and running now.
I will poke and prod and try to reproduce the Oops.

Comment 4 Tom Mitchell 2005-05-18 21:54:14 EDT
Uptime is about 4 hours now and the cable/connector is clearly bad.

While networking was worthless when I had this link up
I was able to debug it, bring it down and bring up the other
from the console......

# grep e1000_watchdog_task /var/log/messages | head -1
May 18 14:56:26 box-12 kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up
1000 Mbps Full Duplex
# grep e1000_watchdog_task /var/log/messages | tail  -1
May 18 16:51:30 box-12 kernel: e1000: eth1: e1000_watchdog_task: NIC Link is Up
1000 Mbps Full Duplex

How many times in this two hours you ask....  ;-)
# grep e1000_watchdog_task /var/log/messages | wc
    617    8648   56814
some are up some are down messages so divide in half.

So it appear that my issue has been addressed.

Comment 5 John W. Linville 2005-05-19 13:44:46 EDT
Excellent!  Now, get yourself another cable and you should be set... :-) 

Note You need to log in before you can comment on or make changes to this bug.