From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050416 Fedora/1.0.3-1.3.1 Firefox/1.0.3 Description of problem: The e1000 netwroik driver goes to sleep when it should not and later the nmi_watchdog kicks things over. A stack trace looks like -- <ffffffff8010fe68>{oops_end+40} <ffffffff8010fe61>{oops_end+33} <ffffffff80122afb>{do_page_fault+1963} <ffffffff80211910>{vgacon_cursor+0 } <ffffffff80138a8d>{release_console_sem+333} <ffffffff80138ac9>{release_co nsole_sem+393} <ffffffff80138d30>{vprintk+528} <ffffffff8010f041>{error_exit+0} <ffffffff80211910>{vgacon_cursor+0} <ffffffff80131ab4>{dequeue_task+4} <ffffffff80131e05>{deactivate_task+21} <ffffffff803460d0>{schedule+512} <ffffffff80112d8a>{timer_interrupt+1066} <ffffffff8015bd3c>{handle_IRQ_ev ent+44} <ffffffff8015bebc>{__do_IRQ+332} <ffffffff801419cd>{__mod_timer+317} <ffffffff803477ad>{schedule_timeout+253} <ffffffff801425a0>{process_timeo ut+0} <ffffffff8014260d>{msleep+93} <ffffffff880b7cc8>{:e1000:e1000_config_dsp_ after_link_change+744} <ffffffff880b9cc1>{:e1000:e1000_check_for_link+273} <ffffffff880b4d2a>{:e1000:e1000_watchdog+42} <ffffffff801419cd>{__mod_tim er+317} <ffffffff880b4d00>{:e1000:e1000_watchdog+0} <ffffffff80141e7e>{run_timer_ softirq+398} <ffffffff8013daf1>{__do_softirq+113} <ffffffff8013dba5>{do_softirq+53} <ffffffff8010eea5>{apic_timer_interrupt+133} <EOI> <ffffffff8010c720>{de fault_idle+0} <ffffffff8010c740>{default_idle+32} <ffffffff8010c88f>{cpu_idle+63} The trick to reproduce this is a network link connector that is not up to snuff and wiggle it. The link goes down as expected but the driver does an unsafe sleep and the watchdog cries wolf (as it should).... does a wolf sound like: Aiee, Aiee, Aiee in the night Version-Release number of selected component (if applicable): 2.6.11-1.14_FC3smp How reproducible: Always Steps to Reproduce: 1.activate ethN on top of the e1000 driver 2. plug/ unplug the connector 3. system panics Oops... Actual Results: <3>Debug: sleeping function called from invalid context at include/linux/rwsem. h:43 in_atomic():1, irqs_disabled():0 Call Trace:<ffffffff801327cf>{__might_sleep+191} <ffffffff80139349>{profile_task _exit+41} <ffffffff8013ac72>{do_exit+34} <ffffffff8010fe68>{oops_end+40} <ffffffff8011005d>{die_nmi+173} <ffffffff8011b26c>{nmi_watchdog_tick+220} <ffffffff80110ab2>{default_do_nmi+130} <ffffffff8011b346>{do_nmi+134} <ffffffff8010f423>{paranoid_exit+0} <ffffffff80348369>{.text.lock.spinlock+2} <EOE> <ffffffff8013198a>{task_rq_lock+74} <ffffffff80131fcb>{try_to_wake_up+43} <ffffffff80133ce0>{__wake_up_common+64} <ffffffff80133d53>{__wake_up+67} <ffffffff802dd574>{sock_def_readable+68} <ffffffff803418b7>{unix_stream_s endmsg+711} <ffffffff802d9de9>{sock_sendmsg+297} <ffffffff8015d5fc>{find_get_page+92} <ffffffff8015e4dc>{filemap_nopage+396} <ffffffff8016ecd2>{handle_mm_fault +418} <ffffffff8014eec0>{autoremove_wake_function+0} <ffffffff802d9b00>{sockfd_ lookup+32} <ffffffff802db6b9>{sys_sendto+233} <ffffffff8019494b>{do_ioctl+123} <ffffffff80194cab>{vfs_ioctl+827} <ffffffff80194d3a>{sys_ioctl+106} <ffffffff8010e51a>{system_call+126} Kernel panic - not syncing: Aiee, killing interrupt handler! Expected Results: Should down the link.... and up the link when restored. Additional info: Dual processor, AMD Opteron, Kernel is 64 bit...
Created attachment 114489 [details] Full console dump... Just in case I pruned the text in the original post too much here is the full console listing of the Oops
I believe this issue is fixed in the test kernels here: http://people.redhat.com/linville/kernels/fc3/ Wanna give them a try to confirm? Thanks!
John, I have 2.6.11-1.21_FC3.jwltest.9smp installed and running now. I will poke and prod and try to reproduce the Oops. thanks, mitch
Uptime is about 4 hours now and the cable/connector is clearly bad. While networking was worthless when I had this link up I was able to debug it, bring it down and bring up the other from the console...... # grep e1000_watchdog_task /var/log/messages | head -1 May 18 14:56:26 box-12 kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex # grep e1000_watchdog_task /var/log/messages | tail -1 May 18 16:51:30 box-12 kernel: e1000: eth1: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex How many times in this two hours you ask.... ;-) # grep e1000_watchdog_task /var/log/messages | wc 617 8648 56814 some are up some are down messages so divide in half. So it appear that my issue has been addressed. Thanks, mitch
Excellent! Now, get yourself another cable and you should be set... :-)