Bug 79103 - [tg3] eth0: Error, poll already scheduled
Summary: [tg3] eth0: Error, poll already scheduled
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeff Garzik
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-12-05 18:32 UTC by Rick Gaudette
Modified: 2013-07-03 02:08 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2003-01-20 21:03:27 UTC
Embargoed:


Attachments (Terms of Use)

Description Rick Gaudette 2002-12-05 18:32:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
We just installed kernel-2.4.18-18.7.x and we are now seeing thousands of
kernel: tg3: eth0: Error, poll already scheduled messages

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Boot system
2. Read over network interface
3.
	

Actual Results:  /var/log/messages fills up with Dec  5 09:38:48 simba kernel:
tg3: eth0: Error, poll already scheduled
Dec  5 09:39:19 simba last message repeated 52 times
Dec  5 09:40:23 simba last message repeated 140 times
Dec  5 09:41:38 simba last message repeated 101 times
Dec  5 09:42:39 simba last message repeated 230 times
Dec  5 09:43:43 simba last message repeated 161 times

Expected Results:  No Error message

Additional info:

Comment 1 Rick Gaudette 2002-12-09 04:12:44 UTC
Some more info:
After repeating this error message for 3 days (several thousand instances) the
machine stoped serving NFS, its only purpose.  The message that looks most
unusual in the log is

Dec  8 18:11:34 simba rpc.statd[586]: Can't callback simba.colorado.edu
(100021,4), giving up.

I attempted to remotely reboot (using shutdown) the machine which only seemed to
cause the system load to rise to 8 and stay there.  I could still log in and
attempted to reboot using reboot, the machine did not reboot and needed to be
powercycled.

Hope this helps.

Comment 2 Rick Gaudette 2002-12-10 00:53:43 UTC
Some more info that might help:
We only observe this error on a Dell poweredge 4600.  Two other P4 machines that
use the 3com 996BT NIC do not report this error at all.

This is kernel ID line from the machine that keeps reporting this error:
Dec  8 20:49:40 simba kernel: eth0: Tigon3 [partno(BCM95700A6) rev 7104
PHY(5401)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:88:dc:7f

This is the kernel ID line from one of the machines that do not crash:
Dec  8 18:06:44 monalisa kernel: eth0: Tigon3 [partno(3C996B-T) rev 0105
PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:04:76:f1:0a:86



Comment 3 Rick Gaudette 2002-12-10 16:55:02 UTC
I spoke too soon.  All three machines have now crashed with the 2.4.18-18.7.x
kernel.

Could this be related to 69920?


Comment 4 Jeff Garzik 2002-12-10 17:00:08 UTC
Yes, the kernel 2.4.18-18.7.x tg3 crash is the one solved by driver version 1.2,
mentioned in bug 69920 :)


Comment 5 Jeff Garzik 2002-12-31 22:38:10 UTC
Should be fixed in experiment #1 rpms, below.



[snip comment from other bug report]
To all still experiencing problems,

1) please boot with "noapic" on the kernel command line.  You can run "cat
/proc/cmdline" to check for sure.

2) I have posted some new rpms for testing, based on the latest errata:

latest production tg3 release, 1.2a, built into unofficial rpms:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/rpms/

but I would like people to test my experiment which should provide additional
stability:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/exp1-rpms/

...and if that doesn't work for people, fall back to experiment 2:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/exp2-rpms/

Feedback requested!  On several systems, there is evidence that the lock-ups are
not directly related to driver but more to system board.  So please make sure to
attach 'dmesg' and 'lspci -vvv' output in future bug reports.


Comment 6 Rick Gaudette 2002-12-31 23:15:19 UTC
Hi Jeff,

Are these additional changes to the 2.4.18-19.7.x kernel?  We just put that on
one of our P4 with a 996BT NIC and we'll see how that runs for a few days.

Comment 7 Jeff Garzik 2003-01-20 21:03:27 UTC
This is fixed in the current 8.1 beta kernels, and also the
unofficial-errata-kernel version "aragorn2" that I have posted at
http://people.redhat.com/jgarzik/pub/  The "aragorn2" kernels are based off of
Red Hat's latest official errata kernel, 2.4.18-19.8.0, with the addition of an
updated e1000 and bugfixed tg3 driver.



Note You need to log in before you can comment on or make changes to this bug.