Bug 79103

Summary:	[tg3] eth0: Error, poll already scheduled
Product:	[Retired] Red Hat Linux	Reporter:	Rick Gaudette <rickg>
Component:	kernel	Assignee:	Jeff Garzik <jgarzik>
Status:	CLOSED RAWHIDE	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.2	CC:	peterm
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2003-01-20 21:03:27 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rick Gaudette 2002-12-05 18:32:11 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
We just installed kernel-2.4.18-18.7.x and we are now seeing thousands of
kernel: tg3: eth0: Error, poll already scheduled messages

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Boot system
2. Read over network interface
3.
	

Actual Results:  /var/log/messages fills up with Dec  5 09:38:48 simba kernel:
tg3: eth0: Error, poll already scheduled
Dec  5 09:39:19 simba last message repeated 52 times
Dec  5 09:40:23 simba last message repeated 140 times
Dec  5 09:41:38 simba last message repeated 101 times
Dec  5 09:42:39 simba last message repeated 230 times
Dec  5 09:43:43 simba last message repeated 161 times

Expected Results:  No Error message

Additional info:

Comment 1 Rick Gaudette 2002-12-09 04:12:44 UTC

Some more info:
After repeating this error message for 3 days (several thousand instances) the
machine stoped serving NFS, its only purpose.  The message that looks most
unusual in the log is

Dec  8 18:11:34 simba rpc.statd[586]: Can't callback simba.colorado.edu
(100021,4), giving up.

I attempted to remotely reboot (using shutdown) the machine which only seemed to
cause the system load to rise to 8 and stay there.  I could still log in and
attempted to reboot using reboot, the machine did not reboot and needed to be
powercycled.

Hope this helps.

Comment 2 Rick Gaudette 2002-12-10 00:53:43 UTC

Some more info that might help:
We only observe this error on a Dell poweredge 4600.  Two other P4 machines that
use the 3com 996BT NIC do not report this error at all.

This is kernel ID line from the machine that keeps reporting this error:
Dec  8 20:49:40 simba kernel: eth0: Tigon3 [partno(BCM95700A6) rev 7104
PHY(5401)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:88:dc:7f

This is the kernel ID line from one of the machines that do not crash:
Dec  8 18:06:44 monalisa kernel: eth0: Tigon3 [partno(3C996B-T) rev 0105
PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:04:76:f1:0a:86

Comment 3 Rick Gaudette 2002-12-10 16:55:02 UTC

I spoke too soon.  All three machines have now crashed with the 2.4.18-18.7.x
kernel.

Could this be related to 69920?

Comment 4 Jeff Garzik 2002-12-10 17:00:08 UTC

Yes, the kernel 2.4.18-18.7.x tg3 crash is the one solved by driver version 1.2,
mentioned in bug 69920 :)

Comment 5 Jeff Garzik 2002-12-31 22:38:10 UTC

Should be fixed in experiment #1 rpms, below.



[snip comment from other bug report]
To all still experiencing problems,

1) please boot with "noapic" on the kernel command line.  You can run "cat
/proc/cmdline" to check for sure.

2) I have posted some new rpms for testing, based on the latest errata:

latest production tg3 release, 1.2a, built into unofficial rpms:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/rpms/

but I would like people to test my experiment which should provide additional
stability:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/exp1-rpms/

...and if that doesn't work for people, fall back to experiment 2:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/exp2-rpms/

Feedback requested!  On several systems, there is evidence that the lock-ups are
not directly related to driver but more to system board.  So please make sure to
attach 'dmesg' and 'lspci -vvv' output in future bug reports.

Comment 6 Rick Gaudette 2002-12-31 23:15:19 UTC

Hi Jeff,

Are these additional changes to the 2.4.18-19.7.x kernel?  We just put that on
one of our P4 with a 996BT NIC and we'll see how that runs for a few days.

Comment 7 Jeff Garzik 2003-01-20 21:03:27 UTC

This is fixed in the current 8.1 beta kernels, and also the
unofficial-errata-kernel version "aragorn2" that I have posted at
http://people.redhat.com/jgarzik/pub/  The "aragorn2" kernels are based off of
Red Hat's latest official errata kernel, 2.4.18-19.8.0, with the addition of an
updated e1000 and bugfixed tg3 driver.