Bug 79103

Summary: [tg3] eth0: Error, poll already scheduled
Product: [Retired] Red Hat Linux Reporter: Rick Gaudette <rickg>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: peterm
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-01-20 21:03:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rick Gaudette 2002-12-05 18:32:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
We just installed kernel-2.4.18-18.7.x and we are now seeing thousands of
kernel: tg3: eth0: Error, poll already scheduled messages

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Boot system
2. Read over network interface
3.
	

Actual Results:  /var/log/messages fills up with Dec  5 09:38:48 simba kernel:
tg3: eth0: Error, poll already scheduled
Dec  5 09:39:19 simba last message repeated 52 times
Dec  5 09:40:23 simba last message repeated 140 times
Dec  5 09:41:38 simba last message repeated 101 times
Dec  5 09:42:39 simba last message repeated 230 times
Dec  5 09:43:43 simba last message repeated 161 times

Expected Results:  No Error message

Additional info:

Comment 1 Rick Gaudette 2002-12-09 04:12:44 UTC
Some more info:
After repeating this error message for 3 days (several thousand instances) the
machine stoped serving NFS, its only purpose.  The message that looks most
unusual in the log is

Dec  8 18:11:34 simba rpc.statd[586]: Can't callback simba.colorado.edu
(100021,4), giving up.

I attempted to remotely reboot (using shutdown) the machine which only seemed to
cause the system load to rise to 8 and stay there.  I could still log in and
attempted to reboot using reboot, the machine did not reboot and needed to be
powercycled.

Hope this helps.

Comment 2 Rick Gaudette 2002-12-10 00:53:43 UTC
Some more info that might help:
We only observe this error on a Dell poweredge 4600.  Two other P4 machines that
use the 3com 996BT NIC do not report this error at all.

This is kernel ID line from the machine that keeps reporting this error:
Dec  8 20:49:40 simba kernel: eth0: Tigon3 [partno(BCM95700A6) rev 7104
PHY(5401)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:88:dc:7f

This is the kernel ID line from one of the machines that do not crash:
Dec  8 18:06:44 monalisa kernel: eth0: Tigon3 [partno(3C996B-T) rev 0105
PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:04:76:f1:0a:86



Comment 3 Rick Gaudette 2002-12-10 16:55:02 UTC
I spoke too soon.  All three machines have now crashed with the 2.4.18-18.7.x
kernel.

Could this be related to 69920?


Comment 4 Jeff Garzik 2002-12-10 17:00:08 UTC
Yes, the kernel 2.4.18-18.7.x tg3 crash is the one solved by driver version 1.2,
mentioned in bug 69920 :)


Comment 5 Jeff Garzik 2002-12-31 22:38:10 UTC
Should be fixed in experiment #1 rpms, below.



[snip comment from other bug report]
To all still experiencing problems,

1) please boot with "noapic" on the kernel command line.  You can run "cat
/proc/cmdline" to check for sure.

2) I have posted some new rpms for testing, based on the latest errata:

latest production tg3 release, 1.2a, built into unofficial rpms:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/rpms/

but I would like people to test my experiment which should provide additional
stability:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/exp1-rpms/

...and if that doesn't work for people, fall back to experiment 2:
http://people.redhat.com/jgarzik/tg3/tg3-1.2a/exp2-rpms/

Feedback requested!  On several systems, there is evidence that the lock-ups are
not directly related to driver but more to system board.  So please make sure to
attach 'dmesg' and 'lspci -vvv' output in future bug reports.


Comment 6 Rick Gaudette 2002-12-31 23:15:19 UTC
Hi Jeff,

Are these additional changes to the 2.4.18-19.7.x kernel?  We just put that on
one of our P4 with a 996BT NIC and we'll see how that runs for a few days.

Comment 7 Jeff Garzik 2003-01-20 21:03:27 UTC
This is fixed in the current 8.1 beta kernels, and also the
unofficial-errata-kernel version "aragorn2" that I have posted at
http://people.redhat.com/jgarzik/pub/  The "aragorn2" kernels are based off of
Red Hat's latest official errata kernel, 2.4.18-19.8.0, with the addition of an
updated e1000 and bugfixed tg3 driver.