Bug 79103
Summary: | [tg3] eth0: Error, poll already scheduled | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Rick Gaudette <rickg> |
Component: | kernel | Assignee: | Jeff Garzik <jgarzik> |
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.2 | CC: | peterm |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-01-20 21:03:27 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rick Gaudette
2002-12-05 18:32:11 UTC
Some more info: After repeating this error message for 3 days (several thousand instances) the machine stoped serving NFS, its only purpose. The message that looks most unusual in the log is Dec 8 18:11:34 simba rpc.statd[586]: Can't callback simba.colorado.edu (100021,4), giving up. I attempted to remotely reboot (using shutdown) the machine which only seemed to cause the system load to rise to 8 and stay there. I could still log in and attempted to reboot using reboot, the machine did not reboot and needed to be powercycled. Hope this helps. Some more info that might help: We only observe this error on a Dell poweredge 4600. Two other P4 machines that use the 3com 996BT NIC do not report this error at all. This is kernel ID line from the machine that keeps reporting this error: Dec 8 20:49:40 simba kernel: eth0: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5401)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:88:dc:7f This is the kernel ID line from one of the machines that do not crash: Dec 8 18:06:44 monalisa kernel: eth0: Tigon3 [partno(3C996B-T) rev 0105 PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:04:76:f1:0a:86 I spoke too soon. All three machines have now crashed with the 2.4.18-18.7.x kernel. Could this be related to 69920? Yes, the kernel 2.4.18-18.7.x tg3 crash is the one solved by driver version 1.2, mentioned in bug 69920 :) Should be fixed in experiment #1 rpms, below. [snip comment from other bug report] To all still experiencing problems, 1) please boot with "noapic" on the kernel command line. You can run "cat /proc/cmdline" to check for sure. 2) I have posted some new rpms for testing, based on the latest errata: latest production tg3 release, 1.2a, built into unofficial rpms: http://people.redhat.com/jgarzik/tg3/tg3-1.2a/rpms/ but I would like people to test my experiment which should provide additional stability: http://people.redhat.com/jgarzik/tg3/tg3-1.2a/exp1-rpms/ ...and if that doesn't work for people, fall back to experiment 2: http://people.redhat.com/jgarzik/tg3/tg3-1.2a/exp2-rpms/ Feedback requested! On several systems, there is evidence that the lock-ups are not directly related to driver but more to system board. So please make sure to attach 'dmesg' and 'lspci -vvv' output in future bug reports. Hi Jeff, Are these additional changes to the 2.4.18-19.7.x kernel? We just put that on one of our P4 with a 996BT NIC and we'll see how that runs for a few days. This is fixed in the current 8.1 beta kernels, and also the unofficial-errata-kernel version "aragorn2" that I have posted at http://people.redhat.com/jgarzik/pub/ The "aragorn2" kernels are based off of Red Hat's latest official errata kernel, 2.4.18-19.8.0, with the addition of an updated e1000 and bugfixed tg3 driver. |