Red Hat Bugzilla – Bug 408891
[PATCH] tg3: system re-ordering mem-mapped io causes eth link to go down
Last modified: 2009-01-09 02:29:55 EST
Description of problem:
I got this message in /var/log/messages:
Nov 28 13:35:22 prophecy kernel: tg3: eth0: The system may be
re-ordering memory-mapped I/O cycles to the network device, attempting
to recover. Please report the problem to the driver maintainer and
include system chipset information.
Nov 28 13:35:22 prophecy kernel: tg3: eth0: Link is down.
Nov 28 13:35:24 prophecy kernel: tg3: eth0: Link is up at 100 Mbps, full
Nov 28 13:35:24 prophecy kernel: tg3: eth0: Flow control is on for TX and
on for RX.
After this I lost my network connection, but it got better immediately. I know
I lost it because my VPN connection dropped.
Version-Release number of selected component (if applicable):
I'm running Fedora 8 on a brand new Dell dual core box.
$ uname -a
Linux prophecy 126.96.36.199-49.fc8 #1 SMP Thu Nov 8 21:41:26 EST 2007 i686 i686 i386
With this network card (built in to the motherboard) and no others
installed in the system:
eth0: Tigon3 [partno(BCM95754) rev b002 PHY(5787)] (PCI Express)
10/100/1000Base-T Ethernet 00:1a:a0:cc:6b:cd
I've seen it twice in a couple of weeks of running Fedora 8.
Steps to Reproduce:
Sorry, I'm not sure what the root cause is. I did nothing more than install
fedora 8 on a new machine and use it to write some code. I don't think the
particular network services I was using have any bearing on this bug.
See the messages included above. Ethernet link lost for a moment.
Expect network connectivity to not go down intermittently.
See attachment. Reported this to the tg3 maintainer (Michael Chan) and he
recommended that I file a bug here.
Created attachment 275921 [details]
tarball of various commands that may provide insight
Created attachment 276391 [details]
I have received similar reports and my suspicion is that the
shinfo(skb)->nr_frags get changed between ->hard_start_xmit() and tx completion
when the driver is freeing the skb. The driver relies on the nr_frags to find
the packet boundaries in the tx ring.
If this is easily reproducible, please try the attached debug patch in comment
#2. It should prevent the problem and will print a warning whenever it detects
that the nr_frags field is corrupted.
I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.
I am CC'ing myself to this bug and will try and assist you in resolving it if I can.
Have you been able to test the above patch?
I haven't built a kernel since before 1.0. I'm going to figure out how to do
that and apply and test the suggested patch. I don't have tg3.c on my box which
probably means I don't have a source RPM for the kernel installed. I'll update
this bug when I have more.
I followed the instructions on this page:
However after following the instruction for supposedly installing the kernel on
the running system, e.g. rpm -ivh
I'm not sure if I'm running my kernel.
$ uname -a
Linux prophecy 188.8.131.52-92.fc8 #1 SMP Wed May 7 16:50:09 EDT 2008 i686 i686 i386
Given the date in uname it seems like this is not my kernel. I'll tried
rebooting and didn't see my kernel in the list on the boot loader. What am I
To answer my own question those instructions do work, but you have to reboot of
course. I've been running with the patch for a day now and so far so good.
I'll update this bug in a week or so if I don't experience any further crashes.
The debug patch will print some debug information when it sees that the skb
frags are corrupted. Please provide us the dmesg as well.
Please check for this message in the dmesg log:
"skb frags corrupted: orig: %d now: %d\n"
Created attachment 307231 [details]
I grepped for skb frags but didn't see it in the output of dmesg. I realize
now that I should have updated the date or version of tg3 driver just to ensure
that I'm really running your patch.
I'd also like to amend the Actual Results portion of the original bug. I've had
this machine freeze up a few times which is what got me digging around and made
me remember this bug. I haven't been able to attribute the freezes to anything
other than this problem. The freeze appears to be a total software freeze of
the box. It is not ping'able, mouse/keyboard are entirely frozen out, I can't
switch to another virtual console, and I can't ctrl-alt-delete to reboot. I
pretty much have to hold down the power button and reboot.
I'm not 100% sure that the tg3 issue is the cause of those freezes, but right
now it is my leading suspect.
If you used to see this in your dmesg before the patch:
tg3: eth0: The system may be re-ordering memory-mapped I/O cycles to the
network device, attempting to recover.
And then eventually crashing, then the patch is used to confirm that the
message above and the eventual crash was caused by SKB corruption.
With the patch, you'll see "skb frags corrupted" instead of the "re-ordering"
message and the crash.
I've been running the patch since 5/29/08 and I've yet to see the skb frags
corrupted in /var/log/messages. I still have intermittent hangs of my system
with nothing in logs. I'm starting to think that maybe despite this being a new
machine I have some bad RAM. So I'm not sure what to tell you concerning this
patch. It may well have made things better for me, but this is still the least
stable linux box I've ever had.
I think the issue causing the original symptoms in this BZ has been found.
See this thread discussing the same issue on the BNX2 driver.
In a nutshell, the TSO code can change an skb while it is still queued in the
driver, causing the BNX2 driver to crash and the TG3 driver to first print out
the "re-ordering" message and then crash. The fix is going to be in the
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '8'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 8's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 8 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.