Bug 408891
Summary: | [PATCH] tg3: system re-ordering mem-mapped io causes eth link to go down | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Steven Samorodin <samorodin> | ||||||||
Component: | kernel | Assignee: | Michael Chan <mchan> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 8 | CC: | benlu, chris.brown, mcarlson | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2009-01-09 07:29:55 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Steven Samorodin
2007-12-03 17:21:09 UTC
Created attachment 275921 [details]
tarball of various commands that may provide insight
Created attachment 276391 [details]
debug patch
I have received similar reports and my suspicion is that the shinfo(skb)->nr_frags get changed between ->hard_start_xmit() and tx completion when the driver is freeing the skb. The driver relies on the nr_frags to find the packet boundaries in the tx ring. If this is easily reproducible, please try the attached debug patch in comment #2. It should prevent the problem and will print a warning whenever it detects that the nr_frags field is corrupted. Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. Have you been able to test the above patch? I haven't built a kernel since before 1.0. I'm going to figure out how to do that and apply and test the suggested patch. I don't have tg3.c on my box which probably means I don't have a source RPM for the kernel installed. I'll update this bug when I have more. I followed the instructions on this page: http://fedoraproject.org/wiki/Docs/CustomKernel However after following the instruction for supposedly installing the kernel on the running system, e.g. rpm -ivh ~/rpmbuild/RPMS/i686/kernel-2.6.24.7-92.fc8.i686.rpm I'm not sure if I'm running my kernel. $ uname -a Linux prophecy 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:50:09 EDT 2008 i686 i686 i386 GNU/Linux Given the date in uname it seems like this is not my kernel. I'll tried rebooting and didn't see my kernel in the list on the boot loader. What am I missing? To answer my own question those instructions do work, but you have to reboot of course. I've been running with the patch for a day now and so far so good. I'll update this bug in a week or so if I don't experience any further crashes. The debug patch will print some debug information when it sees that the skb frags are corrupted. Please provide us the dmesg as well. Please check for this message in the dmesg log: "skb frags corrupted: orig: %d now: %d\n" Created attachment 307231 [details]
dmesg
I grepped for skb frags but didn't see it in the output of dmesg. I realize
now that I should have updated the date or version of tg3 driver just to ensure
that I'm really running your patch.
I'd also like to amend the Actual Results portion of the original bug. I've had this machine freeze up a few times which is what got me digging around and made me remember this bug. I haven't been able to attribute the freezes to anything other than this problem. The freeze appears to be a total software freeze of the box. It is not ping'able, mouse/keyboard are entirely frozen out, I can't switch to another virtual console, and I can't ctrl-alt-delete to reboot. I pretty much have to hold down the power button and reboot. I'm not 100% sure that the tg3 issue is the cause of those freezes, but right now it is my leading suspect. If you used to see this in your dmesg before the patch: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles to the network device, attempting to recover. And then eventually crashing, then the patch is used to confirm that the message above and the eventual crash was caused by SKB corruption. With the patch, you'll see "skb frags corrupted" instead of the "re-ordering" message and the crash. I've been running the patch since 5/29/08 and I've yet to see the skb frags corrupted in /var/log/messages. I still have intermittent hangs of my system with nothing in logs. I'm starting to think that maybe despite this being a new machine I have some bad RAM. So I'm not sure what to tell you concerning this patch. It may well have made things better for me, but this is still the least stable linux box I've ever had. I think the issue causing the original symptoms in this BZ has been found. See this thread discussing the same issue on the BNX2 driver. http://marc.info/?t=121362387400001&r=1&w=2 In a nutshell, the TSO code can change an skb while it is still queued in the driver, causing the BNX2 driver to crash and the TG3 driver to first print out the "re-ordering" message and then crash. The fix is going to be in the netstack. This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |