Red Hat Bugzilla – Bug 502833
Intel 4965 card stops working under high load
Last modified: 2010-01-07 05:22:25 EST
Created attachment 345602 [details]
detailed messages from before and after the microcode crash
Description of problem:
I have two identical Lenovo X61 laptops running up-to-date Rawhide (which will be F11 in five days). On one of them, the Intel 4965 wireless card stops working when a large amount of data (between 200MB and 1GB) is transmitted over the wireless network. I've enabled debugging on the iwlagn card (debug=0x43ff) and am attaching the logs. It seems as if it might be temperature related, though the working laptop has no problems with reported temperatures of up to 65C.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Try to copy a large file over the network
After 200MB to 1GB, data stops being transferred and a "Microcode SW error" appears in the logs
Both laptops have 4965AG-MM1 cards and I've really stress-tested the working laptop, but cannot get its wireless to stop working. It almost seems as if it's hardware, but both laptops are brand new and were running fine under Windows Vista (though a bit sluggishly).
This bug has been recently fixed upstream, please backport the bugfix:
I'm not convinced it's the same error; the error messages in the log are slightly different from yours. I've opened an upstream bug:
On a side note, the working laptop's wireless finally died with the same error message. Manually shutting down the wireless using the hardware switch and turning it on again fixed the problem.
It would seem to me that there's some kind of hardware weakness that is somehow exposed by the Linux driver and not the Windows driver. A minor difference in the cards' thermostats is perhaps the reason that one rarely thinks it is overheating, and when it does, recovers quickly, while the other often thinks it is overheating and only recovers for a minute or two before dying again.
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.
More information and reason for this action is here:
Ok, that's weird. I turned on power management for the non-working laptop, and, even though the wifi card temperature didn't go above 54C, the wireless *still* died after ten minutes.
(In reply to comment #2)
> I'm not convinced it's the same error; the error messages in the log are
> slightly different from yours. I've opened an upstream bug:
So far no response from reporter to a request (on 6/16) for information in this bug report.
Sorry for the delay. I've replied upstream. My wife has been using a USB wifi stick and I lost track of the bug report.
This bug is a hardware fault, so upstream has closed as INVALID.
Commit e4da4beafa550c289fa91e06c0909c003351f148 in kernel-PAE-2.6.31-0.62.rc2.git4.fc12.i686.rpm has automatic recovery from hardware failure (see http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=2013#c9). Any chance that this commit could get pushed to 2.6.29 or 2.6.30 for F11?
I'm sorry, I have no time to backport this. Soon fedora F11 will switch to 2.6.32. As F12 is using 2.6.32 now, I'm closing this bug with NEXT RELEASE resolution.
No problem, I've already upgraded to F12 and replace my wife's wifi adapter.