Bug 434655 - System hangs with latest mac80211/iwlwifi (Intel 4965AGN card)
System hangs with latest mac80211/iwlwifi (Intel 4965AGN card)
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
9
i686 Linux
low Severity high
: ---
: ---
Assigned To: John W. Linville
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-23 20:15 EST by Kelly Stephens
Modified: 2008-07-02 16:39 EDT (History)
6 users (show)

See Also:
Fixed In Version: kernel-2.6.25.9-76.fc9
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-02 16:39:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Tarfile containing /var/log/messages and dmesg text (171.41 KB, application/x-compressed-tar)
2008-03-13 15:57 EDT, Kelly Stephens
no flags Details
/var/log/messages for system hang with kernel 2.6.24.3-38 (35.12 KB, application/x-gzip)
2008-03-17 19:16 EDT, Kelly Stephens
no flags Details
/var/log/messages for F8 and F9 (24.74 KB, application/x-compressed-tar)
2008-04-09 10:00 EDT, Kelly Stephens
no flags Details

  None (edit)
Description Kelly Stephens 2008-02-23 20:15:32 EST
Description of problem:

I have a Dell D630 with the Intel 4965AGN card.  I was disappointed in my
original distribution (F8).  The behavior was unreliable and I couldn't get far
from my access point.  I downloaded from intellinuxwireless.org the latest
mac80211 and iwlwifi packages and rebuilt the kernel.  Running under the patched
kernel the system would work much better for a short period of time then hang. 
Complete system lock up with blinking caps- and scroll-locks.  The only recourse
was to power down and cold-boot.

I then thought maybe I was messing up the kernel compile so I installed rawhide.
 Same thing, system will hang after some useable time.  Since the computer hands
I have no trace information.

That night I had a Eureka moment.  I have Siemens 2.4 GHz handsets and a base
station in the house.  Maybe they were interfering.  Sure, enough after powering
down the phones, thing looked better.  Rawhide would still loose the connection
from time to time but would recover.  dmesg produced the following info:

wlan0: RX deauthentication from 00:c0:02:38:31:68 (reason=4)
wlan0: deauthenticated
wlan0: authenticate with AP 00:c0:02:38:31:68
wlan0: RX authentication from 00:c0:02:38:31:68 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:c0:02:38:31:68
wlan0: RX ReassocResp from 00:c0:02:38:31:68 (capab=0x411 status=0 aid=1)
wlan0: associated
wlan0: RX deauthentication from 00:c0:02:38:31:68 (reason=4)
wlan0: deauthenticated
wlan0: authenticate with AP 00:c0:02:38:31:68
wlan0: RX authentication from 00:c0:02:38:31:68 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:c0:02:38:31:68
wlan0: RX ReassocResp from 00:c0:02:38:31:68 (capab=0x411 status=0 aid=1)
wlan0: associated
wlan0: RX deauthentication from 00:c0:02:38:31:68 (reason=4)
wlan0: deauthenticated
wlan0: authenticate with AP 00:c0:02:38:31:68
wlan0: RX authentication from 00:c0:02:38:31:68 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:c0:02:38:31:68
wlan0: RX ReassocResp from 00:c0:02:38:31:68 (capab=0x411 status=0 aid=1)
wlan0: associated
wlan0: RX deauthentication from 00:c0:02:38:31:68 (reason=2)
wlan0: deauthenticated
wlan0: authenticate with AP 00:c0:02:38:31:68
wlan0: RX authentication from 00:c0:02:38:31:68 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:c0:02:38:31:68
wlan0: RX ReassocResp from 00:c0:02:38:31:68 (capab=0x411 status=0 aid=1)
wlan0: associated

Rebooting F8 with latest stock kernel and with the phones down the network
seemed stable, but I don't know that the draft-n is working.

With the phones up in F8, I get a different error but the network does not
recover in this case:

Feb 22 17:35:18 dwarf ntpd[2621]: synchronized to 128.255.70.89, stratum 2
Feb 22 17:35:18 dwarf ntpd[2621]: time reset +0.362900 s
Feb 22 17:35:18 dwarf ntpd[2621]: kernel time sync status change 0001
Feb 22 17:40:45 dwarf ntpd[2621]: synchronized to 208.113.193.10, stratum 2
Feb 22 17:44:22 dwarf kernel: iwl4965: Microcode SW error detected.  Restarting
0x2000000.

I don't think ntpd has anything to do with it as there is enough time between
the log times, but it seems the microcode message always appears shortly
afterwards.  This is probably because ntpd syncs once the link is up and then
the link dies shortly thereafter.

It is with mixed emotions that I report M$ WinXP works fine.  On the one hand
there is hope for equal performance, on the other I don't like Linux to have to
play catch-up.

Version-Release number of selected component (if applicable):
F8: fully updated stock kernel 2.6.23.15-137.fc8
Rawhide: 2.6.25-0.40.rc1.git2.fc9


How reproducible:
Regularly at intermittent intervals.


Steps to Reproduce:
1.  Connect to AP
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Nicolas Chauvet (kwizart) 2008-02-23 20:50:05 EST
Re-assigned to the right component

But you talk about Rawhide kernel and FC-8? Did you tryed the kernel in
updates-testing ? the kernel-2.6.24.2-7.fc8 has lot of patches related to the
wifi. It might be more "stable" than Rawhide kernel... 

Comment 2 John W. Linville 2008-02-25 10:02:53 EST
FWIW, the deauthentication messages from your rawhide kernel originate with 
your AP -- "reason=4" means "Disassociated due to inactivity", and 
the "reason=2" means "Previous authentication no longer valid".  These come 
from the AP, and our only option is to comply.  The logs also seem to 
indicated that subsequent authentication and association steps are successful, 
so I don't see a problem in the rawhide logs.

I concur with kwizart that you are likely to have much better results with 
kernel-2.6.24.2-7.fc8 than you had with 2.6.23.15-137.fc8.  Please give that a 
try and report the results here...thanks!
Comment 3 Kelly Stephens 2008-02-25 12:20:52 EST
I tried again last night, and generally performance was better across the board
for all kernel configurations, but I was testing in a different location.  I had
hoped to go back to the original location for further experimentation but I
broke something when I updated to the latest rawhide.

Last night I only experienced the deauth messages which were recovered. 
However, prior to these messages, the connection would freeze for some time. 
Pinging the AP would not work.  After a while I would then get the deauth and
re-auth messages.  All the while the network manager icon would report varying
signal strength so it appeared to be tracking the connection.

In fact, I don't know why lack of activity would be the cause.  The
deauth/re-auth cycles occurred several times during downloading the rawhide
updates so there was plenty of activity.  It appears that the download would
progress, then the linux side gets confused and cannot maintain connectivity
(pinging no longer works) and the connection remains hung until reset by the AP
deauth due to inactivity as the linux drivers are "locked up".

Last night, the released F8 *.23 kernel worked best.  I will continue testing.

Keep in mind for rawhide, one of the failure modes is a system crash from which
I have no logs.  Again, I will try to get rawhide back up for further testing.
Comment 4 Kelly Stephens 2008-02-28 16:57:09 EST
Going back to the original location (a nice comfy chair 3 feet from phone base
station and 40 feet from the AP) the F8 .24 kernel from updates-testing crashed
almost immediately after connection.

On reboot, system did not crash but connection jammed as described earlier.  NM
icon/iwconfig apparently still see the AP as they show changing signal strength
(25-60%).  iwconfig shows 0kB/s bitrate.  No network traffic appears to be
occurring.  Also from this position, the de-auth reset was not occurring as
before and network remained locked up.  The connection was initiated correctly
as the DHCP transaction occurred.

F8 .23 kernel appears to work from this location.
Comment 5 Kelly Stephens 2008-03-01 11:53:29 EST
Wrapping up my testing, the F8 .23 kernel does hang from time to time.  No
messages, but I can reset the connection with NM.  It is the most reliable but
with the worst performance (speed and range).

With 2.6.24 kernels and new system crashes sometimes occur.  They do appear to
have better performance but worse reliability.

Note, M$ WinXP has best of both worlds.  Better range, speeds to 144MB/s and no
reliability problems.  So there is hope.
Comment 6 John W. Linville 2008-03-13 11:21:45 EDT
Can you replicate this issue with current F-8 kernels?

   http://koji.fedoraproject.org/koji/buildinfo?buildID=42735
Comment 7 Kelly Stephens 2008-03-13 15:57:51 EDT
Created attachment 297980 [details]
Tarfile containing /var/log/messages and dmesg text

Error messages for the latest F8 kernel
Comment 8 Kelly Stephens 2008-03-13 16:00:29 EDT
The F8 2.6.24.3-12 kernel worked fine for the first couple days, but today it
locked up during web browsing and even after rebooting I cannot establish a
connection.  Many new error messages have been attached...
Comment 9 John W. Linville 2008-03-14 10:00:19 EDT
Ah, sorry about that.  Some people reporting similar problems have found the 
2.6.24.3-34 kernel to be quite a bit better:

   http://koji.fedoraproject.org/koji/buildinfo?buildID=42735

Give that a try instead?
Comment 10 Kelly Stephens 2008-03-17 19:16:30 EDT
Created attachment 298318 [details]
/var/log/messages for system hang with kernel 2.6.24.3-38

Using 2.6.24.3-38 I still get a system hang.  It appears that the system will
hang within after a few seconds the connection is established if at all.  If it
doesn't hang within a few seconds, then it appears that the system hang won't
occur.

Attached is a message log for three attempts.  The first two failed with system
hangs.
Comment 11 John W. Linville 2008-03-18 08:34:34 EDT
Dan, do you see any similarity between this and bug 437903?
Comment 12 Dan Williams 2008-03-18 14:22:33 EDT
kelly: is it a panic, ie the caps-lock light is flashing?  or does the system
just hang?

john: I don't see any apparent errors in the latest log from -38 Kelly posted, I
think notting's bug is different.  We'll need more information about what
process the driver is going through in notting's case, and here driver logs
might also be interesting too just to see what the driver is doing after connect.
Comment 13 Kelly Stephens 2008-03-18 15:24:06 EDT
The caps- and scroll-lock keys flash.  I only experience system hangs with
2.6.24 or later.  I've never had a system hang with a stock kernel 2.6.23 or
earlier.

When the link stays up, performance seems erratic.  I often have to wait on 
network communication.  I'll have several web pages waiting on their servers,
then all at once they'll all get a large chunk of data and complete.  This is a
separate issue and should probably be addressed elsewhere.  What is the proper
forum for this performance issue?
Comment 14 Mark Richards 2008-03-31 22:11:35 EDT
Just want to comment that I am also seeing a system hang with this driver.  iwl
4965 card, Kernel 2.6.24.3-50.fc8 x86_64.

For me the problem is:
Connect to a WPA/WPA2 802.11n network -> hang (sometimes a blinking capslock)

But if I connect first to a neighbour's unsecured 802.11b network, and THEN
connect to my WPA/WPA2 802.11n network, things usually work.  Every time I
suspend or resume I have to do the same thing.  It's workable for now but I hate
to think what I'd do if my neighbour shut off or secured his wifi :)

Unrelated to the hang: I am seeing the same performance issue as Kelly Stephens
mentioned, not to mention that I only connect to my N accesspoint at 60Mbps.
Comment 15 John W. Linville 2008-04-07 10:00:35 EDT
Well I hate to keep playing the "try the latest kernel game", but...

   http://koji.fedoraproject.org/koji/buildinfo?buildID=44648

Several have had success using iwl3945 with that kernel.  Does it help you?
Comment 16 Kelly Stephens 2008-04-09 10:00:33 EDT
Created attachment 301807 [details]
/var/log/messages for F8 and F9

I maintain a separate partition for both F8 and F9.  Both have halted in the
past day.  Attached are the logs.  F9 ran for 10 minutes before crashing, but
the suspicious iwl4965 message occurred just before the crash.	In F8, the
connection was established, my mail headers downloaded, and then the system
crashed.  Again just after the iwl4965 message.
Comment 17 John W. Linville 2008-04-09 11:11:31 EDT
Are you in charge of your wireless access point?  If so, can you disable 
802.11n on it?  If so, could you try that and see if that prevents the crash?

Thanks!
Comment 18 Kelly Stephens 2008-04-10 12:36:52 EDT
When I disable 802.11n I no longer see the crash but my range is reduced.  From
the location where I usually experience the crashes the network comes up
initially and is able to transfer some data, but a few seconds later the network
goes down.  Subsequent attempts to reconnect all fail.  Closer to the AP, the
network appears reliable.
Comment 19 John W. Linville 2008-04-17 15:02:10 EDT
There is a patch called "iwlwifi: fix n-band association problem" in the 
kernels here:

   http://koji.fedoraproject.org/koji/buildinfo?buildID=46436

Give those a try?
Comment 20 Kelly Stephens 2008-05-08 07:40:42 EDT
I haven't had a system hang for some time with these kernels for fc8 or fc9. 
But the connection has continued to be unreliable.  This now appears to be fixed
in fc8 as of 2.6.24.6-90.  Please port to fc9 as I am hooked on the updated
features of NetworkManager.

Thanks so much...
Comment 21 Kelly Stephens 2008-05-10 20:21:35 EDT
Spoke too soon.  Still no system hangs, but the connection remains unreliable in
fc8.
Comment 22 Bug Zapper 2008-05-14 01:36:03 EDT
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 23 Kelly Stephens 2008-05-15 19:57:56 EDT
I just experienced a system hang with the latest fc9 kernel: 2.6.25.3-18.
Comment 24 Stefan Neufeind 2008-05-16 10:35:09 EDT
I also have a D630 with 4965AGN, without problems on either F8 (only the latest
few kernels released) or F9. However I don't have a 802.11n-accesspoint
available, only using 802.11g.
Comment 25 John W. Linville 2008-05-20 15:08:46 EDT
Can you recreate this issue with the test kernels here?

   http://koji.fedoraproject.org/koji/buildinfo?buildID=49743
Comment 26 Kelly Stephens 2008-05-26 18:58:28 EDT
With 2.6.25.4-30 I am experiencing a different kind of system hang.  Rather than
an immediate hang with blinking caps and scroll lights, the system slowly dies.
 After logging in, everything works fine, but if I come back later and try to
start Mozilla not connection is made.  If I then try to start a terminal, it
will also fail.  Things deteriorate quickly with the window manager misbehaving
then the mouse eventually freezing.  The only option is to power down.
Comment 27 John W. Linville 2008-05-29 16:39:23 EDT
Hmmm...well that doesn't necessarily sound likea wireless problem.  Do you get 
different behavior if you do not use the network?
Comment 28 Kelly Stephens 2008-06-10 08:33:43 EDT
I've been keeping up with the kernel releases.  I'm now at 2.6.25.4-42.  It
appeared to be pretty stable until I went on vacation.  When I try to connect to
an AP with WEP security the notebook crashes again with the blinking lights.  If
I change the security on the AP to WPA2, I do not crash but successfully
negotiating the connection is iffy.
Comment 29 analyzer 2008-06-12 15:10:15 EDT
(In reply to comment #28)
> I've been keeping up with the kernel releases.  I'm now at 2.6.25.4-42.  It
> appeared to be pretty stable until I went on vacation.  When I try to connect 
to
> an AP with WEP security the notebook crashes again with the blinking lights.  
If
> I change the security on the AP to WPA2, I do not crash but successfully
> negotiating the connection is iffy.

I got a kernel version which is working pretty well with AGN4965, do u want to 
give it a try if I upload for you kernel + kmod-nvidia ? (2.6.25-
0.121.rc5.git4.fc8 + patch from intellinuxwireless)
Comment 30 analyzer 2008-06-12 17:58:38 EDT
(btw my computer hangs sometimes too when I'm trying to connect to a WEP 
network)

Another thing is that if I run "ifdown wlan0", i got no output and the command 
seems to hang indefinitely (until I press Ctrl-C), perhaps something I could 
try to get you some debug infos ?

NB: I tried the last kernel that John W. Linville gave me in another thread for 
4965AGN, it was even worse ... => (I never succeeded in making a WEP connection 
without hanging the whole system)
Comment 31 John W. Linville 2008-07-02 14:37:23 EDT
Is this still occuring with kernel-2.6.25.9-76.fc9?
Comment 32 Dan Williams 2008-07-02 16:05:07 EDT
worksforme with 4965 and -76.fc9; confirmed broken with -55.fc9 of course

Note You need to log in before you can comment on or make changes to this bug.