Bug 666646

Summary: iwlagn Hard-Lock
Product: [Fedora] Fedora Reporter: James Cape <jamescape777>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: low    
Version: 14CC: alex, gansalmon, itamar, jonathan, kari.hautio, kernel-maint, madhu.chinakonda, mathieu-acct, sgruszka
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.35.14-95.fc14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 714547 (view as bug list) Environment:
Last Closed: 2011-05-19 05:10:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 714547    

Description James Cape 2011-01-01 20:48:16 UTC
Description of problem:

Roughly every 5-20 minutes or so (give or take) my Dell Adamo 13 (black) will hard lock.

The last messages in the syslog before the reboot (consistently) are:

Jan  1 14:03:33 emma kernel: [  350.187460] iwlagn 0000:04:00.0: BA scd_flow 0 does not match txq_id 10
Jan  1 14:03:35 emma kernel: [  352.917520] iwlagn 0000:04:00.0: low ack count detected, restart firmware
Jan  1 14:03:35 emma kernel: [  352.917531] iwlagn 0000:04:00.0: On demand firmware reload
Jan  1 14:03:35 emma kernel: [  352.972646] iwlagn 0000:04:00.0: Stopping AGG while state not ON or starting
Jan  1 14:03:35 emma kernel: [  352.972656] iwlagn 0000:04:00.0: queue number out of range: 0, must be 10 to 19
Jan  1 14:04:42 emma kernel: imklog 4.6.3, log source = /proc/kmsg started.
[boot continues as normal]

Version-Release number of selected component (if applicable):

2.6.35.10-74.fc14.x86_64

How reproducible:

Inconsistent (I believe without better evidence that it's a faulty response to some kind of palpable change in the wifi environment), but when it happens it always has the same/similar log messages (or sometimes the "AGG" line is cut-off halfway)

I'm willing to run a kdump kernel until I have a legit trace for this (having to randomly reboot my primary machine every 5 minutes makes this a priority for me), but will need instructions.


Steps to Reproduce:
1. Boot System
2. Wait.
3. Randomly loose the code you were working on for the last couple minutes.


Actual results:

Crashy fun time.


Expected results:

Boring work.


Additional info:

I'm a Leo.

Wait, did you mean additional info about the problem? In that case, if audio is playing it replays the last couple seconds of buffer on a loop until I button-for-7s the machine---audio is not always playing, however.

Comment 1 Stanislaw Gruszka 2011-01-03 09:43:08 UTC
There are two patches that may help:
https://bugzilla.kernel.org/attachment.cgi?id=38502
http://marc.info/?l=linux-wireless&m=129310430012942&w=2

I will prepare test kernel with them ...

Comment 2 Stanislaw Gruszka 2011-01-03 15:33:57 UTC
Please test both these kernels and share your impression:
http://koji.fedoraproject.org/koji/taskinfo?taskID=2697824
http://koji.fedoraproject.org/koji/taskinfo?taskID=2698228

Comment 3 Kari Hautio 2011-01-05 11:27:44 UTC
I'm affected by the same problem, installing kernel from http://koji.fedoraproject.org/koji/taskinfo?taskID=2698228 now.

Comment 4 Kari Hautio 2011-01-05 11:42:39 UTC
This kernel fixes the problem for me (didn't test the other build).

[khautio@kha ~]$ uname -a
Linux kha 2.6.35.10-75.irq.fc14.i686 #1 SMP Mon Jan 3 14:49:06 UTC 2011 i686 i686 i386 GNU/Linux

Comment 5 James Cape 2011-01-05 12:17:53 UTC
The irq build didn't fix my issue, I'm trying the low_ack kernel now.

Jan  4 16:01:07 emma kernel: [    0.000000] Linux version 2.6.35.10-75.irq.fc14.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Mon Jan 3 14:34:56 UTC 2011
[...]
Jan  4 17:37:37 emma kernel: [ 5810.255040] iwlagn 0000:04:00.0: iwlagn_tx_agg_start on ra = 00:25:9c:d2:4d:a0 tid = 0
Jan  4 17:37:44 emma kernel: [ 5816.994056] iwlagn 0000:04:00.0: BA scd_flow 0 does not match txq_id 10
Jan  4 17:37:45 emma kernel: [ 5818.095818] iwlagn 0000:04:00.0: BA scd_flow 0 does not match txq_id 10
Jan  4 17:37:46 emma kernel: [ 5819.064689] iwlagn 0000:04:00.0: low ack count detected, restart firmware
Jan  4 17:37:46 emma kernel: [ 5819.064701] iwlagn 0000:04:00.0: On demand firmware reload
Jan  4 17:37:46 emma kernel: [ 5819.119974] iwlagn 0000:04:00.0: Stopping AGG while state not ON or starting
Jan  4 17:37:46 emma kernel: [ 5819.119986] iwlagn 0000:04:00.0: queue number out of range: 0, must be 10 to 19
Jan  4 17:38:01 emma kernel: [ 5834.658906] iwlagn 0000:04:00.0: iwlagn_tx_agg_start on ra = 00:25:9c:d2:4d:a0 tid = 0
Jan  4 17:38:04 emma kernel: [ 5837.085846] iwlagn 0000:04:00.0: low ack count detected, restart firmware
Jan  4 17:38:04 emma kernel: [ 5837.085853] iwlagn 0000:04:00.0: On demand firmware reload
Jan  4 17:38:04 emma kernel: [ 5837.136528] iwlagn 0000:04:00.0: Stopping AGG while state not ON or starting
Jan  4 17:38:04 emma kernel: [ 5837.136535] iwlagn 0000:04:00.0: queue number out of range: 0, must be 10 to 19
Jan  4 17:42:51 emma kernel: imklog 4.6.3, log source = /proc/kmsg started.

Comment 6 Stanislaw Gruszka 2011-01-07 09:50:00 UTC
There is other similar bug 667459 report, that point the hard lock problem is on mac80211 layer. Please test this kernel and report back:
http://koji.fedoraproject.org/koji/taskinfo?taskID=2704610

Comment 7 Stanislaw Gruszka 2011-01-12 12:47:58 UTC
James, any news on comment 6 (also you can try official build with patch
http://koji.fedoraproject.org/koji/buildinfo?buildID=213595 if you wish).

Comment 8 James Cape 2011-01-12 13:16:29 UTC
I haven't tested the comment 6 build yet---it will be another week before I can, unfortunately---but I did see the same problem with the low_ack kernel.

Comment 9 Stanislaw Gruszka 2011-01-12 13:48:53 UTC
Please save packages you need from comment 6 as koji can remove these files automaticly. With low_ack kernel at least :low ack count detected, restart firmware" should gone.

Comment 10 James Cape 2011-01-12 13:55:43 UTC
Got it, thanks.

Comment 11 Kari Hautio 2011-01-12 14:01:32 UTC
I'm going to test comment 7 build now

Comment 12 Stanislaw Gruszka 2011-02-11 13:52:49 UTC
Kari and/or James could you test driver from upstream with some my patches:
https://bugzilla.redhat.com/show_bug.cgi?id=648732#c21

Comment 13 Stanislaw Gruszka 2011-05-09 19:28:53 UTC
Posted to fedora and stable.

http://lists.fedoraproject.org/pipermail/kernel/2011-May/003091.html

Comment 14 Fedora Update System 2011-05-16 04:46:11 UTC
kernel-2.6.38.6-27.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.38.6-27.fc15

Comment 15 Fedora Update System 2011-05-17 05:36:54 UTC
Package kernel-2.6.38.6-27.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.38.6-27.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-2.6.38.6-27.fc15
then log in and leave karma (feedback).

Comment 16 Fedora Update System 2011-05-19 05:10:45 UTC
kernel-2.6.38.6-27.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2011-08-17 17:38:29 UTC
kernel-2.6.35.14-95.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.14-95.fc14

Comment 18 Fedora Update System 2011-08-23 04:36:47 UTC
kernel-2.6.35.14-95.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.