Bug 832927

Summary: Kernel bug in driver for Atheros Communications Inc. AR9485 Wireless Network Adapter
Product: [Fedora] Fedora Reporter: Herald van der Breggen <fedora>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: cdemills, gansalmon, gianluca.busiello, itamar, jayguerette, jeffrey.selk, jonathan, kernel-maint, maassd, madhu.chinakonda, shafi.wireless, tohiko.looka, williambader, wphampton
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-07 21:54:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot of console with crash report none

Description Herald van der Breggen 2012-06-18 08:29:24 UTC
Created attachment 592560 [details]
screenshot of console with crash report

Description of problem:
Kernel bug in driver for "Atheros Communications Inc. AR9485 Wireless Network Adapter (rev 01)". The wireless connection looks healthy/operational until the crash.

Version-Release number of selected component (if applicable):
3.4.0-1.fc17.x86_64

How reproducible:
Use wireless connection. A kernel crash can happen after 10 seconds but also after one hour. It is unpredictable.

Steps to Reproduce:
1. use wireless connection with the Atheros AR9485
2. 
3.
  
Actual results:
kernel crash. Black console screen appears.
I have seen about 10 of them, and all are a bit different. The most clear one was the one I attached, starting with:

Kernel bug at drivers/net/wireless/ath/ath9k/recv.c:671!
invalid opcode: 0000 [#1] SMP
CPU 0
...

Expected results:


Additional info:

Comment 1 John W. Linville 2012-06-18 19:06:21 UTC
Line number matches BUG_ON in ath_edma_get_buffers...

        bf = SKB_CB_ATHBUF(skb);
        BUG_ON(!bf);

Comment 2 Mohammed Shafi 2012-06-21 06:13:04 UTC
(In reply to comment #1)
> Line number matches BUG_ON in ath_edma_get_buffers...
> 
>         bf = SKB_CB_ATHBUF(skb);
>         BUG_ON(!bf);

thanks John, we shall try to recreate/do code analysis

Comment 3 Gianluca Busiello 2012-06-25 00:33:59 UTC
I have an Asus Zenbook UX21 and I have exactly the same problem. I'm running kernel 3.4.3-1 and I get a kernel panic after a few minutes the wireless is on.
Curiously, running the live version of Fedora17 with kernel 3.3.4-5 the kernel panic doesn't happen.

I've recomiled the kernel and activated all the debug options for ath9k driver. If I can help to diagnose, please let me know.
Thanks.

Comment 4 Dirk Maaß 2012-06-25 18:55:43 UTC
I experience the same behavior with my new Asus G55V and kernel 3.4.3-1.fc17.x86_64. Let me know how I can help tracking down the problem. 

I can reproduce it easily because the kernel panic happens after few seconds of traffic on the wireless connection.

Thanks

Comment 5 Jeffrey Selk 2012-06-27 03:19:31 UTC
I can confirm this behavior on Fedora 17 x64 3.4.3-1 running on an Asus UX31E.

crash output:

     KERNEL: /usr/lib/debug/lib/modules/3.4.3-1.fc17.x86_64/vmlinux
    DUMPFILE: /var/spool/abrt/vmcore-26.06.12-19:49:38/vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Tue Jun 26 19:49:35 2012
      UPTIME: 00:03:51
LOAD AVERAGE: 0.65, 0.32, 0.13
       TASKS: 276
    NODENAME: asuslaptop
     RELEASE: 3.4.3-1.fc17.x86_64
     VERSION: #1 SMP Mon Jun 18 19:53:17 UTC 2012
     MACHINE: x86_64  (1696 Mhz)
      MEMORY: 3.9 GB
       PANIC: "[  230.851508] kernel BUG at drivers/net/wireless/ath/ath9k/recv.c:671!"
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffff81c13020  (1 of 4)  [THREAD_INFO: ffffffff81c00000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

I can provide vmcore (44mb) upon request.

Comment 6 Jeffrey Selk 2012-06-27 03:23:26 UTC
Also, bug does not appear to be occurring on 3.3.4-5.fc17.x86_64. I can consistency reproduce this bug on any kernel >3.4.x by kicking off a large network traffic task (i.e. transferring media over LAN or WAN; usually occurs after 100mb or so).

Comment 7 Mohammed Shafi 2012-06-28 05:05:55 UTC
Hi John and all,

a recent fix in the rx path for panic
http://permalink.gmane.org/gmane.linux.kernel.wireless.general/93723

Comment 8 John W. Linville 2012-06-28 17:13:33 UTC
Test kernels w/ the above mentioned patch are building here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4206016

When they finish building, please give them a try and post the results here...thanks!

Comment 9 Herald van der Breggen 2012-06-28 18:53:15 UTC
I just rebooted the new kernel and copied a 500MB file around a few times to different locations and different protocols (ssh, nfs) over wifi. No problems found, so to me this looks like the sulotion. Many thanks!

Comment 10 Herald van der Breggen 2012-06-28 18:55:53 UTC
(I was talking about 3.4.4-3.bz832927.1.fc17.x86_64)

Comment 11 Dirk Maaß 2012-06-28 20:55:05 UTC
I can confirm that with 3.4.4-3.bz832927.1.fc17.x86_64 everything seems OK. I just copied two bigger files (1.5GB,2GB) and afterwards concurrently (in/out) using sftp. Thank you very much!!

Comment 12 John W. Linville 2012-06-29 14:40:20 UTC
*** Bug 835213 has been marked as a duplicate of this bug. ***

Comment 13 John W. Linville 2012-06-29 14:57:44 UTC
I have added that patch to the f17 builds.

Comment 14 Stanislaw Gruszka 2012-07-02 07:08:26 UTC
*** Bug 835785 has been marked as a duplicate of this bug. ***

Comment 15 Pascal Dupuis 2012-07-02 21:19:19 UTC
Damnit, got an occurence with kernel vmlinuz-3.4.4-3.fc17.x86_64

Call trace:
ath9k_ioread32+0x34/0x90
ath9k_tasklet+0xdx/0x160
__do_softirq
native_sched_clock
call_softirq
do_softirq
irq_exit
do_IRQ
common_interrupt
<EOI>
sysret_audit
ath_rx_tasklet/0x165/0x1b00 => unable to handle NULL pointer derefence

the mentionned routines have increasing time, the first being ath9k_ioread32, the latest being ath_rx_tasklet

It occured after around two hours of use, playing music on youtube and other web browsing.

I used kernel vmlinuz-3.4.4-3.bz832927.1.fc17.x86_64 for longer time without this issue.

Comment 16 John W. Linville 2012-07-02 21:29:44 UTC
3.4.4-3.fc17 does _not_ have the fix...

Comment 17 William Bader 2012-07-03 23:47:51 UTC
I get this panic on a Sony Vaio laptop with 3.4.4-3.fc17.x86_64.
lspci shows "02:00.0 Network controller: Atheros Communications Inc. AR9485 Wireless Network Adapter (rev 01)" and the text mode panic screen mentions ath_rx_tasklet.
It looks like the fix was made on June 28.
When will it be published where "yum update" can find it?
William

Comment 18 Wesley Hampton 2012-07-04 13:09:22 UTC
I also used kernel-3.4.4-3.bz832927.1.fc17.x86_64.rpm and have not had an ath9k kernel panic since.  The wireless card I have installed is a TP-Link TL-WDN4800 and the lspci output for me is: "03:00.0 Network controller: Atheros Communications Inc. AR9300 Wireless LAN adaptor (rev 01)" on an Intel DG45ID Motherboard.

Prior, I was on kernel 3.4.4-3.fc17.x86_64 and was crashing within seconds of firing up a browser, Skype or YUM Updates.

Thank you very much for patching this problem, it is MUCH appreciated!

Out of the same curiosity as William Bader above, when do special patches like this find their way into mainstream YUM Updates?

Thanks again!

Wes

Comment 19 Fedora Update System 2012-07-05 21:55:34 UTC
kernel-3.4.4-5.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.4.4-5.fc17

Comment 20 Fedora Update System 2012-07-05 23:49:57 UTC
kernel-3.4.4-4.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.4.4-4.fc16

Comment 21 Fedora Update System 2012-07-06 21:23:26 UTC
Package kernel-3.4.4-4.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.4.4-4.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-10319/kernel-3.4.4-4.fc16
then log in and leave karma (feedback).

Comment 22 Dirk Maaß 2012-07-06 21:37:43 UTC
kernel-3.4.4-5.fc17 works like a charm.
imho, this bug seems fixed.
many thanks!
dirk

Comment 23 Fedora Update System 2012-07-07 21:54:09 UTC
kernel-3.4.4-5.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2012-07-08 20:51:09 UTC
kernel-3.4.4-4.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 25 William Bader 2012-07-08 23:25:10 UTC
Thanks! I have been running 3.4.4-5.fc17.x86_64 for several hours without a crash.

Comment 26 Jay Guerette 2012-07-11 00:55:58 UTC
Still getting panics. I have to convince my wife not to simply reboot so I can capture a call trace. Should I open a new bug, or can this one be re-opened?

Comment 27 John W. Linville 2012-07-11 14:04:17 UTC
To avoid confusion, please open a new bug with the backtrace information.