Bug 835213 - kernel panic in ath9k driver [Acer Aspire One D722]
Summary: kernel panic in ath9k driver [Acer Aspire One D722]
Keywords:
Status: CLOSED DUPLICATE of bug 832927
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-25 19:31 UTC by Pascal Dupuis
Modified: 2012-06-29 14:40 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-06-29 14:40:20 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
picture taken with mobile phone, quality so-so (93.69 KB, image/jpeg)
2012-06-25 19:31 UTC, Pascal Dupuis
no flags Details
another picutre (109.20 KB, image/jpeg)
2012-06-26 21:08 UTC, Pascal Dupuis
no flags Details
same occurence (73.79 KB, image/jpeg)
2012-06-26 21:10 UTC, Pascal Dupuis
no flags Details
page fault in ath_rx_tasklet (89.95 KB, image/jpeg)
2012-06-27 21:23 UTC, Pascal Dupuis
no flags Details

Description Pascal Dupuis 2012-06-25 19:31:39 UTC
Created attachment 594262 [details]
picture taken with mobile phone, quality so-so

Description of problem: system freeze and switch back to console. Alt-PrtScr keys not working anymore. Power cycling is the only solution.

Version-Release number of selected component (if applicable):
Linux tatooine.example.org 3.4.3-1.fc17.x86_64 #1 SMP Mon Jun 18 19:53:17 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Network connectivity through the WIFI only, no ethernet cable present

How reproducible: once every few days, but highly problematic


Steps to Reproduce:
1. Have a moderate network activity: open a web site, launch a chat client, do "yum update", ...
2.
3.
  
Actual results: 
Total freeze


Expected results:
Normal working


Additional info:
Googled a bit on the topic; set the BIOS boot order to "network first", created /etc/modprobe.d/ath9k.conf with content
# see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/951709

options ath9k nohwcrypt=1

But this does not solve the problem. The wifi appears in LSPCI as
07:00.0 Network controller: Atheros Communications Inc. AR9485 Wireless Network Adapter (rev 01)
	Subsystem: Lite-On Communications Inc Device 6617
	Flags: bus master, fast devsel, latency 0, IRQ 19
	Memory at f0100000 (64-bit, non-prefetchable) [size=512K]
	Expansion ROM at f0500000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 2
	Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
	Kernel driver in use: ath9k

A picture of the screen during a hang is added. The interesting process is ath9k_ioread32 (the machine use a x86_64 kernel)

Comment 1 Pascal Dupuis 2012-06-26 21:08:37 UTC
Created attachment 594598 [details]
another picutre

Comment 2 Pascal Dupuis 2012-06-26 21:10:41 UTC
Created attachment 594599 [details]
same occurence

Comment 3 Pascal Dupuis 2012-06-26 21:12:25 UTC
The message I've seen today is:

kernel bug at drivers/net/wireless/ath/ath9k/recv.c: 671
invalid opcode: 000 [#1] SMP

Comment 4 Pascal Dupuis 2012-06-27 21:23:20 UTC
Created attachment 594865 [details]
page fault in ath_rx_tasklet

Comment 5 Pascal Dupuis 2012-06-27 21:25:35 UTC
The hang of the day: the system worked flawlessly for around two hours, then I got a page fault. The kernel trace can be seen in attachment https://bugzilla.redhat.com/attachment.cgi?id=594865

Interesting point:

ath_rx_tasklet +0x165/0x1b00
followed by page_fault

Comment 6 Pascal Dupuis 2012-06-28 07:38:55 UTC
A new though: I'm using the laptop in a residential area in France. Doing "iwlist scan" reveals there are between 45 and 65 cells. Most of them comes from "boxes", i.e. Internet access point through telephone cable or optical fibers; yet the link with the user computer/laptop/smartphone is through Wifi. TV channels are also availables through those boxes; you can imagine the bandwith.

Is there some issue with the number of beacons or link quality which is not handled properly ?

Comment 8 John W. Linville 2012-06-28 15:53:23 UTC
Pascal, none of the picture you are posting are useful.  Please pan out enough to actually see the entire screen.

Alex, on what basis do you believe that fix to apply to this problem?

Comment 9 John W. Linville 2012-06-28 17:25:56 UTC
In any case...test kernels w/ the above mentioned patch are building here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4206016

When they finish building, please give them a try and post the results here...thanks!

Comment 10 Alex Andilevko 2012-06-28 19:02:41 UTC
 uname -r
3.4.4-3.bz832927.1.fc17.x86_64

Testing the assembly. Copy a large file via scp.

Comment 11 Alex Andilevko 2012-06-28 19:15:00 UTC
So far so good. Do not panic. screenshot: http://storage6.static.itmages.ru/i/12/0628/h_1340910789_9343523_b77e49aa36.png

Comment 12 Alex Andilevko 2012-06-28 20:21:29 UTC
It works fine! More kernel does not panic.

Comment 13 Pascal Dupuis 2012-06-28 21:45:44 UTC
Installed the new kernel from koji. Removed all the work around, rebooted and ... not a single problem since two hours. Yet I played music from youtube, stress-tested the machine by processing a 10 Gig compressed archive, and so on.

Not a single trouble. Congrat for killing this bug.

Comment 14 Pascal Dupuis 2012-06-29 07:42:50 UTC
Wait a minut. The behaviour I observed with the previous kernel was a leakage. If you look a bit around recv.c, line 685, in the lastest kernel:
 685       if (ret == -EINVAL) {
 686                /* corrupt descriptor, skip this one and the following one */
 687                list_add_tail(&bf->list, &sc->rx.rxbuf);
 688                ath_rx_edma_buf_link(sc, qtype);
 689
 690                skb = skb_peek(&rx_edma->rx_fifo);
 691                if (skb) {
 692                        bf = SKB_CB_ATHBUF(skb);
 693                        BUG_ON(!bf);
 694
 695                        __skb_unlink(skb, &rx_edma->rx_fifo);
 696                        list_add_tail(&bf->list, &sc->rx.rxbuf);
 697                        ath_rx_edma_buf_link(sc, qtype);
 698                } else {
 699                        bf = NULL;
 700                }
 701        }

The idea is to remove the "else {" and the next "}". According to the code,two descriptors are skipped. Is there dynamic memory allocated through those decriptors? Is this memory freed before setting bf to NULL ?

Regards

Pascal

Comment 15 John W. Linville 2012-06-29 14:40:20 UTC

*** This bug has been marked as a duplicate of bug 832927 ***


Note You need to log in before you can comment on or make changes to this bug.