Bug 536713 - Oops in ath9k
Oops in ath9k
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
12
All Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-10 20:52 EST by Matthew Galgoci
Modified: 2010-06-18 17:14 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-18 17:14:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ath9k oops (45.03 KB, text/plain)
2009-11-10 20:53 EST, Matthew Galgoci
no flags Details
oops while running 2.6.31.6 with ath9k debugging enabled (42.54 KB, text/plain)
2009-11-11 20:42 EST, Matthew Galgoci
no flags Details

  None (edit)
Description Matthew Galgoci 2009-11-10 20:52:22 EST
Description of problem:

Oops in ath9k driver under light load and subsequent loss of networking. A reboot must be done to make the system usable again.

This bug seems to be fixed in at least 2.6.32-rc6-git3

Version-Release number of selected component (if applicable):

2.6.31.5-127 PAE

How reproducible:

100% of the time

Steps to Reproduce:

1) rm -rf /var/cache/yum/*

2) yum -y update

Obtaining the compressed repo data is sufficient to trigger the oops.

Actual results:

yum -y update stalls and networking stops working.

Expected results:

I would expect the yum -y update to complete successfully.

Additional info:

Oops is attached.
Comment 1 Matthew Galgoci 2009-11-10 20:53:19 EST
Created attachment 368980 [details]
ath9k oops
Comment 2 Matthew Galgoci 2009-11-10 21:22:39 EST
I am currently building 2.6.31.6 with the F12 config and debugging enabled in the ath9k driver and will test that as soon as it is done.
Comment 3 Matthew Galgoci 2009-11-11 08:48:58 EST
This does not reproduce with 2.6.31.6 using the F12 kernel-PAE .config plus ath9k debugging enabled.
Comment 4 John W. Linville 2009-11-11 09:04:24 EST
I don't see any patches in 2.6.31.5..2.6.31.6 that would seem to relate -- no ath9k ones and only a couple of mac80211 that don't seem related to the oops path.

Maybe it's a Heizenbug?  Could you try your 2.6.31.6 build using the stock F12-PAE config (i.e. w/o turning-on ath9k debugging)?
Comment 5 Luis R. Rodriguez 2009-11-11 09:47:09 EST
John, although 2.6.31.6 did not have any new ath9k or mac80211 specific patch there was the PCI memory rounding fix that also fixed an issue with another netbook. So far that we know if this only affected some Aspires (if the sky2 issue was with an Aspire as well).

I cannot be too sure that that same patch fixed the issue but it would be easy to test by just reverting it.

Matthew, if you have spare time can you try reverting this patch to see if it did indeed fix the issue:

Author: Yinghai Lu <yinghai@kernel.org>

   pci: increase alignment to make more space for hidden code

So far the other issue refer to:

ath9k load issue:

http://bugzilla.kernel.org/show_bug.cgi?id=14402

iwlagn + sky combo issues:

http://bugzilla.kernel.org/show_bug.cgi?id=13940

These issues are loading related though so it would be surprising to see an issue not related to that. The patch in question would change the pci memory address start.

Only thing that I can think of between 2.6.31.5 and 2.6.31.6 but perhaps this wasn't it, only testing would tell.
Comment 6 Matthew Galgoci 2009-11-11 20:40:45 EST
I came home today after work and the laptop had oopsed while running 2.6.31.6 with debugging enabled in the ath9k driver. It looks like the same oops signature. I am attaching it.
Comment 7 Matthew Galgoci 2009-11-11 20:42:03 EST
Created attachment 369122 [details]
oops while running 2.6.31.6 with ath9k debugging enabled
Comment 8 Pierre 2009-11-20 22:11:36 EST
I have exactly the same problem, and disabling kernel modesetting seems to correct it. To reproduce it, I do the same as you, booting linux on kernel 2.6.31.6-134.fc12.i686, launching a web application (like yum or firefox) and right after that, it stops working, and I can't shutdown properly. 

I have this wifi card : 
03:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
        Subsystem: Foxconn International, Inc. Device e01f
        Kernel driver in use: ath9k
        Kernel modules: ath9k

on an ACER laptop, and the same message as you, at the end of your attachment (the kernel oops about null pointer dereferencing and the same last sysfs used).

I was desperate seeing no one else with this problem xD
Comment 9 Pierre 2009-11-20 22:19:56 EST
Here is what my report says, if it can be userful :

Nov 18 12:47:55 localhost NetworkManager: <info>  (wlan0): supplicant connection state:  disconnected -> scanning
Nov 18 12:47:55 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 00000001
Nov 18 12:47:55 localhost kernel: IP: [<c049b517>] put_page+0xe/0x76
Nov 18 12:47:55 localhost kernel: *pde = bf730067 
Nov 18 12:47:55 localhost kernel: Oops: 0000 [#1] SMP 
Nov 18 12:47:55 localhost kernel: last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/charge_full
Nov 18 12:47:55 localhost kernel: Modules linked in: vboxnetadp vboxnetflt vboxdrv sunrpc ipv6 cpufreq_ondemand acpi_cpufreq fuse dm_multipath uinput arc4 ecb snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec mac80211 snd_hwdep snd_seq snd_seq_device uvcvideo ath snd_pcm snd_timer videodev cfg80211 snd acer_wmi tg3 rfkill v4l1_compat i2c_i801 soundcore serio_raw iTCO_wdt snd_page_alloc iTCO_vendor_support joydev wmi usb_storage video output radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Nov 18 12:47:55 localhost kernel:
Nov 18 12:47:55 localhost kernel: Pid: 681, comm: phy0 Not tainted (2.6.31.5-127.fc12.i686 #1) Aspire 5738                    
Nov 18 12:47:55 localhost kernel: EIP: 0060:[<c049b517>] EFLAGS: 00010282 CPU: 0
Nov 18 12:47:55 localhost kernel: EIP is at put_page+0xe/0x76
Nov 18 12:47:55 localhost kernel: EAX: 00000001 EBX: f1887840 ECX: f6e84188 EDX: 00000000
Nov 18 12:47:55 localhost kernel: ESI: 00000001 EDI: f1887860 EBP: f41c7e44 ESP: f41c7e34
Nov 18 12:47:55 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Nov 18 12:47:55 localhost kernel: Process phy0 (pid: 681, ti=f41c6000 task=f6332640 task.ti=f41c6000)
Nov 18 12:47:55 localhost kernel: Stack:
Nov 18 12:47:55 localhost kernel: f6c003ac f1887840 00000001 f1887860 f41c7e54 c06d81d8 f1887840 f6e8c1a0
Nov 18 12:47:55 localhost kernel: <0> f41c7e60 c06d7e20 f1887840 f41c7e68 c06d7eaa f41c7e94 f8be3c3a f6e8d148
Nov 18 12:47:55 localhost kernel: <0> f41c7ea8 00000009 f19544aa f1887860 f19544ae f6e8c9f4 f1887840 f41c7ee4
Nov 18 12:47:55 localhost kernel: Call Trace:
Nov 18 12:47:55 localhost kernel: [<c06d81d8>] ? skb_release_data+0x56/0x96
Nov 18 12:47:55 localhost kernel: [<c06d7e20>] ? __kfree_skb+0x17/0x72
Nov 18 12:47:55 localhost kernel: [<c06d7eaa>] ? consume_skb+0x2f/0x31
Nov 18 12:47:55 localhost kernel: [<f8be3c3a>] ? ieee80211_tx_status+0x367/0x36f [mac80211]
Nov 18 12:47:55 localhost kernel: [<f7d484e5>] ? ath_tx_complete_buf+0x111/0x166 [ath9k]
Nov 18 12:47:55 localhost kernel: [<f7d4934d>] ? ath_draintxq+0x129/0x1b9 [ath9k]
Nov 18 12:47:55 localhost kernel: [<f7d49eec>] ? ath_drain_all_txq+0xd8/0xe6 [ath9k]
Nov 18 12:47:55 localhost kernel: [<f7d45da0>] ? ath_set_channel+0x4d/0xe3 [ath9k]
Nov 18 12:47:55 localhost kernel: [<f7d45fbc>] ? ath9k_config+0x186/0x1bd [ath9k]
Nov 18 12:47:55 localhost kernel: [<f8be381c>] ? ieee80211_hw_config+0x91/0x99 [mac80211]
Nov 18 12:47:55 localhost kernel: [<f8be744f>] ? ieee80211_scan_work+0xeb/0x178 [mac80211]
Nov 18 12:47:55 localhost kernel: [<c0446238>] ? worker_thread+0x13c/0x1bc
Nov 18 12:47:55 localhost kernel: [<f8be7364>] ? ieee80211_scan_work+0x0/0x178 [mac80211]
Nov 18 12:47:55 localhost kernel: [<c0449be1>] ? autoremove_wake_function+0x0/0x34
Nov 18 12:47:55 localhost kernel: [<c04460fc>] ? worker_thread+0x0/0x1bc
Nov 18 12:47:55 localhost kernel: [<c0449937>] ? kthread+0x70/0x75
Nov 18 12:47:55 localhost kernel: [<c04498c7>] ? kthread+0x0/0x75
Nov 18 12:47:55 localhost kernel: [<c04041a7>] ? kernel_thread_helper+0x7/0x10
Nov 18 12:47:55 localhost kernel: Code: 10 89 4c 90 08 42 83 fa 0e 89 10 75 05 e8 1b fe ff ff 89 d8 50 9d 8d 74 26 00 5b 5d c3 55 89 e5 57 56 53 83 ec 04 0f 1f 44 00 00 <66> f7 00 00 c0 89 c3 74 07 e8 48 f9 ff ff eb 52 f0 ff 48 04 0f 
Nov 18 12:47:55 localhost kernel: EIP: [<c049b517>] put_page+0xe/0x76 SS:ESP 0068:f41c7e34
Nov 18 12:47:55 localhost kernel: CR2: 0000000000000001
Nov 18 12:47:55 localhost kernel: ---[ end trace c53995cfa490624c ]---
Comment 10 DennyHalim.com 2010-01-07 03:41:03 EST
same problem on acer aspire 4540.

i 'upgrade' to rawhide kernel and seems everything's ok now.
Comment 11 Stanislaw Gruszka 2010-06-18 17:14:06 EDT
F-12 use now 2.6.32 based kernel, which according to comment 0 have this bug fixed.

Note You need to log in before you can comment on or make changes to this bug.