Bug 536713

Summary: Oops in ath9k
Product: [Fedora] Fedora Reporter: Matthew Galgoci <mgalgoci>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 12CC: dougsland, gansalmon, itamar, kernel-maint, linville, mail2dny, mcgrof, sassmann, sgruszka, xpierro
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-18 21:14:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ath9k oops
none
oops while running 2.6.31.6 with ath9k debugging enabled none

Description Matthew Galgoci 2009-11-11 01:52:22 UTC
Description of problem:

Oops in ath9k driver under light load and subsequent loss of networking. A reboot must be done to make the system usable again.

This bug seems to be fixed in at least 2.6.32-rc6-git3

Version-Release number of selected component (if applicable):

2.6.31.5-127 PAE

How reproducible:

100% of the time

Steps to Reproduce:

1) rm -rf /var/cache/yum/*

2) yum -y update

Obtaining the compressed repo data is sufficient to trigger the oops.

Actual results:

yum -y update stalls and networking stops working.

Expected results:

I would expect the yum -y update to complete successfully.

Additional info:

Oops is attached.

Comment 1 Matthew Galgoci 2009-11-11 01:53:19 UTC
Created attachment 368980 [details]
ath9k oops

Comment 2 Matthew Galgoci 2009-11-11 02:22:39 UTC
I am currently building 2.6.31.6 with the F12 config and debugging enabled in the ath9k driver and will test that as soon as it is done.

Comment 3 Matthew Galgoci 2009-11-11 13:48:58 UTC
This does not reproduce with 2.6.31.6 using the F12 kernel-PAE .config plus ath9k debugging enabled.

Comment 4 John W. Linville 2009-11-11 14:04:24 UTC
I don't see any patches in 2.6.31.5..2.6.31.6 that would seem to relate -- no ath9k ones and only a couple of mac80211 that don't seem related to the oops path.

Maybe it's a Heizenbug?  Could you try your 2.6.31.6 build using the stock F12-PAE config (i.e. w/o turning-on ath9k debugging)?

Comment 5 Luis R. Rodriguez 2009-11-11 14:47:09 UTC
John, although 2.6.31.6 did not have any new ath9k or mac80211 specific patch there was the PCI memory rounding fix that also fixed an issue with another netbook. So far that we know if this only affected some Aspires (if the sky2 issue was with an Aspire as well).

I cannot be too sure that that same patch fixed the issue but it would be easy to test by just reverting it.

Matthew, if you have spare time can you try reverting this patch to see if it did indeed fix the issue:

Author: Yinghai Lu <yinghai>

   pci: increase alignment to make more space for hidden code

So far the other issue refer to:

ath9k load issue:

http://bugzilla.kernel.org/show_bug.cgi?id=14402

iwlagn + sky combo issues:

http://bugzilla.kernel.org/show_bug.cgi?id=13940

These issues are loading related though so it would be surprising to see an issue not related to that. The patch in question would change the pci memory address start.

Only thing that I can think of between 2.6.31.5 and 2.6.31.6 but perhaps this wasn't it, only testing would tell.

Comment 6 Matthew Galgoci 2009-11-12 01:40:45 UTC
I came home today after work and the laptop had oopsed while running 2.6.31.6 with debugging enabled in the ath9k driver. It looks like the same oops signature. I am attaching it.

Comment 7 Matthew Galgoci 2009-11-12 01:42:03 UTC
Created attachment 369122 [details]
oops while running 2.6.31.6 with ath9k debugging enabled

Comment 8 Pierre 2009-11-21 03:11:36 UTC
I have exactly the same problem, and disabling kernel modesetting seems to correct it. To reproduce it, I do the same as you, booting linux on kernel 2.6.31.6-134.fc12.i686, launching a web application (like yum or firefox) and right after that, it stops working, and I can't shutdown properly. 

I have this wifi card : 
03:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
        Subsystem: Foxconn International, Inc. Device e01f
        Kernel driver in use: ath9k
        Kernel modules: ath9k

on an ACER laptop, and the same message as you, at the end of your attachment (the kernel oops about null pointer dereferencing and the same last sysfs used).

I was desperate seeing no one else with this problem xD

Comment 9 Pierre 2009-11-21 03:19:56 UTC
Here is what my report says, if it can be userful :

Nov 18 12:47:55 localhost NetworkManager: <info>  (wlan0): supplicant connection state:  disconnected -> scanning
Nov 18 12:47:55 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 00000001
Nov 18 12:47:55 localhost kernel: IP: [<c049b517>] put_page+0xe/0x76
Nov 18 12:47:55 localhost kernel: *pde = bf730067 
Nov 18 12:47:55 localhost kernel: Oops: 0000 [#1] SMP 
Nov 18 12:47:55 localhost kernel: last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/charge_full
Nov 18 12:47:55 localhost kernel: Modules linked in: vboxnetadp vboxnetflt vboxdrv sunrpc ipv6 cpufreq_ondemand acpi_cpufreq fuse dm_multipath uinput arc4 ecb snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel ath9k snd_hda_codec mac80211 snd_hwdep snd_seq snd_seq_device uvcvideo ath snd_pcm snd_timer videodev cfg80211 snd acer_wmi tg3 rfkill v4l1_compat i2c_i801 soundcore serio_raw iTCO_wdt snd_page_alloc iTCO_vendor_support joydev wmi usb_storage video output radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Nov 18 12:47:55 localhost kernel:
Nov 18 12:47:55 localhost kernel: Pid: 681, comm: phy0 Not tainted (2.6.31.5-127.fc12.i686 #1) Aspire 5738                    
Nov 18 12:47:55 localhost kernel: EIP: 0060:[<c049b517>] EFLAGS: 00010282 CPU: 0
Nov 18 12:47:55 localhost kernel: EIP is at put_page+0xe/0x76
Nov 18 12:47:55 localhost kernel: EAX: 00000001 EBX: f1887840 ECX: f6e84188 EDX: 00000000
Nov 18 12:47:55 localhost kernel: ESI: 00000001 EDI: f1887860 EBP: f41c7e44 ESP: f41c7e34
Nov 18 12:47:55 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Nov 18 12:47:55 localhost kernel: Process phy0 (pid: 681, ti=f41c6000 task=f6332640 task.ti=f41c6000)
Nov 18 12:47:55 localhost kernel: Stack:
Nov 18 12:47:55 localhost kernel: f6c003ac f1887840 00000001 f1887860 f41c7e54 c06d81d8 f1887840 f6e8c1a0
Nov 18 12:47:55 localhost kernel: <0> f41c7e60 c06d7e20 f1887840 f41c7e68 c06d7eaa f41c7e94 f8be3c3a f6e8d148
Nov 18 12:47:55 localhost kernel: <0> f41c7ea8 00000009 f19544aa f1887860 f19544ae f6e8c9f4 f1887840 f41c7ee4
Nov 18 12:47:55 localhost kernel: Call Trace:
Nov 18 12:47:55 localhost kernel: [<c06d81d8>] ? skb_release_data+0x56/0x96
Nov 18 12:47:55 localhost kernel: [<c06d7e20>] ? __kfree_skb+0x17/0x72
Nov 18 12:47:55 localhost kernel: [<c06d7eaa>] ? consume_skb+0x2f/0x31
Nov 18 12:47:55 localhost kernel: [<f8be3c3a>] ? ieee80211_tx_status+0x367/0x36f [mac80211]
Nov 18 12:47:55 localhost kernel: [<f7d484e5>] ? ath_tx_complete_buf+0x111/0x166 [ath9k]
Nov 18 12:47:55 localhost kernel: [<f7d4934d>] ? ath_draintxq+0x129/0x1b9 [ath9k]
Nov 18 12:47:55 localhost kernel: [<f7d49eec>] ? ath_drain_all_txq+0xd8/0xe6 [ath9k]
Nov 18 12:47:55 localhost kernel: [<f7d45da0>] ? ath_set_channel+0x4d/0xe3 [ath9k]
Nov 18 12:47:55 localhost kernel: [<f7d45fbc>] ? ath9k_config+0x186/0x1bd [ath9k]
Nov 18 12:47:55 localhost kernel: [<f8be381c>] ? ieee80211_hw_config+0x91/0x99 [mac80211]
Nov 18 12:47:55 localhost kernel: [<f8be744f>] ? ieee80211_scan_work+0xeb/0x178 [mac80211]
Nov 18 12:47:55 localhost kernel: [<c0446238>] ? worker_thread+0x13c/0x1bc
Nov 18 12:47:55 localhost kernel: [<f8be7364>] ? ieee80211_scan_work+0x0/0x178 [mac80211]
Nov 18 12:47:55 localhost kernel: [<c0449be1>] ? autoremove_wake_function+0x0/0x34
Nov 18 12:47:55 localhost kernel: [<c04460fc>] ? worker_thread+0x0/0x1bc
Nov 18 12:47:55 localhost kernel: [<c0449937>] ? kthread+0x70/0x75
Nov 18 12:47:55 localhost kernel: [<c04498c7>] ? kthread+0x0/0x75
Nov 18 12:47:55 localhost kernel: [<c04041a7>] ? kernel_thread_helper+0x7/0x10
Nov 18 12:47:55 localhost kernel: Code: 10 89 4c 90 08 42 83 fa 0e 89 10 75 05 e8 1b fe ff ff 89 d8 50 9d 8d 74 26 00 5b 5d c3 55 89 e5 57 56 53 83 ec 04 0f 1f 44 00 00 <66> f7 00 00 c0 89 c3 74 07 e8 48 f9 ff ff eb 52 f0 ff 48 04 0f 
Nov 18 12:47:55 localhost kernel: EIP: [<c049b517>] put_page+0xe/0x76 SS:ESP 0068:f41c7e34
Nov 18 12:47:55 localhost kernel: CR2: 0000000000000001
Nov 18 12:47:55 localhost kernel: ---[ end trace c53995cfa490624c ]---

Comment 10 DennyHalim.com 2010-01-07 08:41:03 UTC
same problem on acer aspire 4540.

i 'upgrade' to rawhide kernel and seems everything's ok now.

Comment 11 Stanislaw Gruszka 2010-06-18 21:14:06 UTC
F-12 use now 2.6.32 based kernel, which according to comment 0 have this bug fixed.