Bug 749276

Summary: AR9285 wireless card make system freezed
Product: [Fedora] Fedora Reporter: Flos Lonicerae <lonicerae>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: gansalmon, itamar, jonathan, jwboyer, kernel-maint, madhu.chinakonda, sgruszka
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-22 02:21:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cmdline
none
dmesg
none
iomem
none
installed kernel debugging packages
none
kexec-tools version
none
lsmod result
none
'uname -a' result
none
first crash test on Fedora 16
none
last test on Fedora 16, photo 1
none
last test on Fedora 16, photo 2
none
kdump.conf in /etc
none
atl1c_net_next_update-3.3.patch
none
atl1c_net_next_update-3.4.patch none

Description Flos Lonicerae 2011-10-26 15:46:35 UTC
there is an old post which describes the AR9285 wireless card had a bug that can make the system freezed when the wireless card enabled. please see this:
https://bugzilla.redhat.com/show_bug.cgi?id=697157

i can install Fedora14 on my Lenovo G475 after Stanislaw Gruszka fixed the bug. but recently, i want to install Fedora 16 on my notebook, i find that the 'system-freeze' problem happen again. i try all old methods in the thread rhbz697157, but it helps nothing.

Stanislaw Gruszka was so kind that he told me how to get the kernel core dump for debugging. but it's a bit difficult for me to know how to do it. so i spent a few days to study kdump. now i will write down all problems i found these days:

I. using the system-config-kdump package. 
this tool can be run on RHEL6 correctly sometimes, but can NOT be run on Fedora 16 completely!!! when i try to run it on Fedora 16, it pops up a lot of error windows, and finally exited WITHOUT triggering the debug loging daemon -- 'abrtd'. so i cannot report bug for this tool. there is also an RHEL6 on my notebook, and i use it on my RHCE6 lessons. since the system-config-kdump can run on RHEL6, i think out a ugly method that first i compile the latest Fedora 16 kernel and its dependencies on RHEL6, all finished successfully! second, i boot the kernel-3.1.0-rc10 on RHEL6, and run system-config-kdump to configure kdump. it seems that the initrd-kdump ramfs can be found in my /boot. then i appended "crashkernel=512M" to the kernel's boot prarms, reboot, and see if there is a "Crash Kernel" value in /proc/iomem, it did do the magic. then i test with:
echo "1"> /proc/sys/kernel/sysrq  # this step ok
echo c > /proc/sysrq-trigger
the kernel crashed, but it said that there is a bug so that it can not regain a bash shell. then a kernel panic without any core files dumped...

i also try the 2.6 kernel which carried by RHEL6 itself, following the same steps, it can produce vmcore in my /var/crash/<crash time> directory.

II. using the kexec-tools directly
after i read some tutorials on the web. i decided to manually configure the kdump on my Fedora 16. first i disable my wireless card, update my Fedora 16 to the latest kernel, and install all the debuginfo of the kernel. install the kexec-tools. here, i found that the kexec-tools cannot be install correctly!!! this is because the kdump.service is corrupted -- the [Install] section cannot be found, so that systemd cannot enable it. but all binary files of the kexec-tools were ok. so, after i modified the kdump.conf and appended boot params 'crashkernel=512M' to grub.cfg of /boot/grub2, reboot and run'kdumpctl restart' manually, it then create an 'initramfs-3.1.0-1.fc16.i686.debugkdump.img' in my /boot, and then i run 'service kdump status', it said 'Kdump is operational'. then i check the /proc/iomem and it shows the crash kernel is there! i try to test it:
echo "1"> /proc/sys/kernel/sysrq  # this step ok
echo c > /proc/sysrq-trigger
the kernel crashed, but it did *NOT* produce vmcore either, leaving my keyboard lights blinking.

the attachments are envirments and configurations of the last test on my Fedora 16. pictures are the results of my last two tests.

thanks!

Flos

Comment 1 Flos Lonicerae 2011-10-26 15:54:05 UTC
Created attachment 530310 [details]
cmdline

Comment 2 Flos Lonicerae 2011-10-26 15:54:29 UTC
Created attachment 530311 [details]
dmesg

Comment 3 Flos Lonicerae 2011-10-26 15:55:01 UTC
Created attachment 530312 [details]
iomem

Comment 4 Flos Lonicerae 2011-10-26 15:55:52 UTC
Created attachment 530313 [details]
installed kernel debugging packages

Comment 5 Flos Lonicerae 2011-10-26 15:56:24 UTC
Created attachment 530314 [details]
kexec-tools version

Comment 6 Flos Lonicerae 2011-10-26 15:56:50 UTC
Created attachment 530316 [details]
lsmod result

Comment 7 Flos Lonicerae 2011-10-26 15:58:01 UTC
Created attachment 530317 [details]
'uname -a' result

Comment 8 Flos Lonicerae 2011-10-26 15:59:20 UTC
Created attachment 530318 [details]
first crash test on Fedora 16

Comment 9 Flos Lonicerae 2011-10-26 16:00:29 UTC
Created attachment 530319 [details]
last test on Fedora 16, photo 1

Comment 10 Flos Lonicerae 2011-10-26 16:01:28 UTC
Created attachment 530320 [details]
last test on Fedora 16, photo 2

Comment 11 Flos Lonicerae 2011-10-26 16:09:10 UTC
btw, when i did the crash test, i would first made the network service disabled, then reboot, and make my wireless card useable in bios, save the option and reboot. in this way, the wireless driver was loaded but the network service is stopped, so that i can do the crash test. otherwise, the system will soon be frezzed if the wireless card making connection.

Comment 12 Flos Lonicerae 2011-10-26 16:16:31 UTC
Created attachment 530322 [details]
kdump.conf in /etc

Comment 13 Stanislaw Gruszka 2011-10-29 12:51:25 UTC
I'm sad to hear that kdump does not work.

> the kernel crashed, but it said that there is a bug so that it can not regain a
> bash shell. then a kernel panic without any core files dumped...

So that (RHEL6 + 3.1-rc kernel) was almost successful, secondary (kdump) kernel boot, but did not make a dump? So lets try to fix that, to allow to catch ath9k bug. If you modify /etc/kdump.conf to have "default shell" instead of "default reboot", you will be prompted to small shell that allow to find out why dump fail (i.e. if mount fail or there problem with read /proc/vmcore, etc). BTW: Does your custom 3.1-rc kernel have compiled CONFIG_PROC_KCORE and CONFIG_PROC_VMCORE ?

If there will be no option to force kdump work, perhaps it could be possible to make a photo of a crash:

- first blacklist ath9k module in /etc/modprobe.d/blacklist.conf
- enable wireless in bios
- boot system and log to virtual terminal (Alt+Ctrl+F2) 
- login as root
- and do modprobe ath9k
- this should trigger a crash, which should show calltrace

Comment 14 Stanislaw Gruszka 2011-10-29 13:07:06 UTC
One more thing. Kdump need separate blacklisting of module/s , to make kdump kernel does not crash in ath9k driver, you have to add 

blacklist ath9k 

line to /etc/kdump.conf (and restart kdump service).

You can blacklist other modules that are not needed i.e uvcvideo, snd_*, .... The more modules blacklisted, there is more chance that kdump kernel will boot properly and more memory for successful dump process. Note radeon module is needed for proper display initialization, if blacklisted it could cause only that you will not see kdump kernel booting, but in worse scenario it that could make kdump kernel fail to start.

Comment 15 Stanislaw Gruszka 2012-02-26 12:14:28 UTC
Flos, there are various ath9k fixes committed since last 10/2011, does the problem still occurs in latest fedora 16 kernel 3.2 ?

Comment 16 Flos Lonicerae 2012-02-27 16:12:30 UTC
Hi Stanislaw,

this problem still occurs in latest fedora 16 kernel 3.2. but i think i've find out the problem. when i both enable my wireless network adapter and wired network adapter, the system MUST will be freezed! but if i blacklist the atl1c module in /etc/modprobe.d/blacklist.conf, and regenerate the initramfs, the problem disappear at all, although at this time my wired network adapter cannot be used due to the atl1c module not loaded.

all i can do is this. without the wired network adapter module, the wireless card can work without any problem. the same situation is that if i disabled my wireless card in bios settings or blacklist the ath and ath9k modules, the wired network adapter can work smoothly! what a strange problem.

since i setup a wifi ap at home, now i was working with blacklisting the atl1c module.

i always wonder if the problem can be finally resolved. ;)

Flos

Comment 17 Stanislaw Gruszka 2012-02-29 12:46:48 UTC
Good catch, atl1c do some ASPM quirks on PCI bridges, same as ath9k. Apparently both of them should not do this, but relay on system settings, or change things but make pci core aware of the changes. 

> i always wonder if the problem can be finally resolved. ;)

I plan to work on that, but not sure when.

Comment 18 Flos Lonicerae 2012-02-29 13:15:34 UTC
Thanks! Waiting for your good news!

Comment 19 Dave Jones 2012-03-22 16:55:51 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 20 Dave Jones 2012-03-22 16:59:32 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 21 Dave Jones 2012-03-22 17:10:54 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 22 Stanislaw Gruszka 2012-03-22 20:02:35 UTC
This is not fixed...

Comment 23 Stanislaw Gruszka 2012-05-03 08:19:43 UTC
I did not worked on it yet, but saw that atheros developer post atl1c patches to net-next that changes various register programming code on alt1c, including ASPM. Below kernel build include atl1c driver update from net-next. Please check if that solve the problem:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4045082

Comment 24 Flos Lonicerae 2012-05-03 15:55:16 UTC
(In reply to comment #23)
> I did not worked on it yet, but saw that atheros developer post atl1c patches
> to net-next that changes various register programming code on alt1c, including
> ASPM. Below kernel build include atl1c driver update from net-next. Please
> check if that solve the problem:
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=4045082

Hi Stanislaw,

I downloaded the kernel you just built, and installed. It seems that it has resolved the problem!!! :D
I had to go to bed now, and I'll post the detailed process what I do. have a nice day!

Flos

Comment 25 Stanislaw Gruszka 2012-05-15 11:09:42 UTC
Created attachment 584639 [details]
atl1c_net_next_update-3.3.patch

atl1c update from net-next for 3.3 kernel

Comment 26 Stanislaw Gruszka 2012-05-15 11:11:54 UTC
Created attachment 584640 [details]
atl1c_net_next_update-3.4.patch

atl1 from net next update for 3.4 kernel

Comment 27 Stanislaw Gruszka 2012-05-15 11:16:38 UTC
Josh, please apply above 3.3 patch as fix for this bug. I attached also patch for 3.4 kernel in case fedora will update to that kernel (it apply cleanly on 3.4-rc7)

This update include mostly register programming fixes from Atheros.

Comment 28 Josh Boyer 2012-05-15 12:05:15 UTC
Applied on all branches.  Thanks Stanislaw!

Comment 29 Fedora Update System 2012-05-17 13:45:30 UTC
kernel-3.3.6-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.3.6-3.fc16

Comment 30 Fedora Update System 2012-05-17 13:47:03 UTC
kernel-3.3.6-3.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.3.6-3.fc17

Comment 31 Fedora Update System 2012-05-17 22:56:13 UTC
Package kernel-3.3.6-3.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.3.6-3.fc17'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-7974/kernel-3.3.6-3.fc17
then log in and leave karma (feedback).

Comment 32 Fedora Update System 2012-05-22 02:21:54 UTC
kernel-3.3.6-3.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 33 Fedora Update System 2012-05-26 08:07:39 UTC
kernel-3.3.6-3.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.