Bug 538603

Summary: System locks up when connecting iwl3945 client to access point
Product: [Fedora] Fedora Reporter: Mark Rosenstand <rosenstand>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 12CC: dougsland, gansalmon, itamar, kernel-maint, sgruszka
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-01-05 03:26:06 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Mark Rosenstand 2009-11-18 17:07:01 EST
Description of problem:
When connecting to a Linksys WRT160NL, the system immediately locks up hard with the caps lock key blinking.

Version-Release number of selected component (if applicable):
2.6.31.5-127.fc12.i686

How reproducible:
Every time :-)

Steps to Reproduce:
1. Choose ESSID in NetworkManager
2. Enter wpa-psk/wpa-psk2 password
3. No profit :-(

Actual results:
Caps lock blinks. Hard reset required. On subsequent boots NM will automagically try to connect to the network, meaning crash is guaranteed every time you boot unless you activate the kill switch or turn off your access point.

Expected results:
Wireless working or at least not crashing the system.

Additional info:
Relevant log from the access point:

Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
IEEE 802.11: authenticated
Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
IEEE 802.11: authenticated
Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
IEEE 802.11: authenticated
Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
IEEE 802.11: associated (aid 1)
Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
IEEE 802.11: associated (aid 1)
Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
IEEE 802.11: associated (aid 1)
Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
WPA: received EAPOL-Key 2/4 Pairwise with unexpected replay counter
Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
WPA: received EAPOL-Key 2/4 Pairwise with unexpected replay counter
Nov 17 04:29:22 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
WPA: pairwise key handshake completed (RSN)
Nov 17 04:29:26 yggdrasil daemon.info dnsmasq-dhcp[1491]: DHCPDISCOVER(br-lan)
00:19:d2:63:72:51 
Nov 17 04:29:26 yggdrasil daemon.info dnsmasq-dhcp[1491]: DHCPOFFER(br-lan)
192.168.1.245 00:19:d2:63:72:51 
Nov 17 04:29:26 yggdrasil daemon.info dnsmasq-dhcp[1491]: DHCPDISCOVER(br-lan)
00:19:d2:63:72:51 
Nov 17 04:29:26 yggdrasil daemon.info dnsmasq-dhcp[1491]: DHCPOFFER(br-lan)
192.168.1.245 00:19:d2:63:72:51 
Nov 17 04:33:46 yggdrasil daemon.info hostapd: wlan0: STA 00:19:d2:63:72:51
IEEE 802.11: deauthenticated due to local deauth request

This works with the stock blob-rich firmware on said router, and while ath9k is experimental and there is a chance its behaviour is triggering this bug, I'm not sure it should ever be able to crash the system. My friend's iPhone is able to use the access point though.

There's also another guy in the OpenWrt forums who seem to experience this behaviour with iwl3945: https://forum.openwrt.org/viewtopic.php?pid=97484#p97484 and his post on the Fedora forums: http://forums.fedoraforum.org/showthread.php?t=233917
Comment 1 Stanislaw Gruszka 2009-11-19 03:46:24 EST
Mark,

I going to reproduce this but currently have no OpenWrt router (I really need to get one, because I'm working on other issues where logs from AP are needed, but as far am too busy setup OpenWrt).

Since I will not be able to reproduce, could you please run kdump and provide memory dump of kernel crash (by email or provide to download somewhere). Kdump setup can be done using system-config-kdump program (there is package with the same name). Here is more information for RHEL which should be also applicable for Fedora:

http://kbase.redhat.com/faq/docs/DOC-6039

To make image not so big, you can also use mem= (e.g. mem=1G) kernel boot parameter in /boot/grub/grub.conf. 

Thanks.
Comment 2 Mark Rosenstand 2009-11-21 21:40:29 EST
The kdump service asks me to supply the crashkernel parameter to the kernel, even though /proc/cmdline shows crashkernel=128M@16M.
Comment 3 Stanislaw Gruszka 2009-11-23 08:32:47 EST
system-config-kdump not work on F12 (bug  538850), what a disappointment . 

I setup kdump on my laptops using few tricks.

/etc/grub/grub.conf

title Fedora 12 (2.6.31.1-56.fc12.i686.PAE)
        root (hd0,0)
        kernel /vmlinuz-2.6.31.5-127.fc12.i686 ro nomodeset selinux=0 root=/dev/sda7 rhgb quiet crashkernel=256M
        initrd /initramfs-2.6.31.5-127.fc12.i686.img

nomodeset param is crucial as graphics card driver can crash kdump kernel :/

/etc/kdump.conf 

# Omited stuff
#raw /dev/sda5
#ext3 /dev/sda3
#ext3 LABEL=/boot
#ext3 UUID=03138356-5e61-4ab3-b58e-27507ac41937
#net my.server.com:/export/tmp
#net user@my.server.com
#core_collector makedumpfile -c
#link_delay 60
#kdump_post /var/crash/scripts/kdump-post.sh
#extra_bins /usr/bin/lftp
#extra_modules gfs2
path /var/crash
default shell

Setup kdump:

[root@greenpear ~]# /etc/init.d/kdump stop
Stopping kdump:                                            [  OK  ]
[root@greenpear ~]# /etc/init.d/kdump start
Detected /etc/kdump.conf or /boot/vmlinuz-2.6.31.5-127.fc12.i686 change
Rebuilding /boot/initrd-2.6.31.5-127.fc12.i686kdump.img
Warning!  Lack of dump target specification means default option is ignored!
Starting kdump:                                            [  OK  ]

Check if kdump kernel is installed:

[root@greenpear ~]# cat /sys/kernel/kexec_crash_loaded
1

Now another disappointment - iwl3945 driver have another bug which also crash kdump kernel, to workaround we need to blacklist module before triggering crash.
 
# echo "blacklist iwl3945" > /etc/modprobe.d/iwl3945.conf

This privent to load iwl3945 module automaticly, but you can still load it by hand by "modproble iwl3945" .

I hope this allow you to get crash dump. Please let me know results. Thank you for your cooperation.

[ I'm going to investigate iwl3945 crash on kdump, but that would be rather hard to fix :-{  ]
Comment 4 Mark Rosenstand 2009-11-24 05:05:16 EST
I'll test again later today.

I had a friend over yesterday who also owns a iwl3945 laptop, using some version of Ubuntu (IIRC it was 2.6.29-based) which also crashed in exactly the same way. However, later we installed Debian stable on his laptop (which is using a 2.6.26 kernel) which worked. So I'm pretty sure it's a kernel regression somewhere between .26 and .29.
Comment 5 Stanislaw Gruszka 2009-12-21 13:55:58 EST
Hi Mark 

I have WRT160NL on my desk. However I'm a bit scared to copy there a firmware. It is safe to copy openwrt image through firmware upgrade web interface provided by original Linksys software?

Is 
http://downloads.openwrt.org/snapshots/trunk/ar71xx/openwrt-ar71xx-wrt160nl-squashfs.bin
right image ? 

Thanks in advance.
Comment 6 Stanislaw Gruszka 2009-12-22 07:34:16 EST
Done it. I'm able to reproduce :)
Comment 7 Stanislaw Gruszka 2009-12-22 08:15:14 EST
Note to myself:

kernel BUG at net/mac80211/agg-rx.c:90!                                                                                                                              
invalid opcode: 0000 [#1] SMP                                                                                                                                        
last sysfs file: /sys/devices/virtual/backlight/acpi_video0/brightness                                                                                               
Modules linked in: sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath arc4 snd_hda_codec_analog ecb iwl3945 snd_hda_intel snd_hda_codec snd_hwdep thinkpad_acpi e1000e i2c_i801 iTCO_wdt snd_seq nsc_ircc iwlcore iTCO_vendor_support btusb snd_seq_device irda snd_pcm crc_ccitt snd_timer snd soundcore snd_page_alloc bluetooth mac80211 cfg80211 joydev rfkill yenta_socket rsrc_nonstatic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: microcode]

Pid: 0, comm: swapper Not tainted (2.6.31.5-127.fc12.i686 #1) 6369CTO
EIP: 0060:[<f8175343>] EFLAGS: 00010246 CPU: 0
EIP is at ieee80211_sta_stop_rx_ba_session+0x1f/0x44 [mac80211]
EAX: f63001a0 EBX: 00000001 ECX: f84f552c EDX: f399dbc4
ESI: 00000000 EDI: 00000000 EBP: c0913d04 ESP: c0913cf8
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=c0912000 task=c0940ba0 task.ti=c0912000)
Stack:
 f399da00 00000000 0000001e c0913d28 f8174515 00000001 00000000 c0913d2c
<0> f63001a0 f385b01c f6685340 0000001e c0913d50 f817d928 0000001e c06bc9ea
<0> f63305f0 c0913dd0 f63001a0 c0913dd0 f3828300 f385b01c c0913dbc f817e9fa
Call Trace:
 [<f8174515>] ? ieee80211_process_delba+0x39/0x71 [mac80211]
 [<f817d928>] ? ieee80211_rx_h_action+0x10c/0x37d [mac80211]
 [<c06bc9ea>] ? led_trigger_event+0x42/0x59
 [<f817e9fa>] ? ieee80211_invoke_rx_handlers+0xe61/0x10e7 [mac80211]
 [<c0450405>] ? getnstimeofday+0x57/0xe0
 [<c041d364>] ? hpet_legacy_next_event+0xf/0x11
 [<c045338e>] ? clockevents_program_event+0xbf/0xcd
 [<f817eebb>] ? __ieee80211_rx_handle_packet+0x23b/0x24d [mac80211]
 [<f817f5eb>] ? __ieee80211_rx+0x4c2/0x513 [mac80211]
 [<f83624d8>] ? iwl_enable_interrupts+0x74/0x7d [iwlcore]
 [<f8170cab>] ? ieee80211_tasklet_handler+0x69/0xf1 [mac80211]
 [<c043b095>] ? tasklet_action+0x68/0xad
 [<c043c042>] ? __do_softirq+0xb1/0x157
 [<c043c11e>] ? do_softirq+0x36/0x41
 [<c043c210>] ? irq_exit+0x2e/0x61
 [<c0404ed1>] ? do_IRQ+0x86/0x9a
 [<c0403c50>] ? common_interrupt+0x30/0x38
 [<c04400d8>] ? del_timer_sync+0x4/0x1d
 [<c0764792>] ? schedule+0x45/0x7b8
 [<c044e2f0>] ? sched_clock_idle_wakeup_event+0x16/0x18
 [<c04548bc>] ? tick_nohz_stop_idle+0x54/0x59
 [<c0454d3e>] ? tick_nohz_restart_sched_tick+0x31/0x11f
 [<c0402716>] ? cpu_idle+0xad/0xaf
 [<c0753f98>] ? rest_init+0x58/0x5a
 [<c09898b2>] ? start_kernel+0x32b/0x330
 [<c0989070>] ? i386_start_kernel+0x70/0x77
Code: 04 12 5f c8 8d 65 f4 5b 5e 5f 5d c3 55 89 e5 57 56 53 0f 1f 44 00 00 8b 40 74 8b 5d 08 8b 7d 0c 89 ce 8b 48 50 83 79 60 00 75 04 <0f> 0b eb fe e8 89 d1 ff ff 85 c0 74 10 0f b7 ff 0f b7 cb 0f b7
EIP: [<f8175343>] ieee80211_sta_stop_rx_ba_session+0x1f/0x44 [mac80211] SS:ESP 0068:c0913cf8
Comment 8 Stanislaw Gruszka 2009-12-22 08:21:40 EST
Bug is most likely fixed by:

commit 4253119acf412fd686ef4bd8749b5a4d70ea3a51
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Fri Nov 20 09:15:51 2009 +0100

    mac80211: fix two remote exploits

I'm going to check.
Comment 9 Stanislaw Gruszka 2009-12-23 10:29:54 EST
Confirmed patch fix the problem. Fix goes to vanilla 2.6.31.7 kernel, but this one is not used by fedora yet and I'm not sure when it start to be used. For now, can give rawhide kernel a try: 'yum --enablerepo="rawhide" update kernel'. Cheers.
Comment 10 Mark Rosenstand 2009-12-23 11:38:20 EST
Sorry for being unresponsive, I forgot about this and have been away for some days, so didn't get the notifications till now.

Looks like Christmas has come early this year :-)
Comment 11 Stanislaw Gruszka 2010-01-05 03:26:06 EST
Fedora 12 update to 2.6.31.9 kernel which include fix for that bug. Please reopen if somehow bug still occurs.