Description of problem: Using iwlagn on my Thinkpad T61p I can reliably freeze the OS while using wifi. I never got anything in /var/log/messages even after multiple crashes so I setup kdump and was able to capture multiple traces, they all look similar. Version-Release number of selected component (if applicable): 2.6.35.10-74.fc14.x86_64 Steps to Reproduce: 1. Use wifi 2. Wait some random amount of time (usually less than an hour, sometimes it doesn't even take 5 minutes) 3. Profit, I mean crash Actual results: Kernel is dead frozen, the only "fix" is to reboot Additional info: Here's the first trace (this is dmesg saved by kdump): <4>[ 1635.418366] iwlagn 0000:03:00.0: iwlagn_tx_agg_start on ra = c0:3f:0e:7a:90:34 tid = 0 <4>[ 1663.439932] iwlagn 0000:03:00.0: iwlagn_tx_agg_start on ra = c0:3f:0e:7a:90:34 tid = 0 <6>[ 1678.690234] SysRq : Trigger a crash <1>[ 1678.690274] BUG: unable to handle kernel NULL pointer dereference at (null) <1>[ 1678.690283] IP: [<ffffffff812ba89b>] sysrq_handle_crash+0x16/0x20 <4>[ 1678.690300] PGD 0 <0>[ 1678.690307] Oops: 0002 [#1] SMP <0>[ 1678.690314] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/net/wlan0/statistics/collisions <4>[ 1678.690323] CPU 0 <4>[ 1678.690327] Modules linked in: tcp_lp nfs lockd fscache nfs_acl auth_rpcgss fuse rfcomm sco bnep l2cap cryptd aes_x86_64 aes_generic sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf xt_physdev ipt_MASQUERADE iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput arc4 ecb snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep iwlagn iwlcore mac80211 cfg80211 thinkpad_acpi snd_seq snd_seq_device btusb snd_pcm snd_timer e1000e snd i2c_i801 r852 sm_common nand nand_ids nand_ecc bluetooth mtd soundcore iTCO_wdt iTCO_vendor_support snd_page_alloc rfkill wmi microcode btrfs zlib_deflate libcrc32c sdhci_pci sdhci mmc_core firewire_ohci yenta_socket firewire_core crc_itu_t nouveau ttm drm_kms_helper drm i2c_algo_bit video output i2c_core [last unloaded: scsi_wait_scan] <4>[ 1678.690487] <4>[ 1678.690494] Pid: 0, comm: swapper Not tainted 2.6.35.10-74.fc14.x86_64 #1 6458V5C/6458V5C <4>[ 1678.690501] RIP: 0010:[<ffffffff812ba89b>] [<ffffffff812ba89b>] sysrq_handle_crash+0x16/0x20 <4>[ 1678.690515] RSP: 0018:ffff88000a203950 EFLAGS: 00010082 <4>[ 1678.690521] RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000003f7b <4>[ 1678.690528] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 <4>[ 1678.690534] RBP: ffff88000a203950 R08: 0000000000000001 R09: ffffffffffffffff <4>[ 1678.690541] R10: ffff88000a203850 R11: 0000000000000000 R12: 0000000000000000 <4>[ 1678.690547] R13: ffffffff81a8b880 R14: 0000000000000007 R15: 0000000000000086 <4>[ 1678.690555] FS: 0000000000000000(0000) GS:ffff88000a200000(0000) knlGS:0000000000000000 <4>[ 1678.690563] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>[ 1678.690569] CR2: 0000000000000000 CR3: 0000000001a42000 CR4: 00000000000006f0 <4>[ 1678.690576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[ 1678.690583] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[ 1678.690591] Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a4a020) <0>[ 1678.690596] Stack: <4>[ 1678.690600] ffff88000a2039a0 ffffffff812bae18 ffff88000a203970 ffffffff00000001 <4>[ 1678.690610] <0> ffff88000a2039c0 ffff8800378a2a80 0000000000000000 000000000000002e <4>[ 1678.690621] <0> 0000000000000001 0000000000000001 ffff88000a2039b0 ffffffff812bafa5 <0>[ 1678.690634] Call Trace: <0>[ 1678.690638] <IRQ> <4>[ 1678.690650] [<ffffffff812bae18>] __handle_sysrq+0xab/0x14a <4>[ 1678.690660] [<ffffffff812bafa5>] sysrq_filter+0x94/0x9c <4>[ 1678.690670] [<ffffffff8135b7a8>] input_pass_event+0x8a/0xbd <4>[ 1678.690680] [<ffffffff8135d763>] input_handle_event+0x3c6/0x3d5 <4>[ 1678.690689] [<ffffffff8135d864>] input_event+0x69/0x87 <4>[ 1678.690700] [<ffffffff81363c2b>] atkbd_interrupt+0x543/0x645 <4>[ 1678.690712] [<ffffffff8101057c>] ? native_sched_clock+0x35/0x37 <4>[ 1678.690723] [<ffffffff8106847b>] ? run_posix_cpu_timers+0x2a/0x5bb <4>[ 1678.690734] [<ffffffff81052b08>] ? __raw_local_irq_save+0x1b/0x21 <4>[ 1678.690745] [<ffffffff81358f02>] serio_interrupt+0x45/0x7f <4>[ 1678.690754] [<ffffffff81359c1d>] i8042_interrupt+0x288/0x29a <4>[ 1678.690764] [<ffffffff8106dc85>] ? timekeeping_get_ns+0x1b/0x3d <4>[ 1678.690774] [<ffffffff810a5ac9>] handle_IRQ_event+0x5a/0x11f <4>[ 1678.690785] [<ffffffff810235d8>] ? ack_APIC_irq+0x15/0x17 <4>[ 1678.690794] [<ffffffff810a7d2b>] handle_edge_irq+0xe2/0x12a <4>[ 1678.690803] [<ffffffff8100c2ea>] handle_irq+0x88/0x90 <4>[ 1678.690813] [<ffffffff8146fb44>] do_IRQ+0x5c/0xb4 <4>[ 1678.690823] [<ffffffff8146a093>] ret_from_intr+0x0/0x11 <4>[ 1678.690833] [<ffffffff8107802b>] ? raw_local_irq_restore+0xb/0x12 <4>[ 1678.690843] [<ffffffff81469c5f>] ? _raw_spin_unlock_irqrestore+0x17/0x19 <4>[ 1678.690854] [<ffffffff81059fcc>] ? try_to_del_timer_sync+0x77/0x85 <4>[ 1678.690863] [<ffffffff81059ff3>] ? del_timer_sync+0x19/0x26 <4>[ 1678.690892] [<ffffffffa0475e9b>] ? ___ieee80211_stop_tx_ba_session+0x3f/0xc9 [mac80211] <4>[ 1678.690918] [<ffffffffa0475f76>] ? sta_addba_resp_timer_expired+0x51/0x62 [mac80211] <4>[ 1678.690929] [<ffffffff81059e28>] ? run_timer_softirq+0x1d6/0x2a3 <4>[ 1678.690938] [<ffffffff81071690>] ? clockevents_program_event+0x8e/0x90 <4>[ 1678.690963] [<ffffffffa0475f25>] ? sta_addba_resp_timer_expired+0x0/0x62 [mac80211] <4>[ 1678.690975] [<ffffffff81053a39>] ? __do_softirq+0xdd/0x199 <4>[ 1678.690984] [<ffffffff8100ca3a>] ? timer_interrupt+0x1e/0x25 <4>[ 1678.690994] [<ffffffff8100abdc>] ? call_softirq+0x1c/0x30 <4>[ 1678.691002] [<ffffffff8100c338>] ? do_softirq+0x46/0x82 <4>[ 1678.691006] [<ffffffff81053b99>] ? irq_exit+0x3b/0x7d <4>[ 1678.691006] [<ffffffff8146fb85>] ? do_IRQ+0x9d/0xb4 <4>[ 1678.691006] [<ffffffff8146a093>] ? ret_from_intr+0x0/0x11 <0>[ 1678.691006] <EOI> <4>[ 1678.691006] [<ffffffff8128f900>] ? raw_local_irq_enable+0x10/0x12 <4>[ 1678.691006] [<ffffffff8106b5d8>] ? sched_clock_idle_wakeup_event+0x17/0x1b <4>[ 1678.691006] [<ffffffff8129076c>] ? acpi_idle_enter_bm+0x228/0x260 <4>[ 1678.691006] [<ffffffff81394201>] ? cpuidle_idle_call+0x8b/0xe9 <4>[ 1678.691006] [<ffffffff81008325>] ? cpu_idle+0xaa/0xcc <4>[ 1678.691006] [<ffffffff81451906>] ? rest_init+0x8a/0x8c <4>[ 1678.691006] [<ffffffff81ba1c49>] ? start_kernel+0x40b/0x416 <4>[ 1678.691006] [<ffffffff81ba12c6>] ? x86_64_start_reservations+0xb1/0xb5 <4>[ 1678.691006] [<ffffffff81ba13c2>] ? x86_64_start_kernel+0xf8/0x107 <0>[ 1678.691006] Code: e0 81 83 e2 03 8a 41 03 c1 e2 04 83 e0 cf 09 d0 88 41 03 c9 c3 55 48 89 e5 0f 1f 44 00 00 c7 05 34 0f a3 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 <1>[ 1678.691006] RIP [<ffffffff812ba89b>] sysrq_handle_crash+0x16/0x20 <4>[ 1678.691006] RSP <ffff88000a203950> <0>[ 1678.691006] CR2: 0000000000000000 Here's a diff of the call trace of my second dump: - [<ffffffffa0475e9b>] ? ___ieee80211_stop_tx_ba_session+0x3f/0xc9 [mac80211] - [<ffffffffa0475f76>] ? sta_addba_resp_timer_expired+0x51/0x62 [mac80211] + [<ffffffffa039ce9b>] ? ___ieee80211_stop_tx_ba_session+0x3f/0xc9 [mac80211] + [<ffffffffa039cf76>] ? sta_addba_resp_timer_expired+0x51/0x62 [mac80211] [<ffffffff81059e28>] ? run_timer_softirq+0x1d6/0x2a3 [<ffffffff81071690>] ? clockevents_program_event+0x8e/0x90 - [<ffffffffa0475f25>] ? sta_addba_resp_timer_expired+0x0/0x62 [mac80211] + [<ffffffffa039cf25>] ? sta_addba_resp_timer_expired+0x0/0x62 [mac80211] Looking at past kernel bugs, I guess this could be it: commit 44271488b91c9eecf249e075a1805dd887e222d2 Author: Johannes Berg <johannes.berg> Date: Tue Oct 5 21:40:33 2010 +0200 mac80211: delete AddBA response timer We never delete the addBA response timer, which is typically fine, but if the station it belongs to is deleted very quickly after starting the BA session, before the peer had a chance to reply, the timer may fire after the station struct has been freed already. Therefore, we need to delete the timer in a suitable spot -- best when the session is being stopped (which will happen even then) in which case the delete will be a no-op most of the time. I've reproduced the scenario and tested the fix. This fixes the crash reported at http://mid.gmane.org/4CAB6F96.6090701@candelatech.com Cc: stable Reported-by: Ben Greear <greearb> Signed-off-by: Johannes Berg <johannes.berg> Signed-off-by: John W. Linville <linville>
Created attachment 472059 [details] 0001-mac80211-fix-addba_resp_timer-hard-lockup.patch Thank you for good bug report. Here is proposed patch, let me know if it fix the problem. Kernel build with patch is here: http://koji.fedoraproject.org/koji/taskinfo?taskID=2704610
Any news on above?
Hello, I've downloaded the kernel and installed it but because I'm not in my usual wifi crowded environment I can't say it works for sure (though I think it will)... I should be able to make sure on Sunday night.
Ok, It looks like we have a case of "it works for me": my laptop doesn't hang anymore. Thanks for the quick turn around!
(In reply to comment #4) > It looks like we have a case of "it works for me": my laptop doesn't hang > anymore. As you are a bug reporter, that mean patch fix the bug :-)
Applied in fedora kernel: http://koji.fedoraproject.org/koji/buildinfo?buildID=213595
Did also had many hard freezes on Dell e4300 with iwlagn driver for Intel 5300 wlan card. Did a fresh install but after 1 day, freeze reoccured. Now testing new 77 kernel, looks good so far, will give further feedback.
kernel-2.6.35.11-83.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/kernel-2.6.35.11-83.fc14
kernel-2.6.35.11-83.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 676196 has been marked as a duplicate of this bug. ***