Bug 592011

Summary: iwlagn hangs kernel since fedora 13
Product: [Fedora] Fedora Reporter: Andrew <asavva>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: anton, bojan, dougsland, gansalmon, itamar, jonathan, kernel-maint, linville, reinette.chatre
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.33.5-112.fc13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 595846 (view as bug list) Environment:
Last Closed: 2010-08-07 05:57:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 595846    

Description Andrew 2010-05-13 17:09:58 UTC
Description of problem:

I'm still seeing a lot of kernel oops on my fedora 13 machine with the latest
kernel (i.e. turn on wireless and then wait less than 30 mins and I'm pretty
much guaranteed to get a crash):

Linux loso 2.6.33.3-85.fc13.x86_64 #1 SMP Thu May 6 18:09:49 UTC 2010 x86_64
x86_64 x86_64 GNU/Linux

WARNING: at drivers/net/wireless/iwlwifi/iwl-scan.c:658
iwl_fill_probe_req+0x75/0x99 [iwlcore]()
Hardware name: VGN-SZ691N
Modules linked in: snd_seq_dummy vboxnetadp vboxnetflt vboxdrv aes_x86_64
aes_generic fuse rfcomm sco bridge stp llc bnep l2cap autofs4 coretemp sunrpc
cpufreq_ondemand acpi_cpufreq freq_table nf_conntrack_ipv6 ip6t_ipv6header
ip6t_REJECT ip6table_filter ip6_tables ipv6 uinput nvidia(P) snd_hda_codec_idt
snd_hda_intel arc4 snd_hda_codec ecb snd_hwdep uvcvideo snd_seq iwlagn
snd_seq_device iwlcore sony_laptop videodev snd_pcm btusb v4l1_compat snd_timer
v4l2_compat_ioctl32 bluetooth mac80211 iTCO_wdt tifm_7xx1 snd
iTCO_vendor_support tifm_core i2c_i801 joydev cfg80211 soundcore snd_page_alloc
rfkill sky2 microcode usb_storage firewire_ohci firewire_core crc_itu_t
yenta_socket rsrc_nonstatic nouveau ttm drm_kms_helper drm i2c_algo_bit video
output i2c_core [last unloaded: vboxdrv]
Pid: 880, comm: iwlagn Tainted: P        W  2.6.33.3-85.fc13.x86_64 #1
Call Trace:
[<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f
[<ffffffff8104b57f>] warn_slowpath_null+0xf/0x11
[<ffffffffa0239690>] iwl_fill_probe_req+0x75/0x99 [iwlcore]
[<ffffffffa023a721>] iwl_bg_request_scan+0x97a/0x1081 [iwlcore]
[<ffffffffa02227aa>] ? iwl_set_tx_power+0xe2/0x11d [iwlcore]
[<ffffffff81060d3d>] worker_thread+0x1a4/0x232
[<ffffffffa0239da7>] ? iwl_bg_request_scan+0x0/0x1081 [iwlcore]
[<ffffffff81064817>] ? autoremove_wake_function+0x0/0x34
[<ffffffff81060b99>] ? worker_thread+0x0/0x232
[<ffffffff810643c7>] kthread+0x7a/0x82
[<ffffffff8100a924>] kernel_thread_helper+0x4/0x10
[<ffffffff8106434d>] ? kthread+0x0/0x82
[<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10

Here is another report I got:
general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:38/PNP0C09:00/PNP0C0A:00/power_supply/BAT1/energy_full
CPU 1 
Pid: 884, comm: iwlagn Tainted: P           2.6.33.3-85.fc13.x86_64 #1 VAIO                            /VGN-SZ691N
RIP: 0010:[<ffffffffa0220a12>]  [<ffffffffa0220a12>] iwl_bg_request_scan+0xc6b/0x1081 [iwlcore]
RSP: 0018:ffff880139885d30  EFLAGS: 00010293
RAX: ffff8800afed4400 RBX: ffff8801381b92c0 RCX: 00000000000000c0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c1
RBP: ffff880139885e30 R08: ffff8801386f1b16 R09: 00000000ffffffff
R10: 000000008ce81300 R11: 0000000000000000 R12: ffff8801381b11e0
R13: 0074006500440065 R14: ffff8801386f1b16 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880005900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fef0b531000 CR3: 0000000001a3b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process iwlagn (pid: 884, threadinfo ffff880139884000, task ffff880139970000)
Stack:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000200 000000000000000a 0000000000000035
 ffff8801386f1800 000000c1000000c0 ffff880139885fd8 0000000000000000
Call Trace:
[<ffffffff81060d3d>] worker_thread+0x1a4/0x232
[<ffffffffa021fda7>] ? iwl_bg_request_scan+0x0/0x1081 [iwlcore]
[<ffffffff81064817>] ? autoremove_wake_function+0x0/0x34
[<ffffffff81060b99>] ? worker_thread+0x0/0x232
[<ffffffff810643c7>] kthread+0x7a/0x82
[<ffffffff8100a924>] kernel_thread_helper+0x4/0x10
[<ffffffff8106434d>] ? kthread+0x0/0x82
[<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10
Code: 65 48 8b 04 25 08 cc 00 00 89 8d 48 ff ff ff 48 89 85 68 ff ff ff 48 89 85 50 ff ff ff e9 36 02 00 00 48 63 55 a0 4c 8b 6c d0 38 <45> 39 7d 00 0f 85 20 02 00 00 41 0f b7 7d 04 e8 a2 fb ef ff 44 
RIP  [<ffffffffa0220a12>] iwl_bg_request_scan+0xc6b/0x1081 [iwlcore]
RSP <ffff880139885d30>

Both seem to be in the iwl_bg_request_scan method.

I've noticed on some other threads that using a Cisco router seems to be
triggering the bug. I am using a cisco router and haven't had the crash when
I've been using other modems.

http://www.gossamer-threads.com/lists/linux/kernel/1221699

Previously this has been working perfectly on fedora 10, 11 & 12.

Version-Release number of selected component (if applicable):
Fedora 13
Kernel: 2.6.33.3-85.fc13.x86_6

How reproducible:
Reboot machine and wait 10-15 mins while using a Cisco router. I've never seen the crash on other wireless routers with the same computer.

Steps to Reproduce:
1. Reboot
2. Use internet (light usage or heavy it doesn't matter)
3. Network stops
4. Disconnect network in NetworkManager
5. Reconnect network in NetworkManager
6. Internet comes back for a while
7. Kernel oops report generated
8. A few minutes later it hangs or goes through steps 3-6 and then hangs (caps lock and num lock flashing together and completely unresponsive).
  
Actual results:
Kernel oops / hang

Expected results:
Internet should work

Additional info:

Comment 1 John W. Linville 2010-05-13 17:43:50 UTC
 iwl_fill_probe_req:

       if (WARN_ON(left < ie_len))
                return len;

But still, it is a WARN_ON -- it shouldn't hang the box.  Maybe it is indicative of some other failure?  Hopefully the Intel team can shed some light?

Comment 2 reinette chatre 2010-05-13 22:59:13 UTC
What hardware is this?

This kernel seems significantly different from 2.6.33.3. Could you please guide me on how to obtain the sources of this kernel? I think I asked for this help before but forgot how to do it, I'm sorry and will make sure to post what you send next somewhere where I will always get it.

Comment 3 Bojan Smojver 2010-05-14 04:14:05 UTC
(In reply to comment #2)
> Could you please guide
> me on how to obtain the sources of this kernel?

You can get source RPM from here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=172010

All Fedora kernels are here:

http://koji.fedoraproject.org/koji/packageinfo?packageID=8

PS. Was just passing by and saw your question.

Comment 4 Andrew 2010-05-14 06:02:30 UTC
(In reply to comment #2)
> What hardware is this?

It's a Sony Vaio VGN-SZ691N. Here are the iwlagn specific log entries:

iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, 2.6.33.3-85.fc13.x86_64-kds
iwlagn: Copyright(c) 2003-2009 Intel Corporation
iwlagn 0000:06:00.0: power state changed by ACPI to D0
iwlagn 0000:06:00.0: power state changed by ACPI to D0
iwlagn 0000:06:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:06:00.0: setting latency timer to 64
iwlagn 0000:06:00.0: Detected Intel Wireless WiFi Link 4965AGN REV=0x4
iwlagn 0000:06:00.0: Tunable channels: 11 802.11bg, 13 802.11a channels
iwlagn 0000:06:00.0: irq 30 for MSI/MSI-X
iwlagn 0000:06:00.0: firmware: requesting iwlwifi-4965-2.ucode
iwlagn 0000:06:00.0: loaded firmware version 228.61.2.24
iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:22:6b:f8:63:46 tid = 0
iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:22:6b:f8:63:46 tid = 0
iwlagn 0000:06:00.0: power state changed by ACPI to D3
iwlagn 0000:06:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
iwlagn 0000:06:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xf8000004)
iwlagn 0000:06:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
iwlagn 0000:06:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)
iwlagn 0000:06:00.0: power state changed by ACPI to D0
iwlagn 0000:06:00.0: power state changed by ACPI to D0
iwlagn 0000:06:00.0: power state changed by ACPI to D0
iwlagn 0000:06:00.0: power state changed by ACPI to D0

> 
> This kernel seems significantly different from 2.6.33.3. Could you please guide
> me on how to obtain the sources of this kernel? I think I asked for this help
> before but forgot how to do it, I'm sorry and will make sure to post what you
> send next somewhere where I will always get it.

Comment 5 John W. Linville 2010-05-17 15:03:33 UTC
The info from comment 3 is correct.  FWIW, this kernel has the patches which were recently discussed on the stable list.

Comment 6 reinette chatre 2010-05-18 04:42:15 UTC
(In reply to comment #5)
> The info from comment 3 is correct.  FWIW, this kernel has the patches which
> were recently discussed on the stable list.    

Any chance there are instructions out there to get the kernel source if you are not running Fedora?

In the mean time, since this has the RF reset code I think we are looking at another incarnation of an internal scan race here. If you are supporting internal scanning (RF reset) then you really need the recent scan races fixes that Johannes and I sent upstream. 

Johannes's patch made it to linux-2.6:

commit 88be026490ed89c2ffead81a52531fbac5507e01
Author: Johannes Berg <johannes.berg>
Date:   Wed Apr 7 00:21:36 2010 -0700

    iwlwifi: fix scan races


Mine didn't, it can be found on iwlwifi-2.6's wireless-2.6 branch and was submitted at:http://thread.gmane.org/gmane.linux.kernel.wireless.general/50897/focus=50899

Since you need these two anyway, any chance to build a kernel with them and retest?

Comment 7 John W. Linville 2010-05-18 18:30:45 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=2194306

Test kernel above has the patches Reinette recommended in comment 6.  Please give them a try and post the results here -- thanks!

Comment 8 John W. Linville 2010-05-18 18:34:19 UTC
Reinette, as for getting the sources w/o running Fedora...that could be difficult.  Perhaps your local distro has rpm available?  If so, then rpmbuild may still be the right tool to use.  Otherwise, rpm2cpio (piped to cpio) can extract everthing.  But you would still need to unpack the tarball and apply any patches in the proper order.  Maybe a virtual host running a Fedora image would be easier? :-)

Comment 9 Andrew 2010-05-18 19:58:54 UTC
(In reply to comment #7)
> http://koji.fedoraproject.org/koji/taskinfo?taskID=2194306
> 
> Test kernel above has the patches Reinette recommended in comment 6.  Please
> give them a try and post the results here -- thanks!    

Cheers, I've installed and am running the test kernel now. I'm going to try downloading a large file (fedora ISO probably) and see how that fares. I'll report back tomorrow morning to say how it's going.

Thanks!

Comment 10 Andrew 2010-05-19 17:19:25 UTC
Yes I can confirm the patch seems to be holding. I've been using it for over 24 hours and no crash.

Many thanks!