Bug 642031 - rt2500 module crashes
Summary: rt2500 module crashes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Stanislaw Gruszka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-11 20:11 UTC by Ian Malone
Modified: 2011-02-05 21:01 UTC (History)
8 users (show)

Fixed In Version: kernel-2.6.34.7-62.fc13
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-05 21:01:25 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Ian Malone 2010-10-11 20:11:05 UTC
Description of problem:

After running for a while (possibly after periods of inactivity) the wireless connection is lost and NetworkManager keeps querying for the encryption password. After stopping NetworkManager and re-loading (modprobe -r, modprobe) the rt2500pci module this crash occurs. Further attempts to remove/install the module hang, as does attempting to restart NM or do ifconfig wlan0 down. Only rebooting seems to bring it back (also, causes a hang on shutdown requiring reset button).

Version-Release number of selected component (if applicable):
kernel-2.6.34.7-56.fc13.x86_64

How reproducible:
Seems to inevitably happen

Steps to Reproduce:
1. Boot, login, connect to wireless (54g, WPA).
2. Wait until connection is lost.
3. Steps as above (remove module, re-load).
  
Actual results:
Driver blows up.

Expected results:
Driver doesn't blow up.

Additional info:
kerneloops uploaded, but I don't know how to link to or include it here.
Package:    	kernel
Latest Crash:	Sat 09 Oct 2010 00:59:47 
Command:    	not_applicable
Reason:     	BUG: unable to handle kernel NULL pointer dereference at (null)
Comment:    	Sometime this results in the network coming back up, sometimes it resulsts in this crash.
Bug Reports:	Kernel oops report was uploaded

Comment 1 Stanislaw Gruszka 2010-10-12 18:22:43 UTC
(In reply to comment #0)
> Additional info:
> kerneloops uploaded, but I don't know how to link to or include it here.

Please attach dmesg or part of /var/log/messages where the problem is seen.

Comment 2 Ian Malone 2010-10-13 00:02:54 UTC
Hi,

It's hard to capture the module failure in dmesg as it gets swamped by later messages. Here's what it looks like after connection is lost (NM trying to resume):
phy0 -> rt2500pci_set_device_state: Error - Device failed to enter state 3 (-16).
<!--lots of those ^ -->
phy0 -> rt2500pci_set_device_state: Error - Device failed to enter state 3 (-16).
No probe response from AP 00:1d:68:e7:7a:05 after 500ms, disconnecting.
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: Calling CRDA for country: GB
cfg80211: Regulatory domain changed to country: GB
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm)
wlan0: authenticate with 00:1d:68:e7:7a:05 (try 1)
wlan0: authenticate with 00:1d:68:e7:7a:05 (try 2)
wlan0: authenticate with 00:1d:68:e7:7a:05 (try 3)
wlan0: authentication with 00:1d:68:e7:7a:05 timed out

Stop NM and bring the module down with modprobe -r rt2500pci:
rt2500pci 0000:02:08.0: PCI INT A disabled

Bring it back up with modprobe rt2500pci:
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
rt2500pci 0000:02:08.0: PCI INT A -> Link[APC1] -> GSI 16 (level, low) -> IRQ 16
phy0: Selected rate control algorithm 'minstrel'
Registered led device: rt2500pci-phy0::radio
Registered led device: rt2500pci-phy0::quality
cfg80211: Calling CRDA for country: GB
cfg80211: Regulatory domain changed to country: GB
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm)

Start NM:
phy0 -> rt2500pci_set_device_state: Error - Device failed to enter state 3 (-16).
phy0 -> rt2500pci_set_device_state: Error - Device failed to enter state 4 (-5).
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa024fb0a>] rt2x00queue_init_queues+0x37/0x85 [rt2x00lib]
PGD 79f8c067 PUD 79e5a067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:14.0/net/eth0/ifindex
CPU 0 
Modules linked in: rt2500pci rt2x00pci rt2x00lib mac80211 cfg80211 eeprom_93cx6 snd_seq_dummy nls_utf8 fuse sunrpc cpufreq_ondemand powernow_k8 freq_table ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 vfat fat uinput nouveau snd_ice1724 ttm snd_rawmidi arc4 snd_ice17xx_ak4xxx drm_kms_helper snd_ac97_codec ecb drm ac97_bus snd_ak4xxx_adda snd_ak4114 snd_pt2258 snd_i2c snd_ak4113 snd_seq snd_seq_device i2c_nforce2 snd_pcm i2c_algo_bit video snd_timer rfkill snd output i2c_core soundcore edac_core edac_mce_amd forcedeth snd_page_alloc k8temp ppdev parport_pc parport asus_atk0110 joydev xpad microcode ata_generic pata_acpi pata_amd sata_nv [last unloaded: eeprom_93cx6]

Pid: 1459, comm: wpa_supplicant Not tainted 2.6.34.7-56.fc13.x86_64 #1 M2NPV-MX/System Product Name
RIP: 0010:[<ffffffffa024fb0a>]  [<ffffffffa024fb0a>] rt2x00queue_init_queues+0x37/0x85 [rt2x00lib]
RSP: 0018:ffff880037fd1ba8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800377d2c00 RCX: 0000000000000000
RDX: ffff8800377d2c14 RSI: 0000000000000246 RDI: 0000000000000000
RBP: ffff880037fd1bc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000113 R11: 0000000000001000 R12: ffff88007a119340
R13: 0000000000000001 R14: ffff8800377d2c38 R15: ffff880037fd1c58
FS:  00007f85cdd657c0(0000) GS:ffff880002000000(0000) knlGS:00000000f7761860
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007a254000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process wpa_supplicant (pid: 1459, threadinfo ffff880037fd0000, task ffff880037a78000)
Stack:
 ffff88007a119340 0000000000000000 ffff880037b00000 0000000000000000
<0> ffff880037fd1be8 ffffffffa024e7c6 ffff88007a119340 ffff880037b006c0
<0> ffff880037fd1c18 ffffffffa024e8cb ffff880037fd1c38 ffffffffa0457548
Call Trace:
 [<ffffffffa024e7c6>] rt2x00lib_stop+0x68/0xd5 [rt2x00lib]
 [<ffffffffa024e8cb>] rt2x00lib_start+0x98/0xbb [rt2x00lib]
 [<ffffffffa0457548>] ? cfg80211_netdev_notifier_call+0x3f9/0x412 [cfg80211]
 [<ffffffffa024f363>] rt2x00mac_start+0x1d/0x1f [rt2x00lib]
 [<ffffffffa0486b34>] ieee80211_open+0x288/0x5fa [mac80211]
 [<ffffffff813a7146>] __dev_open+0x8e/0xbc
 [<ffffffff813a4ee3>] __dev_change_flags+0xbe/0x141
 [<ffffffff813a7082>] dev_change_flags+0x21/0x57
 [<ffffffff813fd05c>] devinet_ioctl+0x29a/0x54a
 [<ffffffff813a89fb>] ? dev_ioctl+0x4d8/0x67a
 [<ffffffff813fe32d>] inet_ioctl+0x8f/0xa7
 [<ffffffff813947e4>] sock_do_ioctl+0x29/0x48
 [<ffffffff81394c27>] sock_ioctl+0x20d/0x21c
 [<ffffffff8111aabf>] vfs_ioctl+0x32/0xa6
 [<ffffffff8111b032>] do_vfs_ioctl+0x483/0x4c9
 [<ffffffff8111b0ce>] sys_ioctl+0x56/0x79
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Code: 1f 44 00 00 48 8b 9f 50 04 00 00 49 89 fc eb 3f 48 89 df 45 31 ed e8 8c ff ff ff eb 26 44 89 ef 48 8b 43 08 41 ff c5 48 6b ff 28 <48> c7 04 38 00 00 00 00 49 8b 44 24 08 48 03 7b 08 48 8b 40 40 
RIP  [<ffffffffa024fb0a>] rt2x00queue_init_queues+0x37/0x85 [rt2x00lib]
 RSP <ffff880037fd1ba8>
CR2: 0000000000000000
---[ end trace cd1a0743111a592e ]---

I've tried removing more modules: all the rt2x and mac80211 (but not cfg80211 yet) before resuming and I think the result was the same then, but would have to test again.

Comment 3 Stanislaw Gruszka 2010-10-13 09:05:51 UTC
(In reply to comment #2)
> messages. Here's what it looks like after connection is lost (NM trying to
> resume):
> phy0 -> rt2500pci_set_device_state: Error - Device failed to enter state 3
> (-16).
> <!--lots of those ^ -->
> phy0 -> rt2500pci_set_device_state: Error - Device failed to enter state 3
> (-16).

This should gone when you will use below command:
iwconfig wlan0 power off
You can add that command to start up scripts or udev rules. 

> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffffa024fb0a>] rt2x00queue_init_queues+0x37/0x85 [rt2x00lib]
Ehh, nasty ...

Comment 4 Stanislaw Gruszka 2010-10-13 10:31:55 UTC
Reading symbols from /usr/lib/debug/lib/modules/2.6.34.7-56.fc13.x86_64/kernel/drivers/net/wireless/rt2x00/rt2x00lib.ko.debug...done.
(gdb) l *(rt2x00queue_init_queues+0x37)
0x1b2e is in rt2x00queue_init_queues (drivers/net/wireless/rt2x00/rt2x00queue.c:720).
715	
716		queue_for_each(rt2x00dev, queue) {
717			rt2x00queue_reset(queue);
718	
719			for (i = 0; i < queue->limit; i++) {
720				queue->entries[i].flags = 0;
721	
722				rt2x00dev->ops->lib->clear_entry(&queue->entries[i]);
723			}
724		}
(gdb)

Comment 5 Stanislaw Gruszka 2010-10-13 11:13:10 UTC
commit 9655a6ec19ca656af246fb80817aa337892aefbf
Author: Gertjan van Wingerde <gwingerde>
Date:   Thu May 13 21:16:03 2010 +0200

    rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions.

should resolve this problem (see: http://marc.info/?l=linux-wireless&m=126997977510309&w=2) I'm going to prepare test kernel.

Comment 6 Stanislaw Gruszka 2010-10-13 11:31:26 UTC
Please check if this kernel (currently compiling) fix the problem:
http://koji.fedoraproject.org/koji/taskinfo?taskID=2532153

Comment 7 Ian Malone 2010-10-13 22:29:41 UTC
Thanks for the quick response. Unfortunately it seems to persist, the crash looks to be similar to before:

phy0 -> rt2500pci_set_device_state: Error - Device failed to enter state 3 (-16).
phy0 -> rt2500pci_set_device_state: Error - Device failed to enter state 4 (-5).
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa035fb0a>] rt2x00queue_init_queues+0x37/0x85 [rt2x00lib]
PGD 79d5c067 PUD 79d5d067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:14.0/net/eth0/uevent
CPU 0 
Modules linked in: rt2500pci rt2x00pci rt2x00lib mac80211 cfg80211 eeprom_93cx6 nls_utf8 fuse sunrpc cpufreq_ondemand powernow_k8 freq_table ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 vfat fat uinput snd_ice1724 snd_rawmidi snd_ice17xx_ak4xxx snd_ac97_codec arc4 ecb ac97_bus snd_ak4xxx_adda snd_ak4114 snd_pt2258 snd_i2c snd_ak4113 snd_seq snd_seq_device snd_pcm nouveau ttm drm_kms_helper edac_core drm i2c_algo_bit snd_timer video snd soundcore rfkill joydev k8temp asus_atk0110 edac_mce_amd microcode i2c_nforce2 xpad snd_page_alloc output ppdev parport_pc forcedeth i2c_core parport ata_generic pata_acpi pata_amd sata_nv [last unloaded: eeprom_93cx6]

Pid: 1430, comm: wpa_supplicant Not tainted 2.6.34.7-59.bz642031.fc13.x86_64 #1 M2NPV-MX/System Product Name
RIP: 0010:[<ffffffffa035fb0a>]  [<ffffffffa035fb0a>] rt2x00queue_init_queues+0x37/0x85 [rt2x00lib]
RSP: 0018:ffff880037bd9ba8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800251b7200 RCX: 0000000000000000
RDX: ffff8800251b7214 RSI: 0000000000000246 RDI: 0000000000000000
RBP: ffff880037bd9bc8 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000010f R11: 0000000000001000 R12: ffff88007c25f340
R13: 0000000000000001 R14: ffff8800251b7238 R15: ffff880037bd9c58
FS:  00007f1b709327c0(0000) GS:ffff880002000000(0000) knlGS:00000000f77ae860
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007c0d7000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process wpa_supplicant (pid: 1430, threadinfo ffff880037bd8000, task ffff8800379b5dc0)
Stack:
 ffff88007c25f340 0000000000000000 ffff88003e35b000 0000000000000000
<0> ffff880037bd9be8 ffffffffa035e7c6 ffff88007c25f340 ffff88003e35b6c0
<0> ffff880037bd9c18 ffffffffa035e8cb ffff880037bd9c38 ffffffffa04b2548
Call Trace:
 [<ffffffffa035e7c6>] rt2x00lib_stop+0x68/0xd5 [rt2x00lib]
 [<ffffffffa035e8cb>] rt2x00lib_start+0x98/0xbb [rt2x00lib]
 [<ffffffffa04b2548>] ? cfg80211_netdev_notifier_call+0x3f9/0x412 [cfg80211]
 [<ffffffffa035f363>] rt2x00mac_start+0x1d/0x1f [rt2x00lib]
 [<ffffffffa04e1b34>] ieee80211_open+0x288/0x5fa [mac80211]
 [<ffffffff813a7146>] __dev_open+0x8e/0xbc
 [<ffffffff813a4ee3>] __dev_change_flags+0xbe/0x141
 [<ffffffff813a7082>] dev_change_flags+0x21/0x57
 [<ffffffff813fd05c>] devinet_ioctl+0x29a/0x54a
 [<ffffffff813a89fb>] ? dev_ioctl+0x4d8/0x67a
 [<ffffffff813fe32d>] inet_ioctl+0x8f/0xa7
 [<ffffffff813947e4>] sock_do_ioctl+0x29/0x48
 [<ffffffff81394c27>] sock_ioctl+0x20d/0x21c
 [<ffffffff8111aabf>] vfs_ioctl+0x32/0xa6
 [<ffffffff8111b032>] do_vfs_ioctl+0x483/0x4c9
 [<ffffffff8111b0ce>] sys_ioctl+0x56/0x79
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Code: 1f 44 00 00 48 8b 9f 50 04 00 00 49 89 fc eb 3f 48 89 df 45 31 ed e8 8c ff ff ff eb 26 44 89 ef 48 8b 43 08 41 ff c5 48 6b ff 28 <48> c7 04 38 00 00 00 00 49 8b 44 24 08 48 03 7b 08 48 8b 40 40 
RIP  [<ffffffffa035fb0a>] rt2x00queue_init_queues+0x37/0x85 [rt2x00lib]
 RSP <ffff880037bd9ba8>
CR2: 0000000000000000
---[ end trace 7d958c1505d89e82 ]---

Comment 8 Stanislaw Gruszka 2010-10-14 12:16:08 UTC
Here is another kernel to try:
http://koji.fedoraproject.org/koji/taskinfo?taskID=2534613

It include 
commit 5731858d0047cad309d334c4cd6ccb6199bf28fe
Author: Gertjan van Wingerde <gwingerde>
Date:   Tue Mar 30 23:50:23 2010 +0200

    rt2x00: Disable auto wakeup before waking up device.

If it does not help, please check if "iwconfig wlan0 power off" command helps, and if not, please check if F-14 has also this problem (i.e. you can try F-14 live CD).

Comment 9 Ian Malone 2010-10-19 10:05:06 UTC
Hi, sorry for the delay. It takes a while before this crash occurs and I wanted to be sure about what I was reporting.

Thanks for the new kernel, it appears to fix the problem (as well as removing the 'failed to enter state 3 (-16)' errors). In addition the previous koji kernel, and I think kernel-2.6.34.7-56.fc13.x86_64, didn't crash with 'iwconfig wlan0 power off' set.

I also had 'rate 54M' set in my rc.local which appeared to interact with this bug somehow to trigger it faster, though it does not cause it. (Under previous Fedora releases rate auto would always choose 1M, this seems to be fixed in F13).

Thanks for your help, let me know if you need any more testing to close this.

Ian

Comment 10 Stanislaw Gruszka 2010-10-27 10:55:34 UTC
Patches were included in F-13 kernel
http://koji.fedoraproject.org/koji/buildinfo?buildID=201533

Comment 11 Fedora Update System 2010-12-03 15:34:38 UTC
kernel-2.6.34.7-63.fc13 has been submitted as an update for Fedora 13.
https://admin.fedoraproject.org/updates/kernel-2.6.34.7-63.fc13

Comment 12 Fedora Update System 2010-12-07 20:07:34 UTC
kernel-2.6.34.7-63.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.