Bug 716988 - Ralink RT2573 USB dongle randomly overheats
Summary: Ralink RT2573 USB dongle randomly overheats
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Stanislaw Gruszka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-27 16:47 UTC by Tom Horsley
Modified: 2012-04-25 08:41 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-25 08:41:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Script I use to start the access point running. (205 bytes, text/plain)
2011-07-13 12:04 UTC, Tom Horsley
no flags Details
And here's the script I use to shut down the AP (131 bytes, text/plain)
2011-07-13 12:06 UTC, Tom Horsley
no flags Details

Description Tom Horsley 2011-06-27 16:47:12 UTC
Description of problem:

I have a Belkin USB dongle I use as an access point. It contains a Ralink
chipset that shows up as this in lsusb:

Bus 001 Device 004: ID 050d:705a Belkin Components F5D7050 Wireless G Adapter v3000 [Ralink RT2573]

On Fedora 14, it worked first time every time when I would plug it in and
start hostapd to run it as an access point.

On fedora 15, it sometimes works, and sometimes does not. I have to try
plugging and unplugging and restarting hostapd, etc several times before
I can connect.

Whenever I have this problem connecting, I eventually notice that even though
I seem to be connected and working OK, the USB dongle is physically overheating
to the point where it can be painful to touch (at which point I pull it out
of the machine).

The randomness of this behavior leads me to believe there may be some
un-initialized data being used somewhere in the drivers that sometimes cranks
the power up too high (or something like that which would cause the
overheating).

Version-Release number of selected component (if applicable):
kernel-2.6.38.8-32.fc15.x86_64

How reproducible:
It seems to have the overheating problem just about half the times I
plug it in.


Steps to Reproduce:
1. see above
2.
3.
  
Actual results:
Sometimes does not allow me to connect to the access point and overheats
if I do manage to connect. Sometimes works fine with no overheating.

Expected results:
Always works fine with no overheating.

Additional info:

Here's the lsusb -v details for the dongle:

Bus 001 Device 004: ID 050d:705a Belkin Components F5D7050 Wireless G Adapter v3000 [Ralink RT2573]
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x050d Belkin Components
  idProduct          0x705a F5D7050 Wireless G Adapter v3000 [Ralink RT2573]
  bcdDevice            0.01
  iManufacturer           1 Belkin
  iProduct                2 Belkin 54g USB Network Adapter
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           32
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              300mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x0000
  (Bus Powered)

Comment 1 Tom Horsley 2011-06-27 16:56:10 UTC
Here's what shows up in dmesg when I plug in the dongle and start up hostapd:

[16092.774040] usb 1-3: new high speed USB device using ehci_hcd and address 4
[16093.062017] usb 1-3: New USB device found, idVendor=050d, idProduct=705a
[16093.062023] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[16093.062029] usb 1-3: Product: Belkin 54g USB Network Adapter
[16093.062032] usb 1-3: Manufacturer: Belkin
[16093.382765] cfg80211: Calling CRDA to update world regulatory domain
[16093.487212] cfg80211: World regulatory domain updated:
[16093.487216] cfg80211:     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[16093.487220] cfg80211:     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[16093.487224] cfg80211:     (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[16093.487228] cfg80211:     (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[16093.487231] cfg80211:     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[16093.487235] cfg80211:     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[16093.606375] phy0 -> rt2500usb_init_eeprom: Error - Invalid RT chipset detected.
[16093.606383] phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
[16093.606458] usbcore: registered new interface driver rt2500usb
[16093.913604] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[16093.914046] Registered led device: rt73usb-phy1::radio
[16093.914078] Registered led device: rt73usb-phy1::assoc
[16093.914112] Registered led device: rt73usb-phy1::quality
[16093.914732] usbcore: registered new interface driver rt73usb
[16094.001796] cfg80211: Calling CRDA for country: US
[16094.007785] cfg80211: Regulatory domain changed to country: US
[16094.007789] cfg80211:     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[16094.007793] cfg80211:     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
[16094.007797] cfg80211:     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm)
[16094.007800] cfg80211:     (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[16094.007804] cfg80211:     (5490000 KHz - 5600000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[16094.007807] cfg80211:     (5650000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[16094.007811] cfg80211:     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)
[16172.764945] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[16185.714011] wlan0: no IPv6 routers present

This happens to be an example where the dongle worked fine and did not overheat.

Comment 2 Tom Horsley 2011-07-06 16:56:30 UTC
Here's a dmesg from a time I plugged it in and it didn't work:

[794854.750037] usb 1-3: new high speed USB device using ehci_hcd and address 7
[794855.038016] usb 1-3: New USB device found, idVendor=050d, idProduct=705a
[794855.038022] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[794855.038028] usb 1-3: Product: Belkin 54g USB Network Adapter
[794855.038031] usb 1-3: Manufacturer: Belkin
[794855.148507] phy6 -> rt2500usb_init_eeprom: Error - Invalid RT chipset detected.
[794855.148516] phy6 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
[794855.411672] ieee80211 phy7: Selected rate control algorithm 'minstrel_ht'
[794855.412225] Registered led device: rt73usb-phy7::radio
[794855.412275] Registered led device: rt73usb-phy7::assoc
[794855.412328] Registered led device: rt73usb-phy7::quality
[794875.700404] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[794889.170013] wlan0: no IPv6 routers present

All the cfg80211 messages seem to be missing. Timing issue maybe? Something
not be recognized the same way all the time?

Comment 3 John W. Linville 2011-07-12 17:59:46 UTC
You are the "lucky" owner of one of the Ralink-based devices that shares a USB ID with a different set or Ralink-devices.  The problem is that in Linux, the two set of devices have different drivers and there is no way to know which driver to load other than to simply try one and see if it works or not.  The load order between the two drivers can be a bit non-determnistic, unfortunately.

Your device seems to need rt73usb.  So please add a line like this to the file /etc/modprobe.d/blacklist.conf:

   blacklist rt2500usb

Then reboot.  Does that improve things?

I'm not sure I have an explanation for the overheating.  I suspect that this device (which I'm guessing is several years old) may be getting a little flaky, and the USB subsystem may not be able to properly enumerate it.  For example, there is no "usbcore: registered new interface driver rt73usb" line in comment 2.  A hypothesis might be that this enumeration failure may have resulted in improper power being applied to the USB port in question, leading to the overheating.  But honestly, I really don't know.  My hope is that configuring to only load the correct driver will avoid this confusion for you.

Comment 4 Tom Horsley 2011-07-13 12:02:47 UTC
Well, the errors from trying rt2500usb did go away, but the randomness
still persists. Here's the latest dmesg:

[  361.814039] usb 1-3: new high speed USB device using ehci_hcd and address 4
[  362.101868] usb 1-3: New USB device found, idVendor=050d, idProduct=705a
[  362.101875] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[  362.101880] usb 1-3: Product: Belkin 54g USB Network Adapter
[  362.101884] usb 1-3: Manufacturer: Belkin
[  362.409342] cfg80211: Calling CRDA to update world regulatory domain
[  362.531610] cfg80211: World regulatory domain updated:
[  362.531614] cfg80211:     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[  362.531618] cfg80211:     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  362.531622] cfg80211:     (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[  362.531625] cfg80211:     (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[  362.531629] cfg80211:     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  362.531632] cfg80211:     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  362.843073] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[  362.843615] Registered led device: rt73usb-phy0::radio
[  362.843675] Registered led device: rt73usb-phy0::assoc
[  362.843725] Registered led device: rt73usb-phy0::quality
[  362.844341] usbcore: registered new interface driver rt73usb
[  362.921229] cfg80211: Calling CRDA for country: US
[  362.927720] cfg80211: Regulatory domain changed to country: US
[  362.927724] cfg80211:     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[  362.927729] cfg80211:     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
[  362.927732] cfg80211:     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm)
[  362.927736] cfg80211:     (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  362.927739] cfg80211:     (5490000 KHz - 5600000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  362.927743] cfg80211:     (5650000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  362.927746] cfg80211:     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)

My samsung intercept phone even thinks it is connected, but nothing actually
happens when I try to access the internet. That is one of the ways it
typically fails.

After a few minutes, it decided it was no longer connected, though I didn't
turn anything off. This showed up in /var/log/messages:

Jul 13 07:46:25 tomh hostapd: wlan0: STA 60:a1:0a:c7:af:01 IEEE 802.11: deauthenticated due to local deauth request

By the way, I don't think it is several years old, at least I just bought
it via amazon.com, though it doesn't have a manufacture date on it anywhere,
the CD that came with it had Windows 7 drivers, so I suspect it is fairly
new.

I ran the script I use to disable the AP, then re-ran the enable script,
and the 2nd time it seemed to connect, but now the overheating is starting
to happen. (That always seems to be the way it goes - if it works perfectly
the first time, there is no overheating - if I have to try more than once,
the overheating starts).

Comment 5 Tom Horsley 2011-07-13 12:04:55 UTC
Created attachment 512644 [details]
Script I use to start the access point running.

Maybe I'm doing something silly when I start it? Here's the startup script
I use (runs as root).

Comment 6 Tom Horsley 2011-07-13 12:06:33 UTC
Created attachment 512645 [details]
And here's the script I use to shut down the AP

Comment 7 Stanislaw Gruszka 2011-07-13 13:22:23 UTC
I'm not sure if

Comment 8 Tom Horsley 2011-07-21 12:59:58 UTC
I don't know if this is useful info, but I think I have found a way to work
around the overheating:

If the initial connection attempt does not work, merely running the stop
then start script over a few times will get me connected eventually, but
produce the overheating problem.

If I completely start from scratch, not only running the stop script, but
also unplugging the dongle, then plugging it back in and running the start
script, when I finally get a connection to work, I no longer get the
overheating, so I at least have a way to work around this (but it would sure
be more convenient if it always worked the first time as it seemed to
do when I was running fedora 14).

Comment 9 Stanislaw Gruszka 2011-07-22 14:16:47 UTC
I posted patch upstream that do usb reset every time rt2x00 device is probed, and second one that fix some random memory corruption. I'm not sure if this could help here, but it worth to try. You can use upstream drivers on fedora using compat-wireless, try compat-wireless-next package from:
http://people.redhat.com/sgruszka/compact_wireless.html

Comment 10 Tom Horsley 2011-07-25 11:42:00 UTC
I'm trying the compat-wireless-next package now, and I don't see any change (unless maybe it is worse :-). I got connected for a few seconds once, but then it stopped talking again. I've tried about 5 times now to connect and haven't gotten a reliable connection yet.

Comment 11 Stanislaw Gruszka 2011-07-25 12:24:55 UTC
So issue is still present upstream ... 

If John has no better idea how to solve this problem, I'm proposing bisection, but it's quite time consuming, and require installing upstream kernel, see:
http://kernel.org/pub/software/scm/git/docs/git-bisect.html

Eventually this could be bisected using compat-wireless tarballs, that should save compilation time. Let me know is you want to do this, and want some more help/instructions.

Comment 12 Tom Horsley 2011-07-28 16:12:40 UTC
Just as an experiment before trying drastic stuff, I rebooted to fedora 14,
verify that I could connect with no problems, then installed the
compat-wireless-next package from the above repo, rebooted f14 again and
verified that the random connection problem does indeed exist, so it looks
like the problem is definitely located somewhere in the modules provided
by compat-wireless, and not in some other part of the kernel.

I've only reproduced this on my main machine at work, which I don't really
have time to fiddle with extensively. I'll try to remember to take the
dongle home and see if I can reproduce the same problem on a system I
can reboot furiously some weekend (actually if it is just kernel modules
involved, I suppose I might be able to just unload and load modules).

Comment 13 Stanislaw Gruszka 2011-07-29 11:24:32 UTC
Interesting, that would suggest problem in usb core. But perhaps reason of that is more prosaic, like compat-wireless modules does not load properly [there was bug in compat-wireless-next-2011-07-20 that prevent to load modules, which should be now (2011-07-24) fixed]

You can check if modules are loaded using lsmod, there should be "compat" module and rt2x00lib module should depend on it. Please check that on F-14, and then again on F-15 .

Comment 14 Stanislaw Gruszka 2011-08-22 13:47:08 UTC
I'm sorry, I completely misunderstood comment 12.

Anyway I don't know how to fix this problem. I think the best way would be bisect using using compat-wireless on fedora 14. Tarballs can be downloaded from 

http://linuxwireless.org/download/compat-wireless-2.6/

Bisection could be performed like this. Let say you first install 2010.06.01 and confirm it works, then 2010.12.30 and confirm it not works. Next step would be test 2010.09.01 (the middle of two previous) and if it works follow by 2011.10.15  if not by 2011.07.15, etc allays continuing by middle of good and bad version. Note that tarballs from all dates are not available, if requested date is not available or it does not compile, you need to try nearest one. After finish you will have two near dates i.e 2010.08.05 - good and 2010.08.09 - bad. We then generate diff from these versions and possibly could tell where the problem can be, eventually provide some further patches for debug.

Assuming you will be testing releases from 6 months, the process should take about log2(6*30) ~ 8 steps

On each step you need to compile and install modules. Not all modules from compat-wireless package need to be compiled, you can use "./scripts/driver-select rt2x00" to build only rt2x00 and dependency modules. Also restarting system should be not needed as well, you can use "./scripts/unload.sh ; modprobe rt73usb". See README from compat-wireless package for details.

Please double check if modules from compat-wirless are in use: lsmod should show compat module as dependency for some rt2x00 module.

Comment 15 Tom Horsley 2011-08-22 16:28:44 UTC
Yea. I've been planning to try that, but I need a chunk of time to dedicate to it and I'm not sure when I'll get one. I did note this morning at work that the problem does still exist in the new 2.6.40.3-0.fc15.x86_64 I just installed (I didn't expect it to go away, but it is always worth checking in a new kernel :-).

Comment 16 Josh Boyer 2011-11-30 19:59:26 UTC
Was there any further progress on this?

Comment 17 Tom Horsley 2011-11-30 20:23:03 UTC
I'm afraid not. The prospect of binary searching through changes is so
daunting compared to just unplugging and plugging it in again till it
initializes correctly (usually at most 3 tries), that I've never developed
enough energy to try and find what is really going on. Fedora 16 and
kernel-3.1.2-1.fc16.x86_64 still exhibit the same random problems though.

Comment 18 Stanislaw Gruszka 2011-12-01 08:21:01 UTC
In http://marc.info/?t=132173491500028&r=1&w=2 is indication that heating problems can be caused by not packing frames into one usb buffer, but send lots of small buffers. And that the usb traffic is asynchronous, before it was synchronous - one usb packet at time. I'm able to reproduce performance problems, and going to investigate that. Perhaps fixing that would also make heating problem gone.

Comment 19 Tom Horsley 2011-12-01 14:49:40 UTC
I suppose that could be it, but I'm not sure it fits the symptoms I see when
it overheats: If I plug it in and initialize wlan0 and hostapd it either works or
doesn't work. The only time it overheats is if I leave it plugged in and
re-initialize waln0 and hostapd until it does start to work. Once I can
actually access the network it starts overheating.

If, on the other hand, I pull the dongle out of the system in between the
different attempts to re-initialize it, I never see overheating. (And
eventually one of the initialization attempts works - usually takes 2 or 3
tries).

That seems more like something is being done wrong during initialization, not
when it is actually running the network (maybe transmit power gets cranked up
too high or something - not that I know if that is even possible :-).

Heck, I suppose it could even be something like uninitialized data creeping
into something hostapd or ifconfig sends to the dongle, and the bug isn't
actually in the kernel (unless failing to fully check the user provided
data for errors is considered a bug :-).

Comment 20 lionghostshop 2012-02-05 02:58:23 UTC
I run fedora 15 live cd. Kernel is 2.6.38.
PEAP still does not work.
Where can I download feodra 14?

Comment 21 Stanislaw Gruszka 2012-02-06 07:42:14 UTC
Last question was regarding bug 746744 .

Comment 22 Josh Boyer 2012-02-28 19:19:00 UTC
(In reply to comment #18)
> In http://marc.info/?t=132173491500028&r=1&w=2 is indication that heating
> problems can be caused by not packing frames into one usb buffer, but send lots
> of small buffers. And that the usb traffic is asynchronous, before it was
> synchronous - one usb packet at time. I'm able to reproduce performance
> problems, and going to investigate that. Perhaps fixing that would also make
> heating problem gone.

Stanislaw, I don't see anything upstream about packing the frames for the driver.  Did you ever look into doing that?

Comment 23 Stanislaw Gruszka 2012-02-29 06:56:00 UTC
I have that on my TODO list, please keep it open, eventually close it with upstream resolution.

Comment 24 Dave Jones 2012-03-22 17:03:15 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 25 Dave Jones 2012-03-22 17:06:29 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 26 Dave Jones 2012-03-22 17:17:33 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 27 Tom Horsley 2012-03-22 18:05:27 UTC
I see exact same behavior on new 3.3.0-4.fc16.x86_64 kernel. I have to try starting several times before I can get a solid connection, and if I don't
unplug and plug the dongle in between the tries, it starts overheating.

Comment 28 Tom Horsley 2012-04-17 13:37:04 UTC
New data: I'm now on kernel-3.3.1-5.fc16.x86_64 and have twice connected to the dongle as access point on the first try with no errors. This is a small sample size, but maybe whatever was going on has been fixed. I guess I'll find out over the next several days if I consistently connect with no retries required.

Comment 29 Tom Horsley 2012-04-24 18:50:17 UTC
I'm now on kernel 3.3.2-6.fc16.x86_64 and this USB dongle has connected first time every time for several days now. I have no idea what change might have fixed this, but it sure does seem to be fixed now. No overheating, no connection problems any longer on recent kernels.

Comment 30 Stanislaw Gruszka 2012-04-25 08:41:33 UTC
I'm glad to hear that. We have dozen new rt2x00 patches on 3.3, seems one helped here. I'm closing bug report, if this will start to happen again (I hope not) please reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.