Bug 658451 - [rt2800pci] the Wireless goes very slow
Summary: [rt2800pci] the Wireless goes very slow
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Stanislaw Gruszka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-30 12:15 UTC by Amir Hedayaty
Modified: 2012-02-07 07:42 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-06 16:03:54 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
rt2800pci_irq_none_workaround.patch (511 bytes, text/plain)
2011-10-29 12:13 UTC, Stanislaw Gruszka
no flags Details
helmut.patch (6.33 KB, text/plain)
2012-01-05 15:05 UTC, Stanislaw Gruszka
no flags Details
rt2800pci_zero_interrupts.patch (2.53 KB, text/plain)
2012-01-10 17:00 UTC, Stanislaw Gruszka
no flags Details
rt2800pci_zero_interrupts_3.1.patch (2.55 KB, text/plain)
2012-01-11 15:12 UTC, Stanislaw Gruszka
no flags Details

Description Amir Hedayaty 2010-11-30 12:15:54 UTC
Description of problem:
After a while using the computer for a while, the wireless speed drops to something like 1-2k. 
I could not fix it any other way other than a system restart (up/downing network connection or NetworkManager does not help)

The device model is "RaLink RT2860"
It is a pcix wireless card, I guess there is a problem in kernel module for this hardware!


And I am sure there is no problem with the network connection or firefox. I monitor the speed using netspeed applet, and my laptop which also has fedora 14 works fine while the pc has this bandwidth.

(This is haelty state)
# iwconfig wlan0 
wlan0     IEEE 802.11bgn  ESSID:"CC251"  
          Mode:Managed  Frequency:2.462 GHz  Access Point: 00:05:B4:06:C6:20   
          Bit Rate=1 Mb/s   Tx-Power=9 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=70/70  Signal level=15 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0




Version-Release number of selected component (if applicable):
Hardware is: RaLink RT2860
kernel-2.6.35.6-48.fc14.x86_64
NetworkManager-0.8.1-10.git20100831.fc14.x86_64

How reproducible:
1. Turn on the computer
2. Use it for a while

Steps to Reproduce:
Nothing
  
Actual results:
Wireless speed reaches almost zero

Expected results:


Additional info:

Comment 1 Stanislaw Gruszka 2010-12-01 09:59:13 UTC
We have a few bug reports about slow wireless on F-14 with 2.6.35 kernel, mostly on iwl3945 and iwl4965 but also with rt73. So I guess there is something wrong with mac80211 probably with rate scaling code.

Comment 2 Amir Hedayaty 2010-12-05 05:43:10 UTC
Today It happened again, here is something I found it might be useful,
This is the output of dmesg:
[  622.711358] wlan0: associated
[  622.717196] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[  622.717377] cfg80211: Calling CRDA for country: JP
[  623.018100] Intel AES-NI instructions are not detected.
[  623.044561] padlock: VIA PadLock not detected.
[  632.850029] wlan0: no IPv6 routers present
[ 2052.114334] irq 17: nobody cared (try booting with the "irqpoll" option)
[ 2052.114339] Pid: 0, comm: swapper Tainted: P            2.6.35.6-48.fc14.x86_64 #1
[ 2052.114341] Call Trace:
[ 2052.114342]  <IRQ>  [<ffffffff810a6e2b>] __report_bad_irq.clone.1+0x3d/0x8b
[ 2052.114349]  [<ffffffff810a6f93>] note_interrupt+0x11a/0x17f
[ 2052.114352]  [<ffffffff810a7a73>] handle_fasteoi_irq+0xa8/0xce
[ 2052.114355]  [<ffffffff8100c2ea>] handle_irq+0x88/0x90
[ 2052.114357]  [<ffffffff8146f034>] do_IRQ+0x5c/0xb4
[ 2052.114360]  [<ffffffff81469593>] ret_from_intr+0x0/0x11
[ 2052.114361]  <EOI>  [<ffffffff8102b7f9>] ? native_safe_halt+0xb/0xd
[ 2052.114366]  [<ffffffff81010f03>] ? need_resched+0x23/0x2d
[ 2052.114367]  [<ffffffff8101102a>] default_idle+0x34/0x4f
[ 2052.114370]  [<ffffffff81008325>] cpu_idle+0xaa/0xcc
[ 2052.114373]  [<ffffffff81461f2a>] start_secondary+0x24d/0x28e
[ 2052.114374] handlers:
[ 2052.114375] [<ffffffff81332944>] (usb_hcd_irq+0x0/0x7c)
[ 2052.114378] [<ffffffffa00697da>] (rt2800pci_interrupt+0x0/0x18d [rt2800pci])
[ 2052.114384] Disabling IRQ #17

And I removed the module from kernel, put it again and worked fine
No need to restart that way

Comment 3 Stanislaw Gruszka 2011-02-23 14:21:16 UTC
Anything better with current upstream drivers?
http://people.redhat.com/sgruszka/compat_wireless.html

Comment 4 Amir Hedayaty 2011-05-19 18:38:35 UTC
The problem was much sever in FC15, the network is usually too slow something like 32k-100k instead of 1M-2M (I mean the internet speed)

After a while I found out these kmod-2860 (for my model) and similar ones on rpmfusion exist and they kind of cut the cheese. I wonder what is the issue with them that they are not being packed on fedora?

Comment 5 Amir Hedayaty 2011-10-29 09:43:47 UTC
The problem still exists on Fedora 16
again after "Disabling IRQ #17" wireless goes extremely slow

the rt2860 drivers on fc15 partially solved the problem, I mean they were ok
but that's not available for fc16

Comment 6 Stanislaw Gruszka 2011-10-29 12:13:26 UTC
Created attachment 530778 [details]
rt2800pci_irq_none_workaround.patch

Can you check if this simple workaround make issue gone? Probably the best way to test patch is compile complat-wireless (http://linuxwireless.org/download/compat-wireless-2.6/compat-wireless-2.6.tar.bz2) from source with patch on top, however I'm not sure if current compat-wireless compile on F-16 . Otherwise needs to rebuild kernel. If you are not able to do this, let me know, I will prepare kernel build in koji.

Comment 7 Amir Hedayaty 2011-10-29 20:08:34 UTC
After selecting the driver model, the package was compiled (rt28xx driver complies but other may not) After modprobe there was a crash

[16639.312400] rt2800pci 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[16639.312424] rt2800pci 0000:02:00.0: setting latency timer to 64
[16639.322186] ------------[ cut here ]------------
[16639.322215] WARNING: at net/wireless/core.c:562 wiphy_register+0x5f/0x3d8 [cfg80211]()
[16639.322222] Hardware name: MS-7599
[16639.322226] Modules linked in: rt2800pci(+) rt2800lib rt2x00pci rt2x00lib tcp_lp ppdev parport_pc lp parport fuse 8021q garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_hdmi arc4 snd_hda_codec_via crc_ccitt mac80211 snd_hda_intel snd_hda_codec snd_seq sp5100_tco virtio_net cfg80211 snd_usb_audio rfkill snd_hwdep kvm_amd kvm snd_pcm uvcvideo videodev snd_usbmidi_lib media microcode atl1c eeprom_93cx6 v4l2_compat_ioctl32 i2c_piix4 edac_core snd_rawmidi snd_seq_device snd_timer snd snd_page_alloc k10temp edac_mce_amd soundcore serio_raw binfmt_misc uinput joydev pata_acpi ata_generic pata_atiixp wmi radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: rt2x00lib]
[16639.322336] Pid: 17223, comm: work_for_cpu Not tainted 3.1.0-1.fc16.x86_64 #1
[16639.322342] Call Trace:
[16639.322359]  [<ffffffff81057a56>] warn_slowpath_common+0x83/0x9b
[16639.322369]  [<ffffffff81057a88>] warn_slowpath_null+0x1a/0x1c
[16639.322390]  [<ffffffffa02b3e15>] wiphy_register+0x5f/0x3d8 [cfg80211]
[16639.322402]  [<ffffffff81119370>] ? __kmalloc+0xf0/0x102
[16639.322425]  [<ffffffffa0365239>] ? ieee80211_register_hw+0xd4/0x55e [mac80211]
[16639.322447]  [<ffffffffa0365495>] ieee80211_register_hw+0x330/0x55e [mac80211]
[16639.322462]  [<ffffffffa018809b>] rt2x00lib_probe_dev+0x4b0/0x581 [rt2x00lib]
[16639.322474]  [<ffffffffa0049892>] rt2x00pci_probe+0x236/0x27c [rt2x00pci]
[16639.322484]  [<ffffffff8106d313>] ? move_linked_works+0x6e/0x6e
[16639.322497]  [<ffffffffa0288311>] rt2800pci_probe+0x15/0x17 [rt2800pci]
[16639.322508]  [<ffffffff8123f6e7>] local_pci_probe+0x44/0x75
[16639.322516]  [<ffffffff8106d329>] do_work_for_cpu+0x16/0x28
[16639.322525]  [<ffffffff81072d1f>] kthread+0x84/0x8c
[16639.322536]  [<ffffffff814be234>] kernel_thread_helper+0x4/0x10
[16639.322546]  [<ffffffff81072c9b>] ? kthread_worker_fn+0x148/0x148
[16639.322555]  [<ffffffff814be230>] ? gs_change+0x13/0x13
[16639.322561] ---[ end trace f083997c3eb512dd ]---
[16639.322569] (null) -> rt2x00lib_probe_dev: Error - Failed to initialize hw.
[16639.322622] rt2800pci 0000:02:00.0: PCI INT A disabled
[16639.322678] rt2800pci: probe of 0000:02:00.0 failed with error -22
[16641.632375] audit_printk_skb: 15 callbacks suppressed


I was using compat-wireless-2011-10-29

Comment 8 Stanislaw Gruszka 2011-10-30 14:11:21 UTC
Did you install compat-wirelss modues with "make install" and restart the system? Above crash looks like new rt2800pci driver is used with kernel mac80211/cfg80211 modules, whereas modules from compat-wireless should be used. If you do

modprobe -r rt2800pci
modprobe -v -n rt2800pci

it will show witch modules are loaded, all wireless modules i.e mac80211 should be taken from /lib/modules/`uname -r`/updates/ instead of /lib/modules/`uname -r`/kernel/ .

Comment 9 Amir Hedayaty 2011-10-30 19:20:59 UTC
After removing the modules, I tried inserting with a regular user and the error indicated that the modules have been installing from updates folder.

I managed to compile rt-2860 from FC15, they are working much better, they are not ideal drivers, the frequency of problems is in order once a week.

I would prefer using open-source drivers not these drivers.

Comment 10 Stanislaw Gruszka 2011-10-31 17:07:06 UTC
Ok, please try this kernel (when it finish to compile) it include patch from comment 6:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3475656

Comment 11 Stanislaw Gruszka 2011-10-31 19:24:00 UTC
Err, it failed to compile, I'll fix that tomorrow.

Comment 12 Stanislaw Gruszka 2011-11-01 14:47:20 UTC
Hopefully this one will compile:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3477950

Comment 13 Stanislaw Gruszka 2011-11-02 19:35:28 UTC
Let us know how above kernel works. Patch include in kernel is not a fix but a workaround, but if it works, we will more or less know how to fix problem.

Comment 14 Amir Hedayaty 2011-11-11 11:05:45 UTC
Sorry for delay!, I was a busy for a few days.

How do I download or compile that kernel? I even tried installing koji on my system it did not help!

Comment 15 Stanislaw Gruszka 2011-11-11 12:08:25 UTC
Kernel build I did is now deleted, these scratch builds are kept only a week or so, I'll rebuild it.

Comment 16 Stanislaw Gruszka 2011-11-11 13:44:09 UTC
Here is the new build:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3507412
It's 2.6.41-rc1 kernel, but I hope it will work fine.

Comment 17 Amir Hedayaty 2011-11-12 00:25:14 UTC
Thanks this kernel is working great, so far  better than non open source driver.
I guess I need to wait a few more days to see if it is really stable.

BTW is there any place I can vote for this?

Comment 18 Stanislaw Gruszka 2011-11-14 12:35:47 UTC
(In reply to comment #17)
> Thanks this kernel is working great, so far  better than non open source
> driver.
This kernel contains a patch, which is only a workaround. We can not apply it as it would break other devices that could possibly share interrupt line with rt2800pci device. We need to find out how to read interrupt status on your hardware. I tried to read sources of driver from ralink site, and could not find such information. Ivo, Gertjan, do you have any hints?

> BTW is there any place I can vote for this?
Not sure what you mean?

Comment 19 Amir Hedayaty 2011-11-14 12:49:57 UTC
> 
> > BTW is there any place I can vote for this?
> Not sure what you mean?
When there is an update for fedora, first it is pushed to updates-testing and people usually vote if they are happy with the update. I was asking if there is any similar thing for patches?! Now I guess not!

Well this does not completely fix the issue, but it reduces the problem by 
great significance. With ra-driver the issue happens once every 2-3 days, with normal kernel every 10 mins, and with this patch once after 3 days heavy usage. 
Which is satisfactory for me!

Comment 20 Stanislaw Gruszka 2011-11-16 09:34:36 UTC
(In reply to comment #18)
> This kernel contains a patch, which is only a workaround. We can not apply it
> as it would break other devices that could possibly share interrupt line with
> rt2800pci device.
Actually we can apply it. I thought patch can break shared interrupts, but that's not true, we can return IRQ_HANDLED and interrupts routines from other devices, which share irq line will still be called. 

> We need to find out how to read interrupt status on your
> hardware. I tried to read sources of driver from ralink site, and could not
> find such information. Ivo, Gertjan, do you have any hints?
I found that, vendor driver just return IRQ_HANDLED, so we should we.

Comment 21 Stanislaw Gruszka 2011-11-23 11:40:46 UTC
Patch was refused as incorrect fix, so moving back to assigned.

Comment 22 Amir Hedayaty 2011-12-04 12:06:52 UTC
Can you please build that kernel once more,
I am getting a feeling that something else in that version (not this patch) or maybe a combination of this patch and the previous kernel fixed the issue. 

Last few days, I have been using a 3.1 kernel with that patch applied and the problem is not fixed on this kernel.

Comment 23 Amir Hedayaty 2011-12-06 11:40:08 UTC
I faced some new errors on 3.14 kernel

[ 1802.202425] rt2800pci 0000:02:00.0: PCI INT A disabled
[ 1807.421891] rt2800pci 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[ 1807.421901] rt2800pci 0000:02:00.0: setting latency timer to 64
[ 1807.429593] ieee80211 phy2: Selected rate control algorithm 'minstrel_ht'
[ 1807.429762] Registered led device: rt2800pci-phy2::radio
[ 1807.429775] Registered led device: rt2800pci-phy2::assoc
[ 1807.429793] Registered led device: rt2800pci-phy2::quality
[ 1808.553043] phy2 -> rt2800_wait_wpdma_ready: Error - WPDMA TX/RX busy, aborting.
[ 1808.553058] phy2 -> rt2800pci_set_device_state: Error - Device failed to enter state 4 (-5).


After these errors the device was completely down and was not able to connect by any means

Comment 24 Stanislaw Gruszka 2012-01-05 15:05:26 UTC
Created attachment 550937 [details]
helmut.patch

Here is kernel with Helmut's patch (still compiling):
http://koji.fedoraproject.org/koji/taskinfo?taskID=3622159

Comment 25 Stanislaw Gruszka 2012-01-05 15:06:59 UTC
Amir, does it fix the problem or maybe it make things worse?

Comment 26 Amir Hedayaty 2012-01-09 04:12:30 UTC
Maybe worse! Does not fix it at least.

Comment 27 Stanislaw Gruszka 2012-01-10 17:00:45 UTC
Created attachment 551891 [details]
rt2800pci_zero_interrupts.patch

Another patch to test ...

Comment 28 Amir Hedayaty 2012-01-11 13:39:50 UTC
Could you build a kernel on koji with this patch please?

Comment 29 Stanislaw Gruszka 2012-01-11 15:10:26 UTC
Uhh, forgot that, lunched build here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3639199

Comment 30 Stanislaw Gruszka 2012-01-11 15:12:06 UTC
Created attachment 552146 [details]
rt2800pci_zero_interrupts_3.1.patch

Previous patch backport to 3.1 kernel.

Comment 31 Stanislaw Gruszka 2012-01-11 15:14:52 UTC
Amir, while you're be testing, please check if there is warning similar like in comment 2.

Comment 32 Amir Hedayaty 2012-01-12 09:28:05 UTC
Again speed hardly rose above  200k! There is no error message in dmesg.

Just this is the output from /var/log/messages, it seems to be loading time message.

Jan 11 21:51:46 amir-client kernel: [   17.367348] rt2800pci 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17

If you could make it verbose that would be nice, also I noticed the firmware from kernel-firmware package differs from the RaLink own firmware (I tested them both). That driver also messes up a lot and every time that happens this error is generated (the numbers change each time). I thought these might give a hint on what is going on.

Jan 12 01:22:29 amir-client kernel: [ 7164.989117] Rcv Wcid(1) AddBAReq
Jan 12 01:22:29 amir-client kernel: [ 7164.989125] Start Seq = 00000076
Jan 12 01:22:29 amir-client kernel: [ 7165.465124] Rcv Wcid(1) AddBAReq
Jan 12 01:22:29 amir-client kernel: [ 7165.465131] Start Seq = 000005d9
Jan 12 01:22:30 amir-client kernel: [ 7166.141133] Rcv Wcid(1) AddBAReq
Jan 12 01:22:30 amir-client kernel: [ 7166.141140] Start Seq = 000009a4
Jan 12 01:22:30 amir-client kernel: [ 7166.647587] Rcv Wcid(1) AddBAReq
Jan 12 01:22:30 amir-client kernel: [ 7166.647594] Start Seq = 00000ecc
Jan 12 01:23:30 amir-client kernel: [ 7226.303458] Rcv Wcid(1) AddBAReq
Jan 12 01:23:30 amir-client kernel: [ 7226.303466] Start Seq = 000000cc
Jan 12 01:23:30 amir-client kernel: [ 7226.452453] Rcv Wcid(1) AddBAReq
Jan 12 01:23:30 amir-client kernel: [ 7226.452461] Start Seq = 00000111
Jan 12 01:24:28 amir-client kernel: [ 7284.284007] Rcv Wcid(1) AddBAReq
Jan 12 01:24:28 amir-client kernel: [ 7284.284015] Start Seq = 000009e6
Jan 12 01:24:28 amir-client kernel: [ 7284.366924] Rcv Wcid(1) AddBAReq
Jan 12 01:24:28 amir-client kernel: [ 7284.366927] Start Seq = 0000005b
Jan 12 01:24:29 amir-client kernel: [ 7285.100099] ===>rt_ioctl_giwscan. 5(5) BSS returned, data->length = 1023

Comment 33 Stanislaw Gruszka 2012-01-12 14:35:34 UTC
(In reply to comment #32)
> Again speed hardly rose above  200k! There is no error message in dmesg.
Ok, so patch fix spurious interrupt problem.

> Just this is the output from /var/log/messages, it seems to be loading time
> message.
> 
> Jan 11 21:51:46 amir-client kernel: [   17.367348] rt2800pci 0000:02:00.0: PCI
> INT A -> GSI 17 (level, low) -> IRQ 17
Yes, this is normal message telling which interrupt line is assigned to the device.
 
> If you could make it verbose that would be nice, 
Hmm, I do not understand, what I should do?

> also I noticed the firmware
> from kernel-firmware package differs from the RaLink own firmware (I tested
> them both). 
We will need to update firmware at some point.

> That driver also messes up a lot and every time that happens this
> error is generated (the numbers change each time). I thought these might give a
> hint on what is going on.
> 
> Jan 12 01:22:29 amir-client kernel: [ 7164.989117] Rcv Wcid(1) AddBAReq
> Jan 12 01:22:29 amir-client kernel: [ 7164.989125] Start Seq = 00000076
These messages mean that peer (i.e. AP) want to start Block Ack session, send frames without ACK, and then ACK all of them at once. I do not see any problem here.

Anyway, I assume patch fix problem originally reported here. I will test it on my rt2800pci hardware and post soon. For any other problems please open a separate bug report.

Comment 34 Amir Hedayaty 2012-02-02 03:00:19 UTC
After upgrade to 3.2.2-1.fc16.x86_64 the module has stopped to work totally!

Here is message log:


[  118.884618] rt2800pci 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[  118.884639] rt2800pci 0000:02:00.0: setting latency timer to 64
[  118.894133] phy1 -> rt2800_init_eeprom: Error - Invalid RF chipset 0x0 detected.
[  118.894142] phy1 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
[  118.894190] rt2800pci 0000:02:00.0: PCI INT A disabled

Comment 35 Gertjan van Wingerde 2012-02-02 06:59:07 UTC
As per message on the linux-wireless mailing list, this is an issue with the fedora kernel build.

See http://marc.info/?l=linux-wireless&m=132794937324423&w=2 for details.

Comment 36 Stanislaw Gruszka 2012-02-02 09:20:34 UTC
Yes, bug 785393. This should be fixed in the latest builds:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3749626
If you'll get MCU request failures, try that one:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3751893

Comment 37 Stanislaw Gruszka 2012-02-06 16:03:54 UTC
kernel-3.2.3-2.fc16 has spurious interrupt fix included, and also fix for bug 785393, closing bug report.

Comment 38 Amir Hedayaty 2012-02-06 23:20:55 UTC
The issue has greatly reduced, maybe the rest are not this driver specific!

I see this error while rebooting, seems to be a firmware bug maybe good to have a look at it:

[Firmware Bug] CPU 1 try to use APLC500(LVT offset 0) for vector 0x400, ther register is already in use in vector 0xfg on an other CPU

(the CPU number changes)

Comment 39 Stanislaw Gruszka 2012-02-07 07:42:15 UTC
Yes, this is not rt2x00 related for sure. 

According to the message this is firmware issue, so if that cause troubles for you, ask for support BIOS vendor or maybe try to update BIOS first.


Note You need to log in before you can comment on or make changes to this bug.