Bug 950735 - [RT5390] Wireless unstable in 3.8.X
Summary: [RT5390] Wireless unstable in 3.8.X
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 18
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Stanislaw Gruszka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-10 18:50 UTC by Mike Romberg
Modified: 2013-06-29 18:34 UTC (History)
7 users (show)

Fixed In Version: kernel-3.9.6-301.fc19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-20 02:29:36 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Patch to reverse the undesired behavior starting in 3.8 (987 bytes, patch)
2013-05-26 09:03 UTC, Mike Romberg
no flags Details | Diff
rt5390_check_for_tssi_available.patch (968 bytes, text/plain)
2013-06-08 17:42 UTC, Stanislaw Gruszka
no flags Details
0001-rt2800-fix-RT5390-RT3290-tx-power-settings-regressio.patch (1.75 KB, text/plain)
2013-06-10 05:51 UTC, Stanislaw Gruszka
no flags Details

Description Mike Romberg 2013-04-10 18:50:40 UTC
Description of problem:  
  The driver seems to lower the connection speed until the connection is finally
lost.  Then (I assume NetworkManager) reconnects.  Cycle repeats until the
connection will no longer be made.

  Exact same hardware connected to exact same AP with windows 7 works fine.  It also works fine with the exact same hardware and version of fedora 18 
if I simply use either the original kernel (3.6.10-4) or a vanilla 3.7.10 (using the fedora 3.7.8 .config file).

  Hardware is:

02:00.0 Network controller: Ralink corp. RT5390 Wireless 802.11n 1T/1R PCIe

  using the rt2800pci driver.

Version-Release number of selected component (if applicable):

02:00.0 Network controller: Ralink corp. RT5390 Wireless 802.11n 1T/1R PCIe
Any 3.8.X kernel (up to 3.8.5)

How reproducible:


Steps to Reproduce:
1.  Not sure if it is hardware/driver dependent.  But if so, get the above hardware and kernel.
2.  Connect to an AP using either 802.11 g or n.
3.  Wait about 5 minutes.  Cycle of drop reconnect should start.
  
Actual results:

  Connection rate is lowered until the connection is droped.  Then reconnects
and repeats the cycle.

Expected results:

  Connection remains stable.

Additional info:

  If needed I could narrow this down to the exact kernel where things start to break.  I suspect it is 3.8.0.  It takes a while to build 'em.  Let me know if this would help.

Comment 1 Stanislaw Gruszka 2013-04-11 08:41:24 UTC
Yeah, this probably breaks on 3.8.0. I'll look at the changelog to see possible commits, which could cause the breakage.

Comment 2 Stanislaw Gruszka 2013-05-02 07:56:03 UTC
There are not many changes between 3.7 and 3.8 on rt2x00 driver. I created a kernel which include reverts of most them (omitted not relevant commits like USB IDs additions):

bc18a9d481a39213106bc23db7fdcc2b50b419d6 Revert "rt2800: use BBP_R1 for setting tx power"
1fdfc6abe2853b07ac3896c08ef7becb492bebbb Revert "rt2800: limit TX_PWR_CFG_ values to 0xc"
16254e5d4ea7a3e53167efbc5e7f6d0802dabd75 Revert "rt2800: compensate tx power also for non 11b rates on 2GHz"
965a0e700edac2f8dead3c391bdaa49ca7a3647d Revert "rt2800: use eeprom OFDM 6M TX power as criterion"
233fb59d83a23ab163eed2c0e233d4ee90a2924b Revert "rt2800: pass channel pointer to rt2800_config_txpower"
080560207f27eeec6c535e409691ea9e8af77c5b Revert "rt2800: allow to reduce tx power on devices not exporting power limit"
a9764031f3c8f89ed5b1660c8cfb0c33336ab93b Revert "rt2800: comment tx power settings"
3b9f7d0f07a6e326f1491171aecd4abcb9039a90 Revert "rt2x00: Use addr_mask to disallow invalid MAC addresses in mutli-bssid mode"
a236cbbf33374e103df0d571e5971324ba542f95 Revert "rt2x00: rt2800lib: fix indentation of some rt2x00_rt calls"
5beac44fe39a0b79a1b80fbe11e57a0f67f3170c Revert "rt2x00: rt2800lib: fix indentation in rt2800_init_rfcsr"
601e8b08ea7b368f8aa184271d7a1c434e68ffde Revert "rt2x00: rt2800lib: remove trailing semicolons from RFCSR3_* defines"
4b4336217a38097d1f37fa53f5e0c7c4e90e7408 Revert "rt2x00: rt2800lib: introduce RFCSR3_VCOCAL_EN"
054f3c09ff03d79f7c1edc3979992a91f7c2fa2b Revert "Revert: "rt2x00: Don't let mac80211 send a BAR when an AMPDU subframe fails""
99c072304ceae5571a7e68c22010d19482834573 Revert "rt2x00: Only specify interface combinations if more then one interface is possible"
e2b03abbdfc152d6c15c23a082d7f86a92096b15 Revert "rt2x00: zero-out rx_status"
49ba0a6f5019990713bcb98dcff345183c86813d Revert "rt2x00: error in configurations with mesh support disabled"
e2822940a6000c30e95759463cf4cb81471c0090 Revert "rt2x00: rt2x00pci_regbusy_read() - only print register access failure once"

Kernel build is here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=5323752

Please test if it fixes the problem. If it does, issue is caused by one of reverted commits, if not, issue is caused by some other component i.e. common mac80211 module.

Comment 3 Mike Romberg 2013-05-02 21:59:01 UTC
  Reverting these changes seems to fix the problem.  I have not had a disconnect yet.  I will run this kernel for a few days and report back if it does behave badly.  Otherwise I think it is safe to assume the problem is/was one of those commits.

  If needed I am willing to test further kernels to narrow down which specific commit.

Comment 4 Stanislaw Gruszka 2013-05-09 10:11:13 UTC
Yes, please narrow this a bit more. I have two kernels:
http://koji.fedoraproject.org/koji/taskinfo?taskID=5352733
http://koji.fedoraproject.org/koji/taskinfo?taskID=5352795

p1 reverts first 5 commits
p2 reverts first 10 commits

Please test them and let us know which works and which not.

Comment 5 Stanislaw Gruszka 2013-05-09 13:12:50 UTC
p2 build is broken, I have to figure this out, for now please only test p1

Comment 6 Mike Romberg 2013-05-09 19:34:33 UTC
  Tested p1 for a bit.  It exhibits bad behaviour.  Networkmanager does not
disconnect/reconnect.  But the network pretty much fails.  DNS lookups fail, firefox goes into offline mode, etc.  Network manager has had some updates since this all started with 3.8.0.  That may be why it no longer does that.  So, I think that the problem is with one of the changes backed out in p2.

  I did not test p2 yet on your advise :).

Comment 7 Stanislaw Gruszka 2013-05-10 12:32:23 UTC
Currently we have new 3.9 kernel:
http://koji.fedoraproject.org/koji/buildinfo?buildID=417595
Would be good to try it. There is a chance that some of rt2x00 patches from it  will fix the problem.

If not, here is correct p2 build
http://koji.fedoraproject.org/koji/taskinfo?taskID=5360225
Please test it then.

Comment 8 Mike Romberg 2013-05-10 21:42:45 UTC
  All three of these kernels perform more or less identically as far as this
network card is concerned.  They show low unsteady connection rates and then
network failures (DNS lookups, etc):


  - vmlinuz-3.8.11-200.p1.fc18.x86_64
  - vmlinuz-3.8.11-200.p2.fc18.x86_64
  - mlinuz-3.9.1-200.fc18.x86_64

  Vanilla 3.7.10 continues to work fine.   I guess the problem must be in one of the changes not backed out in p1 or p2.

Comment 9 Mike Romberg 2013-05-26 09:03:26 UTC
Created attachment 753274 [details]
Patch to reverse the undesired behavior starting in 3.8

  A very short look at the differences between 3.7.10 and 3.8 show that something
is being done with the transmission power.  Whatever it is, it causes this bug.  This patch reverses two sections of rt2800lib.c.  The result is a driver that works with no link degradation.

  This patch is sorta brute force.  But I think it puts things back more or less how they were in 3.7.10 while keeping all other changes.

Comment 10 Mike Romberg 2013-05-26 09:07:16 UTC
  I will test this patch with the 3.9 kernels in the next few days.  I'm guessing that it will have a positive result there as well.

Comment 11 Mike Romberg 2013-05-26 20:06:30 UTC
  Yep.  patch works great with 3.9.4.

Comment 12 Stanislaw Gruszka 2013-06-08 17:42:54 UTC
Created attachment 758600 [details]
rt5390_check_for_tssi_available.patch

I checked vendor driver, it uses BBP_R1 for TX power tuning for RT5390. But some other tx power code differs and it has to be changed also in rt2x00. There a bunch of changes, but perhaps this little patch will also help with this regression without removing BBP_R1 tuning . Please test.

Comment 13 Mike Romberg 2013-06-09 14:53:23 UTC
  I tested the tssi patch with vanilla 3.9.4 and it does seem to have a positive effect.  The connection rate jumped around between 9 and 24 MB/s.  Mostly hanging around at 24.  Without the patch the rate dipps down to 1 MB/s or less.  Normally this is a rock steady 65 MB/s connection with 3.7.10 (or any later kernel with my patch).

  If this patch just meant that things would "work" at 1/2 the throughput, then I'd call it good.  But.... (yep here it comes) 1/2 way through a 33gb test transfer, the network stalled and no traffic of any kind went through.

  So, the final verdict is that this patch alone does not really fix the instability.

Comment 14 Stanislaw Gruszka 2013-06-10 05:46:10 UTC
Ok, thanks for testing. This needs more work. For now, to fix regression, I will just remove BBP_R1 settings for RT5390 chip (and RT3290 since it requires same new tx power settings changes).

Comment 15 Stanislaw Gruszka 2013-06-10 05:51:25 UTC
Created attachment 759040 [details]
0001-rt2800-fix-RT5390-RT3290-tx-power-settings-regressio.patch

Mike, this patch differs from yours (but principle is the same), so please check if it fixes the problem. Once you confirm that, I'll post it.

Comment 16 Mike Romberg 2013-06-10 18:25:30 UTC
  Don't go with that patch.  It does not do exactly the same thing as the one I submitted.  There is still a significant performance problem.  I think one of the differences between your patch and mine causes this.  I'll narrow it down and post a revised patch based on yours.

Comment 17 Mike Romberg 2013-06-10 20:21:26 UTC
  Disregard my last comment.  I was testing the wrong kernel.  The above patch fixes this issue

0001-rt2800-fix-RT5390-RT3290-tx-power-settings-regressio.patch

Comment 18 Stanislaw Gruszka 2013-06-12 06:54:40 UTC
I posted patch here:
http://marc.info/?l=linux-wireless&m=137096920914837&w=2
Josh, please apply it as fix for this bug.

Comment 19 Josh Boyer 2013-06-12 11:35:00 UTC
Applied on all branches.  Thanks!

Comment 20 Fedora Update System 2013-06-17 17:22:10 UTC
kernel-3.9.6-301.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.9.6-301.fc19

Comment 21 Fedora Update System 2013-06-18 14:29:49 UTC
kernel-3.9.6-200.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.9.6-200.fc18

Comment 22 Fedora Update System 2013-06-18 19:39:18 UTC
Package kernel-3.9.6-301.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.9.6-301.fc19'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-11140/kernel-3.9.6-301.fc19
then log in and leave karma (feedback).

Comment 23 Fedora Update System 2013-06-20 02:29:36 UTC
kernel-3.9.6-200.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2013-06-29 18:34:45 UTC
kernel-3.9.6-301.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.