Bug 1015932 - Qualcomm Atheros AR8131 Gigabit Ethernet running @ only 100mbps
Qualcomm Atheros AR8131 Gigabit Ethernet running @ only 100mbps
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
19
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: fedora-kernel-ethernet-ath
Fedora Extras Quality Assurance
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-06 19:35 EDT by Bill Gradwohl
Modified: 2013-10-14 11:34 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-14 11:34:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Comment (79.75 KB, text/plain)
2013-10-08 15:47 EDT, Bill Gradwohl
no flags Details

  None (edit)
Description Bill Gradwohl 2013-10-06 19:35:56 EDT
Description of problem:
Qualcomm Atheros AR8131 NIC only runs at low speed.

Asus ROG laptop with the following:

lspci reports:
06:00.0 Ethernet controller: Qualcomm Atheros AR8131 Gigabit Ethernet (rev c0)

root@billlaptop ~# lsmod | grep ath
ath9k 141923 0 
ath9k_common 13503 1 ath9k
ath9k_hw 443174 2 ath9k_common,ath9k
ath 23142 3 ath9k_common,ath9k,ath9k_hw
mac80211 564808 1 ath9k
cfg80211 460310 3 ath,ath9k
Actual results:

How reproducible:
Plug NIC into switch. NIC always comes up at low speed. Testing reveals speed is snail slow.

root@billlaptop ~# ethtool p5p1:
Settings for p5p1::
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: pg
Wake-on: d
Current message level: 0x0000003f (63)
drv probe link timer ifdown ifup
Link detected: yes


root@billlaptop ~# ethtool -s p5p1 speed 1000
Cannot advertise speed 1000


root@billlaptop ~# lshw -C network
...
*-network
description: Ethernet interface
product: AR8131 Gigabit Ethernet
vendor: Qualcomm Atheros
physical id: 0
bus info: pci@0000:06:00.0
logical name: p5p1
version: c0
serial: bc:ae:c5:13:b3:09
size: 100Mbit/s
capacity: 1Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vpd bus_master cap_list ethernet physicaroot@billlaptop ~# root@billlaptop ~# l tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=atl1c driverversion=1.0.1.1-NAPI duplex=full ip=192.168.10.168 latency=0 link=yes multicast=yes port=twisted pair speed=100Mbit/s
resources: irq:55 memory:d6200000-d623ffff ioport:8000(size=128)


root@billlaptop ~# ethtool -s p5p1 speed 1000 autoneg off
This hangs the NIC loosing connectivity. systemctl to restart network.service does not revive the NIC. I have to bounce the box to regain the slow connection.
Comment 1 John Greene 2013-10-07 11:03:22 EDT
What kind of switch? Have you tried on a different switch?
Comment 2 Bill Gradwohl 2013-10-07 11:11:17 EDT
I only have 1 switch - Trendnet Gigabit TEG-S80g.

I swapped cables and tried different switch ports all to no effect.

I've noted that without a switch, when two machines are connected with a crossover cable, the first machine that comes up does so on low speed as there is no one to negotiate with, and the second also connects at low speed because the first is already there. The switch is what allows a machine coming up to "see" another active device and negotiate a speed. Therefore, I don't think connecting my box to another via a crossover cable is a valid test of anything. 

Am I wrong?
Comment 3 John Greene 2013-10-08 15:39:35 EDT
Yeah, probably true.. Fail on my part.  Lets try again.

Can you upload the dmesg output when you connect to the card switch?  I'll take a look.

If there is no helpful stuff as is, we might need to turn on driver debugging.
Comment 4 Bill Gradwohl 2013-10-08 15:47:49 EDT
Created attachment 915777 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
Comment 5 John Greene 2013-10-08 16:15:37 EDT
yup, not much help, need some additional debug info enabled in the driver.  Do you have the ability / comfort level to build your own kernels?
Comment 6 Bill Gradwohl 2013-10-08 16:37:35 EDT
I can follow instructions. I built kernels 9 years ago and not one since. I think things have probably changed, and I've slept since then.


I'm game if you are. Want to use email for communications?
Comment 7 John Greene 2013-10-09 11:04:52 EDT
Ok, cool. I can give you a couple items to get you started.

Would do email, but I sleep sometimes too..and, as you can see, when I sleep too little I fail. ;)

So I'd prefer to leave it here to benefit all that come later and provide me a history to help others.  And get you fixed asap..

A couple things first may preclude the need to build a kernel: I'm just that much lazy.

Check the mount table for debugfs mounted..should be by default.
# mount | grep  debugfs
none on /sys/kernel/debug type debugfs (rw)

Here on f19 it's /sys/kernel/debug, on by default.  Confirm you have that first.

Next, we need to enable the driver debug: as root do this:
cat <path to debugfs>/sys/kernel/debug/dynamic_debug/control | grep -i ath1c

If your kernel is enabled and driver is loaded, to debug you'll get a bunch of output listing the debug messages from these files below..if not, we gotta build you a kernel, somehow.

<path>/drivers/net/ethernet/atheros/atl1c/atl1c_ethtool.c
<path>/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
<path>/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

Grab the list generated and post it to me and we'll march on.
Comment 8 Bill Gradwohl 2013-10-09 12:04:34 EDT
Probably not what you were hoping for:

BTW I changed the grep to atl1c instead of ath1c as ath1c produced nothing.

root@billlaptop dynamic_debug# cat control | grep -i atl1c
drivers/net/ethernet/atheros/atl1c/atl1c_main.c:616 [atl1c]atl1c_mii_ioctl =_ "<atl1c_mii_ioctl> write %x %x"
drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2640 [atl1c]atl1c_probe =_ "mac address : %pM\012"
drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2443 [atl1c]atl1c_suspend =_ "phy power saving failed"
drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2300 [atl1c]atl1c_request_irq =_ "atl1c_request_irq OK\012"
drivers/net/ethernet/atheros/atl1c/atl1c_main.c:437 [atl1c]atl1c_vlan_mode =_ "atl1c_vlan_mode\012"
drivers/net/ethernet/atheros/atl1c/atl1c_main.c:451 [atl1c]atl1c_restore_vlan =_ "atl1c_restore_vlan\012"
drivers/net/ethernet/atheros/atl1c/atl1c_hw.c:821 [atl1c]atl1c_power_saving =_ "%s: suspend MAC=%x,MASTER=%x,PHY=0x%x,WOL=%x\012"
drivers/net/ethernet/atheros/atl1c/atl1c_hw.c:814 [atl1c]atl1c_power_saving =_ "%s: write phy MII_IER failed.\012"
drivers/net/ethernet/atheros/atl1c/atl1c_hw.c:741 [atl1c]atl1c_phy_to_ps_link =_ "get speed and duplex failed\012"
drivers/net/ethernet/atheros/atl1c/atl1c_hw.c:727 [atl1c]atl1c_phy_to_ps_link =_ "phy autoneg failed\012"
Comment 9 John Greene 2013-10-09 15:19:08 EDT
Good catch..Actually looks great: We should be able to get some debug turned on without a kernel build.

couple more commands to enable the prints in the driver
echo 'file atl1c_main.c +p' > /sys/kernel/debug/debugfs/dynamic_debug/control
echo 'file atl1c_hw.c +p' > /sys/kernel/debug/debugfs/dynamic_debug/control

then run your connect test again, maybe a couple times to get both connect and disconnect 

then turn the prints off again to keep your logs from getting clogged:
echo 'file atl1c_main.c -p' > /sys/kernel/debug/debugfs/dynamic_debug/control
echo 'file atl1c_hw.c -p' > /sys/kernel/debug/debugfs/dynamic_debug/control

Then send me the dmesg log.  It works a little better to attach the file rather than paste it unless it's pretty small.

Hope we catch something to help debug the negotiation issue.
Comment 10 Bill Gradwohl 2013-10-09 18:46:58 EDT
Hang on!

That control file is rather large and your two commands would normally wipe it out and only leave one line in it. 

I assume you meant >> instead of > - correct?

I tried >> and did a tail after the first echo to see what happened - nothing happened. File length is 0 yet it has content, but the echo didn't touch it.

Then I tried the simple > and again - no change.

???
Comment 11 John Greene 2013-10-10 10:05:58 EDT
Great valid question, but nope .. > is right.  

For a great background: http://lwn.net/Articles/434856/

debugfs is a virtual file system, used to control the debug print process.  So..you aren't clobbering anything.
In this case, cat the control list before and after and you'll see these with a "-p" in the string go to +p.  This is handled by the print support to log messages to log without building a driver, HOPEFULLY.


No worries about clobbering anything permanently..
Comment 12 Bill Gradwohl 2013-10-10 11:14:45 EDT
Stop the presses!

Yesterday, I connected my laptop to a wireless network I set up on another box using hostapd. 

During my testing, the gateway setting I had on the wired NIC was getting in my way, so I turned the wired NIC off via networkManager to only have the wireless NIC available. After completing my wireless testing, I turned the wireless NIC off and turned the wired NIC back on. I kept using the laptop for a few more hours and then powered it down.

This morning, I turned my laptop on and noted that the switch had the 1000Mbps light lit for my box.

bill@billlaptop ~$ ethtool p5p1:
Settings for p5p1::
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Advertised link modes:  Not reported
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	MDI-X: Unknown
Cannot get wake-on-lan settings: Operation not permitted
	Current message level: 0x0000003f (63)
			       drv probe link timer ifdown ifup
	Link detected: yes

Note the difference in this output versus the first one I uploaded. I executed it several times to see if it ever changed, but its always the same.

The only thing I know I changed was turning the NIC off and then back on. How could doing that have given me the 1000Mbps capability?
Comment 13 John Greene 2013-10-10 15:01:12 EDT
I don't know..Strange things happen like that at times.  Are you able to reproduce the original problem now at all?

What is the wireless NIC?
Comment 14 Bill Gradwohl 2013-10-10 15:28:41 EDT
I've booted the box several times after power downs and can't reproduce the issue. I even powered off the switch, other boxes, etc and now I get 1000Mbps by default. I have no idea what changed.

The wireless NIC is :
bill@billlaptop ~$ lspci -k | grep -A 3 -i "network"
03:00.0 Network controller: Qualcomm Atheros AR9285 Wireless Network Adapter (PCI-Express) (rev 01)
	Subsystem: AzureWave AW-NE785 / AW-NE785H 802.11bgn Wireless Full or Half-size Mini PCIe Card
	Kernel driver in use: ath9k

At end of day, I always powered down all my test gear by turning off a UPS. Now doing the same morning startup sequence no longer gives me a slow NIC. I'm stumped.
Comment 15 John Greene 2013-10-11 08:18:45 EDT
Frustrating! I hate bugs that hide when to turn lights on.

Check your logs and see if we got anything on the negotiation at all.  Sounds doubtful, but maybe..  If we can't reproduce, it will be hard to do much. 
I checked this driver upstream just now, I don't see any fixes later that 3.11-rc1, which you have.So, barring anything new showing up, there don't appear to be any fixes imminent.  
If that is where we are, I'd say I'll leave this open a few days for you to reproduce: if we can't do so in a couple weeks say, we can close CANT REPRODUCE.

Hope you can reproduce..will help you if do!
Comment 16 Bill Gradwohl 2013-10-11 10:59:01 EDT
"I hate bugs that hide when to turn lights on."

That's a very good line. I like it.

The original problem has been there for a long time. I ignored the speed issue as 100Mbps was fine for most things. Only recently I needed to move several hundred GIG of data back and forth between machines and that literally took days. At 11MBps I did the math and knew I had to get the speed bumped up. That's why I started this thread.

Now I have the 1000Mbps and transfers are going at about 34MBps. Not the 10 times as fast as I was hoping for, but certainly better.

I've booted this box numerous times to see if I can get it to go back to 100Mbps but no such (bad) luck. How turning the NIC off and back on again fixed the issue is a mystery, and frankly I don't believe that did it. 

This box has one other issue I'm investigating but I don't think it has anything to do with the speed. In the morning, when I power up, the first boot gives me the Fedora balloon and I never get a login screen. I have to go to a terminal, login and issue a shutdown -r now. The second boot works.

This problem is accelerating. This used to happen once a week, now its every day. dmesg comparisons of the boots don't highlight a difference. I'm still looking into it.
Comment 17 John Greene 2013-10-14 11:34:43 EDT
(In reply to Bill Gradwohl from comment #16)
> "I hate bugs that hide when to turn lights on."
> 
> That's a very good line. I like it.

Thanks, but sticking with day job a while longer..

> 
> The original problem has been there for a long time. I ignored the speed
> issue as 100Mbps was fine for most things. Only recently I needed to move
> several hundred GIG of data back and forth between machines and that
> literally took days. At 11MBps I did the math and knew I had to get the
> speed bumped up. That's why I started this thread.
> 
> Now I have the 1000Mbps and transfers are going at about 34MBps. Not the 10
> times as fast as I was hoping for, but certainly better.
> 
> I've booted this box numerous times to see if I can get it to go back to
> 100Mbps but no such (bad) luck. How turning the NIC off and back on again
> fixed the issue is a mystery, and frankly I don't believe that did it. 
> 
Many times this is par, seldom reachs the media speed in thruput.  Depends on a lot of things, CPU/Memory/Tool used to cp the data..

Looked up your switch it has no firmware upgrade capability.  Price was good I bet though.. lol

Switch manual says it supports 802.3az DRAFT but the chip hardware supports the "latest standard": wonder if there might be a mismatch?

You might try booting with 'pcie_aspm=off' added to the kernel command line and see if this helps. Beyond that..we need to reproduce and get a log to continue or await a fix upstream.


> This box has one other issue I'm investigating but I don't think it has
> anything to do with the speed. In the morning, when I power up, the first
> boot gives me the Fedora balloon and I never get a login screen. I have to
> go to a terminal, login and issue a shutdown -r now. The second boot works.
> 
> This problem is accelerating. This used to happen once a week, now its every
> day. dmesg comparisons of the boots don't highlight a difference. I'm still
> looking into it.

Sounds like an real issue. Please put this into a new bug, it helps us keep things straight if you don't mind.  I suggest you add the logs you have and try disabling wifi and/or bluetooth (if you have it) and see if you can coorelate that much with boot issue.  Please post those results in new BZ. Not sure it's a wifi issue, just a tip to help you get started.  Certainly Red Hat will help out there too.  As you enter the new BZ, you'll be presented with similar issues already out there, maybe it's fixed.

Let me know about the pcie_asmp=off (goes in the grub file).  For now, though, lets close this as insufficient data.  Feel free to reopen if you can reproduce the low speed issue!

Note You need to log in before you can comment on or make changes to this bug.