Bug 819440 - [Intel FCoE] got "unknow" speed from /sys/class/fc_host/host7/speed
[Intel FCoE] got "unknow" speed from /sys/class/fc_host/host7/speed
Status: CLOSED DUPLICATE of bug 880471
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
All Linux
unspecified Severity low
: rc
: ---
Assigned To: Neil Horman
Gris Ge
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-07 05:19 EDT by Gris Ge
Modified: 2015-10-12 22:17 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-01-14 14:19:43 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gris Ge 2012-05-07 05:19:56 EDT
Description of problem:

Intel Soft FCoE cannot provide correct FC speed via "/sys/class/fc_host/host7/speed" which cause "fcoeadm -i" show "unknow" link speed.

Version-Release number of selected component (if applicable):
kernel -269

How reproducible:
100%

Steps to Reproduce:
1. Setup Intel Soft FCoE:
===
service fcoe start
service lldpad start
ifconfig eth0 up
dcbtool sc eth0 dcb on
dcbtool sc eth0 pfc e:1 a:1 w:1
dcbtool sc eth0 app:fcoe e:1 a:1 w:1
===

2. Check speed of FCoE host by command:
===
fcoeadm -i
===
3.
  
Actual results:
Speed:             Unknown

Expected results:
Speed:             10 Gbit

Additional info:
Comment 1 Neil Horman 2012-05-10 15:32:38 EDT
A few questions:

1)Do you have a /var/log/message file from the system in question after you ran the above reproducer?  I'd like to make sure no prior errors occured when starting fcoe.

2) What hardware were you using as your fcoe NIC?  What the link speed gets set to is heavily dependent on what ethtool returns in terms of its speed support.
Comment 2 Neil Horman 2012-05-10 15:33:48 EDT
also, what does ethtool eth0 indicate regarding the negotiated speed of this link?  In fact if you could provide system info for me to poke about on this system, that would be great
Comment 3 Gris Ge 2012-05-10 22:37:50 EDT
It's intel soft FCoE over ixgbe.

This is what you query:
====
[root@storageqe-13 ~]# ethtool -i eth1
driver: ixgbe
version: 3.6.7-k
firmware-version: 0x18f10001
bus-info: 0000:07:00.1
[root@storageqe-13 ~]# ethtool  eth1
Settings for eth1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
                                10000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
        Link detected: yes
[root@storageqe-13 ~]# cat /sys/class/fc_host/host7/speed
unknown
[root@storageqe-13 ~]# cat /sys/class/fc_host/host7/supported_speeds
1 Gbit, 10 Gbit
====

No issue found in /var/log/messages.
Comment 5 Neil Horman 2012-05-11 13:47:04 EDT
Thanks.

Unfortunately, this seems to be working fine with the -270 kernel. 

I'm on storageqe-13, and have fcoe interfaces createed, which are showing me proper speed:

==========================================================================
[root@storageqe-13 fc_host]# ethtool -i eth1
driver: ixgbe
version: 3.6.7-k
firmware-version: 0x18f10001
bus-info: 0000:07:00.1

root@storageqe-13 fc_host]# ethtool eth1
Settings for eth1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full 
                                10000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full 
                                10000baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
        Link detected: yes

[root@storageqe-13 fc_host]# fcoeadm -i
    Description:      82599EB 10-Gigabit SFI/SFP+ Network Connection
    Revision:         01
    Manufacturer:     Intel Corporation
    Serial Number:    001B21591234
    Driver:           ixgbe 3.6.7-k
    Number of Ports:  1

        Symbolic Name:     fcoe v0.1 over eth0.802-fcoe
        OS Device Name:    host5
        Node Name:         0x1000001B21591236
        Port Name:         0x2000001B21591236
        FabricName:        0x2322000573B27F01
        Speed:             10 Gbit               <========== HERE
        Supported Speed:   1 Gbit, 10 Gbit
        MaxFrameSize:      2112
        FC-ID (Port ID):   0x750005
        State:             Online

        Symbolic Name:     fcoe v0.1 over eth1.802-fcoe
        OS Device Name:    host6
        Node Name:         0x1000001B21591237
        Port Name:         0x2000001B21591237
        FabricName:        0x2322000573B27F01
        Speed:             10 Gbit                 <========== HERE
        Supported Speed:   1 Gbit, 10 Gbit 
        MaxFrameSize:      2112
        FC-ID (Port ID):   0x750004
        State:             Online

=====================================================================


I repeaded the setup with eth0, just to be sure, and received the same results.  Interestingly I did note that only fc_host5 and fc_host6 appeared, so I'm not sure if theres some discrepancy there, but currently on the -270 kernel, this seems to be working.
Comment 6 Gris Ge 2012-05-13 21:51:29 EDT
I reinstalled OS and setup fcoe from scratch and found the first FC host will get unknow speed. Please check these output.

After destroy fcoe and re-setup, all things goes correct.

===============
[root@storageqe-13 ~]# fcoeadm -i
    Description:      82599EB 10-Gigabit SFI/SFP+ Network Connection
    Revision:         01
    Manufacturer:     Intel Corporation
    Serial Number:    001B21591234
    Driver:           ixgbe 3.6.7-k
    Number of Ports:  1

        Symbolic Name:     fcoe v0.1 over eth0.802-fcoe
        OS Device Name:    host5
        Node Name:         0x1000001B21591236
        Port Name:         0x2000001B21591236
        FabricName:        0x2322000573B27F01
        Speed:             Unknown
        Supported Speed:   1 Gbit, 10 Gbit
        MaxFrameSize:      2112
        FC-ID (Port ID):   0x750005
        State:             Online

        Symbolic Name:     fcoe v0.1 over eth1.802-fcoe
        OS Device Name:    host6
        Node Name:         0x1000001B21591237
        Port Name:         0x2000001B21591237
        FabricName:        0x2322000573B27F01
        Speed:             10 Gbit
        Supported Speed:   1 Gbit, 10 Gbit
        MaxFrameSize:      2112
        FC-ID (Port ID):   0x750004
        State:             Online

[root@storageqe-13 ~]# uname -a
Linux storageqe-13.rhts.eng.bos.redhat.com 2.6.32-270.el6.x86_64 #1 SMP Tue May 8 21:44:43 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@storageqe-13 ~]# find /sys/class/fc_host/host*/speed -exec echo {} \; -exec cat {} \;
/sys/class/fc_host/host5/speed
unknown
/sys/class/fc_host/host6/speed
10 Gbit
===============
Comment 7 Neil Horman 2012-05-14 06:51:11 EDT
well, you'll have to set it up again, because I just hopped on to your system, and its working fine (although someone moved it back to the 6.2.z kernel):


====================================================================


[root@storageqe-13 ~]# uname -a
Linux storageqe-13.rhts.eng.bos.redhat.com 2.6.32-220.17.1.el6.x86_64 #1 SMP Thu Apr 26 13:37:13 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@storageqe-13 ~]# ethtool eth1
Settings for eth1:                                                                                                                                                                                                                           
        Supported ports: [ FIBRE ]                                                                                                                                                                                                           
        Supported link modes:   1000baseT/Full                                                                                                                                                                                               
                                10000baseT/Full                                                                                                                                                                                              
        Supports auto-negotiation: Yes                                                                                                                                                                                                       
        Advertised link modes:  1000baseT/Full                                                                                                                                                                                               
                                10000baseT/Full                                                                                                                                                                                              
        Advertised pause frame use: No                                                                                                                                                                                                       
        Advertised auto-negotiation: Yes                                                                                                                                                                                                     
        Speed: 10000Mb/s                                                                                                                                                                                                                     
        Duplex: Full                                                                                                                                                                                                                         
        Port: FIBRE                                                                                                                                                                                                                          
        PHYAD: 0                                                                                                                                                                                                                             
        Transceiver: external                                                                                                                                                                                                                
        Auto-negotiation: on                                                                                                                                                                                                                 
        Supports Wake-on: d                                                                                                                                                                                                                  
        Wake-on: d                                                                                                                                                                                                                           
        Current message level: 0x00000007 (7)                                                                                                                                                                                                
        Link detected: yes                                                                                                                                                                                                                   
[root@storageqe-13 ~]# fcoeadm -i                                                                                                                                                                                                            
    Description:      82599EB 10-Gigabit SFI/SFP+ Network Connection                                                                                                                                                                         
    Revision:         01                                                                                                                                                                                                                     
    Manufacturer:     Intel Corporation                                                                                                                                                                                                      
    Serial Number:    001B21591234                                                                                                                                                                                                           
    Driver:           ixgbe 3.4.8-k                                                                                                                                                                                                          
    Number of Ports:  1                                                                                                                                                                                                                      
                                                                                                                                                                                                                                             
        Symbolic Name:     fcoe v0.1 over eth0.802-fcoe                                                                                                                                                                                      
        OS Device Name:    host5                                                                                                                                                                                                             
        Node Name:         0x1000001B21591236                                                                                                                                                                                                
        Port Name:         0x2000001B21591236                                                                                                                                                                                                
        FabricName:        0x0000000000000000                                                                                                                                                                                                
        Speed:             10 Gbit                                                                                                                                                                                                           
        Supported Speed:   1 Gbit, 10 Gbit                                                                                                                                                                                                   
        MaxFrameSize:      2112                                                                                                                                                                                                              
        FC-ID (Port ID):   0x000000                                                                                                                                                                                                          
        State:             Linkdown                                                                                                                                                                                                          
                                                                                                                                                                                                                                             
        Symbolic Name:     fcoe v0.1 over eth1.802-fcoe                                                                                                                                                                                      
        OS Device Name:    host6                                                                                                                                                                                                             
        Node Name:         0x1000001B21591237                                                                                                                                                                                                
        Port Name:         0x2000001B21591237                                                                                                                                                                                                
        FabricName:        0x2322000573B27F01                                                                                                                                                                                                
        Speed:             10 Gbit                                                                                                                                                                                                           
        Supported Speed:   1 Gbit, 10 Gbit                                                                                                                                                                                                   
        MaxFrameSize:      2112                                                                                                                                                                                                              
        FC-ID (Port ID):   0x750004                                                                                                                                                                                                          
        State:             Online

[root@storageqe-13 ~]# 
======================================================================

It would seem that whatever your observing is a transient event.  Is it possible  that you're just not waiting for link speed to finish getting negotiated on these links?
Comment 9 Neil Horman 2012-05-17 06:42:44 EDT
ping Gris, response to comment #7 please?
Comment 10 Gris Ge 2012-05-18 02:19:27 EDT
Neil,

Thanks for the info.

It just take a while to update speed file. 

Close as not a bug.
Comment 11 Gris Ge 2012-12-21 01:29:39 EST
Reopen this bug, since I wait for 1 hour, but its speed still unknown:
========
[root@storageqe-13 ~]# fcoeadm -i
    Description:      82599EB 10-Gigabit SFI/SFP+ Network Connection
    Revision:         01
    Manufacturer:     Intel Corporation
    Serial Number:    001B21591234
    Driver:           ixgbe 3.9.15-k
    Number of Ports:  1

        Symbolic Name:     fcoe v0.1 over eth0.802-fcoe
        OS Device Name:    host5
        Node Name:         0x1000001B21591236
        Port Name:         0x2000001B21591236
        FabricName:        0x2322000573B27F01
        Speed:             Unknown
        Supported Speed:   1 Gbit, 10 Gbit
        MaxFrameSize:      2112
        FC-ID (Port ID):   0x750005
        State:             Online

        Symbolic Name:     fcoe v0.1 over eth1.802-fcoe
        OS Device Name:    host6
        Node Name:         0x1000001B21591237
        Port Name:         0x2000001B21591237
        FabricName:        0x2322000573B27F01
        Speed:             10 Gbit
        Supported Speed:   1 Gbit, 10 Gbit
        MaxFrameSize:      2112
        FC-ID (Port ID):   0x750004
        State:             Online

[root@storageqe-13 ~]# uname -a
Linux storageqe-13.rhts.eng.bos.redhat.com 2.6.32-349.el6.x86_64 #1 SMP Mon Dec 17 15:45:03 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@storageqe-13 ~]# uptime
 01:28:32 up  1:37,  2 users,  load average: 0.00, 0.00, 0.00
[root@storageqe-13 ~]# rpm -q fcoe-utils
fcoe-utils-1.0.24-2.el6.x86_64
========
Comment 12 Neil Horman 2012-12-21 09:18:04 EST
well, this is a 6.5 bug at this point.  

Can I have access to storageqe-13 so that I can poke around with this?

Do you have any updated reproducer notes that I need to be aware of?
Comment 13 Gris Ge 2012-12-22 22:19:12 EST
No new reproducer. Just setup FCoE, 50% you will get unknown speed.

Feel free to ssh storageqe-13.rhts.eng.bos.redhat.com with the default password.
Comment 14 Neil Horman 2013-01-02 15:36:32 EST
Was messign with this today, and fuond that its already fixed in -350.el6 as part of bz875271

*** This bug has been marked as a duplicate of bug 875271 ***
Comment 15 Gris Ge 2013-01-04 22:43:04 EST
Still reproduceable on kernel -351. Check below:
====
[root@storageqe-13 ~]# fcoeadm -i
    Description:      82599EB 10-Gigabit SFI/SFP+ Network Connection
    Revision:         01
    Manufacturer:     Intel Corporation
    Serial Number:    001B21591234
    Driver:           ixgbe 3.9.15-k
    Number of Ports:  1

        Symbolic Name:     fcoe v0.1 over eth1.802-fcoe
        OS Device Name:    host5
        Node Name:         0x1000001B21591237
        Port Name:         0x2000001B21591237
        FabricName:        0x2322000573B27F01
        Speed:             Unknown
        Supported Speed:   1 Gbit, 10 Gbit
        MaxFrameSize:      2112
        FC-ID (Port ID):   0x750004
        State:             Online

        Symbolic Name:     fcoe v0.1 over eth0.802-fcoe
        OS Device Name:    host6
        Node Name:         0x1000001B21591236
        Port Name:         0x2000001B21591236
        FabricName:        0x2322000573B27F01
        Speed:             Unknown
        Supported Speed:   1 Gbit, 10 Gbit
        MaxFrameSize:      2112
        FC-ID (Port ID):   0x750005
        State:             Online

[root@storageqe-13 ~]# uname -a
Linux storageqe-13.rhts.eng.bos.redhat.com 2.6.32-351.el6.x86_64 #1 SMP Thu Dec 20 16:13:16 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
=====
Comment 16 Neil Horman 2013-01-05 16:46:03 EST
ok, this is ridiculous.  I could reproduce this regularly with -349 on dell-pec6100-01, but not at all with -350, after more than a dozen attempts.  Whats the password to storageqe-13?  I know you said that it 'standard' one, but none of the standard passwords I know work on that system, nor can I contact it regularly.  Please set it up so that I can access it, and email me the password.
Comment 18 Neil Horman 2013-01-07 16:54:12 EST
hmm, odd, it appears that for host6 (which is on eth0) we're getting notifications of state changes for the physical interface but not the vlan interface on it, which is why the speed isn't getting updated.  investigating....
Comment 19 Neil Horman 2013-01-08 13:52:33 EST
Gris, could you please investigate the storageqe-13 system, or more specifically the fibre switch its connected to.  All of a sudden I'm unable to connect to any fcoe luns, either on the origonal kernel, or my test kernels.  No clear error as to why, we just seem to be lacking fcoe communication to the switch.
Comment 20 Gris Ge 2013-01-08 21:00:14 EST
Fixed by disable/re-enable Cisco switch ports.

It's Bug #880471. Quit annoying but seems no one care.
Comment 21 Neil Horman 2013-01-09 10:03:04 EST
understood, thank you!
Comment 22 Neil Horman 2013-01-09 14:23:59 EST
I'm sorry Gris, could you please reset the switch ports again, it did the same thing.  

I think I'm pretty close to a solution here, I just need to do some testing on it. If you can provide me access and instructions on resetting the switch, I can do it myself as well.
Comment 23 Gris Ge 2013-01-09 23:31:02 EST
Ports has been reset.

Login information has been emailed you also.
Comment 24 Neil Horman 2013-01-10 09:40:41 EST
Thank you!
Comment 25 Neil Horman 2013-01-11 16:44:58 EST
I'm beginning to think that this problem is entirely caused by the port disable problem you're having in bz 880471.  The root of the problem seems to be that this particular ixgbe adapter reports link_up with an unknown speed.  That really shouldn't happen, and I'm guessing it is due to this link negotiation problem.  It seems that, even when the ports work, there is a delay between the time link up is reported, and the time a valid speed is reported.  The NETDEV_CHANGE events which trigger lport link speed updates in the fcoe stack are generated only on link up->down and down->up transitions.  Changes in link speed do not generate this event, and so if speed is unknown at the time we get a link up, we won't be informed of the new speed later

I have found a minor possible race in the lport link checking code, which I'll be fixing upstream that can cause this problem sometimes, but I would find it really helpful if you could test this out on another switch that doesn't have this problem if at all possible.
Comment 26 Andy Gospodarek 2013-01-14 09:19:07 EST
I would tend to agree with Neil regarding the delay between link-detection and link-speed.  I have not fully checked the source of the data in /sys/class/fc_host/host7/speed, but I think there is a good chance that link can go up for a brief time before the link-speed is known and this is likely why the link-speed is shown as unknown.

It would be interesting to check and see if the speed displayed in /sys/class/fc_host/host7/speed was static data or was something that could be polled with ethtool each time it is checked (by userspace or kernel).
Comment 27 Neil Horman 2013-01-14 09:42:26 EST
Andy, the problem is a bit (though not much) deeper than that.  To answer your question above, the link_speed value contained in /sys/class/fc_host/hostX/speed is stored in the fc_lport data structure.  The underlying device driver updates the value using whatever mechanisms the hardware provides.  In the problem case here, that would be the software fcoe driver, which tracks the speed of the further down network card.  The problem begins with the fact that the sysfs interface (like many sysfs interfaces), simply reads lport->link_speed when we check the sysfs attribute, there is no mechanism to call down to the underlying driver.  The device agnostic libfc library in the kernel which maintains this interface assumes that the hardware driver (in this case the aforementioned soft fcoe driver) will update the speed as needed.  For real hardware fibre channel cards thats not a big deal, as the fibre channel driver gets a physical interrupt when phy status changes.   For software fcoe, its a bit more complex.  The software fcoe driver relies on netdevice events to determine link status changes (I.e. it registers a netdevice_notifier and listens for NETDEV_UP and NETDEV_CHANGE events).  This works pretty well, but theres an underlying assumption to it - namely that link speed will be known when link state changes from down to up.

What we've found in Gris' environment here is that that doesn't appear to be the case always.  There are cases in which the ixgbe adapter will get an interrupt indicating that the link is up, and as a result will set netif_carrier_on (triggering  a NETDEV_UP|CHANGE event), but, the mac, when queried on reception of the event, will still report an unknown link speed.  Compounding this, when the link speed is finally known (some arbitrary time later), the linkwatch code will not generate another NETDEV_CHANGE event (because the linkwatch RFC really only relates to link up/down status, not speed).  As such, the fcoe drvier never knows to update the speed again in the fc_lport->speed structure, and the sysfs attribute reads 'unknown' indefinately.

This is futher compounded by the fact that Gris' environment is suffering from another bug, one related to the firmware in the cisco fcoe switch we are connected to, which causes odd link failures between the two devices, and makes me suspicious that this is a one off situation, and we would never see any real discrepancy between link up/down status and link_speed in the wild, were it not for this firmware bug.


So, I'm trying to figure out what to do about this.  My options as I see them are:

1) Modify the libfc code so that on each query of hostX/speed, we poll down to the underlying driver, giving fcoe and other drivers a chance to query the hardware directly.  I'm not fond of this approach, as it seems to only be needed by this one attribute and only for software fcoe, and libfc is supposed to be hardware agnostic

2) Modify the fcoe driver so that, upon detecting an unknown link_speed, we queue some work to periodically check the link_speed until it is known.  This works, and I have it coded up, but it feels rather hackish.

3) Modify the netdev event code so that we send an event on speed changes.  This seems like an ok approach, but I'm sure it will get lots of pushback, especially if hardware isn't 'supposed' to have a lag between link up/down state and valid speed reporting.

4) Determine that the ixgbe hardware shouldn't be behaving this way, get the cisco/intel firmware bug fixed, and carry on, as we'll never see this again.

So, the question really boils down to  - is it legitimate for ixgbe hardware to report that a link is up, but running at an unknown speed?  If the answer is yes, then I think option 2 is the solution (or possibly 3 if we can identify more cards that fit this behavior).  If the answer is no, I'm inclined to close this bug as a dup of 880471, and get that firmware bug fixed.
Comment 28 Neil Horman 2013-01-14 14:19:43 EST
so I've spoken with Rob Love and Jesse B. (indirectly) at intel about this, and given the root cause analysis above, the consensus is that hardware isn't behaving properly here.  The NIC should never report a positive link up bit, while leaving the speed bits set to their reserved value (all zeros).  The problems caused by this are all results of that bad behavior.

Theres a strong chance that whatever is going here is also causing some of the other problems that have been filed (like bz 880471).  consensus between intel and myself is that the path of least resistance here is to switch out the ixgbe nic in the storageqe-13 system with another ixgbe card and see if the problems clear up (our suspicion is that it will fix this along with all the other port problems).  If you need to gris, you're welcome to use the ixgbe card temporarily out of my fcoe system in dell-pec6110-01.lab.bos.redhat.com

*** This bug has been marked as a duplicate of bug 880471 ***

Note You need to log in before you can comment on or make changes to this bug.