Bug 505653

Summary: [RHEL5.4] ixgbe fixups for version 2.0.8-k2 specifically the 82599
Product: Red Hat Enterprise Linux 5 Reporter: Andy Gospodarek <agospoda>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: abdulkh, akent, andriusb, aparanja, bugproxy, bzeranski, cward, dzickus, jane.lv, jburke, jesse.brandeburg, jjarvis, john.ronciak, jvillalo, keve.a.gabbert, kzhang, luyu, martin.wilck, mbrodeur, mgahagan, mwagner, noboru.obata.ar, pbog, pbunyan, peterm, peter.p.waskiewicz.jr, rpacheco, scofeldm, syeghiay
Target Milestone: rcKeywords: FutureFeature, HardwareEnablement, OtherQA
Target Release: 5.4   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 472547 Environment:
Last Closed: 2009-09-02 08:15:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 472547, 510435, 511206    

Comment 1 Andy Gospodarek 2009-06-12 19:13:10 UTC
A new bug was needed to address any final ixgbe issues for RHEL 5.4.  My latest test kernels -- located here:

http://people.redhat.com/agospoda/#rhel5

contain what I hope is the final patch for ixgbe for RHEL 5.4.  I have verified that the problems we know about have been resolved, so if anyone cc'd on this bug is aware of any problems, please test the above kernels first and report back here with your results.

Thanks!

Comment 2 Andy Gospodarek 2009-06-12 19:19:19 UTC
Latest update from Intel (copied from bug 472547):

(In reply to comment #57)
> PJ, my test kernels have been updates with the code that we plan to ship for
> 5.4.  Could you or someone else help verify them?  We realize that based on
> list in comment #45 there are still some outstanding issues:
> 1. ethtool test failed 
> but I'm not sure if these have been resolved or if we can live with these or
> any additional problems.

The ethtool test failing is fine.  We've never had ethtool test support in
ixgbe until very recently in Dave Miller's net-next-2.6.  So not having it here
is not a problem.

> 4. unsupported SFP+ detection on Niantic does not seem to work (Intel only
> supports a limited set of SFP modules)

We can let this one go.  The good thing is SFP+ modules are functional at this
point; I'd be worried if supported modules weren't working.

> 5. 82598 LOM (aka SFP+ 82598 LOM - also the Sun Mezz adapter) doesn't get link.

We've retested this and it's not that it won't get link, it just takes a long
time to get link.  It's intermittent though, but link will always come up. 
This is a non-issue at this point.

> Thanks for all the help on this.  

You bet.  We should be in good shape from here.  Your 74 kernel was tested and
given the green light yesterday.

Comment 4 Andy Gospodarek 2009-06-12 19:55:14 UTC
*** Bug 504365 has been marked as a duplicate of this bug. ***

Comment 6 Don Zickus 2009-06-18 14:52:25 UTC
in kernel-2.6.18-154.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 8 John Ronciak 2009-06-19 15:56:54 UTC
Testing has started with the driver passing BAT.  It is missing the following device ID that is in the upstream driver: 10FC - XAUI_LOM.

Stress tests were started and we'll have results later today when they finish.

Comment 9 John Ronciak 2009-06-19 23:02:21 UTC
Stress tests have also passed.  One other issue was seen, FC is disable by default but should be enabled by default to match the upstream driver.

Comment 10 Andy Gospodarek 2009-06-22 14:52:55 UTC
I guess I'm not too surprised that the XAUI_LOM isn't supported when I see when it was added upstream pretty recently (with the set of patches that updated to version 2.0.24-k2) here:

commit 1fcf03e65650ed888543d33b018bec8dcd95c8e2
Author: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr>
Date:   Sun May 17 20:58:04 2009 +0000

    ixgbe: Add generic XAUI support to 82599


It is odd that FC is disabled by default.  2.6.18-154 contained this hunk that I thought would have addressed this:

@@ -2848,7 +2893,7 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
                adapter->max_msix_q_vectors = MAX_MSIX_Q_VECTORS_82599;
 
        /* default flow control settings */
-       hw->fc.requested_mode = ixgbe_fc_none;
+       hw->fc.requested_mode = ixgbe_fc_full;
        hw->fc.high_water = IXGBE_DEFAULT_FCRTH;
        hw->fc.low_water = IXGBE_DEFAULT_FCRTL;
        hw->fc.pause_time = IXGBE_DEFAULT_FCPAUSE;

Comment 11 John Ronciak 2009-06-22 22:20:32 UTC
For the FC problem, we can't get FC to be enabled ever.  So something is really wrong with it.  PJ was wondering if the driver is missing all the fc.requested_mode vs. fc.current_mode pieces.  It was a pretty large change that was submitted for this.  Let us know if you can't find the patch and we'll locate it.  In the mean time will check the driver for it.

Comment 12 Andy Gospodarek 2009-06-30 18:44:45 UTC
John, I've looked around at the driver and the intent was that we included all patches up through the one that created version 2.0.8-k2 of the driver.  We then pulled the following patches:

        commit 8be0e4671d6355b2d905cb8fd051393b2cbf9510
        Author: PJ Waskiewicz <peter.p.waskiewicz.jr>
        Date:   Tue Mar 31 21:34:05 2009 +0000
    
            ixgbe: Fix 82598 MSI-X allocation on systems with more than 8 CPU 

        commit cd7664f69fe1f3f75b664503ae3e11a2971a4865
        Author: Don Skidmore <donald.c.skidmore>
        Date:   Tue Mar 31 21:33:44 2009 +0000
    
            ixgbe: feature - driver to default with FC on.
  
        commit aa5aec888585fedcda7cfffc20f75240ad1cb42d
        Author: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr>
        Date:   Tue May 19 09:18:34 2009 +0000
    
            ixgbe: Add semaphore access for PHY initialization for 82599
    
        commit 1479ad4fbfbc801898dce1ac2d4d44f0c774ecc5
        Author: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr>
        Date:   Thu Jun 4 11:10:17 2009 +0000
    
            ixgbe: Change the 82599 PHY DSP restart logic

based on recommendations.  One problem that I noticed is that there is some  flow control initialization inside '#ifdev CONFIG_DCB' blocks, so is that a cause for concern?

If you can elaborate more on "we can't get FC to be enabled ever" do you mean that flow control cannot be enabled properly via ethtool or that flow-control appears to be enabled, but pause frames are never emitted when they should be?

Comment 13 John Ronciak 2009-07-01 17:29:32 UTC
We are looking at the CONFIG_DCB blocks for the FC things.  

According to our testers FC cannot be configured to work at all.  We use ethtool to configure it but it still isn't being configured.

Comment 14 Andy Gospodarek 2009-07-02 19:07:00 UTC
Thanks, John.  Any hints you guys have would obviously be helpful.

Comment 15 Martin Wilck 2009-07-03 09:15:33 UTC
I am sorry for this stupid question - our Intel contact person just told me that we need support for PCI ID 10F8. That ID is not in the EL5.4 beta driver. Do you know anything about it?

Comment 16 PJ Waskiewicz 2009-07-03 18:24:34 UTC
(In reply to comment #15)
> I am sorry for this stupid question - our Intel contact person just told me
> that we need support for PCI ID 10F8. That ID is not in the EL5.4 beta driver.
> Do you know anything about it?  

0x10f8 is the EEPROM-less device ID for 82599.  I just removed it from the upstream driver, since 82599 needs an EEPROM to function correctly.  Therefore we don't want the device coming online without an EEPROM and only half-working.  In other words, we do not want 0x10f8 in this driver.

Comment 17 Chris Ward 2009-07-03 18:45:51 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 18 Chris Ward 2009-07-10 19:14:21 UTC
~~ Attention Partners - RHEL 5.4 Snapshot 1 Released! ~~

RHEL 5.4 Snapshot 1 has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular request. Please test and report back your results here, at your earliest convenience. The RHEL 5.4 exception freeze is quickly approaching.

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. 

Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative.

Comment 19 Martin Wilck 2009-07-13 09:05:00 UTC
(In reply to comment #16)

>  In other words, we do not want 0x10f8 in this driver.  

We need support for PIC ID 8086:10f8. Intel has told us officially that this will be the PCI ID for our on-board Niantic chips (see below). Unfortunately I haven't been able to tell earlier because we haven't been told so by Intel.

*Please re-add 8086:10f8!*

Apart from the PCI ID, is there any extra code that is needed for this device?


From: Fodor, Zoltan [zoltan.fodor] 
Sent: Monday, July 13, 2009 10:47 AM
To: Uhe, Rudolf; Twito, Ofer
Subject: RE: PY Eth Mezz Card 10Gb 2 Port V2 (Niantic)

Hi Rudi,

I can OFFICIALLY confirm that 0x10F8 will be the Niantic KR Device ID.

Thank you,
Zoli

Comment 20 Andy Gospodarek 2009-07-13 13:36:08 UTC
(In reply to comment #19)
> (In reply to comment #16)
> 
> >  In other words, we do not want 0x10f8 in this driver.  
> 
> We need support for PIC ID 8086:10f8. Intel has told us officially that this
> will be the PCI ID for our on-board Niantic chips (see below). Unfortunately I
> haven't been able to tell earlier because we haven't been told so by Intel.
> 
> *Please re-add 8086:10f8!*
> 
> Apart from the PCI ID, is there any extra code that is needed for this device?
> 
> 
> From: Fodor, Zoltan [zoltan.fodor] 
> Sent: Monday, July 13, 2009 10:47 AM
> To: Uhe, Rudolf; Twito, Ofer
> Subject: RE: PY Eth Mezz Card 10Gb 2 Port V2 (Niantic)
> 
> Hi Rudi,
> 
> I can OFFICIALLY confirm that 0x10F8 will be the Niantic KR Device ID.
> 
> Thank you,
> Zoli  

Martin, the latest kernels in RHEL and those from Intel upsream do not support 0x10F8, so we have no way to add support for this right now.

Is there any way that Zoli meant 0x10FB instead?

Comment 21 John Ronciak 2009-07-13 18:25:52 UTC
The 0x10F8 device ID is not supported yet.  It is planned to release in WW39, (Sept. 25).  This has to do with the testing of the boards, basically validation.  So this ID should not be supported in RHEL5.4 as it misses the code cut-off by a long way.

Comment 22 Ronald Pacheco 2009-07-13 18:58:07 UTC
Martin,

Please open up a new feature request for RHEL 5.5.

Comment 23 Martin Wilck 2009-07-14 08:25:18 UTC
> Is there any way that Zoli meant 0x10FB instead?

Unfortunately, no.

Comment 24 Martin Wilck 2009-07-14 08:36:05 UTC
Opened bug #511206 for tracking 8086:10f8.

Comment 25 Zhang Kexin 2009-07-24 07:08:25 UTC
Hi Andy,

How is going with this bug? from comment #13, intel is investigating some FC issue, if this issue could not get solved, would it affect the ship of the all code? for now, could I say that the bug has been tested by Intel and result is acceptable except the FC issue?

thanks.

Comment 26 Andy Gospodarek 2009-07-27 14:53:54 UTC
The FC issue has not been resolved, but we plan to ship the new update without the FC fix if we do not have one.

Comment 27 Zhang Kexin 2009-07-27 15:40:53 UTC
tested by partner, only do code review, patch is in.

Comment 28 John Ronciak 2009-07-27 21:31:40 UTC
For #26, are you looking for something from Intel on this?  Upstream I believe is working correctly.

Comment 29 Andy Gospodarek 2009-07-28 01:01:02 UTC
John, if you guys are able to help me figure out if there is a register that isn't programmed correctly or something it would be helpful.  Fixing this is not the hottest issue I have, so it will be a little while before I get to it.  That doesn't mean I won't get to it if you don't, but help is always appreciated. :-)

Comment 30 Chris Ward 2009-08-03 15:47:09 UTC
~~ Attention Partners - RHEL 5.4 Snapshot 5 Released! ~~

RHEL 5.4 Snapshot 5 is the FINAL snapshot to be release before RC. It has been 
released on partners.redhat.com. If you have already reported your test results, 
you can safely ignore this request. Otherwise, please notice that there should be 
a fix available now that addresses this particular issue. Please test and report 
back your results here, at your earliest convenience.

If you encounter any issues while testing Beta, please describe the 
issues you have encountered and set the bug into NEED_INFO. If you 
encounter new issues, please clone this bug to open a new issue and 
request it be reviewed for inclusion in RHEL 5.4 or a later update, if it 
is not of urgent severity. If it is urgent, escalate the issue to your partner manager as soon as possible. There is /very/ little time left to get additional code into 5.4 before GA.

Partners, after you have verified, do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. 

Further questions can be directed to your Red Hat Partner Manager or other 
appropriate customer representative.

Comment 33 errata-xmlrpc 2009-09-02 08:15:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html