Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 460349

Summary: igb network driver takes too much time to establish link status
Product: Red Hat Enterprise Linux 5 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: agibson2, agospoda, alexander.h.duyck, andriusb, dmair, james.brown, jesse.brandeburg, peterm, ranshalit, rdoty, rpacheco, tao
Target Milestone: rcKeywords: OtherQA, Regression
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-10 18:00:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 391501, 441885    
Attachments:
Description Flags
igb-52-fix.patch none

Description Flavio Leitner 2008-08-27 17:49:00 UTC
Description of problem:

The igb network driver takes too much time to establish link status. This can 
happen if you reload the module or during the installation then it fails 
to acquire DHCP address.

Below are some messages from console during installation:
...
ADDRCONF(NETDEV_UP): eth0: link is not ready
igb 0000:04:00.0 NIC Link is Up 1000Mbps Full Duplex, Flow Control:RX
ADDRCONF(NETDEV_UP): eth0: link is not ready
igb 0000:04:00.0 NIC Link is Up 1000Mbps Full Duplex, Flow Control:RX
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_UP): eth0: link is not ready
...


The problem seems to be related to MSI interrupts handling because 
passing 'pci=nomsi' to kernel workaround the problem.

Version-Release number of selected component (if applicable):
RHEL5.2

How reproducible:
Very frequently, but not 100% of times.

Steps to Reproduce:
Install a RHEL-5.2 on a system with igb device or try to reload
the module ( it seems that anaconda reloads the module )

  
Actual results:
The driver takes too much time to establish link status failing to 
acquire DHCP address.

Comment 8 Andy Gospodarek 2008-09-08 17:59:34 UTC
Created attachment 316109 [details]
igb-52-fix.patch

Good news!  I have a small fix to the 5.2 igb driver that fixes this.  I took at look at what was being done and on systems that use MSI-X, and found that we were never properly starting the receive queues.

I've been testing this fix for a while and it seems to be exactly what we need.  It doesn't require a large driver update that magically fixes this, it's an actual fix.

My test kernels will only contain the updated patch for 5.3, so if testing of this patch on 5.2 (-92 kernel) is needed, someone else will probably need to build it.

Comment 9 Andy Gospodarek 2008-09-09 03:12:10 UTC
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel5

Please test them and report back your results.  Without immediate
feedback there is a good chance this or any other fix for this driver
will not be included in the upcoming update.

Comment 10 Jesse Brandeburg 2008-09-11 17:31:42 UTC
I examined this issue and it is one that is created by redhat having to backport kernel.org drivers to RHEL5.  I'm not sure if we should start tracking this kind of issue resolution separately.

I concur that Andy's patch appears correct.  booting with pci=nomsi caused the driver to only use 1 tx and rx queue due to only having one interrupt, which caused things to (mostly) work.

Comment 16 Issue Tracker 2008-10-13 16:57:48 UTC
File uploaded: ebet-32.png

This event sent from IssueTracker by jwest 
 issue 189601
it_file 163951

Comment 17 Issue Tracker 2008-10-13 16:57:54 UTC
File uploaded: rh52-64.png

This event sent from IssueTracker by jwest 
 issue 189601
it_file 163952

Comment 20 Andy Gospodarek 2008-11-10 18:00:59 UTC
This fix in comment #8 was already included in the big igb update fro 5.3 for bug 436040.

*** This bug has been marked as a duplicate of bug 436040 ***

Comment 21 static 2008-12-23 19:55:56 UTC
Any chance of backporting this fix to 5.2?  I have a similar issue with igb driver on Dell R200 hardware where no traffic is seen and 'Link Detected: no' but the rest of the link status is correct (duplex, etc).  A ethtool -r eth# causes 'Link Detected' to go active 'yes' and the interface to work but sometimes when I unplug and plug back in the interface the link state gets stuck at no again even though it is plugged in.

I can not access bug 436040 so I can not put this comment on that bug report so I am placing it here.

Comment 22 static 2008-12-23 20:01:43 UTC
I forgot to mention that pci=nomsi does fix the link detection issue for me but I worry that this workaround might not be a 100% fix after reading a comment above which mentions that it (mostly) works with pci=nomsi.

"I concur that Andy's patch appears correct.  booting with pci=nomsi caused the
driver to only use 1 tx and rx queue due to only having one interrupt, which
caused things to (mostly) work."

Comment 23 Andy Gospodarek 2008-12-23 20:30:46 UTC
Adam, with the release of 5.3 pending (in a matter of weeks), I'm not sure this will be able to slip into the 5.2 stream in time to be worth anything.  Will you be able to update to at least the newer kernel when it comes out?

And don't worry, you aren't missing much by not being able to access bug 436040 (seriously).

Comment 24 static 2008-12-23 23:24:53 UTC
Waiting is not such a big deal if 5.3 is expected in weeks.  The workaround will work for now until I upgrade them I guess.  I was just worried that the workaround might not be good enough based on comments above.  I am deploying a few firewalls so networking is pretty important.  Testing so far has revealed that the pci=nomsi is working fine.

Hopefully this question isn't too far off topic but what are the downsides to using pci=nomsi?  I did a google search but didn't really turn up anything.  I assume that since all ports share an IRQ there will be more overhead but I have a feeling it won't account for much with processor speeds nowadays.  Any ideas?

Comment 25 ranran 2018-04-08 15:08:25 UTC
Hello,

I have similar issue, but I don't have "eth0: link is not ready" messages.
I only see that it takes a lot of time (6 seconds in total: 3 from loading driver till link is up, and additional 3 seconds from link is up till ping.

Is it the same issue (I see it in buildroot) ? 
Should I try the patch ?

Regards,
ranran