244587 – e1000 introduces latency on 965Q chipset

Bug 244587 - e1000 introduces latency on 965Q chipset

Summary: e1000 introduces latency on 965Q chipset

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Andy Gospodarek
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-06-17 19:27 UTC by Rod Nayfield
Modified:	2014-06-29 22:58 UTC (History)
CC List:	2 users (show)
Fixed In Version:	5.1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-09-10 18:01:41 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmidecode (9.28 KB, text/plain) 2007-06-17 19:28 UTC, Rod Nayfield	no flags	Details
lsmod (2.56 KB, text/plain) 2007-06-17 19:28 UTC, Rod Nayfield	no flags	Details
lspci -v -v (22.99 KB, text/plain) 2007-06-17 19:28 UTC, Rod Nayfield	no flags	Details
View All

Description Rod Nayfield 2007-06-17 19:27:11 UTC

Description of problem:

When using DQ965GF (965Q chipset motherboard) applications exhibit regular
"stutters" of many ms.  The simplest way to see this is to playback video.

Version-Release number of selected component (if applicable):
RHEL5 kernels through 2.6.18-8.1.6
RHEL5-rt kernel kernel-rt-2.6.21-23.el5rt (significantly less pronounced)
F7 kernels through 2.6.21-1.3228.fc7 

How reproducible:
Always

Steps to Reproduce:
1. Boot system
2. Play video, or run glxgears
  
Actual results:
Video stutters once every 5-10 seconds.  

Expected results:
Clean playback like on old system

Additional info:

If you service network stop and rmmod e1000 things clear up. 

Interestingly enough modprobing and restarting network does not seem to have the
issue.  I have not confirmed this with extensive testing (it worked once, could
be red herring, etc).  If this is the case, it might be an issue with driver
load order.

Comment 1 Rod Nayfield 2007-06-17 19:28:07 UTC

Created attachment 157229 [details]
dmidecode

Comment 2 Rod Nayfield 2007-06-17 19:28:30 UTC

Created attachment 157230 [details]
lsmod

Comment 3 Rod Nayfield 2007-06-17 19:28:54 UTC

Created attachment 157231 [details]
lspci -v -v

Comment 4 Rod Nayfield 2007-06-17 19:30:18 UTC

Attached dmidecode, lsmod, and lspci from the system.  
(Note - it is running F7 kernel right now)

Comment 6 Andy Gospodarek 2007-06-21 20:48:44 UTC

Rod, 

Anything show up in the logs like watchdog timeouts or anything like that?  

What about disabling TSO?  See any differences when turning it off?  I as mostly
because you are seeing bursty traffic and since TSO does stuff in chunks it
*could* be a culprit.  Normally I'm not sure I'd even suggest that anymore since
TSO seems pretty stable, but the jitter makes me wonder.

You could also try my rhel5 test kernels, but it's unlikely they will make a
difference since f7 seems hosed as well.

http://people.redhat.com/agospoda/#rhel5

One last suggestion...are you running NetworkManager on this system by any
chance?  It looks like there may be some sort of workaround for the 82566 that
could cause some delays, but those delays will really only happen if doing a
bunch of ethtool operations.  NetworkManager might be a good candidate for
someone calling the ethtool ioctl a bunch.  Here's the function I'm talking about:

/******************************************************************************
* Work-around for 82566 Kumeran PCS lock loss:
* On link status change (i.e. PCI reset, speed change) and link is up and
* speed is gigabit-
* 0) if workaround is optionally disabled do nothing
* 1) wait 1ms for Kumeran link to come up
* 2) check Kumeran Diagnostic register PCS lock loss bit
* 3) if not set the link is locked (all is good), otherwise...
* 4) reset the PHY
* 5) repeat up to 10 times
* Note: this is only called for IGP3 copper when speed is 1gb.
*
* hw - struct containing variables accessed by shared code
******************************************************************************/
static int32_t
e1000_kumeran_lock_loss_workaround(struct e1000_hw *hw)

Comment 7 Rod Nayfield 2007-06-22 17:13:44 UTC

NetworkManager is off, and there is hardly any ethernet traffic.  Like an idle
ssh session.

Again, the issue is that video is jittery, regularly pausing for a few ms every
few seconds.  Not network traffic.  

Example - I can boot the system and it exhibits the problem with just about no
network traffic whatsoever.  I remove and reinsert the e1000 module and the
problem is gone, even if I am scping 2 gigabyte files to another machine.

This issue can be seen using intel driver for the built in graphics (and via the
pci express ADD2 SDVO port) but it also is found when using a cheap PCI video
card using the nv driver.


Any quick way to monitor pci resets or activity?  (e.g. let's say the issue is
related to the problem that NM tries to workaround in the above function, not
the solution)

Comment 8 Rod Nayfield 2007-06-23 21:17:05 UTC

So I have seen the issue go away without removing the module.  

Interestingly enough, I am thinking that it is related to speed....  here is an
annoted log excerpt:

Jun 23 16:51:47 tv2 kernel: e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network
Connection
Jun 23 16:51:47 tv2 kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000
Mbps Full Duplex, Flow Control: None
**system has come up at gigabit**
**video issues observed**

**service network stop**
Jun 23 16:58:42 tv2 kernel: e1000: eth0: e1000_reset: Hardware Error
**video issues go away**

**service network start**
Jun 23 16:59:13 tv2 kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps
Full Duplex, Flow Control: RX/TX
Jun 23 16:59:13 tv2 kernel: e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO

**Hmm it didn't come up at 1000**
**ethtool -s eth0 speed 1000**
Jun 23 17:03:17 tv2 kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX/TX
Jun 23 17:03:20 tv2 kernel: e1000: eth0: e1000_watchdog: NIC Link is Down
Jun 23 17:03:23 tv2 kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps
Full Duplex, Flow Control: RX/TX
Jun 23 17:03:23 tv2 kernel: e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO


Several more times I proceeded to try to set 1000 and the link automatically
went down to 100.

Comment 9 Rod Nayfield 2007-06-23 21:23:58 UTC

also, remove and probing the e1000 module does not change the fact that it can't
do gigabit.

So the problem seems that:

e1000 can't get into gigabit mode except at boot
when in gigabit mode from boot, latency is seen

Comment 10 Andy Gospodarek 2007-08-22 20:07:34 UTC

Rod, It seems that your system has TSO disabled when linked up at 100Mbps.  Can
you reproduce the problem and then try 

# ethtool -K eth0 tso off

and see if you still have the extra jitter in your video stream?

You can verify that TSO is off by doing a 

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off

Thanks!

Comment 11 Rod Nayfield 2007-09-09 13:05:46 UTC

The system no longer negotiates gigabit on boot, it just ends up at 100.  


Thus I can't reproduce the issue anymore.

Comment 12 Andy Gospodarek 2007-09-10 18:01:41 UTC

Rod, Since I'm guessing that your device should never have negotiated to 1Gbps
anyway (since it now only does 100Mbps), I'm going to close this as resolved in
5.1.  Please reopen if needed.

Note You need to log in before you can comment on or make changes to this bug.