Bug 157147

Summary: e1000 corrupts data with large transfers
Product: Red Hat Enterprise Linux 4 Reporter: Rod Nayfield <nayfield>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-07 16:13:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rod Nayfield 2005-05-07 15:34:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050416 Red Hat/1.0.3-1.4.1 Firefox/1.0.3

Description of problem:
On an IBM T41 laptop, when using the e1000 (wired) ethernet, data corruption occurs.  Using the airo (wireless) built-in, the corruption does not occur.

This causes SILENT DATA CORRUPTION in some cases.

The bug appears to only occur when large amounts of data are sent through the interface.  For example, regular email use (small 1-3k replies, etc) works fine.  The issue manifests itself with email attachments.

Unfortunately, in that use case (email attachments) the corruption is not detected on the outbound SMTP connection.  Thus, every attachment sent is silently corrupted.  

If a MUA then places the attachment into an IMAPs folder (aka Sent) the corruption is discovered.  I believe this is due to the corruption affecting SSL checksums.

This also manifests itself in using SSH (specifically SCP of files).  

I would assume the same characteristics apply, and that an unencrypted (or less checksummed) protocol (like FTP) may exhibit the silent data corruption isuse.




Version-Release number of selected component (if applicable):
kernel-2.6.9-5.0.5.EL

How reproducible:
Always

Steps to Reproduce:
1. Use e1000 driver
2. scp file outbound


Additional info:

[rod@nayfield tmp]$ scp test.tgz nayfield.com:~/test1.tgz
nayfield.com's password:
test.tgz                                      100%   13MB  55.2KB/s   03:56
[rod@nayfield tmp]$ sudo ifdown eth0 #airo
Password:
[rod@nayfield tmp]$ sudo ifup eth1 #e1000

Determining IP information for eth1... done.
[rod@nayfield tmp]$ scp test.tgz nayfield.com:~/test2.tgz
nayfield.com's password:
test.tgz                                        1%  132KB 132.0KB/s   01:37 ETAReceived disconnect from x.x.x.x: 2: Corrupted MAC on input.
lost connection

Comment 1 Rod Nayfield 2005-05-07 15:39:14 UTC
Silent data loss example:

1. Use e1000 driver
2. Use mail client and don't copy sent to ssl imap
3. Send large file

4. Go to recipient maildir
5. Look at file

After ~ 700 lines characters outside of the base64 spec appear in the stream
(spaces, unprintable characters, garbage).  It looks similar to line noise on a
1200 bps modem.  Obviously the file is corrupt.


It is interesting that these issues are not caught by tcp checksums.



Comment 2 Rod Nayfield 2005-05-07 16:13:34 UTC
Issue with cisco zero-copy 

Turning off all offloading seems to fix.
# ethtool -K eth1 tx off
# ethtool -K eth1 rx off
# ethtool -K eth1 sg off
# ethtool -K eth1 tso off


*** This bug has been marked as a duplicate of 126869 ***