545530 – mlx4_en excessive TCP retransmissions

Bug 545530 - mlx4_en excessive TCP retransmissions

Summary: mlx4_en excessive TCP retransmissions

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	5.8
Assignee:	Jay Fenlason
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	502912 600363 680163 719172
TreeView+	depends on / blocked

Reported:	2009-12-08 19:49 UTC by Issue Tracker
Modified:	2018-10-27 12:18 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-10-17 03:16:40 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Issue Tracker 2009-12-08 19:49:01 UTC

Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2009-12-08 19:49:02 UTC

Event posted on 12-08-2009 02:37pm EST by woodard

	From: 	Trent D'Hooge <tdhooge>
	Subject: 	mlx4_en driver issue
	Date: 	December 8, 2009 1:22:54 PM CST
	To: 	Ben Coyote Woodard <woodard>, woodard9, Ira Weiny <weiny2>

Sending this to you first before opening a ticket so that we are on the same 
page. Then we should open a ticket with RH.

RHEL5.3 used mtnic, RHEL5.4 uses the mlx4_en driver

Mellanox firmware version on the 10GigE card is 2.7.0. When using the mtnic 
driver we were at firmware version 2.5.914.

First problem seen:

The mlx4_en driver seems to be losing enough packets to cause a number of TCP
connections to fail, timeout, and then eventually get connected.  Lustre does
not like this and gets upset.  (Even if Lustre was not upset this could cause
major performance issues...)

first problem found by Ira:

The 10GigE card was not using MSI interupts. He fixed this, but we are still 
having problems.

from e-mails going around:

First our conclusion is the unified driver is BROKEN...  As I say at the
bottom of this email.  The only thing which has changed is the software.  We
are using the unified driver from RHEL 5.4.  The only modification has been
the patch I just applied to get MSI to work...


Now for the gory details...

After enabling MSI we still see connections getting into SYN_RECV and causing
problems.

ifconfig and ethtool show only a few errors on the RX side.  I don't know how
running these nodes back to back is going to present the problem.  Right now
running 2 nodes against each other through the switch results in no errors.  I
believe there is something more complex going on because of the large number
of TCP connections which Lustre establishes.

We still see a large number of retransmissions in TCP.

# hype139 /sys/module/mlx4_core/parameters > netstat -s | grep retrans
   1059776 segments retransmited
   569233 fast retransmits
   467332 forward retransmits
   2536 retransmits in slow start
   628 sack retransmits failed

# hype139 /sys/module/mlx4_core/parameters > ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:02:C9:04:6E:88  
         inet addr:172.16.1.201  Bcast:172.16.7.255  Mask:255.255.248.0
         inet6 addr: fe80::202:c9ff:fe04:6e88/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:7500  Metric:1
         RX packets:102570804 errors:20 dropped:23 overruns:23 frame:43
         TX packets:211462289 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000 
         RX bytes:83809240941 (78.0 GiB)  TX bytes:1162296790558 (1.0 TiB)

# hype139 /sys/module/mlx4_core/parameters > ifconfig eth3
eth3      Link encap:Ethernet  HWaddr 00:02:C9:04:6E:89  
         inet addr:172.16.9.203  Bcast:172.16.15.255  Mask:255.255.248.0
         inet6 addr: fe80::202:c9ff:fe04:6e89/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:7500  Metric:1
         RX packets:114022111 errors:0 dropped:29 overruns:29 frame:29
         TX packets:241317415 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000 
         RX bytes:90188106711 (83.9 GiB)  TX bytes:1345718307382 (1.2 TiB)

I attempted to turn on debugging in the driver but nothing is being printed to
the console.

# hype139 /sys/module/mlx4_core/parameters > 
cat /sys/module/mlx4_core/parameters/debug_level 
1

here are the core settings.

/sys/module/mlx4_core/parameters/block_loopback
1
/sys/module/mlx4_core/parameters/debug_level
1
/sys/module/mlx4_core/parameters/enable_qos
N
/sys/module/mlx4_core/parameters/internal_err_reset
1
/sys/module/mlx4_core/parameters/log_mtts_per_seg
3
/sys/module/mlx4_core/parameters/log_num_cq
0
/sys/module/mlx4_core/parameters/log_num_mac
2
/sys/module/mlx4_core/parameters/log_num_mcg
0
/sys/module/mlx4_core/parameters/log_num_mpt
0
/sys/module/mlx4_core/parameters/log_num_mtt
0
/sys/module/mlx4_core/parameters/log_num_qp
0
/sys/module/mlx4_core/parameters/log_num_srq
0
/sys/module/mlx4_core/parameters/log_num_vlan
0
/sys/module/mlx4_core/parameters/log_rdmarc_per_qp
0
/sys/module/mlx4_core/parameters/msi_x
1
/sys/module/mlx4_core/parameters/panic_on_catas
0
/sys/module/mlx4_core/parameters/set_4k_mtu
0
/sys/module/mlx4_core/parameters/use_prio
N

And the ethernet driver settings:

# hype139 /sys/module/mlx4_core/parameters > for file 
in /sys/module/mlx4_en/parameters/*; do echo $file; cat $file;  done
/sys/module/mlx4_en/parameters/inline_thold
104
/sys/module/mlx4_en/parameters/ip_reasm
1
/sys/module/mlx4_en/parameters/num_lro
0
/sys/module/mlx4_en/parameters/pfcrx
0
/sys/module/mlx4_en/parameters/pfctx
0
/sys/module/mlx4_en/parameters/rss_mask
5
/sys/module/mlx4_en/parameters/rss_xor
0


We are still looking for errors anywhere else in the system (ie on the
switches or other network cards).  But we have NOT FOUND ANY.  We ran for 3
days with Myricom cards over the weekend without any issues.  The mtnic driver
we were using previously worked (After much pain!).  So we are highly
suspicious of the new unified driver.  Once again, ONLY THE SOFTWARE HAS
CHANGED here...  :-(

Is there perhaps a FW upgrade which needs to be done with the unified driver?

# hype139 /sys/module/mlx4_core/parameters > mstflint -d 02:00.0 q
Image type:      ConnectX
FW Version:      2.7.0
Device ID:       26428
Chip Revision:   A0
Description:     Node             Port1            Port2            Sys image
GUIDs:           0002c9030004e948 0002c9030004e949 0002c9030004e94a 
0002c9030004e94b 
MACs:                             000000000000     000000000001     
Board ID:         (MT_0C40110009)
VSD:             
PSID:            MT_0C40110009
# hype139 /sys/module/mlx4_core/parameters > mstflint -d 85:00.0 q
Image type:      ConnectX
FW Version:      2.7.0
Device ID:       25448
Chip Revision:   A0
Description:     Port1            Port2
MACs:            0002c9046e88     0002c9046e89     
Board ID:         (MT_0BD0110004)
VSD:             
PSID:            MT_0BD0110004
# hype139 /sys/module/mlx4_core/parameters > mstflint -d 86:00.0 q
Image type:      ConnectX
FW Version:      2.7.0
Device ID:       26428
Chip Revision:   A0
Description:     Node             Port1            Port2            Sys image
GUIDs:           0002c9030004e928 0002c9030004e929 0002c9030004e92a 
0002c9030004e92b 
MACs:                             000000000000     000000000001     
Board ID:         (MT_0C40110009)
VSD:             
PSID:            MT_0C40110009


I don't know what else to try.  We will continue to look for a smaller scale
reproducer but nothing we have done so far is working.

Ira


Begin forwarded message:

Date: Tue, 8 Dec 2009 09:00:53 -0800
From: Jim Garlick <garlick>
To: weiny2, behlendorf1, morrone2
Subject: SYN_RECV connections are back on hype


Uh oh, looks like the old problem is back.
Jim

ehype139: Active Internet connections (w/o servers)
ehype139: Proto Recv-Q Send-Q Local Address               Foreign Address             
State      
ehype139: tcp        0      0 hype139-lnet0:lustresvc     strauss2-eth2:1023          
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     
pigs7-lnet0:edvrpftpd       SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     levi3-eth2:edvrpftpd        
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     strauss10-eth2:1023         
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     momus12-eth2:1021           
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     tycho12-lnet0:1021          
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     momus2-eth2:1023            
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     strauss13-eth2:1020         
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     levi4-eth2:1021             
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     momus5-eth2:1020            
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     pigs4-lnet0:1021            
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     strauss12-eth2:1021         
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     pigs2-lnet0:1023            
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     tycho4-lnet0:1021           
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     strauss1-eth2:1020          
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     momus14-eth2:1023           
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     
tycho7-lnet0:edvrpftpd      SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     momus9-eth2:1020            
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     strauss6-eth2:1023          
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     pigs14-lnet0:1023           
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     tycho10-lnet0:1023          
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     momus6-eth2:1023            
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     levi6-eth2:1023             
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     
pigs15-lnet0:edvrpftpd      SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     tycho6-lnet0:1023           
SYN_RECV    
ehype139: tcp        0      0 hype139-lnet0:lustresvc     momus10-eth2:1023           
SYN_RECV    

This event sent from IssueTracker by kbaxley  [LLNL (HPC)]
 issue 373976

Comment 5 Issue Tracker 2009-12-16 21:24:00 UTC

Event posted on 2009-12-16 13:23 PST by woodard

Thanks Doug,

Regarding the FW version Mellanox did not change much.  They sent me a
"2.7.0" version which reduced the number of outstanding PCI transactions
on the bus from 16 to 12 to 8 to 4.  I tried the 12, 8, and 4 versions. 
They thought there was evidence of a PCI issue but none of these helped.

From our point of view we did not think this was the issue but we tried
the FW just to make sure.

Ira



This event sent from IssueTracker by woodard 
 issue 373976

Comment 6 Jay Fenlason 2010-06-30 18:09:02 UTC

Did you try the driver Doug attached?  Did it behave any different than the 5.4 one?

Comment 9 Chris Samuel 2010-11-04 06:49:03 UTC

I've just added a bug of our own (relating our IBM HPC gear at VLSCI at the University of Melbourne) where we are seeing packets arriving on the physical eth1 10Gb/s interface being delivered incorrectly by the driver to eth0.

https://bugzilla.redhat.com/show_bug.cgi?id=649623

We have replicated with this using 3 cards (Mellanox ConnectX2 MT26448) in 2 different servers so we've pretty confident it's not a hardware problem.  This is with RHEL5.5

We've found that using the mlx4_en driver from the Mellanox site does seem to fix it though - so it might be worth investigating yourselves.

Comment 13 Chris Samuel 2011-06-21 03:49:47 UTC

Red Hat have told us this bug won't get fixed in 5.7, but they will look at whether or not they will fix it in 5.8. :-(

It does appear that RHEL 6.1 might have the newer version of the driver without this problem though..

Comment 15 David Mair 2011-10-17 03:16:40 UTC

Closing this as not a bug. The original customer report was closed indicating that the problem was due to faulty hardware. If you disagree with this please open a support case with Red Hat support at access.redhat.com

Note You need to log in before you can comment on or make changes to this bug.