Bug 729352 - Qlogic NetXen Driver Not Working Correctly netxen_nic module
Summary: Qlogic NetXen Driver Not Working Correctly netxen_nic module
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Chad Dupuis (Cavium)
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-09 16:17 UTC by Dave Sullivan
Modified: 2018-11-29 20:34 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-09-21 14:06:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
src rpm for the HP provided Qlogic Network Card Driver/Module (2.11 MB, application/x-rpm)
2011-08-09 16:17 UTC, Dave Sullivan
no flags Details
Phantom core to take fw dump (662.33 KB, application/octet-stream)
2011-08-29 07:37 UTC, rajesh.borundia
no flags Details
Firmware Dump eth0 netxen_nic (2.85 MB, application/x-gzip)
2011-09-01 20:17 UTC, Dave Sullivan
no flags Details

Description Dave Sullivan 2011-08-09 16:17:13 UTC
Created attachment 517444 [details]
src rpm for the HP provided Qlogic Network Card Driver/Module

Description of problem:

The netxen_nic driver does not seem to work correctly.  When a network cable is unplugged from a quad port nic and then plugged back in the link lights do not come back on and the eth port stays down

Firmware was updated off of HP's support driver site to:

firmware-version: 4.0.556


Version-Release number of selected component (if applicable):

HP DL380 G6 Server

This problem is seen on either of the two RHEL5 kernels

Linux ustchscaeflx09 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

Linux ustchscbeflx09 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux


firmware-version: 4.0.556

0a:00.0 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42)
        Subsystem: Hewlett-Packard Company NC375T PCI Express Quad Port Gigabit Server Adapter

How reproducible:


Steps to Reproduce:
1.  Put quad card in place
2.  Cable up
3.  Reboot
4.  Pull a network cable from the quad core nic and plug it back in
  
Actual results:

The link light does not come back on and the eth port stays down.

Expected results:

the link light should come back on


Additional info:

I pulled down the driver/module for this network card from HP Support Driver site.  And built two rpms, loaded the rpms, which was required to flash the firmware up to 4.0.556.  Then I removed the rpms to try and use the default netxen_nic driver with the updated firmware.  Same issue.

I resolved the issue by reinstalling the two rpms provided by HP which blacklist the netxen_nic netxen_xport modules and utilize the nx_nic module.  Once I did this when I pull a cable on this card and then plug back in the link light comes back on as expected.  We have been trying to only utilize drivers provided by the Native Red Hat kernel but this issue will force us to use one provided by HP.

#these are the ones that were built from src and that were installed to make things work, provided from HP's support driver site
rpm -ivh hp-nx_nic-tools-4.0.556-2.x86_64.rpm kmod-hp-nx_nic-4.0.556-2.x86_64.rpm

Comment 1 Tony Camuso 2011-08-12 14:44:37 UTC
ACK

Have notified the HP networking group.

Comment 2 rajesh.borundia 2011-08-29 07:37:55 UTC
Created attachment 520304 [details]
Phantom core to take fw dump

Hi Dave,

I tried the test on "Hewlett-Packard Company NC375T PCI Express Quad Port Gigabit
Server Adapter" with 4.0.556 firmware and rhel5.7 inbox netxen_nic driver (2.6.18-274.el5). For me link was coming up properly after the test(unplug/plug the cable test).

Before running the test load the module with auto_fw_reset disabled.
modprobe netxen_nic auto_fw_reset=0

I am attaching a phantomcore_p3 binary to take a firmware dump.
After you hit the issue run
./phantomcore_p3 -i <interface>.
 
Also send the output of following commands before and after the test.
a. ethtool -i <interface>
b ethtool <interface>

Comment 3 rajesh.borundia 2011-08-31 18:06:05 UTC
Hi Dave,

Can you send the dmesg output of failure and success case ?
One after loading netxen_nic and one after loading nx_nic.

Is the interface connected to a switch ?
If yes then can you send the details of switch, why type of switch
and its name etc.

Also send ethtool -i output in both cases.

Comment 4 Simon Reber 2011-09-01 13:20:06 UTC
Hi all,

I have/had similar problems:

Card Information:
Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42)
Subsystem: Hewlett-Packard Company NC375i Integrated Quad Port Multifunction Gigabit Server Adapter

# ethtool -i eth0
driver: netxen_nic
version: 4.0.74
firmware-version: 4.0.544
bus-info: 0000:04:00.0

Driver netxen_nic with Version 4.0.74 wasn't working with firmware 4.0.544
 - couldn't set speed, duplex, autoneg through ethtool
 - I also wasn't able to bring up the interface

I then upgraded to RHEL 5.7 where we have netxen_nic driver version 4.0.75 available
 - was able to set speed, duplex, autoneg through ethtool
 - link is detected and interface is coming up
 - setting the interface to autoneg isn't working! I need to set ETHTOOL_OPTS
   in ifcfg-eth0
 - network performance is poor while comparing with the situation we had in the 
   past running driver version 4.0.74 and firmware version 4.0.534! 

It's probably worth to mention that our setup is a bit special, since we are connected to a 100MB/s Switch and not to 1000MB/s - but I don't see why this should cause such massive problems

Comment 5 Dave Sullivan 2011-09-01 18:20:33 UTC
(In reply to comment #3)
> Hi Dave,
> 
> Can you send the dmesg output of failure and success case ?
> One after loading netxen_nic and one after loading nx_nic.
> 
> Is the interface connected to a switch ?
> If yes then can you send the details of switch, why type of switch
> and its name etc.
> 
> Also send ethtool -i output in both cases.

I think it's kind of strange that we would ship a driver that requires this option to function correctly:

modprobe netxen_nic auto_fw_reset=0

I added options netxen_nic auto_fw_reset=0 to modprobe.conf and rebuilt the initrd for good measure.  Rebooted, and now when I pull a cable and plug back in the link light stays on as expected.

However, when I ran the ./phantomcore_p3 -i eth0 to take a firrware dump it hung up the system.  I logged in to the remote console and restarted the network, service network restart and saw some nasty errors.

I rebooted the system thinking maybe the firmware dump had something to do with this and it looks like when I restart the network (service network restart) all is good.  So I think we are good to go.

fyi, I'm not too worried about these messages, since it was caused by the firmware dump

Sep  1 13:30:54 ustchscaeflx09 kernel: eth6: firmware hang detected
Sep  1 13:41:07 ustchscaeflx09 kernel: netxen_nic: card response timeout

here's the ethtool request

root@ustchscaeflx09 ~]# !984
for i in 0 1 2 3 4 5 6 7; do echo "eth$i"; ethtool -i eth$i; done
eth0
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.0
eth1
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.1
eth2
driver: bnx2
version: 2.0.21
firmware-version: bc 4.6.4 NCSI 1.0.3
bus-info: 0000:02:00.0
eth3
driver: bnx2
version: 2.0.21
firmware-version: bc 4.6.4 NCSI 1.0.3
bus-info: 0000:02:00.1
eth4
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.2
eth5
driver: bnx2
version: 2.0.21
firmware-version: bc 4.6.4
bus-info: 0000:03:00.1
eth6
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.3
eth7
driver: bnx2
version: 2.0.21
firmware-version: bc 4.6.4
bus-info: 0000:03:00.0

One issue that I see is that in order for me to load the updated firmware-version from HP, it required me to have nx_nic module loaded.  So in order for me to do this. I installed the the following rpms 

hp-nx_nic-tools-4.0.556-2.x86_64.rpm
kmod-hp-nx_nic-4.0.556-2.x86_64.rpm

compiling them from src (pulled from HP support site)

I then got the required module (nx_xport.ko) loaded to run the firmware update.

./CP015529.scexe -s

Then I yum removed the two rpms above.  This seems somewhat painful of a process.  If both Red Hat and HP's driver is the same qlogic driver upstream why do we have different module names, would be nice for the two to get in sync.

Simon we do 100Mb/s so you shouldn't have a problem assuming you update the firmware to 4.0.556.  
[root@ustchscaeflx09 nicswap]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
DEVICE=eth0
ETHTOOL_OPTS="autoneg off speed 100 duplex full"
ONBOOT=yes
MASTER=bond1
SLAVE=yes
USERCTL=no
BOOTPROTO=none

Comment 6 Dave Sullivan 2011-09-01 20:17:13 UTC
Created attachment 521095 [details]
Firmware Dump eth0 netxen_nic

Comment 7 David Aquilina 2011-09-01 22:48:56 UTC
Perhaps this HP CA is related? 

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&task
Id=110&prodSeriesId=3913537&prodTypeId=329290&objectID=c02964542 

Chad, do you know what changes would have been made to hp's nx_nic driver for the updated firmware the CA references and if the same changes are in our netxen_nic driver?

Comment 8 rajesh.borundia 2011-09-02 05:41:56 UTC
(In reply to comment #5)
> (In reply to comment #3)
> > Hi Dave,
> > 
> > Can you send the dmesg output of failure and success case ?
> > One after loading netxen_nic and one after loading nx_nic.
> > 
> > Is the interface connected to a switch ?
> > If yes then can you send the details of switch, why type of switch
> > and its name etc.
> > 
> > Also send ethtool -i output in both cases.
> 
> I think it's kind of strange that we would ship a driver that requires this
> option to function correctly:
> 
> modprobe netxen_nic auto_fw_reset=0
> 
> I added options netxen_nic auto_fw_reset=0 to modprobe.conf and rebuilt the
> initrd for good measure.  Rebooted, and now when I pull a cable and plug back
> in the link light stays on as expected.
> 
> However, when I ran the ./phantomcore_p3 -i eth0 to take a firrware dump it
> hung up the system.  I logged in to the remote console and restarted the
> network, service network restart and saw some nasty errors.
> 
> I rebooted the system thinking maybe the firmware dump had something to do with
> this and it looks like when I restart the network (service network restart) all
> is good.  So I think we are good to go.
> 
> fyi, I'm not too worried about these messages, since it was caused by the
> firmware dump
> 
> Sep  1 13:30:54 ustchscaeflx09 kernel: eth6: firmware hang detected
> Sep  1 13:41:07 ustchscaeflx09 kernel: netxen_nic: card response timeout
> 
> here's the ethtool request
> 
> root@ustchscaeflx09 ~]# !984
> for i in 0 1 2 3 4 5 6 7; do echo "eth$i"; ethtool -i eth$i; done
> eth0
> driver: netxen_nic
> version: 4.0.75
> firmware-version: 4.0.556
> bus-info: 0000:0a:00.0
> eth1
> driver: netxen_nic
> version: 4.0.75
> firmware-version: 4.0.556
> bus-info: 0000:0a:00.1
> eth2
> driver: bnx2
> version: 2.0.21
> firmware-version: bc 4.6.4 NCSI 1.0.3
> bus-info: 0000:02:00.0
> eth3
> driver: bnx2
> version: 2.0.21
> firmware-version: bc 4.6.4 NCSI 1.0.3
> bus-info: 0000:02:00.1
> eth4
> driver: netxen_nic
> version: 4.0.75
> firmware-version: 4.0.556
> bus-info: 0000:0a:00.2
> eth5
> driver: bnx2
> version: 2.0.21
> firmware-version: bc 4.6.4
> bus-info: 0000:03:00.1
> eth6
> driver: netxen_nic
> version: 4.0.75
> firmware-version: 4.0.556
> bus-info: 0000:0a:00.3
> eth7
> driver: bnx2
> version: 2.0.21
> firmware-version: bc 4.6.4
> bus-info: 0000:03:00.0
> 
> One issue that I see is that in order for me to load the updated
> firmware-version from HP, it required me to have nx_nic module loaded.  So in
> order for me to do this. I installed the the following rpms 
> 
> hp-nx_nic-tools-4.0.556-2.x86_64.rpm
> kmod-hp-nx_nic-4.0.556-2.x86_64.rpm
> 
> compiling them from src (pulled from HP support site)
> 
> I then got the required module (nx_xport.ko) loaded to run the firmware update.
> 
> ./CP015529.scexe -s
> 
> Then I yum removed the two rpms above.  This seems somewhat painful of a
> process.  If both Red Hat and HP's driver is the same qlogic driver upstream
> why do we have different module names, would be nice for the two to get in
> sync.
> 
> Simon we do 100Mb/s so you shouldn't have a problem assuming you update the
> firmware to 4.0.556.  
> [root@ustchscaeflx09 nicswap]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
> DEVICE=eth0
> ETHTOOL_OPTS="autoneg off speed 100 duplex full"
> ONBOOT=yes
> MASTER=bond1
> SLAVE=yes
> USERCTL=no
> BOOTPROTO=none

The option auto_fw_reset=0 only disables firmware recovery in case of firmware hang. So do not set this option for normal operations. phantomcore_p3 takes a dump of firmware so it shuts down the firmware therefore in this case we have to disable auto recovery. Therefore you were having nasty messages in dmesg.

If you are getting linkup now then you should get it even after not setting 
auto_fw_reset=0 option. Please try now without setting this option, if you still see the problem send me the dmesg, /var/log/messages/, ethtool <interface>, ethtool -i <interface> output.

Comment 9 Simon Reber 2011-09-02 08:00:17 UTC
(In reply to comment #5)
> Simon we do 100Mb/s so you shouldn't have a problem assuming you update the
> firmware to 4.0.556.  
> [root@ustchscaeflx09 nicswap]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
> DEVICE=eth0
> ETHTOOL_OPTS="autoneg off speed 100 duplex full"
> ONBOOT=yes
> MASTER=bond1
> SLAVE=yes
> USERCTL=no
> BOOTPROTO=none
Dave,

The problem is not, that it does not work. The problem is that it doesn't perform as expected, respectively like it did before the firmware and OS upgrade (note, the OS upgrade was necessary, since the driver version 4.0.74 was not able to work with firmware version 4.0.544) 



Just to give you an idea - we have a second server, that runs the same hardware, but with the old firmware/drivers.
Both are attached to the same switch and are running 100MB/s full duplex

Legend:

external Server: server1
server running RHEL 5.6 firmware version 4.0.534: serverA
server running RHEL 5.7 firmware version 4.0.544: serverX


For testing purpose, I've created the following file and copied it over the network to an external server
scp: dd if=/dev/zero of=/home/~/test.file2 bs=1024 count=100000


----------


[ Server running RHEL 5.7 and is having performance issues ]
server1:~ $ time scp serverX:~/test.file .
test.file                                                                                                                                            100%   98MB   4.9MB/s   00:20

real    0m20.821s
user    0m3.278s
sys     0m0.854s
server1:~ $

server1:~ $ tracepath serverX
 1:  server1 (XX.XX.XX.218)             0.072ms pmtu 1500
 1:  SECRET (XX.XX.XX.254)      0.823ms
 2:  SECRET (XX.XX.XX.102)        1.049ms
 3:  SECRET (XX.XX.XX.5)   1.230ms
 4:  SECRET (XX.XX.XX.14)   1.669ms
 5:  serverX (XX.XX.XX.53)              2.389ms reached
     Resume: pmtu 1500 hops 5 back 5


---------


[ Server running RHEL 5.6 and does not show any network issue ]
server1:~ $ time scp serverA:~/test.file2 .
test.file2                                                                                                                                           100%   98MB  12.2MB/s   00:08

real    0m9.482s
user    0m3.417s
sys     0m0.864s
server1:~ $

server1:~ $ tracepath serverA
 1:  server1 (XX.XX.XX.218)             0.072ms pmtu 1500
 1:  SECRET (XX.XX.XX.254)      0.745ms
 2:  SECRET (XX.XX.XX.102)        1.214ms
 3:  SECRET (XX.XX.XX.5)   1.032ms
 4:  SECRET (XX.XX.XX.14)   1.039ms
 5:  serverA (XX.XX.XX.49)              0.600ms reached
     Resume: pmtu 1500 hops 5 back 5

Comment 10 Dave Sullivan 2011-09-02 14:47:55 UTC
(In reply to comment #8)
> (In reply to comment #5)
> > (In reply to comment #3)
> > > Hi Dave,
> > > 
> > > Can you send the dmesg output of failure and success case ?
> > > One after loading netxen_nic and one after loading nx_nic.
> > > 
> > > Is the interface connected to a switch ?
> > > If yes then can you send the details of switch, why type of switch
> > > and its name etc.
> > > 
> > > Also send ethtool -i output in both cases.
> > 
> > I think it's kind of strange that we would ship a driver that requires this
> > option to function correctly:
> > 
> > modprobe netxen_nic auto_fw_reset=0
> > 
> > I added options netxen_nic auto_fw_reset=0 to modprobe.conf and rebuilt the
> > initrd for good measure.  Rebooted, and now when I pull a cable and plug back
> > in the link light stays on as expected.
> > 
> > However, when I ran the ./phantomcore_p3 -i eth0 to take a firrware dump it
> > hung up the system.  I logged in to the remote console and restarted the
> > network, service network restart and saw some nasty errors.
> > 
> > I rebooted the system thinking maybe the firmware dump had something to do with
> > this and it looks like when I restart the network (service network restart) all
> > is good.  So I think we are good to go.
> > 
> > fyi, I'm not too worried about these messages, since it was caused by the
> > firmware dump
> > 
> > Sep  1 13:30:54 ustchscaeflx09 kernel: eth6: firmware hang detected
> > Sep  1 13:41:07 ustchscaeflx09 kernel: netxen_nic: card response timeout
> > 
> > here's the ethtool request
> > 
> > root@ustchscaeflx09 ~]# !984
> > for i in 0 1 2 3 4 5 6 7; do echo "eth$i"; ethtool -i eth$i; done
> > eth0
> > driver: netxen_nic
> > version: 4.0.75
> > firmware-version: 4.0.556
> > bus-info: 0000:0a:00.0
> > eth1
> > driver: netxen_nic
> > version: 4.0.75
> > firmware-version: 4.0.556
> > bus-info: 0000:0a:00.1
> > eth2
> > driver: bnx2
> > version: 2.0.21
> > firmware-version: bc 4.6.4 NCSI 1.0.3
> > bus-info: 0000:02:00.0
> > eth3
> > driver: bnx2
> > version: 2.0.21
> > firmware-version: bc 4.6.4 NCSI 1.0.3
> > bus-info: 0000:02:00.1
> > eth4
> > driver: netxen_nic
> > version: 4.0.75
> > firmware-version: 4.0.556
> > bus-info: 0000:0a:00.2
> > eth5
> > driver: bnx2
> > version: 2.0.21
> > firmware-version: bc 4.6.4
> > bus-info: 0000:03:00.1
> > eth6
> > driver: netxen_nic
> > version: 4.0.75
> > firmware-version: 4.0.556
> > bus-info: 0000:0a:00.3
> > eth7
> > driver: bnx2
> > version: 2.0.21
> > firmware-version: bc 4.6.4
> > bus-info: 0000:03:00.0
> > 
> > One issue that I see is that in order for me to load the updated
> > firmware-version from HP, it required me to have nx_nic module loaded.  So in
> > order for me to do this. I installed the the following rpms 
> > 
> > hp-nx_nic-tools-4.0.556-2.x86_64.rpm
> > kmod-hp-nx_nic-4.0.556-2.x86_64.rpm
> > 
> > compiling them from src (pulled from HP support site)
> > 
> > I then got the required module (nx_xport.ko) loaded to run the firmware update.
> > 
> > ./CP015529.scexe -s
> > 
> > Then I yum removed the two rpms above.  This seems somewhat painful of a
> > process.  If both Red Hat and HP's driver is the same qlogic driver upstream
> > why do we have different module names, would be nice for the two to get in
> > sync.
> > 
> > Simon we do 100Mb/s so you shouldn't have a problem assuming you update the
> > firmware to 4.0.556.  
> > [root@ustchscaeflx09 nicswap]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
> > DEVICE=eth0
> > ETHTOOL_OPTS="autoneg off speed 100 duplex full"
> > ONBOOT=yes
> > MASTER=bond1
> > SLAVE=yes
> > USERCTL=no
> > BOOTPROTO=none
> 
> The option auto_fw_reset=0 only disables firmware recovery in case of firmware
> hang. So do not set this option for normal operations. phantomcore_p3 takes a
> dump of firmware so it shuts down the firmware therefore in this case we have
> to disable auto recovery. Therefore you were having nasty messages in dmesg.
> 
> If you are getting linkup now then you should get it even after not setting 
> auto_fw_reset=0 option. Please try now without setting this option, if you
> still see the problem send me the dmesg, /var/log/messages/, ethtool
> <interface>, ethtool -i <interface> output.

Rajesh, I already ran that test without this option and that's when I noticed unplugging the cable and plugging it back in didn't bring the link light back on.  I can try to re-run the test again today to make sure.  I'm working with application folks who are using this machine so I have to get permission to run the test.

Simon this is why I was saying that it didn't work.  What I notice with the netxen_nic is that when I unplug and replug the link light doesn't come back right away.  After some time it does though.  I haven't had the opportunity to performance test this nic/driver set.  Apparently the option auto_fw_reset=0 option corrects the link light issue.  But according to Rajesh we shouldn't run in this mode.  There's obviously has to be some differences between the code sets for the HP nx_nic and the RH netxen_nic, not sure why since it's the same upstream qlogic driver.  But I assume this happens from time to time.

Comment 11 Dave Sullivan 2011-09-02 14:53:42 UTC
I already attached the hp nx_nic src rpm, should be fairly trivial to compare that to what's in the 2.6.18-274 kernel src rpm.

Comment 12 Dave Sullivan 2011-09-02 16:16:00 UTC
(In reply to comment #10)
> (In reply to comment #8)
> > (In reply to comment #5)
> > > (In reply to comment #3)
> > > > Hi Dave,
> > > > 
> > > > Can you send the dmesg output of failure and success case ?
> > > > One after loading netxen_nic and one after loading nx_nic.
> > > > 
> > > > Is the interface connected to a switch ?
> > > > If yes then can you send the details of switch, why type of switch
> > > > and its name etc.
> > > > 
> > > > Also send ethtool -i output in both cases.
> > > 
> > > I think it's kind of strange that we would ship a driver that requires this
> > > option to function correctly:
> > > 
> > > modprobe netxen_nic auto_fw_reset=0
> > > 
> > > I added options netxen_nic auto_fw_reset=0 to modprobe.conf and rebuilt the
> > > initrd for good measure.  Rebooted, and now when I pull a cable and plug back
> > > in the link light stays on as expected.
> > > 
> > > However, when I ran the ./phantomcore_p3 -i eth0 to take a firrware dump it
> > > hung up the system.  I logged in to the remote console and restarted the
> > > network, service network restart and saw some nasty errors.
> > > 
> > > I rebooted the system thinking maybe the firmware dump had something to do with
> > > this and it looks like when I restart the network (service network restart) all
> > > is good.  So I think we are good to go.
> > > 
> > > fyi, I'm not too worried about these messages, since it was caused by the
> > > firmware dump
> > > 
> > > Sep  1 13:30:54 ustchscaeflx09 kernel: eth6: firmware hang detected
> > > Sep  1 13:41:07 ustchscaeflx09 kernel: netxen_nic: card response timeout
> > > 
> > > here's the ethtool request
> > > 
> > > root@ustchscaeflx09 ~]# !984
> > > for i in 0 1 2 3 4 5 6 7; do echo "eth$i"; ethtool -i eth$i; done
> > > eth0
> > > driver: netxen_nic
> > > version: 4.0.75
> > > firmware-version: 4.0.556
> > > bus-info: 0000:0a:00.0
> > > eth1
> > > driver: netxen_nic
> > > version: 4.0.75
> > > firmware-version: 4.0.556
> > > bus-info: 0000:0a:00.1
> > > eth2
> > > driver: bnx2
> > > version: 2.0.21
> > > firmware-version: bc 4.6.4 NCSI 1.0.3
> > > bus-info: 0000:02:00.0
> > > eth3
> > > driver: bnx2
> > > version: 2.0.21
> > > firmware-version: bc 4.6.4 NCSI 1.0.3
> > > bus-info: 0000:02:00.1
> > > eth4
> > > driver: netxen_nic
> > > version: 4.0.75
> > > firmware-version: 4.0.556
> > > bus-info: 0000:0a:00.2
> > > eth5
> > > driver: bnx2
> > > version: 2.0.21
> > > firmware-version: bc 4.6.4
> > > bus-info: 0000:03:00.1
> > > eth6
> > > driver: netxen_nic
> > > version: 4.0.75
> > > firmware-version: 4.0.556
> > > bus-info: 0000:0a:00.3
> > > eth7
> > > driver: bnx2
> > > version: 2.0.21
> > > firmware-version: bc 4.6.4
> > > bus-info: 0000:03:00.0
> > > 
> > > One issue that I see is that in order for me to load the updated
> > > firmware-version from HP, it required me to have nx_nic module loaded.  So in
> > > order for me to do this. I installed the the following rpms 
> > > 
> > > hp-nx_nic-tools-4.0.556-2.x86_64.rpm
> > > kmod-hp-nx_nic-4.0.556-2.x86_64.rpm
> > > 
> > > compiling them from src (pulled from HP support site)
> > > 
> > > I then got the required module (nx_xport.ko) loaded to run the firmware update.
> > > 
> > > ./CP015529.scexe -s
> > > 
> > > Then I yum removed the two rpms above.  This seems somewhat painful of a
> > > process.  If both Red Hat and HP's driver is the same qlogic driver upstream
> > > why do we have different module names, would be nice for the two to get in
> > > sync.
> > > 
> > > Simon we do 100Mb/s so you shouldn't have a problem assuming you update the
> > > firmware to 4.0.556.  
> > > [root@ustchscaeflx09 nicswap]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
> > > DEVICE=eth0
> > > ETHTOOL_OPTS="autoneg off speed 100 duplex full"
> > > ONBOOT=yes
> > > MASTER=bond1
> > > SLAVE=yes
> > > USERCTL=no
> > > BOOTPROTO=none
> > 
> > The option auto_fw_reset=0 only disables firmware recovery in case of firmware
> > hang. So do not set this option for normal operations. phantomcore_p3 takes a
> > dump of firmware so it shuts down the firmware therefore in this case we have
> > to disable auto recovery. Therefore you were having nasty messages in dmesg.
> > 
> > If you are getting linkup now then you should get it even after not setting 
> > auto_fw_reset=0 option. Please try now without setting this option, if you
> > still see the problem send me the dmesg, /var/log/messages/, ethtool
> > <interface>, ethtool -i <interface> output.
> 
> Rajesh, I already ran that test without this option and that's when I noticed
> unplugging the cable and plugging it back in didn't bring the link light back
> on.  I can try to re-run the test again today to make sure.  I'm working with
> application folks who are using this machine so I have to get permission to run
> the test.
> 
> Simon this is why I was saying that it didn't work.  What I notice with the
> netxen_nic is that when I unplug and replug the link light doesn't come back
> right away.  After some time it does though.  I haven't had the opportunity to
> performance test this nic/driver set.  Apparently the option auto_fw_reset=0
> option corrects the link light issue.  But according to Rajesh we shouldn't run
> in this mode.  There's obviously has to be some differences between the code
> sets for the HP nx_nic and the RH netxen_nic, not sure why since it's the same
> upstream qlogic driver.  But I assume this happens from time to time.

Hmm, ok, we just ran the test (pulling cable/replugging) again and it seems to work just fine.  So now I am a bit confused.  This test was without the auto_fw_reset=0 setting in modprobe.conf.

Maybe somehow I got the the kernel versions and firmware versions mixed up.  So let me run some netperf tests on this nic.

[root@ustchscaeflx09 nicswap]# for i in 0 1 2 3 4 5 6 7; do echo "eth$i"; ethtool -i eth$i; done
eth0
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.0
eth1
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.1
eth2
driver: bnx2
version: 2.0.21
firmware-version: bc 4.6.4 NCSI 1.0.3
bus-info: 0000:02:00.0
eth3
driver: bnx2
version: 2.0.21
firmware-version: bc 4.6.4 NCSI 1.0.3
bus-info: 0000:02:00.1
eth4
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.2
eth5
driver: bnx2
version: 2.0.21
firmware-version: bc 4.6.4
bus-info: 0000:03:00.1
eth6
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.3
eth7
driver: bnx2
version: 2.0.21
firmware-version: bc 4.6.4
bus-info: 0000:03:00.0

[root@ustchscaeflx09 nicswap]# cat /etc/modprobe.conf
alias scsi_hostadapter cciss
alias scsi_hostadapter1 ata_piix
alias scsi_hostadapter2 lpfc
alias scsi_hostadapter3 usb-storage
options lpfc lpfc_lun_queue_depth=16 lpfc_devloss_tmo=10 lpfc_discovery_threads=32
#options netxen_nic auto_fw_reset=0
alias eth0 netxen_nic
alias eth1 netxen_nic
alias eth2 bnx2
alias eth3 bnx2
alias eth4 netxen_nic
alias eth5 bnx2
alias eth6 netxen_nic
alias eth7 bnx2
alias bond0 bonding mode=1 miimon=100
alias bond1 bonding mode=1 miimon=100

#new initrd 
-rw------- 1 root root 3768901 Sep  2 11:10 initrd-2.6.18-274.el5.img

Comment 13 Marvell Linux NIC Driver 2011-09-06 14:41:23 UTC
Is any of the failing configurations using half duplex settings? Since the hardware does not support half duplex, there is a workaround in the 4.0.556 firmware to declare link down to the host, when it detects half duplex.

Comment 14 Dave Sullivan 2011-09-06 15:26:49 UTC
(In reply to comment #13)
> Is any of the failing configurations using half duplex settings? Since the
> hardware does not support half duplex, there is a workaround in the 4.0.556
> firmware to declare link down to the host, when it detects half duplex.

From my perspective all is good with the 4.0.556 firmware update.

I transferred a 1GB file one using bnx2 and one using netxen_nic and I'm seeing about the same performance.

I also ran netperf tests over the weekend across the netxen_nic/driver and all seems ok.

[root@deruescaeflx05 tmp]# #bnx2 nic
[root@deruescaeflx05 tmp]# time scp ./1gb.dat xxx.xx.xx.xx:/tmp/
Warning: Permanently added '121.74.251.7' (RSA) to the list of known hosts.

Subject to applicable law, anyone using the Network expressly consents to:   

1)  having his/her network activity monitored and recorded; and, 

2)  using the Network only in accordance with the terms of the applicable 
    Acceptable Use Practices (www.NetworkAUP.com). 

Your work  product created, transmitted  or stored  on GM networks  or systems,
including your name or other personally identifiable information, may be shared
with  other GM  entities, suppliers  and third  parties around  the globe  when
required for business or legal purposes.

BE ADVISED,  that improper usage  of the  network and/or computing  systems and
equipment may result in disciplinary action, up to and including termination of
employment. If  possible criminal activity  is detected, system records  may be
provided to law enforcement officials.

1gb.dat                                       100% 1024MB  51.2MB/s   00:20    

real	0m20.606s
user	0m19.453s
sys	0m1.592s
[root@deruescaeflx05 tmp]# #netxen_nic
[root@deruescaeflx05 tmp]# time scp ./1gb.dat xx.xx.xx.xx:/tmp/
Warning: Permanently added '10.22.5.2' (RSA) to the list of known hosts.

Subject to applicable law, anyone using the Network expressly consents to:   

1)  having his/her network activity monitored and recorded; and, 

2)  using the Network only in accordance with the terms of the applicable 
    Acceptable Use Practices (www.NetworkAUP.com). 

Your work  product created, transmitted  or stored  on GM networks  or systems,
including your name or other personally identifiable information, may be shared
with  other GM  entities, suppliers  and third  parties around  the globe  when
required for business or legal purposes.

BE ADVISED,  that improper usage  of the  network and/or computing  systems and
equipment may result in disciplinary action, up to and including termination of
employment. If  possible criminal activity  is detected, system records  may be
provided to law enforcement officials.

1gb.dat                                       100% 1024MB  48.8MB/s   00:21    

real	0m21.078s
user	0m19.353s
sys	0m1.422s

Comment 15 Dave Sullivan 2011-09-06 18:50:39 UTC
My conclusion:

If you are deploying on 5.7 (2.6.18-274 kernel) then go with netxen_nic provided by Red Hat.  However if you have to deploy on 5.6 (2.6.18-238.XX) kernel then use the nx_nic kernel. 

We built the updated driver rpms for nx_nic for that specific kernel.  The netxen_nic with the 2.6.18-238 kernel has negotiation issues where as the nx_nic does not.


nx_nic testing 2.6.18-274 .....................................................................................................................................

[root@ustchscaeflx09 nicswap]# time scp /tmp/1gb.dat 192.168.109.2:/tmp/

Subject to applicable law, anyone using the Network expressly consents to:

1)  having his/her network activity monitored and recorded; and,

2)  using the Network only in accordance with the terms of the applicable
    Acceptable Use Practices (www.NetworkAUP.com).

Your work  product created, transmitted  or stored  on GM networks  or systems,
including your name or other personally identifiable information, may be shared
with  other GM  entities, suppliers  and third  parties around  the globe  when
required for business or legal purposes.

BE ADVISED,  that improper usage  of the  network and/or computing  systems and
equipment may result in disciplinary action, up to and including termination of
employment. If  possible criminal activity  is detected, system records  may be
provided to law enforcement officials.

1gb.dat                                    100% 1024MB  11.1MB/s   01:32

real    1m31.707s
user    0m19.856s
sys    0m2.036s
[root@ustchscaeflx09 nicswap]# ethtool -i eth6
driver: nx_nic
version: 4.0.556
firmware-version: 4.0.556
bus-info: 0000:0a:00.3
[root@ustchscaeflx09 nicswap]# ifdown eth6
[root@ustchscaeflx09 nicswap]# ifup eth6
[root@ustchscaeflx09 nicswap]# uname -a
Linux ustchscaeflx09 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
You have new mail in /var/spool/mail/root

netxen_nic testing ....................................................................................................................

[root@ustchscaeflx09 nicswap]# time scp /tmp/1gb.dat 192.168.109.2:/tmp/
Subject to applicable law, anyone using the Network expressly consents to:

1)  having his/her network activity monitored and recorded; and,

2)  using the Network only in accordance with the terms of the applicable
    Acceptable Use Practices (www.NetworkAUP.com).

Your work  product created, transmitted  or stored  on GM networks  or systems,
including your name or other personally identifiable information, may be shared
with  other GM  entities, suppliers  and third  parties around  the globe  when
required for business or legal purposes.

BE ADVISED,  that improper usage  of the  network and/or computing  systems and
equipment may result in disciplinary action, up to and including termination of
employment. If  possible criminal activity  is detected, system records  may be
provided to law enforcement officials.

1gb.dat                                    100% 1024MB  11.1MB/s   01:32

real    1m31.843s
user    0m20.007s
sys    0m1.966s
You have new mail in /var/spool/mail/root
[root@ustchscaeflx09 nicswap]# ethtool -i eth6
driver: netxen_nic
version: 4.0.75
firmware-version: 4.0.556
bus-info: 0000:0a:00.3
[root@ustchscaeflx09 nicswap]# uname -a
Linux ustchscaeflx09 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux 

nx_nic 2.6.18-238.xx ...........................................................................................................................

[root@ustchscbeflx09 nicswap]# ethtool -i eth6
driver: nx_nic
version: 4.0.550
firmware-version: 4.0.556
bus-info: 0000:0a:00.3
[root@ustchscbeflx09 nicswap]# ifdown eth6
[root@ustchscbeflx09 nicswap]# ifup eth6
[root@ustchscbeflx09 nicswap]# uname -a
Linux ustchscbeflx09 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@ustchscbeflx09 nicswap]# history | grep time
  694  time cp zsa_tables01.dbf /var/crash
  696  time scp zsa_tables01.dbf nodea:/var/crash
 1055  time scp /tmp/1gb.dat 192.168.109.1:/tmp/
 1068  history | grep time
 1069  time scp /tmp/1gb.dat 192.168.109.1:/tmp/
 1189  history | grep time
[root@ustchscbeflx09 nicswap]# time scp /tmp/1gb.dat 192.168.109.1:/tmp/

Subject to applicable law, anyone using the Network expressly consents to:

1)  having his/her network activity monitored and recorded; and,

2)  using the Network only in accordance with the terms of the applicable
    Acceptable Use Practices (www.NetworkAUP.com).

Your work  product created, transmitted  or stored  on GM networks  or systems,
including your name or other personally identifiable information, may be shared
with  other GM  entities, suppliers  and third  parties around  the globe  when
required for business or legal purposes.

BE ADVISED,  that improper usage  of the  network and/or computing  systems and
equipment may result in disciplinary action, up to and including termination of
employment. If  possible criminal activity  is detected, system records  may be
provided to law enforcement officials.

1gb.dat                                                                                        100% 1024MB  11.3MB/s   01:31

real    1m31.914s
user    0m19.518s
sys    0m2.141s

netxen_nic 2.6.18-238.xx .......................................................................................................

[root@ustchscbeflx09 nicswap]# ethtool -i eth6
driver: netxen_nic
version: 4.0.74
firmware-version: 4.0.556
bus-info: 0000:0a:00.3
[root@ustchscbeflx09 nicswap]# ifdown eth6
[root@ustchscbeflx09 nicswap]# ifup eth6
Cannot set new settings: Input/output error
  not setting speed
  not setting duplex
  not setting autoneg
[root@ustchscbeflx09 nicswap]# uname -a
Linux ustchscbeflx09 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

[root@ustchscbeflx09 nicswap]# time scp /tmp/1gb.dat 192.168.109.1:/tmp/

Subject to applicable law, anyone using the Network expressly consents to:

1)  having his/her network activity monitored and recorded; and,

2)  using the Network only in accordance with the terms of the applicable
    Acceptable Use Practices (www.NetworkAUP.com).

Your work  product created, transmitted  or stored  on GM networks  or systems,
including your name or other personally identifiable information, may be shared
with  other GM  entities, suppliers  and third  parties around  the globe  when
required for business or legal purposes.

BE ADVISED,  that improper usage  of the  network and/or computing  systems and
equipment may result in disciplinary action, up to and including termination of
employment. If  possible criminal activity  is detected, system records  may be
provided to law enforcement officials.

1gb.dat                                                                                        100% 1024MB  11.3MB/s   01:31

real    1m31.695s
user    0m19.778s
sys    0m2.235s

nx_nic 2.6.18.238.xx ...........................................................................................................................................

[root@ustchscbeflx09 nicswap]# time scp /tmp/1gb.dat 192.168.109.1:/tmp/

Subject to applicable law, anyone using the Network expressly consents to:

1)  having his/her network activity monitored and recorded; and,

2)  using the Network only in accordance with the terms of the applicable
    Acceptable Use Practices (www.NetworkAUP.com).

Your work  product created, transmitted  or stored  on GM networks  or systems,
including your name or other personally identifiable information, may be shared
with  other GM  entities, suppliers  and third  parties around  the globe  when
required for business or legal purposes.

BE ADVISED,  that improper usage  of the  network and/or computing  systems and
equipment may result in disciplinary action, up to and including termination of
employment. If  possible criminal activity  is detected, system records  may be
provided to law enforcement officials.

1gb.dat                                                                                        100% 1024MB  11.1MB/s   01:32

real    1m32.078s
user    0m19.623s
sys    0m1.815s
You have new mail in /var/spool/mail/root
[root@ustchscbeflx09 nicswap]# ethtool -i eth6
driver: nx_nic
version: 4.0.556
firmware-version: 4.0.530
bus-info: 0000:0a:00.3
[root@ustchscbeflx09 nicswap]# ifdown eth6
[root@ustchscbeflx09 nicswap]# ifup eth6
[root@ustchscbeflx09 nicswap]# uname -a
Linux ustchscbeflx09 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

netxen_nic 2.6.18.238.xx ...........................................................................................................................................

[root@ustchscbeflx09 nicswap]# time scp /tmp/1gb.dat 192.168.109.1:/tmp/

Subject to applicable law, anyone using the Network expressly consents to:

1)  having his/her network activity monitored and recorded; and,

2)  using the Network only in accordance with the terms of the applicable
    Acceptable Use Practices (www.NetworkAUP.com).

Your work  product created, transmitted  or stored  on GM networks  or systems,
including your name or other personally identifiable information, may be shared
with  other GM  entities, suppliers  and third  parties around  the globe  when
required for business or legal purposes.

BE ADVISED,  that improper usage  of the  network and/or computing  systems and
equipment may result in disciplinary action, up to and including termination of
employment. If  possible criminal activity  is detected, system records  may be
provided to law enforcement officials.

1gb.dat                                                                                        100% 1024MB  11.1MB/s   01:32

real    1m31.828s
user    0m19.584s
sys    0m1.923s
You have new mail in /var/spool/mail/root
[root@ustchscbeflx09 nicswap]# ifdown eth6
[root@ustchscbeflx09 nicswap]# ifup eth6
Cannot set new settings: Input/output error
  not setting speed
  not setting duplex
  not setting autoneg
[root@ustchscbeflx09 nicswap]# ethtool -i eth6
driver: netxen_nic
version: 4.0.74
firmware-version: 4.0.530
bus-info: 0000:0a:00.3

Comment 16 Simon Reber 2011-09-07 08:28:23 UTC
If I understood you correctly, would you suggest to update the NIC firmware to version 4.0.556 but keep using netxen_nic module?

If so, can someone please outline the changes that were made from firmware version 4.0.544 to 4.0.556 (can find any helpful information on www.hp.com)

Comment 17 Dave Sullivan 2011-09-07 13:03:54 UTC
(In reply to comment #16)
> If I understood you correctly, would you suggest to update the NIC firmware to
> version 4.0.556 but keep using netxen_nic module?
> 
> If so, can someone please outline the changes that were made from firmware
> version 4.0.544 to 4.0.556 (can find any helpful information on www.hp.com)

I'm saying upgrade the firmware to 4.0.556 and use the netxen_nic in 5.7 (2.6.18-274)

I still an issue with the 4.0556 and netxen_nic in 5.6 (2.6.18-238.xx) with it negotiating network settings, although I get the same results performance wise.

If you have to remain on 5.6 (2.6.18-238.XX) I would still recommend upgrading the firmware to 4.0.556, but you will have to pull the hp nx_nic driver and compile that for your kernel version as I noted from first post.

This is ultimately the same driver upstream (provided by qlogic), not sure why hp/rh don't use the same name.

Comment 18 Chad Dupuis (Cavium) 2011-09-12 14:59:52 UTC
(In reply to comment #7)
> Perhaps this HP CA is related? 
> 
> http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&task
> Id=110&prodSeriesId=3913537&prodTypeId=329290&objectID=c02964542 
> 
> Chad, do you know what changes would have been made to hp's nx_nic driver for
> the updated firmware the CA references and if the same changes are in our
> netxen_nic driver?

The change in the drivers is simply to check for the minimum firmware revision in the HP advisory.

Comment 21 rajesh.borundia 2011-09-16 17:51:52 UTC
The changes/Fixes for duplex settings went to upstream after rhel5.6 submission.
That is the reason it works with rhel5.7 inbox driver and not with rhel5.6.

Can we close this bug now.

Comment 22 Dave Sullivan 2011-09-16 18:08:31 UTC
Rajesh, from my perspective I would say yes.

But Simon mentioned some performance problems.  However I didn't notice any performance issues with my testing on 5.7 and the updated firmware.

-Dave

Comment 23 Simon Reber 2011-09-21 11:39:09 UTC
(In reply to comment #22)
> Rajesh, from my perspective I would say yes.
> 
> But Simon mentioned some performance problems.  However I didn't notice any
> performance issues with my testing on 5.7 and the updated firmware.
> 
> -Dave
Indeed, I do have some performance issues, which were not resolved by upgrading to firmware version 4.0.556.

I've now changed the cable and also the switch port and still, the performance is very bad.

Red Hat Professional support does suggest to update the kernel to version 2.6.18-274.3.1 since they have put driver updates into this version. Let's see if this helps

The strange thing is, that I see lots of retransmitted packages as well as lost segments, which usually indicates cable, mtu, physical issues - but as mentioned , I did change everything, and still the issue isn't resolved!

Anyway, I think the main issue (for which the case was created) has been resolved and I think it's OK if we close the case (since I also have an open case with Red Hat professional support).

Simon

Comment 24 Chad Dupuis (Cavium) 2011-09-21 14:06:37 UTC
Closing.  Resolved with RHEL 5.7 inbox driver and netxen firmware version 4.0.556.


Note You need to log in before you can comment on or make changes to this bug.