Bug 429008

Summary: sky2 network driver hangs under heavy load.
Product: Red Hat Enterprise Linux 5 Reporter: giuseppe bonacci <giuseppe.bonacci>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: low    
Version: 5.1CC: ad2clark, dbunt, herbert.xu, jolsa, nhorman, tgraf, uwe
Target Milestone: rc   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-03 18:58:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description giuseppe bonacci 2008-01-16 19:16:48 UTC
Description of problem:

On an Acer Veriton 7800, equipped with an Marvell Yukon ethernet adapter, the
"sky2" driver hangs after a few minutes under heavy network traffic.  The
interface is effectively disabled until the module is unloaded from the kernel
and reloaded again.  

I'm reporting the bug because I think I found a workaround that might be useful
to other people: loading the module with "modprobe sky2 disable_msi=1" (or
configuring the option in modprobe.conf) apparently solves the problem.

Version-Release number of selected component (if applicable):
kernel version 2.6.18-53.1.4.el5 { sky2 version 1.14 }

Additional info:
output from lspci -vn -s 02:00.0

02:00.0 0200: 11ab:4360 (rev 10)
        Subsystem: 1462:6300
        Flags: bus master, fast devsel, latency 0, IRQ 177
        Memory at fddfc000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at ee00 [size=256]
        [virtual] Expansion ROM at fdc00000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
        Capabilities: [e0] Express Legacy Endpoint IRQ 0
        Capabilities: [100] Advanced Error Reporting

Comment 1 David Bunt 2009-07-23 14:58:29 UTC
This problem is still present in 5.3. I'm using a ASUS P6T Deluxe V2 LGA 1366 Intel X58 with 2 onboard sky2 ethernet adapters. The disable_msi=1 *seems* to have worked, but I would much rather see a real fix than a kernel panic message in my log file.

Comment 2 David Bunt 2009-07-24 14:16:34 UTC
uname:
2.6.18-128.2.1.el5 #1 SMP Tue Jul 14 06:36:37 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

lspci:
04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)

04:00.0 0200: 11ab:4364 (rev 12)
	Subsystem: 1043:81f8
	Flags: bus master, fast devsel, latency 0, IRQ 169
	Memory at fbbfc000 (64-bit, non-prefetchable) [size=16K]
	I/O ports at b800 [size=256]
	Expansion ROM at fbbc0000 [disabled] [size=128K]
	Capabilities: [48] Power Management version 3
	Capabilities: [50] Vital Product Data
	Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
	Capabilities: [e0] Express Legacy Endpoint IRQ 0



dmesg:
sky2 eth0: enabling interface
sky2 eth0: ram buffer 0K
ADDRCONF(NETDEV_UP): eth0: link is not ready
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present
eth0: hw csum failure.


Call Trace:
 <IRQ>  [<ffffffff8004cfca>] __skb_checksum_complete+0x4a/0x62
 [<ffffffff88f529b1>] :ip_conntrack:udp_error+0x122/0x162
 [<ffffffff88f5123d>] :ip_conntrack:ip_conntrack_in+0x91/0x46a
 [<ffffffff80036dab>] ip_route_input+0x4e3/0xceb
 [<ffffffff80033bcc>] nf_iterate+0x41/0x7d
 [<ffffffff80237283>] ip_rcv_finish+0x0/0x2f7
 [<ffffffff800564d7>] nf_hook_slow+0x58/0xbc
 [<ffffffff80237283>] ip_rcv_finish+0x0/0x2f7
 [<ffffffff80035146>] ip_rcv+0x25b/0x57d
 [<ffffffff8002044a>] netif_receive_skb+0x37f/0x3ab
 [<ffffffff8822bf06>] :sky2:sky2_poll+0x82a/0xb27
 [<ffffffff8000c5c6>] net_rx_action+0xa4/0x1a4
 [<ffffffff80011fc3>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006cada>] do_softirq+0x2c/0x85
 [<ffffffff8006c962>] do_IRQ+0xec/0xf5
 [<ffffffff80056bac>] mwait_idle+0x0/0x4a
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80056be2>] mwait_idle+0x36/0x4a
 [<ffffffff80048d9e>] cpu_idle+0x95/0xb8
 [<ffffffff80076c3f>] start_secondary+0x45a/0x469

eth0: hw csum failure.

Call Trace:
 <IRQ>  [<ffffffff8004cfca>] __skb_checksum_complete+0x4a/0x62
 [<ffffffff88f529b1>] :ip_conntrack:udp_error+0x122/0x162
 [<ffffffff88f5123d>] :ip_conntrack:ip_conntrack_in+0x91/0x46a
 [<ffffffff80036dab>] ip_route_input+0x4e3/0xceb
 [<ffffffff80033bcc>] nf_iterate+0x41/0x7d
 [<ffffffff80237283>] ip_rcv_finish+0x0/0x2f7
 [<ffffffff800564d7>] nf_hook_slow+0x58/0xbc
 [<ffffffff80237283>] ip_rcv_finish+0x0/0x2f7
 [<ffffffff80035146>] ip_rcv+0x25b/0x57d
 [<ffffffff8002044a>] netif_receive_skb+0x37f/0x3ab
 [<ffffffff8822bf06>] :sky2:sky2_poll+0x82a/0xb27
 [<ffffffff8000c5c6>] net_rx_action+0xa4/0x1a4
 [<ffffffff80011fc3>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006cada>] do_softirq+0x2c/0x85
 [<ffffffff8006c962>] do_IRQ+0xec/0xf5
 [<ffffffff80056bac>] mwait_idle+0x0/0x4a
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80056be2>] mwait_idle+0x36/0x4a
 [<ffffffff80048d9e>] cpu_idle+0x95/0xb8
 [<ffffffff80076c3f>] start_secondary+0x45a/0x469

Comment 3 Neil Horman 2010-12-16 13:31:29 UTC
would you mind trying again with the 5.5 update (or preferably 5.6 when it comes out soon)?  I ask because several sky2 bugs have been fixed, including a wholesale update of the entire driver from upstream which, among other things, fixed several odd hangs.  Anything after 2.6.18-236.el5 should do fine.

Comment 4 giuseppe bonacci 2010-12-16 13:44:38 UTC
I've not that hardware available anymore, so I'm not able to help anymore.  perhaps David can?

Comment 5 Neil Horman 2011-11-03 18:58:03 UTC
closing due to extreeme inactivity.