Bug 180063

Summary: network device (sky2 driver) stalls
Product: [Fedora] Fedora Reporter: Charles Lopes <tjarls>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED UPSTREAM QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: davej, kyrsjo, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-31 14:31:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Charles Lopes 2006-02-05 10:12:59 UTC
After a certain period of time that can go from a couple of hours to a couple of
days, the device stops receiving and sending packets. There's no oops or any
other kernel message generated. tcpdump of the interface still show local
packets going to this interface, but "ip -s link" show no change in RX or TX.
The only way I have found to restore network connectivity is to unload and
reload the kernel module.
The network device is built-in an Asus A8V-E motherboard. "lspci" gives this
information:
05:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 15)
05:00.0 0200: 11ab:4362 (rev 15)


Version-Release number of selected component (if applicable):
all development kernels with sky2 driver up to kernel-2.6.15-1.1884_FC5. I'm
trying out 1907 at the moment. I observed the same problem with version 0.9 and
0.10 of sky2 applied to rawhide kernels 2.6.14. I  didn't report it before
because I thought the problem could have been due to my patching. I haven't
tried it with a vanilla kernel yet.

Comment 1 Alexandre Oliva 2006-02-13 12:23:41 UTC
Is this an SMP box?  I'm experiencing very similar problems with my A8V Deluxe
motherboard, with Athlon64X2 processor.  Rawhide uses skge, FC4 uses sk98lin,
and both present the same problem.  Booting with maxcpus=1 appears to work
around it, but it's a shame to not be able to use the second core :-(

Comment 2 Charles Lopes 2006-02-13 12:55:43 UTC
Alexandre, if I'm not mistaken you are using the older sysconnect yukon chip
which is a quite different. The Yukon II chip doesn't work with the skge driver.
There is a version of sk98lin that supports it but it's not been accepted in the
kernel because it was felt that the new chip is too different to be handled by
the same driver. Otherwise to answer your question, my box is uniproc so your
work around will not work here. Anyway I'm using an entirely different driver,
sky2, so the issues are not the same.


Comment 3 Alexandre Oliva 2006-02-13 16:52:00 UTC
Ok, thanks, I've filed a separate report now, bug 181347.

Comment 4 Charles Lopes 2006-02-14 08:12:16 UTC
I have found a discussion on the netdev mailing list that could explain the
issue that I'm seeing as I'm using both netfilters and pppoe.

http://www.mail-archive.com/netdev@vger.kernel.org/msg06567.html

I'll try that patch in the next couple of days to see if that solves my problem.

Comment 5 John W. Linville 2006-02-20 18:40:15 UTC
FWIW, the above patch is already in rawhide... 

Comment 6 Charles Lopes 2006-02-20 21:40:08 UTC
Yes and the problem seems to be somewhere else. I've been trying sky2 0.16 with
the latest rawhide kernel over the week-end and it still hangs. I've collected
some data I've sent to Stephen Hemminger. I'll keep this bug updated.

Comment 7 Charles Lopes 2006-02-20 22:24:16 UTC
I'm now testing a pre-release version of sky2 1.0. I'll report the results here
as well.

Comment 8 Charles Lopes 2006-02-24 19:56:29 UTC
Version 1.0-rc1 seems to fix my problem. 80 hours without a stall so far.
Hopefully it'll make it into -rc5 or 2.6.16.



Comment 9 John W. Linville 2006-03-03 03:38:51 UTC
Test kernels w/ sky2 version 1.0-pre1 available here: 
 
   http://people.redhat.com/linville/kernels/fc5/ 
 
Please give those a try and post the results here...thanks! 

Comment 10 Charles Lopes 2006-03-09 21:54:19 UTC
I've been running kernel-2.6.15-1.2009.2.1_FC5.jwltest.12 for over 5 days now
without any problem. So that version fixes it my problem too. Thanks.


Comment 11 Kyrre Ness Sjøbæk 2006-03-23 18:07:24 UTC
I have a similar problem with the sky2 driver - internet connection works fine,
but if i try to connect to a host on the lan (ssh, nfs...), the connection hangs
within secounds.

The kernel shipped with fc5-t3(64-bit) was fine, but the one shipped with fc5
(32&64-bit) has the problem, and so has the kernel named 2.6.16-1.2070_FC5smp.

Running a Intel P4 system with hyperthreading.

Comment 12 Kyrre Ness Sjøbæk 2006-03-23 19:56:02 UTC
It seems to works just fine with the jwltest kernel - i have been transmitting a
large ammount of data (concurrent transfers, just to be evil) over NFS, using
Internet, and using evolution troughX-over-ssh, creating as much stress as i can
for the network card. It works, at least it works better. I have seen a few
hickups, but they went away after a few secounds (instead of minutes as with
other kernels).

Comment 13 Kyrre Ness Sjøbæk 2006-03-25 10:59:27 UTC
Just a small question: Will this patch be in the next update kernel?

Comment 14 Kyrre Ness Sjøbæk 2006-03-31 11:44:49 UTC
Bug is still present in (uname -a):
Linux storeulv 2.6.16-1.2080_FC5smp #1 SMP Tue Mar 28 03:55:15 EST 2006 i686
i686 i386 GNU/Linux

Will it be in the next update kernel then? Its quite obvious that the patch is
doing good - its replacing a totaly dysfunctional driver with a working one.

Comment 15 John W. Linville 2006-03-31 14:31:42 UTC
The patch will filter into the Fedora kernel from upstream.  Please be 
patient...thanks! 

Comment 16 Kyrre Ness Sjøbæk 2006-04-21 21:57:16 UTC
Hi.
I just updated to
Linux storeulv 2.6.16-1.2096_FC5smp #1 SMP Wed Apr 19 05:31:55 EDT 2006 i686
i686 i386 GNU/Linux
And the bug is still here. Any ETA for when it will hit fedora stable kerneles?