Bug 46291

Summary: (NET TULIP) 2.4.3-12 kills my network
Product: [Retired] Red Hat Linux Reporter: Michal Jaegermann <michal>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: len, patricio_zuniga, peterm
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tulip-diag output - de4x5 loaded and working card
none
tulip-diag output - tulip reloaded after de4x5 and network dead
none
tulip-diag output - tulip loaded after boot, network dead none

Description Michal Jaegermann 2001-06-27 17:37:12 UTC
(See possibly also #40394 and #44158)

After updating an Athlon machine to 7.1 distribution with 2.4.3-12
kernel my networking, which worked reliably over a year with a tulip
driver, died (although it may be restored, save hiccups from time to
time, by switching to de4x5 module).

Attached are outputs from 'tulip-diag -aa -mm -ee -f' for my card
and 2.4.3-12 kernel in the following situations:

tdiag.with.de4x5        - when de4x5 module is loaded
tdiag.tulip.after.de4x5 - tulip loaded after de4x5 was running
tdiag.with.tulip        - tulip loaded freshly after reboot

In the last case some variations were observed; like this (diff from
two different tries):

--- tdiag.with.tulip	Wed Jun 27 09:39:51 2001
+++ tdiag.with.tulip.too	Wed Jun 27 09:46:58 2001
@@ -2,7 +2,7 @@
  http://www.scyld.com/diag/index.html
 Index #1: Found a Digital DS21143 Tulip adapter at 0xc400.
 Digital DS21143 Tulip chip registers at 0xc400:
- 0x00: f8a08000 ffffffff ffffffff 070b7000 070b7200 f0000000 b2420200
fbfffbff
+ 0x00: f8a08000 ffffffff ffffffff 05bdd000 05bdd200 f0000000 b2420200
fbfffbff
  0x40: e0000000 fff483ff ffffffff 00000000 000060ca ffff0001 fffbffff
8ffd0000
  Port selection is 10mpbs-serial, full-duplex.
  Transmit stopped, Receive stopped, full-duplex.

So far the _only_ NIC driver for 2.4 series which works reliably for
me is 0.9.14d and this did not give me any troubles so far.

  Michal
  michal

Comment 1 Michal Jaegermann 2001-06-27 17:38:23 UTC
Created attachment 21967 [details]
tulip-diag output - de4x5 loaded and working card

Comment 2 Michal Jaegermann 2001-06-27 17:40:04 UTC
Created attachment 21968 [details]
tulip-diag output - tulip reloaded after de4x5 and network dead

Comment 3 Michal Jaegermann 2001-06-27 17:54:45 UTC
Created attachment 21969 [details]
tulip-diag output - tulip loaded after boot, network dead

Comment 4 Arjan van de Ven 2001-06-28 08:47:41 UTC
The interesting bit is that 2.4.3-12 has the exact same tulip version as before,
eg 0.9.14....... so why it stopped working is beyond me

Comment 5 len 2001-07-01 20:14:28 UTC
My VALinux Varstation 27 has a Digital 21143-chipset 10/100
fast ethernet card.  It worked fine under RedHat 7.1 with the 2.4.2-2
kernel.  I upgraded the kernel to 2.4.3-12, and it stopped
working.  I'm not sure at this point about the details.  If I
go back to the previous kernel it works again.  I compared the
tulip.o module for the two kernels and they are exactly the same.

In my case there is an added complication.  With the 2.4.2 kernel,
the USB controller and the network card share irq 9, apparently
happily.   I don't know if this is the problem under 2.4.3.

I will try the d4x5 module to see if that works.


Comment 6 Michal Jaegermann 2001-07-01 21:46:20 UTC
> The interesting bit is that 2.4.3-12 has the exact same tulip version as 
> before ...

Before of what?  I was reporting on previous occasions that a new tulip
driver is broken (see, for example, #44158 and also some postings on
linux-kernel list, and some other occasions as well).  This does not
seem to generate too much of interest.

Once again - so far among variants for 2.4 kernels (this includes stuff
from sourceforge) I found only a version 0.9.14d which is usable _for me_.
I have seen more, independent, reports of that kind so I am far from beeing
unique.


Comment 7 Leonard Evens 2001-07-02 01:01:16 UTC
I think the comment "same as before" probably meant that the files
tulip.o in the /lib/modules/2.4.2-2 and /lib/modules/2.4.3-12
are exactly the same.  I used cmp and found they are the same.

So whatever the problem is, it appears not to be the module itself,
but of course its interaction with other parts of the kernel could
cause the problem.

I tried using the de4x5 module instead, and it appears to work.
On further checking I find my current ethernet card is a DE500 PCI
10/100 fast ethernet, so that is not surprising.  However, the tulip
module is supposed to work with this card and it is what was
chosen automatically by kudzu under the 2.4.2 kernel when I
put that ethernet card in (replacing a 3Com card I had there
before).

I think there is something definitely wrong here.

Comment 8 Leonard Evens 2001-07-02 14:23:59 UTC
Here are some additional facts I left out previously.

My machine has a DEC DE500 10/100 fast ethernet card, and it is connected
to a Linksys DSL/Cable router.   It worked fine under RH7.1 with the
kernel 2.4.2-2.

The DSL/Cable router acts as a DHCP server.   With eth0 aliased to tulip,
under 2.4.3-12, /var/log/messages shows the same messages as under
2.4.2, but apparently ifup fails.  During this process the lights on
the DSL/Cable router blink on and off, and that continues even after
the attempt to start eth0 times out.  Because of that, I conjecture that
with the 2.4.3 kernel, the tulip module is not making an adequate
connection to the DSL/Cable router, so ifup naturally fails.  Also,
/proc/interrupts shows no IRQ assigned to the network card, whereas before
it always has shown it sharing IRQ 9 with the usb controller.  If
I alias eth0 to de4x5 instead, everything works properly, except that
according to the lights on the DSL/Cable router, it is only in half duplex.

Of course the problem could instead be some problem with pump or the
network startup scripts.

Comment 9 Michal Jaegermann 2001-07-02 14:52:19 UTC
Are you sure that 'ifconfig' fails for you?  In all cases which I have seen
so for 'ifconfig' succeeds and a card is up but left in a "Transmitter stopped"
state.

You can collect more information about your card status with 'tulip-diag'
which you can find on http://sourceforge.net/projects/tulip/ in tulip
driver sources or on http://www.scyld.com (slightly different version).

Comment 10 Leonard Evens 2001-07-02 16:04:27 UTC
I got tulip-diag and ran it, but I'm not sure what it is telling me.
Here is part of it when run under 2.4.2 with network down since it never
started.
Index #1: Found a Digital DS21143 Tulip adapter at 0x1400.
 Port selection is 10mpbs-serial, full-duplex.
 Transmit stopped, Receive stopped, full-duplex.
  The Rx process state is 'Stopped'.
  The Tx process state is 'Stopped'.
  The transmit threshold is 72.
  The NWay status register is 45e192ce.
.
.
.
Internal autonegotiation state is 'Transmit disabled'.

When run under 2.4.2, here is what I get

Index #1: Found a Digital DS21143 Tulip adapter at 0x1400.
 Port selection is 100mbps-SYM/PCS 100baseTx scrambler, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Waiting for packets'.
  The Tx process state is 'Idle'.
  The transmit threshold is 128.
  The NWay status register is 45e1d2cc.
  Internal autonegotiation state is 'Negotiation complete'.

It should be noted that under 2.4.3, the full duplex and 100 MHz light
on the dsl/cable router keep blinking on and off in unison, but according
to tulip-diag, the ethernet card is set at 10 MHz.

What do I make of all this, and is there anything else to try to get it
to work?





Comment 11 Michal Jaegermann 2001-07-02 16:44:17 UTC
>  Transmit stopped, Receive stopped, full-duplex.

Yes, this is the same failure I see on other occasions and no Linksys in sight.

An output from tulip-diag does not tell you very much but, hopefully, is
more meaningfull to somebody else with a knowledge of tulip internals.
You can get more with extra flags to tulip-diag but likely this will be
similar to what I already filed in attachements.


Comment 12 Michal Jaegermann 2001-09-05 02:30:39 UTC
The problem is still the same for 2.4.7-6 kernel and betas for an upcoming
7.2 distribution.  To make it even more exciting after replacing broken
tulip with an "alternative" (not quite, but say...) de4x5 driver one sees
the following:

# ifup eth0
RTNETLINK answers: Illegal seek

but a network seems to be operational regardless.

Just a reminder that 'tulip' did work for a long time until it got "improved".


Comment 13 Arjan van de Ven 2001-09-05 07:16:42 UTC
2.4.7-6 has the original 7.1 tulip driver available as tulip_old.o

Comment 14 Bugzilla owner 2004-09-30 15:39:03 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/