Bug 72472

Summary:	kernel tries to load 8139cp instead of 8139too
Product:	Red Hat Enterprise Linux 2.1	Reporter:	Jos van den Oever <oever>
Component:	kernel	Assignee:	Jeff Garzik <jgarzik>
Status:	CLOSED WONTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.1	CC:	maspotts, peterm, trevor
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-10-01 05:17:03 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jos van den Oever 2002-08-24 09:32:04 UTC

Description of Problem:
On my compaq presario 1926, I've a pcmcia network card. This card needs the
8139too kernel module to work. Unfortunately, the kernel want to load 8139cp.
This not the correct driver and it knows. This is what is says:

8139too Fast Ethernet driver 0.9.25
Linux Kernel Card Services 3.1.22
  options:  [pci] [cardbus] [pm]
PCI: Found IRQ 9 for device 00:08.0
PCI: Sharing IRQ 9 with 00:08.1
PCI: Sharing IRQ 9 with 01:00.0
PCI: Found IRQ 9 for device 00:08.1
PCI: Sharing IRQ 9 with 00:08.0
PCI: Sharing IRQ 9 with 01:00.0
Yenta IRQ list 08b0, PCI irq9
Socket status: 30000020
Yenta IRQ list 08b0, PCI irq9
Socket status: 30000006
cs: cb_alloc(bus 2): vendor 0x10ec, device 0x8139
PCI: Enabling device 02:00.0 (0000 -> 0003)
8139cp 10/100 PCI Ethernet driver v0.0.7 (Feb 27, 2002)
8139cp: pci dev 02:00.0 (id 10ec:8139 rev 10) is not an 8139C+ compatible chip
8139cp: Try the "8139too" driver instead.
cs: IO port probe 0x0c00-0x0cff: clean.
cs: IO port probe 0x0100-0x04ff: excluding 0x378-0x37f 0x4d0-0x4d7
cs: IO port probe 0x0a00-0x0aff: clean.


Version-Release number of selected component (if applicable):
(null)

How Reproducible:
Problem at boot time. Removing and reinserting the card after bringing up the
network gives no problems.

Actual Results:
The network is not initialised on boot

Expected Results:


Additional Information:
The card does work with 8139too.
SuSE 8.0 has the same problem.

Comment 1 Bill Nottingham 2002-08-26 05:11:26 UTC

This is because they're both listed in the hotplug map...

Comment 2 Mike Potts 2002-10-18 17:50:40 UTC

This has just started happening to me after installing the new errata
kernel 2.4.18-17-7.x on my Sony VAIO PCG-XG19.  A work-around is to
run 'modprobe 8139too' after booting.  But for some reason if I add
'alias eth0 8139too' to /etc/modules.conf it doesn't work.  Is there
a correct way to specify a different driver in modules.conf?

Comment 3 Arjan van de Ven 2002-10-18 17:55:20 UTC

Jeff... your backyard ;)

Comment 4 Jeff Garzik 2002-10-18 18:36:54 UTC

I need to extend the kernel's PCI map to include PCI revision id, and patch
modutils to support it.

Comment 5 Mike Potts 2002-11-18 03:27:43 UTC

Any idea when that modification will make it into a new errata
kernel for RedHat 7.1?

Comment 6 Peter Surda 2003-01-11 23:58:49 UTC

I have a similar problem, I have RH7.2 and a "Surecom EP-428X" Cardbus card.
There is a very simple workaround, edit
/lib/modules/2.4.18-18.7.x/modules.pcimap and delete the line that begins with
"8139cp". BTW it worked without problems with older kernels (e.g. 2.4.9-31).

This is an anoyance. I recommend to keep the line deleted until a normal
solution can be found.

MfG shurdeek

Comment 7 Trevor Cordes 2006-10-01 04:32:36 UTC

This bug exists (reintroduced?) in FC5 2.6.17-1.2187_FC5smp!  I just upgraded a
box from FC3 to FC4 to FC5 and hit this exact bug.  Took me forever to figure
out as the symptoms were very strange.

This bug did NOT exist in FC3, at least on this system, which was running FC3
for over 1.5 years.  It may have existed in FC4 for the brief time (hours) I was
in transition through FC4, but I can't be certain.

The solution of removing the 8139cp line in pcimap works perfectly.  I tested
with over a dozen boots with and without the line and can confirm that the
entire bug hinges on the presence of that line.  Note, that if the line is NOT
deleted, sometimes the bug does NOT show up.  I'd say 1 out of 5 boots will work
even with the line intact.  Take out the line, and 100% of boots will work.  I
do believe there's some race condition in the order of module loading.

My NIC config: ne2k-pci as eth0; 8139too as eth1.

Here's a dmesg snippet from a "good" boot (pcimap line deleted):
ne2k-pci.c:v1.03 9/22/2003 D. Becker/P. Gortmaker
  http://www.scyld.com/network/ne2k-pci.html
eth0: RealTek RTL-8029 found at 0xa800, IRQ 137, 52:54:05:F6:1B:D0. gameport:
EMU10K1 is pci0000:00:09.1/gameport0, io 0xb400, speed 1169kHz
piix4_smbus 0000:00:04.3: Found 0000:00:04.3 device
8139too Fast Ethernet driver 0.9.27
eth1: RealTek RTL8139 at 0xd08fc000, 00:40:f4:15:2d:1c, IRQ 145
eth1:  Identified 8139 chip type 'RTL-8139C'
8139cp: 10/100 PCI Ethernet driver v1.2 (Mar 22, 2004)

Note, ne2k loads first and grabs eth0, as it should.

Snippet from a buggy boot (pcimap line intact):
8139too Fast Ethernet driver 0.9.27
eth0: RealTek RTL8139 at 0xd0836000, 00:40:f4:15:2d:1c, IRQ 145
eth0:  Identified 8139 chip type 'RTL-8139C'
gameport: EMU10K1 is pci0000:00:09.1/gameport0, io 0xb400, speed 1193kHz
8139cp: 10/100 PCI Ethernet driver v1.2 (Mar 22, 2004)
Linux video capture interface: v1.00
piix4_smbus 0000:00:04.3: Found 0000:00:04.3 device
SCSI subsystem initialized
ne2k-pci.c:v1.03 9/22/2003 D. Becker/P. Gortmaker
  http://www.scyld.com/network/ne2k-pci.html
eth1: RealTek RTL-8029 found at 0xa800, IRQ 137, 52:54:05:F6:1B:D0.

Note, 8139 captures eth0 in defiance of alias eth0 ne2k-pci in modprobe.conf.

The symptom I see on the buggy boots is that neither network interface will
function.  The eth0 won't get it's WAN DHCP.  The LAN eth1 will show its static
IP but cannot send or receive any traffic except for rx broadcasts.  ifdown'ing
and ifup'ing till the cows come how has no effect.

I can fix the problem on a buggy boot by ifdown'ing and rmmod'ing both NIC's and
ifup'ing the ne2k eth0 first.

Should this bug be switched to FC5 to indicate it's not a stale RHEL2 bug?

Comment 8 Trevor Cordes 2006-10-01 04:42:36 UTC

In case it wasn't clear, unlike the original poster, my system is a standard
desktop with PCI cards, not a laptop with PCMCIA cards.

Comment 9 Jeff Garzik 2006-10-01 05:17:03 UTC

This is not a kernel problem.  The kernel is not the entity that chooses to load
one driver or another.

Comment 10 Trevor Cordes 2006-10-01 20:47:30 UTC

Huh?  Instead of closing the bug, why not then reassign it to the correct
component?  If not the kernel, then what?

This is a serious bug that causes a machine to become non-remotely-accessible
and total network failure.  It must be a "bug" because on unchanged/untouched
hardware that ran perfectly in FC1 and FC3, all of a sudden it fails in FC5.

Deleting a line in modules.pcimap every time I yum update the kernel hardly
seems ideal.

Comment 11 Trevor Cordes 2006-10-01 22:25:55 UTC

Perhaps my bug is actually a new bug that just happens to share the solution to
the original bug?

I looked in rc.sysinit to see how FC5 loads these modules vs FC3.  It appears to
be quite different and hints that the problem could be in udev (/sbin/start_udev).

Any hints on how to proceed are appreciated.

Comment 12 Trevor Cordes 2006-10-01 22:32:42 UTC

Perhaps related (but not identical to) bug 178165.

Comment 13 Trevor Cordes 2006-10-01 23:05:13 UTC

Perhaps related: I just checked the problem system vs another FC5 system and was
surpised to notice that the modules.pcimap has the 8139xx entries in different
order.  And both these systems are running the exact same kernel!  I suppose
that pcimap file is generated by something else.  I'm wondering if the order of
8139too vs 8139cp may impact whether this bug gets hit or not.

Comment 14 Trevor Cordes 2006-10-01 23:22:34 UTC

As per bug 178165, adding HWADDR fields to the ifcfg-ethX files does indeed
appear to solve the problem.  I just did so and tested it and the modules are
loaded in the correct order (ne2k then 8139too) and eth0/eth1 are assigned
correctly.

I think the "bug" here is the change in human interface from using modprobe.conf
to specify ethX aliases to using HWADDR.  modprobe.conf not only is greatly
ignored (by udev) but it still seems to be used by ifup/down/etc and thus really
messes things up by creating conflicts.

Comment 15 Adam Thompson 2006-10-02 03:31:42 UTC

So what *does* control module loading order?  In FC5, it appears to be udev
(/sbin/start_udev, to be precise).
Obviously there are two "correct" solutions:
 1) fix pcimap code to [optionally] account for the Revision field, as suggested
by Jeff three years ago but which I can't find any indication of this having
happened yet, and
 2) merge 8139too and 8139cp modules, as there logically should NOT be two
different but NOT equal drivers for the exact same [logically speaking] piece of
hardware.  I believe situations such as aic_7xxx are different because the old
and new drivers provided essentially the same functionality for the same
hardware - the new driver added PCI IDs and dropped other PCI IDs, but for the
ones that were dropped the pcimap did not indicate the new driver to be
authoritative.

How do you copy an entire bug in Bugzilla, so this can be reopened under FC5,
component UDEV ?