Bug 112341 - traceroute breaks when used with 3c2000
Summary: traceroute breaks when used with 3c2000
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 9
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Mike McLean
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-12-18 00:23 UTC by Gabriel Schulhof
Modified: 2007-04-18 17:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-01-14 16:50:48 UTC
Embargoed:


Attachments (Terms of Use)
Simple UDP sendto/recvfrom utility (2.37 KB, application/x-tar)
2003-12-18 00:43 UTC, Gabriel Schulhof
no flags Details

Description Gabriel Schulhof 2003-12-18 00:23:40 UTC
Description of problem:
This problem applies to both RedHat 9 and Fedora Core 1.  The
computers involved have an Asus P4C800 Deluxe motherboard with
on-board Gigabit Ethernet (3c2000).  I downloaded the 3c2000 drivers
from 3Com and compiled them against 2.4.20-24.9 on RH9 and
2.4.22-1.2129.nptl on FC1.  I configured the interfaces using
redhat-config-network, and they work well indeed (ping, ssh, NIS+, NFS
all work).  Unfortunately, for some strange reason, traceroute doesn't
work !

IOW:  We have 4 nearly identical boxen based on the above-mentioned
motherboard connected to one another via a Linksys Gigabit Ethernet
switch.  They are set up identically, except one does firewalling and
NAT-ing to the outside world).  When theyt are all up and running,
they get IPs as follows:

192.168.2.2
192.168.2.3
192.168.2.4 <-- NAT/fw/dhcp machine, and default gw for all others
192.168.2.5

Thus, ping 192.168.2.3 from 192.168.2.5 does work, but traceroute
192.168.2.3 from 192.168.2.5 doesn't.
Version-Release number of selected component (if applicable):
     RedHat 9: traceroute-1.4a12-9
               kernel-smp-2.4.20-24.9
Fedora Core 1: traceroute-1.4a12-20.1
               kernel-2.4.22-1.2129.nptl
How reproducible:
Always

Steps to Reproduce:
1. Set up 2 machines using the hardware described and connect them via
   either cross-linked CAT6 or via a Gigabit Ethernet switch.

2. Install either Fedora Core 1 or RedHat 9.

3. Download the 3c2000 drivers from 3Com.

4. Compiled them against any RedHat 9 or Fedora Core 1 kernel.

5. insmod 3c2000.o

6. Bring up the 2 interfaces with, say

   ifconfig eth0 192.168.1.1 netmask 255.255.255.0 up

   and

   ifconfig eth0 192.168.1.2 netmask 255.255.255.0 up

7. On, say, 192.168.1.1, type

   traceroute -n 192.168.1.2

8. Watch it hang.

Actual results:
Traceroute hangs when attempted across two computers connected via
3c2000 cards.

Expected results:
Traceroute works properly whenever ping, ssh, NFS, and NIS+ all work
properly.

Additional information:
I have another system running Gentoo with a vanilla kernel (2.4.23)
and traceroute 1.4a12.  I created a boot cd using a minimal kernel and
a copy of the traceroute binary, plus copies of all other binaries
necessary for a minimal system having only bash, route, ifconfig,
traceroute, insmod, etc., and when I booted two of the computers and
brought up the interfaces as described above, traceroute worked just fine.

Comment 1 Gabriel Schulhof 2003-12-18 00:37:58 UTC
P.S:
The problem seems to be UDP-related.  When performing traceroute -I
(that is, using ICMP ECHO packets), traceroute works fine.

Additionally, I have written a simple sendto/recvfrom utility which
uses UDP to send single messages between two computers.  Irrespective
of what port I run the program on, I cannot send messages between two
computers connected via 3c2000 cards, whereas it works fine between
any other computers.

The source (225 lines - a no-brainer, really) is attached.

Comment 2 Gabriel Schulhof 2003-12-18 00:43:14 UTC
Created attachment 96596 [details]
Simple UDP sendto/recvfrom utility

Comment 3 Gabriel Schulhof 2003-12-18 02:45:54 UTC
I believe that, for some reason, kernel-level UDP is possible (because
NFS works), but user-level UDP is not possible (because my tool
doesn't work).

To further support this observarion, I have set up one of the boxen
with gdm such that it accepts XDMCP connections.  I also set up (for a
very short time ;o) ) an Internet-connected computer with XDMCP. 
Then, from the firewall machine, I ran

X -query 192.168.2.3

This didn't work.

Then, I ran

X -query <internet_ip>

and it worked immediately.

Comment 4 Aaron VanDevender 2004-01-13 21:15:49 UTC
I have an Asus A7V600, which has the same 3c2000 onboard Gigabit
Ethernet controller. Rather than using the drivers from the 3com
website, I used the drivers that came with the CD for the motherboard,
which can also be downloaded from the Asus website. After that it was
a simple:

export CC=gcc32
make
make install
modprobe 3c2000

and they are up and running, UDP packets and all, on Fedora Core 1.
(the gcc32 part is to match the gcc version that was used to compile
the fedora core kernels.)

Comment 5 Gabriel Schulhof 2004-01-14 04:00:36 UTC
Is this what you downloaded ?

3Com Gigabit LOM (3C940) Driver V1.00.00.0046 for Linux

I downloaded this and compiled it, and the resulting module, compiled
against RH9 2.4.20-24.9 is no different from the one I had before. 
Literally.  I ran diff on the module generated from the ASUS code and
the one from the 3Com code, and no difference, that is, diff did not
say "binary files differ".

Can you please tell me which Fedora core kernel you're running ?

Comment 6 Aaron VanDevender 2004-01-14 05:12:19 UTC
I have run every Fedora kernel that has come down the pipe with
success, but currently I am running kernel-2.4.22-1.2140.nptl.athlon.
It could be that there is a slight revision change in the controller
that you have on the P4C800 vs the one on the ASUS A7V600 that I have.
It could also be that differences in the chipsets that we have (VIA
KT600 vs Intel 875P) that is also causing it.

What happens when you drop the gentoo kernel you mentioned into your
Fedora machine and boot off of that? This is different than booting
from a gentoo live cd, as it would use the same modtools and libraries
that it was using with Fedora, and only the kernel would change. If
that works I'd try running a Fedora kernel built from the fedora
kernel-source RPM.

Another test I'd be interested to see is if you have 2 machines where
machine A has the 3c2000 and machine B doesn't, what happens if you
traceroute machine B from machine A? what about traceroute machine A
from machine B? Running tests with 3c2000 cards on both ends makes it
harder to sort out what's going on.

Comment 7 Phil Knirsch 2004-01-14 09:20:20 UTC
This sounds very much like a kernel driver issue, so i'm changing the
component to kernel (where it belongs. :-).

Read ya, Phil

Comment 8 Dave Jones 2004-01-14 16:50:48 UTC
third party driver issue.


Comment 9 acount closed by user 2004-01-15 17:40:49 UTC
instead 3c2000, try the _kernel_ driver sk98lin

Comment 10 Gabriel Schulhof 2004-01-15 17:56:24 UTC
Doesn't work ... "No such device".  It may be the fact that it's a
builtin card, rather than a PCI card.

Comment 11 acount closed by user 2004-01-16 18:07:45 UTC
then, try latest sk98lin from:

- 2.4.25-pre6
- or sysk web site 
http://www.syskonnect.com/syskonnect/support/driver/zip/linux/
- or latest Fedora 1 kernel errata


Note You need to log in before you can comment on or make changes to this bug.