10888 – (NET) Impossible to ping card and network

Bug 10888 - (NET) Impossible to ping card and network

Summary: (NET) Impossible to ping card and network

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	6.2
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Jeff Garzik
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2000-04-18 04:22 UTC by Stanley Suan
Modified:	2013-07-03 02:04 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-08-24 13:46:03 UTC
Embargoed:

Attachments	(Terms of Use)

Description Stanley Suan 2000-04-18 04:22:09 UTC

I get very strange bug with nic2003, nic2005 and 3com plug&play

first card are not detecting correctly (bad driver for card) at
autodetection at boot time
I give some course in linux and three teams in laboratory get
the same error.
Impossible to ping own network pc card. (there are one nic2003 or 2005-use
ne.o driver- and one 3com -use 3c59x.o driver-  network card inside). So I
use manually configuration
ifconfig eth0 down, rmmod modules. install modules again, changes modules,
ifconfig eth0 192.168.1.5, ifconfig and output of netstat -rn are correct
impossible to ping. Mmmm!! change ip number ofr 192.168.2.5 that's work now
incroyable. Mmm may be there procedure to verify same ip adress?? I put two
station in cross connect same result. I reproduced this error on two team.
I didn't know what happen. I get old ne.o and 3c59x.o modules on my redhat
5.2 server but code change so much (library...).

Comment 1 Anonymous 2000-04-26 17:17:59 UTC

I have found a similar problem. It doesn't matter if I use a NE2000, Ne2k-pci or
 DEC 21041 NIC with the appropriate drivers, I get the exact same results.
Symptoms: I can ping the main server, but I can't ping anybody else on the
sub-net. Everyone can ping the machine. I can telnet, ftp, whatever, I just
can't ping anybody but one machine. I have put a sniffer on the wire, and can
follow the conversation between the various machines, and ping is working,
except on the higher layers....Any help?

Comment 2 keith.moore 2000-08-17 11:56:24 UTC

I think the two here are not the same, on the second one, try mounting an NFS
drive.  If this fails it's probably a reverse routing issue (TCP works, but not
UDP/ICMP).

To the first, on the same machines, can you get other cards to work properly?

-- Keith Moore

Comment 3 Jay R. Ashworth, http://baylink.pitas.com 2000-10-19 14:47:40 UTC

This seems similar to the problem I'm having; I'll post it here rather than on a new bug for the moment:

RH6.2, Compaq Armada 7730MT laptop, 3c589. It's been working fine, but a couple of times now, networking's crapped it's drawers, for no apparent
reason. The symptoms are these: I can ping the laptop successfully from other machines on the subnet. If I ping out from the laptop, tcpdump in
another window can see the outbound ICMP ping and inbound ICMP pong packets... but they never get back up to userland. The MAC addresses of the
remote machines, however, *do* get added to the arp table, if they weren't already there.

Removing the default route makes the machine again behave properly WRT the local net. Adding static routes to remote machines works fine, too.

One other point of note: the ifup script, called from the PCMCIA stuff, seems to add *two* default routes, one with metric 0, and one with metric 1.
When
everything is otherwise working ok, this causes networking not to work until one of them is deleted. Right now, both need to be removed.

This behavior survives the ejection of the 3c589, manual down and up on the if, a reboot, and a power off.

But it *did* crop up once before, and then go away. I thought the replacement of the 3c589 was what fixed it, but I see I'm wrong.

The original was a 3c589c, the current card's a TPO. The cable is, obviously, 10BaseT.

This appears to be a bug in the routing kernel, but I can't pin it down any further than that, just yet.

I'm all over my mailbox on this one; any help will be greatly appreciated; I'll be glad to compile kernels, or swap them, etc, if necessary (I'm a unix guy of
15 years, and my first Linux box ran .99pl12f for almost 2 years; I've got a bit o' experience on this stuff).

Heeellpppp! :-)

Comment 4 Jay R. Ashworth, http://baylink.pitas.com 2000-10-19 14:49:50 UTC

Oh: when the machine is misbehaving this way, I can ping localhost, but cannot even ping my card's address.

This behavior cropped up with, so far as I can tell, no networking reconfig's.  (See: "But I didn't change anything!" :-)

Comment 5 Jay R. Ashworth, http://baylink.pitas.com 2000-10-31 16:30:15 UTC

I've now seen this problem in at least 3 environments.  It appears to have something possibly to do with the 2.2.16 kernel sending out bad ARP packets, 
because rebooting the gateway on the network makes the workstation come back to life.  This looks like it might be a fairly serious bug; is anyone 
paying attention?

Comment 6 Unstable Boy 2001-01-26 19:28:46 UTC

I appear to be having the same problem. I have noticed some other things on my machine 1: ftp and telnet to the machine is extreamly slow. 2: if I add the remote machine to the /etc/hosts file it seems to clear up for that machine only. I am running 6.2 on a Umax laptop w/ 3com nic.

Comment 7 Jay R. Ashworth, http://baylink.pitas.com 2002-08-20 01:34:38 UTC

Ok.

I've gotten this characterized quite a bit more tightly in the *24 months* 
since this boog was opened.

What appears to be happening is that *if the resolv.conf points to a broken or 
missing server*, and there is a default route, then packets are accepted in by 
the IP kernel, but not handed up to the upper layers.

I can ping to such a machine with no trouble.

If I try to ping out *from* such a machine, I can see the echo reply packets 
coming in on the interface in tcpdump in a separate window, but they aren't 
handed upstairs to ping for a *long* time -- the delay on the packets is 
repeatable and looks like: 
74,148,222,296,370,444,518,592,999,740,814,888,962,1036,1110,1184,1258.

Those are in *seconds*.

The sequence numbers are in order.

Obviously, as long as you can either dump the default route, or modify the 
resolv.conf, you can get around this, but neither's a great answer, and if 
you're trying to use Linux on, say, a laptop, as a diagnostic tool, it's 
miserable.

Any chance someone might look into this further?

jra

Comment 8 Arjan van de Ven 2002-08-20 17:12:30 UTC

Maybe a stupid question, but have you tried ping -n yet? (eg to avoid ping doing
dns lookups)

Comment 9 Jay R. Ashworth, http://baylink.pitas.com 2002-08-21 00:31:17 UTC

Yeah, this happens both ways.  Because, let's face it, ping is only gonna do 
those lookups once, at the beginning, right?

And leaving DNS broken but removing the default route fixes the problem.

I can even construct a *specific* route to a remote network via my router and 
everything works fine.  So it's not packets with remote addresses.

Sorry; what I'm pinging for test is always *on* the local network, which, to 
date, has been 192.168.n/24.

(nice to know we woke someone up; thanks. :-)

Comment 10 Jay R. Ashworth, http://baylink.pitas.com 2003-06-13 22:48:02 UTC

But I see we didn't wake anyone up all that far. 
 
Is anyone still seeing this?  I'm about to load yet a third new laptop -- with 7.3, cause I'm not 
happy with 9.0 as a production release yet.  More then.  But, really, is *anyone* with a 
redhat.com address listening? 
 
Cause, y'know, if you're not... close the damn bug or shut down the bz.

Comment 11 Jay R. Ashworth, http://baylink.pitas.com 2003-06-14 18:45:26 UTC

FOLO: a small amount of free time having cropped up, I tried to reproduce on RH9, which I 
have on a test box. 
 
Here it doesn't happen: placing an invalid nameserver address in resolv.conf does not make 
pings to the local network take forever. 
 
And one further comment: when I was seeing this problem on 7.2, it didn't appear to affect 
telnet -- I could telnet to local hosts just fine.  It was *just* ICMP, so far as I could determine.

Comment 12 Jay R. Ashworth, http://baylink.pitas.com 2003-06-14 18:46:41 UTC

Clarification: I originally noted the problem on 6.2; it survived an upgrade to 7.3.

Note You need to log in before you can comment on or make changes to this bug.