Bug 15223 - ping gives invalid timings
ping gives invalid timings
Status: CLOSED RAWHIDE
Product: Red Hat Linux
Classification: Retired
Component: iputils (Show other bugs)
7.0
i386 Linux
high Severity medium
: ---
: ---
Assigned To: Crutcher Dunnavant
Winston gold
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-08-03 08:20 EDT by Neil Darlow
Modified: 2008-05-01 11:37 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-08-05 13:40:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Here is a patch that works around the problem. (768 bytes, patch)
2000-08-05 07:44 EDT, Tim Waugh
no flags Details | Diff

  None (edit)
Description Neil Darlow 2000-08-03 08:20:49 EDT
Re-built kernel-2.2.16-17 with changed options:
  CPU - i486
  Optimize as router not host

Changed networking parameters:
  IP forwarding
  IP always defragment

Network card:
  NE2000 clone
  io=0x300, irq=10

Pinging a Red Hat 6.2 system from a Pinstripe system gives:

PING ideal.darlow.co.uk (172.16.0.1) from 172.16.0.2 : 56(84) bytes of
data.
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=0 ttl=255
time=-1273 usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=1 ttl=255 time=-282
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=2 ttl=255 time=-195
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=3 ttl=255 time=-241
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=4 ttl=255 time=-284
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=5 ttl=255 time=-312
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=6 ttl=255 time=-355
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=7 ttl=255 time=-202
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=8 ttl=255 time=-204
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=9 ttl=255 time=-302
usec
--- ideal.darlow.co.uk ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max/mdev = -1.-273/429496.364/0.000/687288.484 ms

These look like formatting problems.
Performing the same operation in reverse gives consistent 1.1ms timings.
Comment 1 Pekka Savola 2000-08-04 07:26:27 EDT
I can't reproduce this here, but I bet the problem could be in ping.c lines
with:

 delta = delta*1000 + (now.tv_usec - prev.tv_usec)/1000;

Comment 2 Neil Darlow 2000-08-04 09:42:17 EDT
iputils-20000121-1 from Beta1 works ok.
iputils-20000418-1 from Beta2 and later display the fault.
Comment 3 Glen Foster 2000-08-04 11:24:29 EDT
This defect is considered MUST-FIX for Winston Gold-release
... output is misleading enough this should be fixed
Comment 4 Jeff Johnson 2000-08-04 16:05:26 EDT
I can't reproduce this problem, so there's more than just the information given
above
that's gonna be needed. Suggestions?
Comment 5 Pekka Savola 2000-08-04 16:48:05 EDT
I was able to reproduce this on a RH6.2 w/ latest iputils package rebuilt.  I
pinged localhost.

The key was that it has to be 486.  Pentiums and later work fine; 386 is
probably also affected.

This happens with and without Redhat patches.
Comment 6 Pekka Savola 2000-08-04 16:56:27 EDT
[the "normal" response on 486/66 without patches.. see jump to 2300 usec below]

[root@linux SOURCES]# ping localhost
PING localhost (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=255 time=-1623 usec
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=255 time=-405 usec
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=255 time=-440 usec
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=255 time=-361 usec


With small investigatory patch as follows, I could see some interesting results:

--- iputils/ping.c      Tue Apr 18 20:42:50 2000
+++ ip/ping.c   Fri Aug  4 23:45:47 2000
@@ -765,11 +765,15 @@
        struct timeval now;
        long delta = 0;
        int waittime;
+
+       long diff=0;

        gettimeofday(&now, NULL);

        if (signo) {
                delta = now.tv_sec - prev.tv_sec;
+               diff=now.tv_usec - prev.tv_usec;
+printf("\nDiff: %lu, now: %lu, prev: %lu\n", diff, now.tv_usec, prev.tv_usec);
                delta = delta*1000 + (now.tv_usec - prev.tv_usec)/1000;
                if (delta >= (interval-slop))
                        signo = 0;
-----

With that, interesting things seem to be happening.

Also note that latency increased from 330 usec to 2.3 msec.  I guess the
algorithm isn't too reliable for low end systems..
----
[root@linux iputils]# ./ping localhost
PING localhost (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=-5518 usec

Diff: 4294962156, now: 122552, prev: 127692
64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=-2327 usec

Diff: 31, now: 122583, prev: 122552
64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=-2401 usec

Diff: 4294967267, now: 122554, prev: 122583
64 bytes from 127.0.0.1: icmp_seq=3 ttl=255 time=-2243 usec
----

While on K6-2/300 the output is:
----
PING localhost (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=137 usec

Diff: 4294960853, now: 882392, prev: 888835
64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=92 usec

Diff: 4294967287, now: 882383, prev: 882392
64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=90 usec
----

Comment 7 Pekka Savola 2000-08-04 16:59:01 EDT
Err, mixups with long/unsigned long, but that should still be of some help :)

Comment 8 Jeff Johnson 2000-08-04 17:04:21 EDT
This smells like a kernel problem, but could be a gcc problem. Opinions?
Comment 9 Pekka Savola 2000-08-04 17:36:00 EDT
Hard to tell.

I tried

i386 kernel 2.2.16-17 on P2, works fine (Pinstripe)
i586 kernel 2.2.16 (stock) on K6-2, works fine (RH6.2)
i386 kernel 2.2.16-3 on 486/66, failure (RH6.2)

HTH.
Comment 10 Alan Cox 2000-08-04 22:16:38 EDT
You said 486 works. Does your 486 have TSC or not (cat /proc/cpuinfo)
Could be related to timing accuracy and behaviour of tsc versus usec counter
accuracy
Comment 11 Pekka Savola 2000-08-05 02:47:56 EDT
Err, 486 does not specifically _work_.  It was just the only platform I could
reproduce this on.  I don't have access to any other 486's or 386's, or
other oldish systems.

/proc/cpuinfo doesn't tell much (no flags, lots of unknown :)..:
-----
[root@linux /root]# more /proc/cpuinfo
processor       : 0
vendor_id       : unknown
cpu family      : 4
model           : 0
model name      : unknown
stepping        : unknown
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : -1
wp              : yes
flags           :
bogomips        : 33.08
-----
Linux version 2.2.16-3 (root@porky.devel.redhat.com) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 Mon Jun 19 18:10:14 EDT 2000


Comment 12 Neil Darlow 2000-08-05 05:36:15 EDT
Do code changes between iputils-20000121-1 and iputils-20000418-1 give
any clues?

The contents of my /proc/cpuinfo are:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 4
model           : 9
model name      : 486 DX/4-WB
stepping        : 4
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu
bogomips        : 60.01

I'll also take a look at the source code.
Comment 13 Tim Waugh 2000-08-05 07:43:30 EDT
I've tracked this down to SIOCGSTAMP returning wrong information.
Comment 14 Tim Waugh 2000-08-05 07:44:30 EDT
Created attachment 2094 [details]
Here is a patch that works around the problem.
Comment 15 Tim Waugh 2000-08-05 07:52:15 EDT
SIOCGSTAMP was returning a time that was slightly behind the time that
gettimeofday returned before sendmsg.  This is the _real_ bug.  I don't know why
it happens; I can reproduce it 100% of the time on my 486:

$ grep flags /proc/cpuinfo
flags : fpu vme

Seems like a kernel bug to me.
Comment 16 Tim Waugh 2000-08-05 08:02:31 EDT
I take it back.  Looking at the kernel code I don't see a way to fix it in the
kernel.

Jeff, what do you think of the patch?  Too icky?
Comment 17 Alan Cox 2000-08-05 09:39:34 EDT
SIOCGSTAMP is low precision. If you want to use it with gettimeofday use
gettimeofday just
on the receive as well.

This would also explain why only a non TSC chip sees the problem
Comment 18 Tim Waugh 2000-08-05 13:40:48 EDT
(There is a -U option to turn on this [older] behaviour, which the above patch
turns off if it detets thsi situation.)
Comment 19 Tim Waugh 2000-08-08 06:44:13 EDT
Fixed in iputils-20000418-5.

Note You need to log in before you can comment on or make changes to this bug.