Bug 15223

Summary: ping gives invalid timings
Product: [Retired] Red Hat Linux Reporter: Neil Darlow <neil>
Component: iputilsAssignee: Crutcher Dunnavant <crutcher>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 7.0CC: alan, pekkas, twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard: Winston gold
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-08-05 17:40:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Here is a patch that works around the problem. none

Description Neil Darlow 2000-08-03 12:20:49 UTC
Re-built kernel-2.2.16-17 with changed options:
  CPU - i486
  Optimize as router not host

Changed networking parameters:
  IP forwarding
  IP always defragment

Network card:
  NE2000 clone
  io=0x300, irq=10

Pinging a Red Hat 6.2 system from a Pinstripe system gives:

PING ideal.darlow.co.uk (172.16.0.1) from 172.16.0.2 : 56(84) bytes of
data.
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=0 ttl=255
time=-1273 usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=1 ttl=255 time=-282
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=2 ttl=255 time=-195
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=3 ttl=255 time=-241
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=4 ttl=255 time=-284
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=5 ttl=255 time=-312
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=6 ttl=255 time=-355
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=7 ttl=255 time=-202
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=8 ttl=255 time=-204
usec
64 bytes from ideal.darlow.co.uk (172.16.0.1): icmp_seq=9 ttl=255 time=-302
usec
--- ideal.darlow.co.uk ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max/mdev = -1.-273/429496.364/0.000/687288.484 ms

These look like formatting problems.
Performing the same operation in reverse gives consistent 1.1ms timings.

Comment 1 Pekka Savola 2000-08-04 11:26:27 UTC
I can't reproduce this here, but I bet the problem could be in ping.c lines
with:

 delta = delta*1000 + (now.tv_usec - prev.tv_usec)/1000;



Comment 2 Neil Darlow 2000-08-04 13:42:17 UTC
iputils-20000121-1 from Beta1 works ok.
iputils-20000418-1 from Beta2 and later display the fault.


Comment 3 Glen Foster 2000-08-04 15:24:29 UTC
This defect is considered MUST-FIX for Winston Gold-release
... output is misleading enough this should be fixed

Comment 4 Jeff Johnson 2000-08-04 20:05:26 UTC
I can't reproduce this problem, so there's more than just the information given
above
that's gonna be needed. Suggestions?

Comment 5 Pekka Savola 2000-08-04 20:48:05 UTC
I was able to reproduce this on a RH6.2 w/ latest iputils package rebuilt.  I
pinged localhost.

The key was that it has to be 486.  Pentiums and later work fine; 386 is
probably also affected.

This happens with and without Redhat patches.


Comment 6 Pekka Savola 2000-08-04 20:56:27 UTC
[the "normal" response on 486/66 without patches.. see jump to 2300 usec below]

[root@linux SOURCES]# ping localhost
PING localhost (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=0 ttl=255 time=-1623 usec
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=255 time=-405 usec
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=255 time=-440 usec
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=255 time=-361 usec


With small investigatory patch as follows, I could see some interesting results:

--- iputils/ping.c      Tue Apr 18 20:42:50 2000
+++ ip/ping.c   Fri Aug  4 23:45:47 2000
@@ -765,11 +765,15 @@
        struct timeval now;
        long delta = 0;
        int waittime;
+
+       long diff=0;

        gettimeofday(&now, NULL);

        if (signo) {
                delta = now.tv_sec - prev.tv_sec;
+               diff=now.tv_usec - prev.tv_usec;
+printf("\nDiff: %lu, now: %lu, prev: %lu\n", diff, now.tv_usec, prev.tv_usec);
                delta = delta*1000 + (now.tv_usec - prev.tv_usec)/1000;
                if (delta >= (interval-slop))
                        signo = 0;
-----

With that, interesting things seem to be happening.

Also note that latency increased from 330 usec to 2.3 msec.  I guess the
algorithm isn't too reliable for low end systems..
----
[root@linux iputils]# ./ping localhost
PING localhost (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=-5518 usec

Diff: 4294962156, now: 122552, prev: 127692
64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=-2327 usec

Diff: 31, now: 122583, prev: 122552
64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=-2401 usec

Diff: 4294967267, now: 122554, prev: 122583
64 bytes from 127.0.0.1: icmp_seq=3 ttl=255 time=-2243 usec
----

While on K6-2/300 the output is:
----
PING localhost (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=137 usec

Diff: 4294960853, now: 882392, prev: 888835
64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=92 usec

Diff: 4294967287, now: 882383, prev: 882392
64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=90 usec
----



Comment 7 Pekka Savola 2000-08-04 20:59:01 UTC
Err, mixups with long/unsigned long, but that should still be of some help :)



Comment 8 Jeff Johnson 2000-08-04 21:04:21 UTC
This smells like a kernel problem, but could be a gcc problem. Opinions?

Comment 9 Pekka Savola 2000-08-04 21:36:00 UTC
Hard to tell.

I tried

i386 kernel 2.2.16-17 on P2, works fine (Pinstripe)
i586 kernel 2.2.16 (stock) on K6-2, works fine (RH6.2)
i386 kernel 2.2.16-3 on 486/66, failure (RH6.2)

HTH.


Comment 10 Alan Cox 2000-08-05 02:16:38 UTC
You said 486 works. Does your 486 have TSC or not (cat /proc/cpuinfo)
Could be related to timing accuracy and behaviour of tsc versus usec counter
accuracy


Comment 11 Pekka Savola 2000-08-05 06:47:56 UTC
Err, 486 does not specifically _work_.  It was just the only platform I could
reproduce this on.  I don't have access to any other 486's or 386's, or
other oldish systems.

/proc/cpuinfo doesn't tell much (no flags, lots of unknown :)..:
-----
[root@linux /root]# more /proc/cpuinfo
processor       : 0
vendor_id       : unknown
cpu family      : 4
model           : 0
model name      : unknown
stepping        : unknown
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : -1
wp              : yes
flags           :
bogomips        : 33.08
-----
Linux version 2.2.16-3 (root.redhat.com) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 Mon Jun 19 18:10:14 EDT 2000




Comment 12 Neil Darlow 2000-08-05 09:36:15 UTC
Do code changes between iputils-20000121-1 and iputils-20000418-1 give
any clues?

The contents of my /proc/cpuinfo are:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 4
model           : 9
model name      : 486 DX/4-WB
stepping        : 4
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu
bogomips        : 60.01

I'll also take a look at the source code.


Comment 13 Tim Waugh 2000-08-05 11:43:30 UTC
I've tracked this down to SIOCGSTAMP returning wrong information.

Comment 14 Tim Waugh 2000-08-05 11:44:30 UTC
Created attachment 2094 [details]
Here is a patch that works around the problem.

Comment 15 Tim Waugh 2000-08-05 11:52:15 UTC
SIOCGSTAMP was returning a time that was slightly behind the time that
gettimeofday returned before sendmsg.  This is the _real_ bug.  I don't know why
it happens; I can reproduce it 100% of the time on my 486:

$ grep flags /proc/cpuinfo
flags : fpu vme

Seems like a kernel bug to me.

Comment 16 Tim Waugh 2000-08-05 12:02:31 UTC
I take it back.  Looking at the kernel code I don't see a way to fix it in the
kernel.

Jeff, what do you think of the patch?  Too icky?

Comment 17 Alan Cox 2000-08-05 13:39:34 UTC
SIOCGSTAMP is low precision. If you want to use it with gettimeofday use
gettimeofday just
on the receive as well.

This would also explain why only a non TSC chip sees the problem


Comment 18 Tim Waugh 2000-08-05 17:40:48 UTC
(There is a -U option to turn on this [older] behaviour, which the above patch
turns off if it detets thsi situation.)

Comment 19 Tim Waugh 2000-08-08 10:44:13 UTC
Fixed in iputils-20000418-5.