Bug 234450 - Every (ave) 3000th ping fails "segmentation fault"
Every (ave) 3000th ping fails "segmentation fault"
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: iputils (Show other bugs)
6
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Martin Bacovsky
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-29 08:19 EDT by Mike Yates
Modified: 2007-11-30 17:12 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-10 07:59:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mike Yates 2007-03-29 08:19:23 EDT
Description of problem:
Fall-over clustering bash-script pings virtual server on other box up to 4 times
per second (usually much less if mirrored app data not idle) and fails
"segmentation fault" at random (30-30000 pings) intervals. Since the segv fails
are indistinguishable from "server down" failures, "second try" pings work
around this problem, but far from satisfactorily!  
Both physical servers in the cluster show this behaviour, with very different
hardware but identical FC6 installed.

Version-Release number of selected component (if applicable):

I'm using the latest ping (ss020927)for Fedora 6, in
iptools-20020927-41.fc6.rpm

Currently running same test on Fedora-4 server and on "ZenWalk Linux" VMware box
within that server. Neither have thrown segmentation errors after over 30,000 pings.

How reproducible:
See script below

Steps to Reproduce:
1. Install FC6 and bring up-to-date
2. Run script below 

#! /bin/bash
echo `date +%T` Starting testping
while [ 0 ]
do
	if ! ping 10.0.0.1 -c 1 -w 1 &> /dev/null
        then
		echo `date +%T` Ping failed after $COUNT operations
        fi
COUNT=$(( COUNT + 1 ))
echo -n -e "$COUNT \\r"
sleep 0.2
done
  
Actual results:

[root@mysvr5a cluster]# ./testping
11:08:31 Starting testping
./testping: line 12: 21716 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
11:24:12 Ping failed after 4028 operations
./testping: line 12:  5216 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
11:40:17 Ping failed after 8304 operations
./testping: line 12: 20132 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
11:52:51 Ping failed after 11401 operations
./testping: line 12: 29971 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:00:32 Ping failed after 13392 operations
./testping: line 12: 11050 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:12:36 Ping failed after 16479 operations
./testping: line 12: 14748 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:15:30 Ping failed after 17167 operations
./testping: line 12: 29969 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:22:38 Ping failed after 18896 operations
./testping: line 12: 18937 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:31:43 Ping failed after 21047 operations
./testping: line 12:  6129 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:49:29 Ping failed after 25611 operations
./testping: line 12: 24498 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
13:01:26 Ping failed after 28516 operations
31745

N.B. note reversal of &>  -- could be actually stdout/stderr problem, but I've
tried not hiding the (voluminous) ping output and still got segv's.

The working scripts sometimes go up to 3 hours or 50,000 pings without a segv.

Expected results:

[root@mysvr5a cluster]# ./testping
11:08:31 Starting testping
76543


Additional info:
Please advise how to delve deeper...
Comment 1 Mike Yates 2007-03-29 09:52:26 EDT
Another test: running "testping" alone on one server, two instances on other.
Both threw segv's, so not due to simultaneous pinging.

Fedora4, Zenwalk and Knoppix (Debian) have run two instances of testping for
over 60,000 now with no segv's
Comment 2 Mike Yates 2007-04-02 09:15:08 EDT
On repeatability:-
Temporary install of FC6 in a third physical PC reproduced the fault immediately.
Installed FC6 in a VMware box and got 246559 pings, in 54546 secs, before it
suddenly started throwing segv's every 87 to 6804 (random) pings.
Meanwhile, the testpings on the 2 servers had several periods of less than a
minute, spaced many hours apart, when pinging the router failed every 1 to 30
pings without segv's - very strange. I would guess during DHCP renewal, except
that the servers are fixed-ip!
Comment 3 Martin Bacovsky 2007-04-03 09:04:52 EDT
ok, i've tried to reproduce this, but was unsucessfull. I tried about 5 times up
to 60000 pings and the bug did not occured. 
First of all i update iputils to new upstream also in fc6.
If it does not help then we have to find simpliest and as default as possible
enviroment where this is reporoducible. I will let you know when new iputils
will be available.
Comment 4 Martin Bacovsky 2007-04-03 14:14:12 EDT
iputils-20070202-1.fc6 should be pushed to fc6 testing. Can you, please, test
whether this bug is still reproducible after update?
Comment 5 Mike Yates 2007-04-04 08:08:45 EDT
I'm afraid it is!

10:54:07 Ping failed after 5529 ops, 1201 secs
./testping: line 15:  3628 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
11:20:59 Ping failed after 7418 ops, 1612 secs
./testping: line 15:  6423 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
11:25:58 Ping failed after 1375 ops, 299 secs
./testping: line 15: 20049 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
11:50:29 Ping failed after 6775 ops, 1471 secs
878  
[root@localhost ~]# rpm -Uv iputils-20070202-1.fc7.i386.rpm 
warning: iputils-20070202-1.fc7.i386.rpm: Header V3 DSA signature: NOKEY, key ID
30c9ecf8
Preparing packages for installation...
iputils-20070202-1.fc7
[root@localhost ~]# ping -V
ping utility, iputils-sss20070202
[root@localhost ~]# ./testping 
12:06:09 Starting testping
./testping: line 15: 29596 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:19:58 Ping failed after 3820 ops, 829 secs
./testping: line 15: 31216 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:22:53 Ping failed after 806 ops, 175 secs
./testping: line 15: 32641 Segmentation fault      ping 10.0.0.1 -c 1 -w 1
>&/dev/null
12:25:27 Ping failed after 709 ops, 154 secs
2338 
Comment 6 Mike Yates 2007-04-04 09:31:06 EDT
Further "elimination" tests:-
In minimal (Gnome Desktop but no apps) FC6, fully updated, in VMware,
1) Changed ping address from router (Smoothwall) to VMware Zenwalk.
2) In another console, at same time, pinged eth0:1
Both average about 3000 pings per segv, as usual.
Comment 7 Martin Bacovsky 2007-05-09 09:25:53 EDT
I'm still not able to reproduce this. Can you provide me core dump?
Comment 8 Martin Bacovsky 2007-05-09 10:41:21 EDT
And pls, also output of rpm -qa to compare installed packages versions?
Comment 9 Mike Yates 2007-05-10 06:59:37 EDT
Well, it seems to have gone away with yum updates.
I now have
iputils-20070202-3.fc6
kernel-2.6.20-1.2948.fc6
and, as I write, 9000 pings have succeeded.
So, unless anyone else complains (I'll check with the one I know of)
you can close this off.
Comment 10 Martin Bacovsky 2007-05-10 07:59:46 EDT
Thanks for info. According to comment #9 I am closing this as notabug. If it
appears again feel free to reopen this bug.
Comment 11 Stepan Kasal 2007-05-11 04:04:18 EDT
Well, I had the script running for 2 days on an FC4/i386 and an FC6/x86_64 and
in both cases experianced 7 failures out of 740k pings.
Now its time to fix the script so that it generates and saves some core dumps.  ;-)
Comment 12 Mike Yates 2007-05-11 04:23:01 EDT
Mine has reached 320000 with no failures this morning, in up-to-date FC6-i386

Note You need to log in before you can comment on or make changes to this bug.