Bug 2474

Summary: pump's CPU usage shoots up
Product: [Retired] Red Hat Linux Reporter: ecarter
Component: pumpAssignee: Erik Troan <ewt>
Status: CLOSED RAWHIDE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: duanev, gregpublic, kanellis, maurice
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-02-03 15:42:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ecarter 1999-05-01 21:02:19 UTC
Note: pump wasn't in the list of packages, so I picked
dhcpcd instead.

After about 12 hours of being left running, pump starts
periodically using a lot of CPU.  About every minute, my
system CPU time goes up to 50% while pump's usage goes to
about 25%.  The only way I can fix this problem is to
killall pump ; ifdown eth0 ; ifup eth0 as root (ifdown eth0
hangs without the killall command first).

Comment 1 ecarter 1999-05-03 03:02:59 UTC
How I fixed my system:

First I had to make ifup and ifdown in /etc/sysconfig/network-scripts
use dhcpcd instead of pump when DHCP is being used.  Also, the version
of dhcpcd included in RH 6 is broken, so I upgraded that package.
Patches for ifup and ifdown and the dhcpcd rpm I used are on anonymous
ftp at 150.135.194.252.

Note: I am not the only one who is going to be having problems using
pump instead of dhcpcd.  Cox cable modems in Phoenix, AZ require that
the DHCP client use a hostname parameter.  It does not appear that
pump has this capability.

Comment 2 Jeff Johnson 1999-06-02 11:20:59 UTC
*** Bug 504 has been marked as a duplicate of this bug. ***

Tried a DHCPCD connection by cable modem using DHCPCD V.70
as included in Red Hat 5.2 AXP.
Failed.
Tried with V.65-4 from 5.1 and it works just dandy.

------- Additional Comments From dkl  12/29/98 17:15 -------
We would need to get a cable modem in to accurately test this bug. We
will reopen it when we get one.

Comment 3 Jay Turner 1999-06-04 15:17:59 UTC
This isue has been forwarded to a developer for further consideration.

Comment 4 Erik Troan 1999-06-21 18:27:59 UTC
Have you tested it with the latest pump from the errata? I fixed
something similiar to this...

If you still see this, could you start pump, attach an strace to it,
and see what happens when the CPU usage goes crazy? I'd like to see
what strace says.

------- Email Received From  Edward Carter <ecarter.edu> 06/24/99 02:23 -------

Comment 5 gregpublic 1999-07-08 22:04:59 UTC
My Cox at Home cable modem needs my machine name on all DHCP
inquiries.  So when pump requested a renewal, it will go into an
endless cycle. (These notes are against the latest pump from the
errata. pump-0.6.7-1.i386.rpm)

So I added the BOOTP_OPTION_HOSTNAME option to the following three
queries:
    DHCP_TYPE_DISCOVER
    DHCP_TYPE_REQUEST
    DHCP_TYOE_RELEASE
Similar to the change made to DHCP_TYPE_OFFER.

The following two changes are also needed:
Machine name added to ifcfg-eth1:
    HOST=xx99999-x
Use the HOST macro in /sbin/ifup:
    if /sbin/pump i $DEVICE h $HOST

Also, theres an overflow in handleTransaction(), at the nextTimeout
*= 2 line; changed it to:
    nextTimeout = (NUM_RETRIES-tries)*2;

Finally theres a memory leak in dhcpRelease():
    free( intf->hostname );
    free( intf->domain );

Comment 6 Duane Voth 1999-08-25 03:18:59 UTC
How much strace data would you like?  I got 28M of it.
I've placed the first 7000 lines or so in a file at:
ftp://ftp.io.com/pub/usr/duanev/pump.txt
This is with the errata'ed version from Redhat (pump-0.7.0
is what the directory says - I wish there was a version
number *inside* the .c file - then I could be more sure).
ls -l gives:
    45295 Jul 27 17:58 pump.c

Comment 7 Duane Voth 1999-08-25 13:34:59 UTC
Hey!  My date is way off!  5 hours slow to be exact.
And something in cron keeps setting it way off.
I've been meaning to fix it but it hasn't been a priority
until now.  I'll bet that when the time jumps around pump
can think brand new leases are up immediately.  Adding test
code to check this and force lease renewals in the main loop
to be at least 30 minutes apart.  I'll keep yall informed.

Comment 8 Bill Nottingham 1999-08-25 17:11:59 UTC
*** Bug 4306 has been marked as a duplicate of this bug. ***

Pump works Ok when I first boot. But hours later "top"
reports that its eating up all my CPU and System resources.
I have upgraded to the lattest version and have the same
results. Iam connecting to a 1Mbit ISP called Sympatico in
Toronto Canada

------- Additional Comments From duanev  08/25/99 09:58 -------
probably a duplicate of # 2474

Comment 9 Duane Voth 1999-08-31 06:17:59 UTC
Progress report (or lack there of).  This sucker is difficult to
recreate.  I haven't been able to cause pump to hang by manually
changing the system date (and I deliberatly have not fixed my
jumping date problem yet).  But what I do know is the pump problem
occurs when pump fails to wakeup in time to renew the lease.  By
then my ISP has revoked my IP number and all subsequent renew
operations fail.  Pump then stupidly sits in a loop forever trying
to renew anyway.

So three things to fix once all is known: 1) pump should react
correctly on renew failures (bring down the interface and fire
it up with a new lease maybe?), 2) to make it more robust pump
should have a backup plan for catching lease expiration (like
wakeing up more often to see if the "state" of the system has
changed), and 3) fix why pump is occasionally missing the
expiration time.

Comment 10 Duane Voth 1999-09-09 03:32:59 UTC
Ok, I've got a patch for #3, it seems stable enough to ignore #2, and
I'll leave #1 up to someone who knows the dhcp protocol better than I
do.  I know about elapsed time vs. wall-clock time and that was the
problem here.  There are two very different types of time on computer
systems - but most everyone thinks there is only one.  Wall clock time
is what you get with date(1) and time(2).  Is needs to track GMT ala
your local offset and is expected to speed up or slow down to sync
with whatever external synchronization source you choose.  This can be
a GPS reference via ntpdate or it can be your watch via your fingers
and date(1), but it is expected that this time WILL JUMP periodically
to match the source.  Elapsed time is the precise number of seconds
(or miliseconds) that have elapsed since some time in the past when
your app started its timer.  This time is expected to REMAIN STABLE no
matter if the clock crystals in the computer drift or not.  These two
applications don't mix, elapsed time people hate the jumps in
wall-clock time, and wall-clock time people hate the drift in elapsed
time.  pump was using wall-clock time but needed to use elapsed time.
When the date jumped (wildly on my system) pump would occasionally
botch the lease renewal and loose the ip.  Then it would try forever
to renew the lease to no avail.

ftp://ftp.io.com/pub/usr/duanev/pump-0.7.0-djv.patch

I've also added a fair number of new debug messages and made the
--status output easier to awk or grep|cut.  (everyone is going to use
pump to extract the dynamic ip address right?)  Take what you need,
dump the rest.

Comment 11 Duane Voth 1999-09-09 03:42:59 UTC
Ok, I've got a patch for #3, it seems stable enough to ignore #2, and
I'll leave #1 up to someone who knows the dhcp protocol better than I
do.  I know about elapsed time vs. wall-clock time and that was the
problem here.  There are two very different types of time on computer
systems - but most everyone thinks there is only one.  Wall clock time
is what you get with date(1) and time(2).  Is needs to track GMT ala
your local offset and is expected to speed up or slow down to sync
with whatever external synchronization source you choose.  This can be
a GPS reference via ntpdate or it can be your watch via your fingers
and date(1), but it is expected that this time WILL JUMP periodically
to match the source.  Elapsed time is the precise number of seconds
(or miliseconds) that have elapsed since some time in the past when
your app started its timer.  This time is expected to REMAIN STABLE no
matter if the clock crystals in the computer drift or not.  These two
applications don't mix, elapsed time people hate the jumps in
wall-clock time, and wall-clock time people hate the drift in elapsed
time.  pump was using wall-clock time but needed to use elapsed time.
When the date jumped (wildly on my system) pump would occasionally
botch the lease renewal and loose the ip.  Then it would try forever
to renew the lease to no avail.

ftp://ftp.io.com/pub/usr/duanev/pump-0.7.0-djv.patch

I've also added a fair number of new debug messages and made the
--status output easier to awk or grep|cut.  (everyone is going to use
pump to extract the dynamic ip address right?)  Take what you need,
dump the rest.

Comment 12 Hal Burgiss 1999-12-24 05:09:59 UTC
I see a similar thing if the network goes completely down and thus pump is not
able to renew lease. pump-0.7.2-2. Also, pump stops logging after a while.
Restarting syslog does not help. Restarting pump does.

Comment 13 Erik Troan 2000-02-03 15:42:59 UTC
This should all be fixed in pump 0.7.6, which will make it onto our ftp site
next week.

I'll have this on ftp://people.redhat.com/ewt/ later this afternoon for testing.

Thanks for the good comments and the patch; sorry it took so long to get this
taken care of.