Bug 18979 - Nanny crashes Linux if timeout > 30 in lvs.cf
Nanny crashes Linux if timeout > 30 in lvs.cf
Status: CLOSED CURRENTRELEASE
Product: Red Hat High Availability Server
Classification: Retired
Component: piranha (Show other bugs)
1.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Phil Copeland
Phil Copeland
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-10-12 12:55 EDT by Pietro Ravasio
Modified: 2007-04-18 12:29 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-11-17 14:50:12 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Red Hat Bugzilla 2000-10-12 12:55:46 EDT
Hi,

It's two months I'm using piranha without any problem. I'm using it to 
balance www and db-gateway services.

At this moment I'm using 17-2 and 17-4 versions.

This morning I modified lvs.cf as follows:

...
virtual serverphone {
        address = 151.39.82.13 eth1:1
        active = 1
        load_monitor = rup
        timeout = 30
        reentry = 180
        port = 5555
        scheduler = rr
...

After 180 seconds, as soon as the service is going to be made available, 
nanny hardly crashes, killing syslog, inet and a LOT of other daemons.

This is the last log in /var/log/messages:

Oct 12 17:51:03 lvs1 lvs[1811]: create_monitor for serverphone/rs2b 
running as pid 1840
Oct 12 17:51:03 lvs1 lvs[1811]: nanny for child serverphone/rs1b died! 
shutting down lvs
Oct 12 17:51:03 lvs1 lvs[1811]: nanny for child serverphone/rs1 died! 
shutting down lvs
Oct 12 17:51:03 lvs1 lvs[1811]: shutting down virtual service webserver
Oct 12 17:51:03 lvs1 nanny[1819]: Terminating due to signal 15
Oct 12 17:51:03 lvs1 nanny[1819]: Killing child 1832
Oct 12 17:51:03 lvs1 nanny[1819]: running command  "/usr/sbin/ipvsadm" "-
d" "-t" "151.39.82.13:
80" "-r" "172.16.0.11"
Oct 12 17:51:03 lvs1 nanny[1820]: Terminating due to signal 15
Oct 12 17:51:03 lvs1 nanny[1820]: Killing child 1833
Oct 12 17:51:03 lvs1 exiting on signal 15

This happens if I set "timeout = 30" (or greater). If I set timeout = 15 
everything works great.
I've tried with 17-2 and 17-4 versions of piranha.

Kind Regards,
Pietro Ravasio
Comment 1 Red Hat Bugzilla 2000-10-16 13:42:04 EDT
Thanks. Now that we are back from ALS we'll look into this...
Comment 2 Red Hat Bugzilla 2000-11-17 14:50:09 EST
Oh nutbunnies,..

I found that I left a bug in nanny.c, when "timeout" in /etc/lvs.cf is larger
than 20 (interval>20), the condition
        (currCount % (20/interval) == 0) 
will cause nanny core dump. We need change it to
        (interval > 20 || currCount % (20 / interval) == 0) 

Fixed for next rebuild

I appologise for not responding sooner (weeks holiday)

Phil
=--=
Comment 3 Red Hat Bugzilla 2000-12-21 21:29:38 EST
I'm clossing this unless you have any other problems or if the latest patch
didn't fix it for you

Phil
=--=
Comment 4 Red Hat Bugzilla 2000-12-22 03:54:53 EST
Thanks Phil, I must admit I've not tried to rise timeout value above 20 
anymore! (I'm using Piranha in a "production environment" so I can't make too 
many experiments... ;)

I'm sure everything is working great now, thanks for support! :)

Note You need to log in before you can comment on or make changes to this bug.