Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 505172

Summary:

a failure to start a single nanny kills off *all* running nannys

Product:

Red Hat Enterprise Linux 5

Reporter:

Dan Yocum <dyocum>

Component:

piranha

Assignee:

Marek Grac <mgrac>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.3

CC:

adrew, bill-bugzilla.redhat.com, cluster-maint, davidj, djansa, ffotorel, lscalabr, wcooley

Target Milestone:

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

piranha-0.8.4-22

Doc Type:

Bug Fix

Doc Text:

New keyword for lvs.cf was added. hard_shutdows = (0 | 1) 1 (default) => problem with single nanny will kill all nannies 0 => problem with single nanny won't kill all nannies but system needs manual intervention

Story Points:

---

Clone Of:

Clones:

593728 (view as bug list)

Environment:

Last Closed:

2011-07-21 11:23:10 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

593728

Attachments:

Description	Flags
Small patch that prevents the complete shutdown after a nanny error	none

Description Dan Yocum 2009-06-10 21:17:46 UTC

Description of problem:

After setting a real server to active = 0 and weight = 0 and reloading pulse, <perform some work on the RS>, set active = 1 and weight = 3 and reloading pulse, lvsd first creates the monitor for the process, which dies for some strange reason, then proceeds to shutdown *all* virtual services!! 

Version-Release number of selected component (if applicable):

piranha-0.8.4-9.3.el5 and piranha-0.8.4-11.el5



How reproducible:

always

Steps to Reproduce:
1. configure lvs.cf with a service on a new real server
2. make sure the service is NOT running on the real server
3. service pulse reload; tail -f /var/log/messages
4. PANIC!
  
Actual results:

lvs[19604]: rereading configuration file
lvs[19604]: create_monitor for saz-admin:8443/fg5x3 running as pid 31729
lvs[19604]: create_monitor for saz-admin:8443/fg6x3 running as pid 31730
lvs[19604]: nanny for child saz-admin:8443/fg5x3 died! shutting down lvs
lvs[19604]: shutting down virtual service MYSQL:3306
lvs[19604]: shutting down virtual service SAZ:8888
lvs[19604]: shutting down virtual service SAZ:8881
lvs[19604]: shutting down virtual service SAZ:8882
lvs[19604]: shutting down virtual service voms:8443
lvs[19604]: shutting down virtual service voms-osg:8443
lvs[19604]: shutting down virtual service gums:8443
nanny[19614]: Terminating due to signal 15
nanny[19617]: Terminating due to signal 15
nanny[19622]: Terminating due to signal 15
nanny[19644]: Terminating due to signal 15
nanny[19645]: Terminating due to signal 15
nanny[19647]: Terminating due to signal 15


Expected results:

lvs[18998]: rereading configuration file
lvs[18998]: starting virtual service saz-admin:8450 active: 8450
lvs[18998]: create_monitor for saz-admin:8450/fgt6x6 running as pid 10673
nanny[10673]: starting LVS client monitor for 131.225.81.155:8450
nanny[10673]: [ active ] making 131.225.81.131:8450 available


Additional info:

Comment 1 J. Kost 2009-06-23 08:13:37 UTC

Created attachment 349055 [details]
Small patch that prevents the complete shutdown after a nanny error

Small patch that prevents the complete lvsd shutdown after a nanny error

Comment 8 David Jacobson 2011-04-14 17:06:23 UTC

Hi,

Running the following :

CentOS 5.3
piranha-0.8.4-19.el5
ipvsadm-1.24-12.el5

We have just been hit by this bug, see logs below:

Apr 14 17:55:18 serverhostname nanny[15067]: [inactive] shutting down 196.x.x.x:25 due to connection failure
Apr 14 17:56:47 serverhostname nanny[20959]: [inactive] shutting down 196.x.x.x:25 due to connection failure
Apr 14 17:56:47 serverhostname nanny[20959]: /sbin/ipvsadm command failed!
Apr 14 17:56:47 serverhostname lvs[20911]: nanny died! shutting down lvs
Apr 14 17:56:47 serverhostname lvs[20911]: shutting down virtual service balancer

Similarly, the problem occured when trying to restart pulse from trying to bring up the nanny process twice incorrectly :

Apr 14 17:59:18 serverhostname nanny[15067]: [ active ] making 196.x.x.x:25 available
Apr 14 17:59:53 serverhostname nanny[22170]: [ active ] making 196.x.x.x:25 available
Apr 14 17:59:53 serverhostname nanny[22170]: /sbin/ipvsadm command failed!
Apr 14 17:59:53 serverhostname lvs[22111]: nanny died! shutting down lvs
Apr 14 17:59:53 serverhostname lvs[22111]: shutting down virtual service balancer

From what I can see the root cause of the issue is that it tries to shutdown the connection twice, if it just did it once all would be fine.

I also agree though, lvsd should not die completely.

This bug has been open for over 2 years, is there going to be any progress on this?

Regards,
David

Comment 10 Marek Grac 2011-06-06 15:40:23 UTC

http://git.fedorahosted.org/git/?p=piranha.git;a=commit;h=8ca36132f5aad67fb0977f2efbf9d25b55776642

[test case is not valid; it ends in a other branch very close to original problem]

Joerg thanks for a patch, I have add a new keywork hard_shutdown = (0 | 1) which should go to global section of lvs.conf. 1 - default - backward compatibility where system is either running completely or not at all. Setting hard_shutdown = 0 is functionality that you would like to use. 

If you are willing to test, I would like to send you a preliminary version.

Comment 11 Marek Grac 2011-06-06 15:40:23 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
New keyword for lvs.cf was added. 

hard_shutdows = (0 | 1) 

1 (default) => problem with single nanny will kill all nannies
0 => problem with single nanny won't kill all nannies but system needs manual intervention

Comment 13 Bill McGonigle 2011-06-07 19:53:42 UTC

FYI, typo in the Technical Note.

Comment 22 errata-xmlrpc 2011-07-21 11:23:10 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1059.html