Description of problem: After setting a real server to active = 0 and weight = 0 and reloading pulse, <perform some work on the RS>, set active = 1 and weight = 3 and reloading pulse, lvsd first creates the monitor for the process, which dies for some strange reason, then proceeds to shutdown *all* virtual services!! Version-Release number of selected component (if applicable): piranha-0.8.4-9.3.el5 and piranha-0.8.4-11.el5 How reproducible: always Steps to Reproduce: 1. configure lvs.cf with a service on a new real server 2. make sure the service is NOT running on the real server 3. service pulse reload; tail -f /var/log/messages 4. PANIC! Actual results: lvs[19604]: rereading configuration file lvs[19604]: create_monitor for saz-admin:8443/fg5x3 running as pid 31729 lvs[19604]: create_monitor for saz-admin:8443/fg6x3 running as pid 31730 lvs[19604]: nanny for child saz-admin:8443/fg5x3 died! shutting down lvs lvs[19604]: shutting down virtual service MYSQL:3306 lvs[19604]: shutting down virtual service SAZ:8888 lvs[19604]: shutting down virtual service SAZ:8881 lvs[19604]: shutting down virtual service SAZ:8882 lvs[19604]: shutting down virtual service voms:8443 lvs[19604]: shutting down virtual service voms-osg:8443 lvs[19604]: shutting down virtual service gums:8443 nanny[19614]: Terminating due to signal 15 nanny[19617]: Terminating due to signal 15 nanny[19622]: Terminating due to signal 15 nanny[19644]: Terminating due to signal 15 nanny[19645]: Terminating due to signal 15 nanny[19647]: Terminating due to signal 15 Expected results: lvs[18998]: rereading configuration file lvs[18998]: starting virtual service saz-admin:8450 active: 8450 lvs[18998]: create_monitor for saz-admin:8450/fgt6x6 running as pid 10673 nanny[10673]: starting LVS client monitor for 131.225.81.155:8450 nanny[10673]: [ active ] making 131.225.81.131:8450 available Additional info:
Created attachment 349055 [details] Small patch that prevents the complete shutdown after a nanny error Small patch that prevents the complete lvsd shutdown after a nanny error
Hi, Running the following : CentOS 5.3 piranha-0.8.4-19.el5 ipvsadm-1.24-12.el5 We have just been hit by this bug, see logs below: Apr 14 17:55:18 serverhostname nanny[15067]: [inactive] shutting down 196.x.x.x:25 due to connection failure Apr 14 17:56:47 serverhostname nanny[20959]: [inactive] shutting down 196.x.x.x:25 due to connection failure Apr 14 17:56:47 serverhostname nanny[20959]: /sbin/ipvsadm command failed! Apr 14 17:56:47 serverhostname lvs[20911]: nanny died! shutting down lvs Apr 14 17:56:47 serverhostname lvs[20911]: shutting down virtual service balancer Similarly, the problem occured when trying to restart pulse from trying to bring up the nanny process twice incorrectly : Apr 14 17:59:18 serverhostname nanny[15067]: [ active ] making 196.x.x.x:25 available Apr 14 17:59:53 serverhostname nanny[22170]: [ active ] making 196.x.x.x:25 available Apr 14 17:59:53 serverhostname nanny[22170]: /sbin/ipvsadm command failed! Apr 14 17:59:53 serverhostname lvs[22111]: nanny died! shutting down lvs Apr 14 17:59:53 serverhostname lvs[22111]: shutting down virtual service balancer From what I can see the root cause of the issue is that it tries to shutdown the connection twice, if it just did it once all would be fine. I also agree though, lvsd should not die completely. This bug has been open for over 2 years, is there going to be any progress on this? Regards, David
http://git.fedorahosted.org/git/?p=piranha.git;a=commit;h=8ca36132f5aad67fb0977f2efbf9d25b55776642 [test case is not valid; it ends in a other branch very close to original problem] Joerg thanks for a patch, I have add a new keywork hard_shutdown = (0 | 1) which should go to global section of lvs.conf. 1 - default - backward compatibility where system is either running completely or not at all. Setting hard_shutdown = 0 is functionality that you would like to use. If you are willing to test, I would like to send you a preliminary version.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: New keyword for lvs.cf was added. hard_shutdows = (0 | 1) 1 (default) => problem with single nanny will kill all nannies 0 => problem with single nanny won't kill all nannies but system needs manual intervention
FYI, typo in the Technical Note.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1059.html