Bug 505172
| Summary: | a failure to start a single nanny kills off *all* running nannys | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Dan Yocum <dyocum> | ||||
| Component: | piranha | Assignee: | Marek Grac <mgrac> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 5.3 | CC: | adrew, bill-bugzilla.redhat.com, cluster-maint, davidj, djansa, ffotorel, lscalabr, wcooley | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | i386 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | piranha-0.8.4-22 | Doc Type: | Bug Fix | ||||
| Doc Text: |
New keyword for lvs.cf was added.
hard_shutdows = (0 | 1)
1 (default) => problem with single nanny will kill all nannies
0 => problem with single nanny won't kill all nannies but system needs manual intervention
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 593728 (view as bug list) | Environment: | |||||
| Last Closed: | 2011-07-21 11:23:10 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 593728 | ||||||
| Attachments: |
|
||||||
|
Description
Dan Yocum
2009-06-10 21:17:46 UTC
Created attachment 349055 [details]
Small patch that prevents the complete shutdown after a nanny error
Small patch that prevents the complete lvsd shutdown after a nanny error
Hi, Running the following : CentOS 5.3 piranha-0.8.4-19.el5 ipvsadm-1.24-12.el5 We have just been hit by this bug, see logs below: Apr 14 17:55:18 serverhostname nanny[15067]: [inactive] shutting down 196.x.x.x:25 due to connection failure Apr 14 17:56:47 serverhostname nanny[20959]: [inactive] shutting down 196.x.x.x:25 due to connection failure Apr 14 17:56:47 serverhostname nanny[20959]: /sbin/ipvsadm command failed! Apr 14 17:56:47 serverhostname lvs[20911]: nanny died! shutting down lvs Apr 14 17:56:47 serverhostname lvs[20911]: shutting down virtual service balancer Similarly, the problem occured when trying to restart pulse from trying to bring up the nanny process twice incorrectly : Apr 14 17:59:18 serverhostname nanny[15067]: [ active ] making 196.x.x.x:25 available Apr 14 17:59:53 serverhostname nanny[22170]: [ active ] making 196.x.x.x:25 available Apr 14 17:59:53 serverhostname nanny[22170]: /sbin/ipvsadm command failed! Apr 14 17:59:53 serverhostname lvs[22111]: nanny died! shutting down lvs Apr 14 17:59:53 serverhostname lvs[22111]: shutting down virtual service balancer From what I can see the root cause of the issue is that it tries to shutdown the connection twice, if it just did it once all would be fine. I also agree though, lvsd should not die completely. This bug has been open for over 2 years, is there going to be any progress on this? Regards, David http://git.fedorahosted.org/git/?p=piranha.git;a=commit;h=8ca36132f5aad67fb0977f2efbf9d25b55776642 [test case is not valid; it ends in a other branch very close to original problem] Joerg thanks for a patch, I have add a new keywork hard_shutdown = (0 | 1) which should go to global section of lvs.conf. 1 - default - backward compatibility where system is either running completely or not at all. Setting hard_shutdown = 0 is functionality that you would like to use. If you are willing to test, I would like to send you a preliminary version.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
New keyword for lvs.cf was added.
hard_shutdows = (0 | 1)
1 (default) => problem with single nanny will kill all nannies
0 => problem with single nanny won't kill all nannies but system needs manual intervention
FYI, typo in the Technical Note. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1059.html |