| Summary: | Unstable loadbalancer (piranha) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Henrik Johansson <henrik.l.johansson> | ||||
| Component: | piranha | Assignee: | Ryan O'Hara <rohara> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.2 | CC: | benjamin.girard, cluster-maint, djansa, jkortus, lhh, mfuruta | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | i686 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | piranha-0.8.5-9.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Prior to this update, terminating a nanny or an lvs daemon did not trigger a failover to the backup server. As a consequence, the load balancer stopped working. With this update, the pulse daemon shuts down if either the nanny daemon or the lvs daemon terminates. Now, the load balancer works as expected.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-12-06 17:57:30 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
With patch: # service pulse start # kill <nanny-pid> # tail /var/log/messages Aug 11 09:46:55 mobil-virt-09 nanny[14634]: Terminating due to signal 15 Aug 11 09:46:55 mobil-virt-09 lvs[14625]: nanny died! shutting down lvs Aug 11 09:46:55 mobil-virt-09 lvs[14625]: shutting down virtual service HTTP Aug 11 09:46:55 mobil-virt-09 nanny[14635]: Terminating due to signal 15 Aug 11 09:46:55 mobil-virt-09 nanny[14636]: Terminating due to signal 15 Aug 11 09:46:55 mobil-virt-09 pulse[14622]: Terminating due to signal 15 # service pulse start # kill <lvsd-pid> # tail /var/log/messages Aug 11 09:58:09 mobil-virt-09 lvs[14675]: shutting down due to signal 15 Aug 11 09:58:09 mobil-virt-09 lvs[14675]: shutting down virtual service HTTP Aug 11 09:58:09 mobil-virt-09 nanny[14684]: Terminating due to signal 15 Aug 11 09:58:09 mobil-virt-09 nanny[14685]: Terminating due to signal 15 Aug 11 09:58:09 mobil-virt-09 nanny[14686]: Terminating due to signal 15 Aug 11 09:58:09 mobil-virt-09 pulse[14671]: Terminating due to signal 15 Killing (with SIGTERM) either nanny or lvsd will cause all pulse/nanny/lvsd processes to exit.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Prior to this update, terminating a nanny or an lvs daemon did not trigger a failover to the backup server. As a consequence, the load balancer stopped working. With this update, the pulse daemon shuts down if either the nanny daemon or the lvs daemon terminates. Now, the load balancer works as expected.
Which version of piranha has the update? Any suggestions for restart of 'service pulse'? Oops, piranha-0.8.5-9.el6 Where can I find this update ? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1716.html |
Created attachment 500392 [details] An extract from /var/log/messages, from killing a nanny until restart of pulse. Description of problem: The loadbalancer stops working, when one of the nanny-processes dies or the lvsd-process get a TERM-signal. The lvsd stops the remaining nannys, goes into defunct status, the pulse processes (on MASTER and BACKUP) doesn't observe the problem. Manual fix: 'service pulse restart'. Version-Release number of selected component (if applicable): [root@lvs2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.0 (Santiago) [root@lvs2 ~]# rpm -qa | egrep piranha piranha-0.8.5-7.el6.i686 How reproducible: kill <nanny-process> or kill <lvsd-process> Steps to Reproduce: 1. service pulse start 2. kill <nanny-process> 3. Actual results: The loadbalancer stops and the backup doesn't notice the problem. Extract from ps: [root@lvs2 ~]# ps -ef | egrep "piranha|pulse|lvsd|nanny" root 15526 1 0 09:10 ? 00:00:00 pulse -v root 15533 15526 0 09:10 ? 00:00:00 [lvsd] <defunct> root 15579 1614 0 09:11 pts/0 00:00:00 egrep piranha|pulse|lvsd|nanny Expected results: Alternatives: - Restarting the missing nanny. - Restarting the service pulse. - Restarting the service pulse, with a timeout. - Stopping the service pulse. Additional info: An extract from /var/log/messages, from killing a nanny until restart of pulse.