Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 706881

Summary: Unstable loadbalancer (piranha)
Product: Red Hat Enterprise Linux 6 Reporter: Henrik Johansson <henrik.l.johansson>
Component: piranhaAssignee: Ryan O'Hara <rohara>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 6.2CC: benjamin.girard, cluster-maint, djansa, jkortus, lhh, mfuruta
Target Milestone: rc   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: piranha-0.8.5-9.el6 Doc Type: Bug Fix
Doc Text:
Prior to this update, terminating a nanny or an lvs daemon did not trigger a failover to the backup server. As a consequence, the load balancer stopped working. With this update, the pulse daemon shuts down if either the nanny daemon or the lvs daemon terminates. Now, the load balancer works as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 17:57:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
An extract from /var/log/messages, from killing a nanny until restart of pulse. none

Description Henrik Johansson 2011-05-23 10:51:47 UTC
Created attachment 500392 [details]
An extract from /var/log/messages, from killing a nanny until restart of pulse.

Description of problem:
The loadbalancer stops working, when one of the nanny-processes dies or the lvsd-process get a TERM-signal.
The lvsd stops the remaining nannys, goes into defunct status, the pulse processes (on MASTER and BACKUP) doesn't observe the problem. 

Manual fix: 'service pulse restart'.

Version-Release number of selected component (if applicable):
[root@lvs2 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.0 (Santiago)
[root@lvs2 ~]# rpm -qa | egrep piranha
piranha-0.8.5-7.el6.i686


How reproducible:
kill <nanny-process> or kill <lvsd-process>


Steps to Reproduce:
1. service pulse start
2. kill <nanny-process>
3.
  
Actual results:
The loadbalancer stops and the backup doesn't notice the problem. 
Extract from ps:
[root@lvs2 ~]# ps -ef | egrep "piranha|pulse|lvsd|nanny"
root     15526     1  0 09:10 ?        00:00:00 pulse -v
root     15533 15526  0 09:10 ?        00:00:00 [lvsd] <defunct>
root     15579  1614  0 09:11 pts/0    00:00:00 egrep piranha|pulse|lvsd|nanny


Expected results:
Alternatives:
- Restarting the missing nanny. 
- Restarting the service pulse.
- Restarting the service pulse, with a timeout.
- Stopping the service pulse.


Additional info:
An extract from /var/log/messages, from killing a nanny until restart of pulse.

Comment 6 Ryan O'Hara 2011-08-11 14:59:29 UTC
With patch:

# service pulse start
# kill <nanny-pid>
# tail /var/log/messages

Aug 11 09:46:55 mobil-virt-09 nanny[14634]: Terminating due to signal 15
Aug 11 09:46:55 mobil-virt-09 lvs[14625]: nanny died! shutting down lvs
Aug 11 09:46:55 mobil-virt-09 lvs[14625]: shutting down virtual service HTTP
Aug 11 09:46:55 mobil-virt-09 nanny[14635]: Terminating due to signal 15
Aug 11 09:46:55 mobil-virt-09 nanny[14636]: Terminating due to signal 15
Aug 11 09:46:55 mobil-virt-09 pulse[14622]: Terminating due to signal 15

# service pulse start
# kill <lvsd-pid>
# tail /var/log/messages

Aug 11 09:58:09 mobil-virt-09 lvs[14675]: shutting down due to signal 15
Aug 11 09:58:09 mobil-virt-09 lvs[14675]: shutting down virtual service HTTP
Aug 11 09:58:09 mobil-virt-09 nanny[14684]: Terminating due to signal 15
Aug 11 09:58:09 mobil-virt-09 nanny[14685]: Terminating due to signal 15
Aug 11 09:58:09 mobil-virt-09 nanny[14686]: Terminating due to signal 15
Aug 11 09:58:09 mobil-virt-09 pulse[14671]: Terminating due to signal 15

Killing (with SIGTERM) either nanny or lvsd will cause all pulse/nanny/lvsd processes to exit.

Comment 9 Eliska Slobodova 2011-10-25 09:50:41 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, terminating a nanny or an lvs daemon did not trigger a failover to the backup server. As a consequence, the load balancer stopped working. With this update, the pulse daemon shuts down if either the nanny daemon or the lvs daemon terminates. Now, the load balancer works as expected.

Comment 10 Henrik Johansson 2011-10-25 12:18:19 UTC
Which version of piranha has the update?

Any suggestions for restart of 'service pulse'?

Comment 11 Henrik Johansson 2011-10-25 12:20:15 UTC
Oops, piranha-0.8.5-9.el6

Comment 12 benjamin.girard 2011-12-02 17:27:11 UTC
Where can I find this update ?

Comment 13 errata-xmlrpc 2011-12-06 17:57:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1716.html