Bug 391131

Summary: pulse cannot bind to port 539 after a restart, child processes still have it open
Product: Red Hat Enterprise Linux 5 Reporter: Matthew Whitehead <mwhitehe>
Component: piranhaAssignee: Marek Grac <mgrac>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: cluster-maint, james.brown, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:54:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 433473    
Attachments:
Description Flags
Fixes part of the problem none

Description Matthew Whitehead 2007-11-19 22:27:40 UTC
Description of problem: In the failover server (fos) configuration, the daemon
pulse leaves open a file descriptor for port 539 when it forks sub-processes. 

This includes ALL user specified programs started in /etc/sysconfig/ha/lvs.cf
using the 'start_cmd' directive.

Unless ALL programs terminate (including user specified ones), the next time you
start pulse on the same host, it will fail because it can't bind to the port.
All child processes of the first pulse are bound to the port, making it
unavailable to the new pulse.

Version-Release number of selected component (if applicable):


How reproducible: 100%


Steps to Reproduce:
1. Configure /etc/sysconfig/ha/lvs.cf to have start_cmd call a program that
never ends (ie "while [ 1 ]; do sleep 10 ; done ; ")
2. /etc/init.d/pulse stop # fails over second node
3. /etc/init.d/pulse start
  
Actual results:


Expected results:


Additional info:

While fos, nanny, and pulse may need port 539 open, pulse should close the
descriptor before it calls "start_cmd".

Comment 1 Nate Straz 2007-12-13 17:30:57 UTC
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.

Comment 2 Marek Grac 2008-02-20 14:57:54 UTC
Problem confirmed. But it looks like that main problem is in stopping the
applications as pulse/fos waits until all stop_cmd finish. And as long as 'fos'
is running you can't run second instance. Solution that you proposed covers part
of the problem - I will try to put together acceptable solution

Comment 3 Lon Hohberger 2008-02-20 16:40:09 UTC
Created attachment 295430 [details]
Fixes part of the problem

This would fix the port being bound in child processes, but... the child
processes *should not* be left around after 'service pulse stop' has completed!

Comment 4 Marek Grac 2008-04-04 10:06:35 UTC
In Lon's patch you have to change F_[GS]ETFL to F_[GS]ETFD. Patch will be in CVS
(after 5.3 release)

Comment 9 errata-xmlrpc 2009-01-20 20:54:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0095.html