Bug 391131 - pulse cannot bind to port 539 after a restart, child processes still have it open
pulse cannot bind to port 539 after a restart, child processes still have it ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: piranha (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Marek Grac
Cluster QE
:
Depends On:
Blocks: 433473
  Show dependency treegraph
 
Reported: 2007-11-19 17:27 EST by Matthew Whitehead
Modified: 2010-10-22 16:33 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:54:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fixes part of the problem (594 bytes, patch)
2008-02-20 11:40 EST, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Matthew Whitehead 2007-11-19 17:27:40 EST
Description of problem: In the failover server (fos) configuration, the daemon
pulse leaves open a file descriptor for port 539 when it forks sub-processes. 

This includes ALL user specified programs started in /etc/sysconfig/ha/lvs.cf
using the 'start_cmd' directive.

Unless ALL programs terminate (including user specified ones), the next time you
start pulse on the same host, it will fail because it can't bind to the port.
All child processes of the first pulse are bound to the port, making it
unavailable to the new pulse.

Version-Release number of selected component (if applicable):


How reproducible: 100%


Steps to Reproduce:
1. Configure /etc/sysconfig/ha/lvs.cf to have start_cmd call a program that
never ends (ie "while [ 1 ]; do sleep 10 ; done ; ")
2. /etc/init.d/pulse stop # fails over second node
3. /etc/init.d/pulse start
  
Actual results:


Expected results:


Additional info:

While fos, nanny, and pulse may need port 539 open, pulse should close the
descriptor before it calls "start_cmd".
Comment 1 Nate Straz 2007-12-13 12:30:57 EST
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.
Comment 2 Marek Grac 2008-02-20 09:57:54 EST
Problem confirmed. But it looks like that main problem is in stopping the
applications as pulse/fos waits until all stop_cmd finish. And as long as 'fos'
is running you can't run second instance. Solution that you proposed covers part
of the problem - I will try to put together acceptable solution
Comment 3 Lon Hohberger 2008-02-20 11:40:09 EST
Created attachment 295430 [details]
Fixes part of the problem

This would fix the port being bound in child processes, but... the child
processes *should not* be left around after 'service pulse stop' has completed!
Comment 4 Marek Grac 2008-04-04 06:06:35 EDT
In Lon's patch you have to change F_[GS]ETFL to F_[GS]ETFD. Patch will be in CVS
(after 5.3 release)
Comment 9 errata-xmlrpc 2009-01-20 15:54:35 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0095.html

Note You need to log in before you can comment on or make changes to this bug.