Red Hat Bugzilla – Bug 391131
pulse cannot bind to port 539 after a restart, child processes still have it open
Last modified: 2010-10-22 16:33:14 EDT
Description of problem: In the failover server (fos) configuration, the daemon
pulse leaves open a file descriptor for port 539 when it forks sub-processes.
This includes ALL user specified programs started in /etc/sysconfig/ha/lvs.cf
using the 'start_cmd' directive.
Unless ALL programs terminate (including user specified ones), the next time you
start pulse on the same host, it will fail because it can't bind to the port.
All child processes of the first pulse are bound to the port, making it
unavailable to the new pulse.
Version-Release number of selected component (if applicable):
How reproducible: 100%
Steps to Reproduce:
1. Configure /etc/sysconfig/ha/lvs.cf to have start_cmd call a program that
never ends (ie "while [ 1 ]; do sleep 10 ; done ; ")
2. /etc/init.d/pulse stop # fails over second node
3. /etc/init.d/pulse start
While fos, nanny, and pulse may need port 539 open, pulse should close the
descriptor before it calls "start_cmd".
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.
Problem confirmed. But it looks like that main problem is in stopping the
applications as pulse/fos waits until all stop_cmd finish. And as long as 'fos'
is running you can't run second instance. Solution that you proposed covers part
of the problem - I will try to put together acceptable solution
Created attachment 295430 [details]
Fixes part of the problem
This would fix the port being bound in child processes, but... the child
processes *should not* be left around after 'service pulse stop' has completed!
In Lon's patch you have to change F_[GS]ETFL to F_[GS]ETFD. Patch will be in CVS
(after 5.3 release)
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.