Description of problem: ======================= Bug 1325861 (launchpad #1565511) aims to solve cases where the lbaas agent goes offline. To have a complete high-availability solution for lbaas agent with haproxy running in namespace, we would also want to handle a case where the haproxy process itself stopped. This[1] neutron spec offers the following approach: "We propose monitoring those processes, and taking a configurable action, making neutron more resilient to external failures." [1] http://specs.openstack.org/openstack/neutron-specs/specs/juno/agent-child-processes-status.html
*** Bug 1269981 has been marked as a duplicate of this bug. ***
Both patches look to be merged.
https://review.openstack.org/#/c/327966/ was reverted, adding https://review.openstack.org/#/c/344658/ as an external tracker.
Patch has been merged upstream.
How to test: ============ 1. Create a Loadbalancer 2. Create a Listener 3. Create Pool and memebers 4. Verify loadbalancing functionality. 5. Kill the haproxy process 6. Wait for ~30 sec and see if it respawns. 7. Redo step #4
https://review.openstack.org/#/c/344658/21/neutron_lbaas/drivers/haproxy/namespace_driver.py@379 fits what I have in my deployment. Verifying Tnx
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245