Prior to this update, the L3 agent failed to respawn keepalived process if the keepalived parent process died. This was because the child keepalived process was still running.
Consequently, the L3 agent could not recover from keepalived parent process death, breaking the HA router served by the process.
With this update, the L3 agent is made aware of the child keepalived process, and now cleans up it as well before respawning keepalived.
As a result, the L3 agent is now able to recover HA routers when the keepalived process dies.
Description of problem: keepalived fails to respawn after crash when running OSP8 (Liberty) neutron.
Version-Release number of selected component (if applicable): neutron 8.0
How reproducible: always.
First, OSP8 based steps to reproduce:
1. set up OSP8 system.
2. run test_keepalived_respawns functional test for OSP8 neutron.
3. experience the following failure.
==============================
Failed 1 tests - output below:
==============================
neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_respawns
-------------------------------------------------------------------------------------------------------
Captured traceback:
~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "neutron/tests/functional/agent/linux/test_keepalived.py", line 73, in test_keepalived_respawns
exception=RuntimeError(_("Keepalived didn't respawn")))
File "neutron/agent/linux/utils.py", line 339, in wait_until_true
eventlet.sleep(sleep)
File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 34, in sleep
hub.switch()
File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
return self.greenlet.switch()
RuntimeError: Keepalived didn't respawn
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHEA-2016-0603.html
Description of problem: keepalived fails to respawn after crash when running OSP8 (Liberty) neutron. Version-Release number of selected component (if applicable): neutron 8.0 How reproducible: always. First, OSP8 based steps to reproduce: 1. set up OSP8 system. 2. run test_keepalived_respawns functional test for OSP8 neutron. 3. experience the following failure. ============================== Failed 1 tests - output below: ============================== neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_respawns ------------------------------------------------------------------------------------------------------- Captured traceback: ~~~~~~~~~~~~~~~~~~~ Traceback (most recent call last): File "neutron/tests/functional/agent/linux/test_keepalived.py", line 73, in test_keepalived_respawns exception=RuntimeError(_("Keepalived didn't respawn"))) File "neutron/agent/linux/utils.py", line 339, in wait_until_true eventlet.sleep(sleep) File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 34, in sleep hub.switch() File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch return self.greenlet.switch() RuntimeError: Keepalived didn't respawn