Bug 1287736

Summary: L3 agent fails to respawn keepalived process
Product: Red Hat OpenStack Reporter: Arie Bregman <abregman>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED ERRATA QA Contact: Alexander Stafeyev <astafeye>
Severity: unspecified Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: amuller, bperkins, chrisw, ihrachys, mlopes, nyechiel, sclewis, tfreger, yeylon
Target Milestone: ga   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-7.0.1-2.el7ost Doc Type: Bug Fix
Doc Text:
Prior to this update, the L3 agent failed to respawn keepalived process if the keepalived parent process died. This was because the child keepalived process was still running. Consequently, the L3 agent could not recover from keepalived parent process death, breaking the HA router served by the process. With this update, the L3 agent is made aware of the child keepalived process, and now cleans up it as well before respawning keepalived. As a result, the L3 agent is now able to recover HA routers when the keepalived process dies.
Story Points: ---
Clone Of:
: 1293339 (view as bug list) Environment:
Last Closed: 2016-04-07 21:15:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1293339    

Description Arie Bregman 2015-12-02 14:52:27 UTC
Description of problem: keepalived fails to respawn after crash when running OSP8 (Liberty) neutron.

Version-Release number of selected component (if applicable): neutron 8.0

How reproducible: always.

First, OSP8 based steps to reproduce:
1. set up OSP8 system.
2. run test_keepalived_respawns functional test for OSP8 neutron.
3. experience the following failure.

==============================
Failed 1 tests - output below:
==============================

neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_respawns
-------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "neutron/tests/functional/agent/linux/test_keepalived.py", line 73, in test_keepalived_respawns
        exception=RuntimeError(_("Keepalived didn't respawn")))
      File "neutron/agent/linux/utils.py", line 339, in wait_until_true
        eventlet.sleep(sleep)
      File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 34, in sleep
        hub.switch()
      File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
        return self.greenlet.switch()
    RuntimeError: Keepalived didn't respawn

Comment 2 Assaf Muller 2015-12-02 17:40:57 UTC
Adding upstream patch Arie sent. We'll need to backport it, so also adding dev_ack.

Comment 8 errata-xmlrpc 2016-04-07 21:15:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html