Bug 1483673

Summary: Memory leak in L3 HA agent keepalived process
Product: Red Hat OpenStack Reporter: Jaison Raju <jraju>
Component: openstack-neutronAssignee: anil venkata <vkommadi>
Status: CLOSED ERRATA QA Contact: Alexander Stafeyev <astafeye>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: amuller, bperkins, chrisw, dmacpher, ihrachys, jraju, mschuppe, nyechiel, srevivo, tfreger, vkommadi
Target Milestone: zstreamKeywords: Triaged, Unconfirmed, ZStream
Target Release: 7.0 (Kilo)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-2015.1.4-22.el7ost Doc Type: Bug Fix
Doc Text:
Due to a memory leak, keepalived consumed a large amount of memory even after killing its existing processes. It would also repeatedly update the keepalived.conf file with duplicate routes. This fix resolves this issues by separating the persistence of keepalived's virtual routes according to their role, which resolves the memory leak.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-25 17:05:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaison Raju 2017-08-21 17:12:46 UTC
Description of problem:
One keepalived process of L3 HA agent is consuming system memory continuously and reach 200GB .
keepalived process can be killed & gets respawned, but memory consumption keeps rising.
# cat keepalived.conf | wc -l
55705

Currently one line is appended every 40 secs and 8040 KB memory usage is increased.


Version-Release number of selected component (if applicable):
RHOS7

How reproducible:
No. Only on customer environment.

Steps to Reproduce:
1.
2.
3.

Actual results:
Currently one line is appended every 40 secs and 8040 KB memory usage is increased.

Expected results:
keepalived does not exhibit any memeory leak behaviors.

Additional info:

Comment 3 Assaf Muller 2017-08-23 13:22:35 UTC
The existing sosreports L3 agent logs only show INFO+. Is it possible to flip to DEBUG, reproduce, and upload a new sosreport?

Comment 5 anil venkata 2017-08-24 12:04:10 UTC
_process_router_if_compatible() will repeatedly try to process the router if it gets any exception while processing the router. In this case below existing code can just append the route(for example, 0.0.0.0/0 via 210.93.170.123) though it is already added in previous iteration.

    def routes_updated(self):
        new_routes = self.router['routes']

        instance = self._get_keepalived_instance()

        # Filter out all of the old routes while keeping only the default route
        default_gw = (n_consts.IPv6_ANY, n_consts.IPv4_ANY)
        instance.virtual_routes = [route for route in instance.virtual_routes
                                   if route.destination in default_gw]
        for route in new_routes:
            instance.virtual_routes.append(keepalived.KeepalivedVirtualRoute(
                route['destination'],
                route['nexthop']))

        self.routes = new_routes

We already have a fix for this https://github.com/openstack/neutron/commit/e214b56da9205be7ba927142cc92e4f69ad09b01 in OSP8 and later branches. We need to backport that to OSP7 also.

As Assaf requested, logs can help us confirm the issue.

Comment 6 Ihar Hrachyshka 2017-08-24 13:17:57 UTC
Great. It would make sense to also determine which exception happens. Wouldn't we see it in sos-report logs?

Comment 7 anil venkata 2017-08-24 13:45:36 UTC
No, I am unable to find the router which is causing the issue in l3 agent and neutron server logs.  keepalived.conf[1] has 'qg-cc9f8bec-e8' and 'qr-48109b39-be' interfaces, but the corresponding ports and not seen in the l3 agent and neutron server logs(though these ports appear in l2 agent log, they are useless).

[1] http://collab-shell.usersys.redhat.com/01911336/x-text/keepalived.conf

Comment 8 anil venkata 2017-08-30 06:47:24 UTC
I need to release the build, can someone set the proper flags?

Comment 9 anil venkata 2017-08-30 12:09:05 UTC
Build openstack-neutron-2015.1.4-22.el7ost has the fix which avoids l3 agent to repeatedly add same route to keepalived conf.

Comment 16 Toni Freger 2017-10-05 06:46:35 UTC
The fix exists within openstack-neutron-2015.1.4-23.el7ost.noarch

Comment 18 errata-xmlrpc 2017-10-25 17:05:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3069