Bug 1483673 - Memory leak in L3 HA agent keepalived process
Summary: Memory leak in L3 HA agent keepalived process
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 7.0 (Kilo)
Hardware: All
OS: Linux
high
high
Target Milestone: zstream
: 7.0 (Kilo)
Assignee: anil venkata
QA Contact: Alexander Stafeyev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-21 17:12 UTC by Jaison Raju
Modified: 2020-12-14 09:37 UTC (History)
11 users (show)

Fixed In Version: openstack-neutron-2015.1.4-22.el7ost
Doc Type: Bug Fix
Doc Text:
Due to a memory leak, keepalived consumed a large amount of memory even after killing its existing processes. It would also repeatedly update the keepalived.conf file with duplicate routes. This fix resolves this issues by separating the persistence of keepalived's virtual routes according to their role, which resolves the memory leak.
Clone Of:
Environment:
Last Closed: 2017-10-25 17:05:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3161881 0 None None None 2017-08-24 04:30:22 UTC
Red Hat Product Errata RHBA-2017:3069 0 normal SHIPPED_LIVE openstack-neutron bug fix advisory 2017-10-25 21:02:55 UTC

Description Jaison Raju 2017-08-21 17:12:46 UTC
Description of problem:
One keepalived process of L3 HA agent is consuming system memory continuously and reach 200GB .
keepalived process can be killed & gets respawned, but memory consumption keeps rising.
# cat keepalived.conf | wc -l
55705

Currently one line is appended every 40 secs and 8040 KB memory usage is increased.


Version-Release number of selected component (if applicable):
RHOS7

How reproducible:
No. Only on customer environment.

Steps to Reproduce:
1.
2.
3.

Actual results:
Currently one line is appended every 40 secs and 8040 KB memory usage is increased.

Expected results:
keepalived does not exhibit any memeory leak behaviors.

Additional info:

Comment 3 Assaf Muller 2017-08-23 13:22:35 UTC
The existing sosreports L3 agent logs only show INFO+. Is it possible to flip to DEBUG, reproduce, and upload a new sosreport?

Comment 5 anil venkata 2017-08-24 12:04:10 UTC
_process_router_if_compatible() will repeatedly try to process the router if it gets any exception while processing the router. In this case below existing code can just append the route(for example, 0.0.0.0/0 via 210.93.170.123) though it is already added in previous iteration.

    def routes_updated(self):
        new_routes = self.router['routes']

        instance = self._get_keepalived_instance()

        # Filter out all of the old routes while keeping only the default route
        default_gw = (n_consts.IPv6_ANY, n_consts.IPv4_ANY)
        instance.virtual_routes = [route for route in instance.virtual_routes
                                   if route.destination in default_gw]
        for route in new_routes:
            instance.virtual_routes.append(keepalived.KeepalivedVirtualRoute(
                route['destination'],
                route['nexthop']))

        self.routes = new_routes

We already have a fix for this https://github.com/openstack/neutron/commit/e214b56da9205be7ba927142cc92e4f69ad09b01 in OSP8 and later branches. We need to backport that to OSP7 also.

As Assaf requested, logs can help us confirm the issue.

Comment 6 Ihar Hrachyshka 2017-08-24 13:17:57 UTC
Great. It would make sense to also determine which exception happens. Wouldn't we see it in sos-report logs?

Comment 7 anil venkata 2017-08-24 13:45:36 UTC
No, I am unable to find the router which is causing the issue in l3 agent and neutron server logs.  keepalived.conf[1] has 'qg-cc9f8bec-e8' and 'qr-48109b39-be' interfaces, but the corresponding ports and not seen in the l3 agent and neutron server logs(though these ports appear in l2 agent log, they are useless).

[1] http://collab-shell.usersys.redhat.com/01911336/x-text/keepalived.conf

Comment 8 anil venkata 2017-08-30 06:47:24 UTC
I need to release the build, can someone set the proper flags?

Comment 9 anil venkata 2017-08-30 12:09:05 UTC
Build openstack-neutron-2015.1.4-22.el7ost has the fix which avoids l3 agent to repeatedly add same route to keepalived conf.

Comment 16 Toni Freger 2017-10-05 06:46:35 UTC
The fix exists within openstack-neutron-2015.1.4-23.el7ost.noarch

Comment 18 errata-xmlrpc 2017-10-25 17:05:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3069


Note You need to log in before you can comment on or make changes to this bug.