Description of problem: One keepalived process of L3 HA agent is consuming system memory continuously and reach 200GB . keepalived process can be killed & gets respawned, but memory consumption keeps rising. # cat keepalived.conf | wc -l 55705 Currently one line is appended every 40 secs and 8040 KB memory usage is increased. Version-Release number of selected component (if applicable): RHOS7 How reproducible: No. Only on customer environment. Steps to Reproduce: 1. 2. 3. Actual results: Currently one line is appended every 40 secs and 8040 KB memory usage is increased. Expected results: keepalived does not exhibit any memeory leak behaviors. Additional info:
The existing sosreports L3 agent logs only show INFO+. Is it possible to flip to DEBUG, reproduce, and upload a new sosreport?
_process_router_if_compatible() will repeatedly try to process the router if it gets any exception while processing the router. In this case below existing code can just append the route(for example, 0.0.0.0/0 via 210.93.170.123) though it is already added in previous iteration. def routes_updated(self): new_routes = self.router['routes'] instance = self._get_keepalived_instance() # Filter out all of the old routes while keeping only the default route default_gw = (n_consts.IPv6_ANY, n_consts.IPv4_ANY) instance.virtual_routes = [route for route in instance.virtual_routes if route.destination in default_gw] for route in new_routes: instance.virtual_routes.append(keepalived.KeepalivedVirtualRoute( route['destination'], route['nexthop'])) self.routes = new_routes We already have a fix for this https://github.com/openstack/neutron/commit/e214b56da9205be7ba927142cc92e4f69ad09b01 in OSP8 and later branches. We need to backport that to OSP7 also. As Assaf requested, logs can help us confirm the issue.
Great. It would make sense to also determine which exception happens. Wouldn't we see it in sos-report logs?
No, I am unable to find the router which is causing the issue in l3 agent and neutron server logs. keepalived.conf[1] has 'qg-cc9f8bec-e8' and 'qr-48109b39-be' interfaces, but the corresponding ports and not seen in the l3 agent and neutron server logs(though these ports appear in l2 agent log, they are useless). [1] http://collab-shell.usersys.redhat.com/01911336/x-text/keepalived.conf
I need to release the build, can someone set the proper flags?
Build openstack-neutron-2015.1.4-22.el7ost has the fix which avoids l3 agent to repeatedly add same route to keepalived conf.
The fix exists within openstack-neutron-2015.1.4-23.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3069