Bug 1693363
| Summary: | Neutron L3 agents crash flapping after applying a minor update; DHCP agents unable to allocate memory | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Ganesh Kadam <gkadam> |
| Component: | openstack-neutron | Assignee: | Nate Johnston <njohnston> |
| Status: | CLOSED DUPLICATE | QA Contact: | Roee Agiman <ragiman> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 10.0 (Newton) | CC: | amuller, chrisw, nchandek, njohnston, pfb29, scohen |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-04-01 20:24:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ganesh Kadam
2019-03-27 16:19:35 UTC
Sorry to intrude on the BZ (I'm the original poster of the linked support case). Our issue, which may not be immediately clear, is that on a fresh boot of a combined OSP10 controller/network node (any of the total 3, the behaviour is seen on all), Neutron-related services start out consuming ~600MB RSS, but over a couple of days linearly balloon to consuming all available memory on the node; # Freshly booted node top -n1 -b -o %MEM | head -100 | egrep neutron 16597 neutron 20 0 509912 201876 6828 S 0.0 0.2 0:26.42 neutron-dhcp-agent 16601 neutron 20 0 506912 200768 6756 S 0.0 0.2 0:25.25 neutron-l3-agent 36018 neutron 20 0 496984 190912 6760 S 0.0 0.1 0:26.56 neutron-openvswitch # Node with 2 days uptime top -n1 -b -o %MEM | head -100 | egrep neutron 17872 neutron 20 0 14.7g 14.4g 6824 S 0.0 11.4 48:43.08 neutron-dhcp-agent 17881 neutron 20 0 14.7g 14.4g 6816 S 0.0 11.4 21:54.27 neutron-l3-agent 17992 neutron 20 0 14.7g 14.4g 6760 S 0.0 11.4 445:50.22 neutron-openvswitch Left unchecked these processes accumulate all available memory on the node, and then normal Neutron processes such as creation new networks or L3 routers can no longer proceed, as no memory can be allocated for them. The only fix has been to reboot the controller/network node as memory usage proceeds to these levels. I don't think our cloud sees the rate and totals of Neutron resource creation that could explain this memory usage purely in the number of resources being managed by Neutron in the cloud (total routers ~130 , total tenant private networks ~250). Team, Any update? Marking this as a duplicate of bug 1693430 which tracks our memory leak fix for python-openvswitch. *** This bug has been marked as a duplicate of bug 1693430 *** |