Description of problem: neutron l3ha wasting my memory, by living ~12 MB behind after every scenario test. If we would use this version of neutron with L3HA for managing our CI machines, we would need fight with the OOM killer on weekly bases. (Our CI jobs always creates new router) This also prevents to do any longer time neutron tests, or a bigger load test. Version-Release number of selected component (if applicable): python-neutronclient-6.0.0-1.1.el7ost.noarch python-neutron-lib-0.4.0-1.el7ost.noarch openstack-neutron-common-9.1.0-4.el7ost.noarch openstack-neutron-bigswitch-lldp-9.40.0-1.1.el7ost.noarch python-neutron-lbaas-9.1.0-1.el7ost.noarch openstack-neutron-9.1.0-4.el7ost.noarch openstack-neutron-openvswitch-9.1.0-4.el7ost.noarch openstack-neutron-ml2-9.1.0-4.el7ost.noarch openstack-neutron-metering-agent-9.1.0-4.el7ost.noarch puppet-neutron-9.4.0-3.el7ost.noarch python-neutron-9.1.0-4.el7ost.noarch openstack-neutron-sriov-nic-agent-9.1.0-4.el7ost.noarch python-neutron-tests-9.1.0-4.el7ost.noarch openstack-neutron-bigswitch-agent-9.40.0-1.1.el7ost.noarch openstack-neutron-lbaas-9.1.0-1.el7ost.noarch How reproducible: always Steps to Reproduce: 1. 3 controller default setup 2. install tempest 3. run any tempest scenario test which creates and uses and deletes a router. for ex.: ostestr -r 'minimum' Actual results: I have new sudo process ( rss 2808, shr 2112 kB (at least 684 kb wasted + the kernel side memory usage) , and neutron-rootwrap-daemon rss:14888 shr:4288, at least 10600kB . Expected results: The extra neutron-rootwrap-daemon and sudo processes dies when the router is deleted. Additional info: Similar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1383448
Hi, I have done some tests and indeed, there are processes leaked. In my particular case I've observed these new processes after running the following two tests: neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase.test_ha_router_lifecycle neutron.tests.functional.agent.l3.test_ha_router.LinuxBridgeL3HATestCase.test_ha_router_lifecycle 246a246,250 > root 21276 1 0 11:51 ? 00:00:00 ip -o monitor address > root 21680 1 0 11:52 ? 00:00:00 ip -o monitor address > root 21825 1 0 11:52 ? 00:00:00 sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf > root 21826 21825 1 11:52 ? 00:00:00 /usr/bin/python /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf [centos@devstack bug1397418]$ ps -o rss,sz,vsz 21276 RSS SZ VSZ 788 1666 6664 [centos@devstack bug1397418]$ ps -o rss,sz,vsz 21680 RSS SZ VSZ 788 1666 6664 [centos@devstack bug1397418]$ ps -o rss,sz,vsz 21825 RSS SZ VSZ 2796 48593 194372 [centos@devstack bug1397418]$ ps -o rss,sz,vsz 21826 RSS SZ VSZ 20936 76311 305244 Regarding the 'ip -o monitor' process, that's indeed because keepalived-state-change process is spawning that one and when a HA router is deleted, the keepalived-state-change is being stopped through a SIGKILL leaving 'ip -o monitor' orphaned. @Attila: does it make sense compared against what you have observed? @Assaf In my opinion, the patch to fix this could be capturing the kill signal within keepalived_state_change and get rid of children processes. Also, I need to investigate further about the rootwrap-daemon processes being leaked.
Sent a patch to upstream gerrit: https://review.openstack.org/#/c/411968/ In my setup, at least no 'ip -o monitor processes' are orphaned anymore Daniel
looks like patches have landed on stable/newton.
Build with the fix has been released in openstack-neutron-9.1.1-6.el7ost Steps to test: 1. 3 controller default setup 2. install tempest 3. run any tempest scenario test which creates and uses and deletes a router. for ex.: ostestr -r 'minimum'
To verify I had to run test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario test and made sure that ps aux | grep "monitor address" didn't return anything, so the process didn't exist Verified in openstack-neutron-ml2-9.2.0-2.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0314.html