Description of problem: After a set of Rally scenario completes neutron-server doesn't return to the previous idle utilization. [1] CPU Utilization, [2] Memory. [1]@10:40. @1550 the workload has completed, but neutron-server utilization never returns. ML2OVN is experience this. Trying exact same tests with OVSML2. [1] http://norton.perf.lab.eng.rdu.redhat.com:3000/dashboard/snapshot/xJjA6dF7KoNPSdCud7VdzhF4M8lZTdDS [2] http://norton.perf.lab.eng.rdu.redhat.com:3000/dashboard/snapshot/pttXNtlRM2ZQ3ofg1oeLALTyVS9kqQFL Version-Release number of selected component (if applicable): OSP14 2018-10-08.4 How reproducible: N/A Steps to Reproduce: 1. Run set of Rally scenarios defined [ https://gist.github.com/jtaleric/2909dd9a8c0bed13cc93b655a6b3b875 ] Restarting the containers fixes the problem.
I can confirm that I am seeing high cpu usage caused by neutron on an idle OSP14 deployment as well, The CPU usage increased over time on idle system, in my case I did not even run any high workload. But It may speed up to reproduce it.
Recreated with OVSML2
Changing state back to New, tested with puppet-tripleo-9.3.1-0.20181010034745.157eaab.el7ost.noarch (in puddle : 2018-10-25.3) Ran a set of Rally tests, and saw controller-0 still burning CPU even though Neutron had nothing to work on. All that is left over from my tests: (.browbeat-venv) (overcloud) [stack@c09-h11-r630 browbeat]$ neutron net-list neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +--------------------------------------+----------------------------------------------------+-----------+-------------------------------------------------------+ | id | name | tenant_id | subnets | +--------------------------------------+----------------------------------------------------+-----------+-------------------------------------------------------+ | 8cdf70ab-2a00-4395-a32f-4f12ede8f8bd | HA network tenant c4ce3b3558394963b425d6ab21bae058 | | a9d6106a-251b-4459-b4a6-b32e456c2708 169.254.192.0/18 | +--------------------------------------+----------------------------------------------------+-----------+-------------------------------------------------------+ I restarted Neutron and it returns the CPU utilization, you can see the before and after here.[1]. Interesting note, looking at a different controller (where I didn't restart Neutron). I can see in /var/log/container/neutron/server.log : 2018-11-01 23:11:21.818 29 INFO neutron.wsgi [-] 172.16.0.11 "OPTIONS / HTTP/1.0" status: 200 len: 248 time: 0.0013459 2018-11-01 23:11:22.548 33 INFO neutron.wsgi [-] 172.16.0.32 "OPTIONS / HTTP/1.0" status: 200 len: 248 time: 0.0010641 2018-11-01 23:11:23.816 35 INFO neutron.wsgi [-] 172.16.0.25 "OPTIONS / HTTP/1.0" status: 200 len: 248 time: 0.0010581 2018-11-01 23:11:23.821 35 INFO neutron.wsgi [-] 172.16.0.11 "OPTIONS / HTTP/1.0" status: 200 len: 248 time: 0.0008490 2018-11-01 23:11:24.552 38 INFO neutron.wsgi [-] 172.16.0.32 "OPTIONS / HTTP/1.0" status: 200 len: 248 time: 0.0011449 ^ This just constantly happens, like something is querying neutron? However, looking at controller-0, i see the same thing (after a restart). However, since Neutron is in wsgi, I figured I would look there too, but, neutron is the only service with it being empty : ./neutron-api: total 0 drwxr-xr-x. 2 root root 6 Nov 1 01:20 . drwxr-xr-x. 14 root root 221 Nov 1 01:21 .. GMR might help us figure out what Neutron is chewing on?[2] [1] http://norton.perf.lab.eng.rdu.redhat.com:3000/dashboard/snapshot/kAlDmd16lvs4ccc2qfEBOv08iv6GOZxw [2] https://wiki.openstack.org/wiki/GuruMeditationReport
For the requests traffic seen on controllers, it is normal healthcheck traffic from haproxy, we have (grabbed from haproxy container): listen neutron bind 10.0.0.113:9696 transparent bind 172.17.1.33:9696 transparent mode http http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto http if !{ ssl_fc } http-request set-header X-Forwarded-Port %[dst_port] option httpchk option httplog server controller-0.internalapi.localdomain 172.17.1.12:9696 check fall 5 inter 2000 rise 2 server controller-1.internalapi.localdomain 172.17.1.16:9696 check fall 5 inter 2000 rise 2 server controller-2.internalapi.localdomain 172.17.1.31:9696 check fall 5 inter 2000 rise 2 So this matches the regular OPTIONS requests in logs (httpchk without parameters mean OPTIONS)