Bug 1571855

Summary: [Queens release] neutron-openvswitch high cpu usage
Product: [Community] RDO Reporter: Ivan Garcia <igarcia>
Component: openstack-neutronAssignee: Assaf Muller <amuller>
Status: CLOSED NEXTRELEASE QA Contact: Ofer Blaut <oblaut>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: chrisw, srevivo
Target Milestone: ---   
Target Release: trunk   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-26 18:28:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ivan Garcia 2018-04-25 14:26:55 UTC
Description of problem:

"We just ran into a case where the openvswitch agent (local dev destack, current master branch) eats 100% of CPU time.

Pyflame profiling show the time being largely spent in neutron.agent.linux.ip_conntrack, line 95.

https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_conntrack.py#L95

The code around this line is:

        while True:
            pool.spawn_n(self._process_queue)

The documentation of eventlet.spawn_n says: "The same as spawn(), but it’s not possible to know how the function terminated (i.e. no return value or exceptions). This makes execution faster. See spawn_n for more details." I suspect that GreenPool.spawn_n may behave similarly.

It seems plausible that spawn_n is returning very quickly because of some error, and then all time is quickly spent in a short circuited while loop."


https://bugs.launchpad.net/neutron/+bug/1750777

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
1. Deploy a RDO openstack cloud (queens)
2. execute the "top" command
3. neutron-openvswitch will use 100% of cpu time (compute and nodes)

Actual results:
neutron-openvswitch uses 100% cpu time



Expected results:

neutron-openvswitch does not use 100% cpu time

Additional info:

https://bugs.launchpad.net/neutron/+bug/1750777

Comment 1 Assaf Muller 2018-04-26 18:28:39 UTC
The fix is in upstream stable/queens as of April 5th:
https://review.openstack.org/#/c/554258/