1571855 – [Queens release] neutron-openvswitch high cpu usage

RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/

Bug 1571855 - [Queens release] neutron-openvswitch high cpu usage

Summary: [Queens release] neutron-openvswitch high cpu usage

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	RDO
Classification:	Community
Component:	openstack-neutron
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	trunk
Assignee:	Assaf Muller
QA Contact:	Ofer Blaut
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-25 14:26 UTC by Ivan Garcia
Modified:	2018-04-26 18:28 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-04-26 18:28:39 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ivan Garcia 2018-04-25 14:26:55 UTC

Description of problem:

"We just ran into a case where the openvswitch agent (local dev destack, current master branch) eats 100% of CPU time.

Pyflame profiling show the time being largely spent in neutron.agent.linux.ip_conntrack, line 95.

https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_conntrack.py#L95

The code around this line is:

        while True:
            pool.spawn_n(self._process_queue)

The documentation of eventlet.spawn_n says: "The same as spawn(), but it’s not possible to know how the function terminated (i.e. no return value or exceptions). This makes execution faster. See spawn_n for more details." I suspect that GreenPool.spawn_n may behave similarly.

It seems plausible that spawn_n is returning very quickly because of some error, and then all time is quickly spent in a short circuited while loop."


https://bugs.launchpad.net/neutron/+bug/1750777

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
1. Deploy a RDO openstack cloud (queens)
2. execute the "top" command
3. neutron-openvswitch will use 100% of cpu time (compute and nodes)

Actual results:
neutron-openvswitch uses 100% cpu time



Expected results:

neutron-openvswitch does not use 100% cpu time

Additional info:

https://bugs.launchpad.net/neutron/+bug/1750777

Comment 1 Assaf Muller 2018-04-26 18:28:39 UTC

The fix is in upstream stable/queens as of April 5th:
https://review.openstack.org/#/c/554258/

Note You need to log in before you can comment on or make changes to this bug.