Bug 1290562

Summary: [perf] neutron list_ports grows in timing as number of ports increases
Product: Red Hat OpenStack Reporter: Alex Krzos <akrzos>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: amuller, chrisw, jlibosva, jschluet, jtaleric, mlopes, nyechiel, yeylon
Target Milestone: ga   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-7.0.1-10.el7ost Doc Type: Bug Fix
Doc Text:
Red Hat OpenStack Platform 8 introduced a new RBAC feature that allows you to share neutron networks with a specific list of tenants, instead of globally. As part of the feature, the default policy.json file for neutron started triggering I/O, consuming database fetches for every port fetch in attempt to allow the owner of a network to list all ports that belong to his network, even if they were created by other tenants. Consequently, the list operation for ports triggered multiple unneeded database fetches, which drastically affected performance of the operation. This update addresses this issue by running the I/O operations only when they are actually needed, for example, when the port to be validated by the policy engine does not belong to the tenant that invokes the list operation. As a result, list operations for ports will scale normally again.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-07 21:17:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1292167, 1292179    
Bug Blocks:    
Attachments:
Description Flags
Rally html output displaying the timings for each action performed against neutron none

Description Alex Krzos 2015-12-10 19:48:58 UTC
Created attachment 1104489 [details]
Rally html output displaying the timings for each action performed against neutron

Description of problem:
Testing neutron through browbeat (https://github.com/jtaleric/browbeat) create and list scenarios for networks, ports, routers, and subnets.  List-ports test shows a drastic increase in response timing of neutron.list_ports.

The benchmark ran 500 iterations of:
create network
Create port x 4
list ports

Timing values for each action are include in the rally html output attached to this bz

Testing was conducted on an OSPd deployed OSP 8 cloud with a single controller and a single compute node.  Pacemaker was deployed dispite the single controller configuration to accurate measure response timings with ha enabled.

Version-Release number of selected component (if applicable):
OSP 8
openstack-neutron-common-7.0.0-5.el7ost.noarch
python-neutron-lbaas-7.0.0-2.el7ost.noarch
python-neutronclient-3.1.0-1.el7ost.noarch
openstack-neutron-lbaas-7.0.0-2.el7ost.noarch
openstack-neutron-openvswitch-7.0.0-5.el7ost.noarch
openstack-neutron-metering-agent-7.0.0-5.el7ost.noarch
python-neutron-7.0.0-5.el7ost.noarch
openstack-neutron-7.0.0-5.el7ost.noarch
openstack-neutron-ml2-7.0.0-5.el7ost.noarch


How reproducible:
Every time

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


See details tab to view the drastic increase in response timing for listing ports as the number of ports created increases.

Comment 1 Assaf Muller 2015-12-10 23:02:26 UTC
I tried reproducing on devstack master. Following the Rally scenario, I created 500 networks and 4 ports in each network. No subnets or any other resources, everything is in the same tenant.

I issued 'time neutron port-list':
real	0m1.325s
user	0m0.572s
sys	0m0.038s

No dice... Creating a devstack setup based off Liberty now.

Comment 2 Assaf Muller 2015-12-15 15:00:51 UTC
There's some u/s traction, adding Launchpad bugs. Fixes will be backported to OSP 8. The summary is that newly added features (RBAC and AZs) added DB look ups on a per-resource basic when listing that resource type. So for example when listing networks, RBAC added a per-network DB look up, so the time to list networks increases linearly.

Comment 3 Haim Daniel 2015-12-22 15:10:39 UTC
Backported the following issues to osp-8:

https://bugs.launchpad.net/bugs/1525423 (https://bugzilla.redhat.com/show_bug.cgi?id=1292167)

https://bugs.launchpad.net/bugs/1525295 (https://bugzilla.redhat.com/show_bug.cgi?id=1292179)

Removing issue https://bugs.launchpad.net/bugs/1525740 back-port is irrelevant to osp-8, as its patch fixes network availability zones db, which didn't make it into osp-8.

Leaving https://bugs.launchpad.net/bugs/1513782 to track this issue.

Comment 5 Assaf Muller 2016-01-13 20:40:56 UTC
Package openstack-neutron-7.0.1-4 has two patches merged (Links in the dependent RHBZs). Joe Talerico installed it in an OSP 8 system and re-ran Rally tests, with these results:

http://10.16.31.132/rook-neutron-test-neutron-24/neutron/neutron-create-list-port-cc/run-1/rook-neutron-test-neutron-24-iteration_1-neutron-create-list-port-cc-0008.html#/NeutronNetworks.create_and_list_ports/details

It looks like it still takes 50 seconds to list 500 ports.

Comment 7 Assaf Muller 2016-01-21 15:03:00 UTC
Added an upstream patch by Ihar that will resolve the issue for most cases. We'll push it upstream and backport it to OSP 8 as fast as we can.

Comment 9 Assaf Muller 2016-02-23 16:23:29 UTC
Joe - Package is available for testing. Did you want to re-run Rally?

Comment 10 Joe Talerico 2016-02-24 13:15:58 UTC
Assaf - We are getting a OSP8 env up for this.

Comment 12 Toni Freger 2016-03-15 09:11:25 UTC
You can find my result here - http://file.tlv.redhat.com/~tfreger/rally_logs/output_1.html

Have been running this task on Controller+Compute setup, that installed via OSPd 
openstack-neutron-7.0.1-10.el7ost.noarch

From my results it seems to be improved as well.

Comment 13 Toni Freger 2016-03-15 09:15:14 UTC
Assaf, from your opinion what result we should get in order to verify this bug?

Comment 14 Assaf Muller 2016-03-17 11:58:47 UTC
Looking at your attached results vs. the attached results in comment 0, the linear coefficient has clearly reduced dramatically. We can always improve, but I personally consider this bug closed.

Comment 15 errata-xmlrpc 2016-04-07 21:17:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html