Bug 1732076

Summary: Neutron List Security Groups API Regression
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: openstack-neutronAssignee: Slawek Kaplonski <skaplons>
Status: CLOSED CURRENTRELEASE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 15.0 (Stein)CC: amuller, chrisw, njohnston, scohen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-29 10:47:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sai Sindhur Malleni 2019-07-22 15:46:19 UTC
Description of problem: Comparing OSP 15 with OSP13 both using ML2/OVN backend, there seems to be a regression when exercising the list security groups API call.

Rally scenario used is create-list-security-group which creates a new security group and lists them and concurrency was set to 16 and time was set to 500.

The 95% API response time in the case of OSP 15 was 16s to list security groups while it was 6s


Version-Release number of selected component (if applicable):
15

How reproducible:
100%

Steps to Reproduce:
1. deploy OSP15
2. Run create-list-security-group rally scenario
3.

Actual results:
There is a performance reression

Expected results:
Similar or better results for OSP 15

Additional info:
Not sure if this could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1721273

Comment 1 Nate Johnston 2019-07-25 15:12:42 UTC
This looks like it might be the same issue as https://bugs.launchpad.net/bugs/1830679

We should make sure https://review.opendev.org/670075 is applied and see if that helps the response time.

Comment 2 Slawek Kaplonski 2019-07-29 10:47:34 UTC
I was trying to reproduce this issue today and check if patch https://review.opendev.org/670075 will help to solve it.

First problem for me was that I wasn't able to reproduce this issue. I was checking it using:

()[root@controller-0 /]# rpm -qa | grep neutron
puppet-neutron-14.4.1-0.20190531220405.ff3610d.el8ost.noarch
python3-neutron-lib-1.25.0-0.20190521130309.fc2a810.el8ost.noarch
python3-neutron-14.0.3-0.20190704180411.9f4e596.el8ost.noarch
openstack-neutron-common-14.0.3-0.20190704180411.9f4e596.el8ost.noarch
python3-neutron-dynamic-routing-14.0.1-0.20190426180400.f313f0e.1.el8ost.noarch
openstack-neutron-lbaas-14.0.1-0.20190614170521.30bdd86.el8ost.noarch
openstack-neutron-ml2-14.0.3-0.20190704180411.9f4e596.el8ost.noarch
python3-neutronclient-6.12.0-0.20190312100012.680b417.el8ost.noarch
openstack-neutron-14.0.3-0.20190704180411.9f4e596.el8ost.noarch
python3-neutron-lbaas-14.0.1-0.20190614170521.30bdd86.el8ost.noarch

and container version:
192.168.24.1:8787/rhosp15/openstack-neutron-server:20190715.1

This version don't have patch https://review.opendev.org/670075 but results were much better than mentioned in bug description:


test scenario NeutronSecurityGroup.create_and_list_security_groups
args position 0
args values:
{
  "args": {
    "security_group_create_args": {}
  },
  "runner": {
    "times": 100,
    "concurrency": 10
  },
  "contexts": {
    "users": {
      "tenants": 3,
      "users_per_tenant": 3
    },
    "quotas": {
      "neutron": {
        "security_group": -1
      }
    }
  },
  "sla": {
    "failure_rate": {
      "max": 0
    }
  },
  "hooks": []
}

--------------------------------------------------------------------------------
Task 138c2349-be2c-4a00-b831-8908f800b525 has 0 error(s)
--------------------------------------------------------------------------------

+----------------------------------------------------------------------------------------------------------------------------------+
|                                                       Response Times (sec)                                                       |
+-------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| Action                        | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
+-------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| neutron.create_security_group | 1.159     | 1.574        | 2.49         | 2.828        | 3.247     | 1.775     | 100.0%  | 100   |
| neutron.list_security_groups  | 0.296     | 1.809        | 2.801        | 3.144        | 4.374     | 1.925     | 100.0%  | 100   |
| total                         | 2.206     | 3.552        | 4.764        | 5.039        | 5.759     | 3.699     | 100.0%  | 100   |
|  -> duration                  | 2.206     | 3.552        | 4.764        | 5.039        | 5.759     | 3.699     | 100.0%  | 100   |
|  -> idle_duration             | 0.0       | 0.0          | 0.0          | 0.0          | 0.0       | 0.0       | 100.0%  | 100   |
+-------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+

Load duration: 38.97572
Full duration: 121.609626


=====================

So as next step I applied manually patch https://review.opendev.org/670075 in neutron_api containers in all 3 controller nodes on my deployment and run same rally scenario again. Results were as below:

test scenario NeutronSecurityGroup.create_and_list_security_groups
args position 0
args values:
{
  "args": {
    "security_group_create_args": {}
  },
  "runner": {
    "times": 100,
    "concurrency": 10
  },
  "contexts": {
    "users": {
      "tenants": 3,
      "users_per_tenant": 3
    },
    "quotas": {
      "neutron": {
        "security_group": -1
      }
    }
  },
  "sla": {
    "failure_rate": {
      "max": 0
    }
  },
  "hooks": []
}

--------------------------------------------------------------------------------
Task 56089a6e-fe99-407f-a0fb-e85077b7372f has 0 error(s)
--------------------------------------------------------------------------------

+----------------------------------------------------------------------------------------------------------------------------------+
|                                                       Response Times (sec)                                                       |
+-------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| Action                        | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
+-------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| neutron.create_security_group | 1.145     | 1.564        | 2.528        | 2.937        | 3.484     | 1.69      | 100.0%  | 100   |
| neutron.list_security_groups  | 0.141     | 0.722        | 1.159        | 1.462        | 1.856     | 0.734     | 100.0%  | 100   |
| total                         | 1.566     | 2.227        | 3.466        | 3.723        | 5.322     | 2.424     | 100.0%  | 100   |
|  -> duration                  | 1.566     | 2.227        | 3.466        | 3.723        | 5.322     | 2.424     | 100.0%  | 100   |
|  -> idle_duration             | 0.0       | 0.0          | 0.0          | 0.0          | 0.0       | 0.0       | 100.0%  | 100   |
+-------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+

Load duration: 25.90234
Full duration: 105.568619

So as You can see, there is significant improvement when this patch is applied. It is already in rhos-15.0-trunk-patches branch so should be included in next OSP-15 puddle IMO: https://code.engineering.redhat.com/gerrit/gitweb?p=neutron.git;a=log;h=refs/heads/rhos-15.0-trunk-patches

I will close this BZ as CURRENTRELEASE for now but feel free to reopen it if You will still have same issue using newest OSP-15.
In such case please also provide details about environment on which You had this issue and maybe it would be also possible to get access to Your env to debug there what is going on and why it happend like that.