Bug 2219693 - [OSP 16.2][neutron] slow api reply for tenant ports listing is causing instance creation to fails
Summary: [OSP 16.2][neutron] slow api reply for tenant ports listing is causing instan...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Rodolfo Alonso
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-04 19:52 UTC by Flavio Piccioni
Modified: 2023-08-28 13:57 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-23 09:37:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-26311 0 None None None 2023-07-04 19:53:50 UTC

Internal Links: 2221701

Description Flavio Piccioni 2023-07-04 19:52:47 UTC
Description of problem:
vm creation is failing, problem seems related to nova getting timeout trying to retrieve tenant's ports:

2023-06-14 10:39:12.837 30 ERROR nova.api.openstack.wsgi keystoneauth1.exceptions.connection.ConnectTimeout: Request to http://$ip:9696/v2.0/ports?tenant_id=$uuid&fields=id timed out

Running openstack port list --tenant $tenant takes 1min and 50 seconds for something like 900 ports.

Version-Release number of selected component (if applicable):
RHOSP 16.1.8 witch opflex and cisco's plugins
RHOSP 16.2.5 witch opflex and cisco's plugins


How reproducible:

Steps to Reproduce:
1. find (or create) a tenant with a lot of ports from different subnets (>900 in this case)
2. try to create a vm (openstack server create --flavor xxx --image xxx --network test-net test-vm)

Actual results:
listing tenant's ports is taking something like 2 minutes, causing nova's instance creation workflow to timing out.

Expected results:
listing tenant's ports to be more efficient (referring for example to https://bugs.launchpad.net/neutron/+bug/2016704
or https://review.opendev.org/c/openstack/neutron/+/790691)

Comment 2 Ihar Hrachyshka 2023-07-05 14:04:30 UTC
To confirm, there's nothing specific about the ports in question used for the scenario. right? E.g. these are not trunk ports, or ports with a large number of additional SGs / address pairs... Anything particular about the ports in question? Just want to make sure we don't miss something specific about the ports.

===

Another question I have for the scenario is - why does nova list all (?) ports as part of a VM boot? Shouldn't it list just the ports that belong to the VM? Perhaps there's some nova backport missing to optimize it from compute side. Perhaps something to check on with Compute team.

Comment 3 Flavio Piccioni 2023-07-05 19:03:00 UTC
(In reply to Ihar Hrachyshka from comment #2)
> To confirm, there's nothing specific about the ports in question used for
> the scenario. right? E.g. these are not trunk ports, or ports with a large
> number of additional SGs / address pairs... Anything particular about the
> ports in question? Just want to make sure we don't miss something specific
> about the ports.
> 

from customer reply: "all these VMs have single connection and no trunks. We have very simple VMs.Regarding security groups, we do not have more than 2-3"


> 
> Another question I have for the scenario is - why does nova list all (?)
> ports as part of a VM boot? Shouldn't it list just the ports that belong to
> the VM? Perhaps there's some nova backport missing to optimize it from
> compute side. Perhaps something to check on with Compute team.

from a very quick query to DFG:compute: there is proably a historical reason, prehaps quotas, as enforicng port quotas

Comment 4 Rodolfo Alonso 2023-07-10 15:47:18 UTC
Hello Flavio:

Some questions about this issue:
1) These 900 ports, to what subnets/networks they belong? Do these networks belong to other projects?
2) Has "tenant" (from the "port list" command) admin powers or is a regular user?
3) How many RBAC rules are there in the project? To what project these RBAC entries belong? *NOTE* I'm not asking the target project in the RBAC rule but from what project these RBAC entries were created. In particular what I'm wondering if these rules belong to the same project as "tenant". A "select * from networkrbacs" will help.
4) Did the customer enabled the "slow_query_log" in the database engine? If that is the case, the output will be very helpful. This can be enabled/disable on real-time with [1].

Regards.

[1]https://access.redhat.com/solutions/321003

Comment 9 Ihar Hrachyshka 2023-07-11 19:30:50 UTC
Reported https://bugzilla.redhat.com/show_bug.cgi?id=2222102 to track nova optimization to avoid fetching all ports on VM creation.

Comment 12 Ihar Hrachyshka 2023-07-12 15:52:09 UTC
Rodolfo, I don't think get_ports calls to ml2 drivers, does it? I believe I checked it in code when I was looking at the 16.1 counterpart of this bz, and I couldn't find where ml2 driver could be called by ml2 plugin. Perhaps I'm missing something?

Comment 15 Rodolfo Alonso 2023-07-13 08:31:40 UTC
Hello Ihar:

I can't confirm that for 'apic_aim' mechanism driver. Actually using ML2/OVS or ML2/OVN and testing in a similar environment (RBACs, users, networks, ports), the Neutron API response time is around 5-6 seconds. Because we don't provide support for this driver and I don't know what extensions this mechanism driver is loading (that could affect to the "_make_port_dict" method and most probably this is what is happening), I can't triage nor analyse this issue.

Regards.


Note You need to log in before you can comment on or make changes to this bug.