Description of problem: OCP installation on OSP is taking much more time with OVN neutron backend rather than with ML2/OVS backend. It seems to be related to the time it takes now to add neutron subports to VM trunks, which in a OCP on OSP installation is made frequently when pods are created in each new OCP project. Adding 5 subports to a trunk (creating a pod in a new project) takes: 20 seconds with ML2/OVS > 2 minutes with ML2/OVN Version-Release number of selected component (if applicable): OSP 13 2019-03-18.1 puddle python-networking-ovn-4.0.3-3.el7ost.noarch python-networking-ovn-metadata-agent-4.0.3-3.el7ost.noarch openvswitch-ovn-central-2.9.0-97.el7fdp.x86_64 openvswitch-ovn-common-2.9.0-97.el7fdp.x86_64 puppet-ovn-12.4.0-2.el7ost.noarch openvswitch-ovn-host-2.9.0-97.el7fdp.x86_64 How reproducible: 100% Steps to Reproduce: 1. Deploy OSP with Octavia and OVN neutron backend (no DVR) 2. Run OCP 3.11 playbooks with Kuryr 3. Once OCP is deployed, ssh into master node 4. Create a new project: [openshift@master-0 ~]$ oc new-project hello-openshift 5. Obtain the IP range assigned to the network namespace for the project [openshift@master-0 ~]$ oc get kuryrnets NAME AGE ns-default 19h ns-hello-openshift 43s (...) [openshift@master-0 ~]$ oc get kuryrnets ns-hello-openshift -o yaml (...) spec: netId: 72e3b67f-76c5-41e5-817f-e3a3fac2941c routerId: 5396d97c-f8d0-4bec-a096-0025afdb75dc sgId: 590cd3e3-571c-4126-a707-b50d3a64803b subnetCIDR: 10.11.15.0/24 <<<<----------------- subnetId: b522840e-c1d2-4565-85be-41fdb9fae0cf 6. Monitor the ports status in that IP range and see how much takes to all of them be in ACTIVE status after the pod creation is launched [stack@undercloud-0 ~]$ watch "openstack port list | grep 10.11.15" 7. Deploy hello-openshift pod: [openshift@master-0 ~]$ oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json Actual results: The 5 ports created by Kuryr do need more than 2 minutes to be added to VM trunk (and go to ACTIVE status). And they are moved to ACTIVE one by one, not as one operation, although they are requested by a unique call [1]. (shiftstack) [stack@undercloud-0 ~]$ openstack port list | grep 10.11.15 | 30d8ae16-04bf-48c1-91ad-b7d4b6f02712 | fa:16:3e:1a:99:08 | ip_address='10.11.15.7', subnet_id='b522840e-c1d2-4565-85be-41fdb9fae0cf' | DOWN | | 3e4ff6a6-735c-4e3c-8f41-61af0f342fe4 | fa:16:3e:c4:7f:59 | ip_address='10.11.15.17', subnet_id='b522840e-c1d2-4565-85be-41fdb9fae0cf' | DOWN | | 3ec4a554-1812-4e26-9321-c47c67271ad1 | fa:16:3e:6c:5c:38 | ip_address='10.11.15.12', subnet_id='b522840e-c1d2-4565-85be-41fdb9fae0cf' | DOWN | | 4ec67995-7fa9-43d5-b606-34f15e9c174f | fa:16:3e:8e:3e:12 | ip_address='10.11.15.5', subnet_id='b522840e-c1d2-4565-85be-41fdb9fae0cf' | ACTIVE | | 943270b4-6163-4306-bed7-1266249d1dbb | fa:16:3e:3c:6d:d1 | ip_address='10.11.15.14', subnet_id='b522840e-c1d2-4565-85be-41fdb9fae0cf' | DOWN | | c07b39f9-6973-48d7-aa08-ece9745c0e9e | fa:16:3e:7d:33:00 | ip_address='10.11.15.1', subnet_id='b522840e-c1d2-4565-85be-41fdb9fae0cf' | ACTIVE | Expected results: With ML2/OVS backend it takes 20 seconds, and the ports are moved to ACTIVE at the same time, not one by one. Additional info: [1] https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/nested_vlan_vif.py#L95-L96
I see a bunch of these messages: 2019-03-21 12:01:04.077 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.079 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.081 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.083 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.084 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.086 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.088 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.090 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.092 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.094 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 2019-03-21 12:01:04.096 35 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-ad4b7334-17d5-4bde-90c1-fb5d141af326 c13e0cc8325b47d99780b07e32c1c29e 240611d0d1c84092a8105e7e9f7ec85c - default default] Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'] _exclude_attributes_by_policy /usr/lib/python2.7/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256 Which I believe is the reason of the slowness. The binding details come from this: https://github.com/openstack/networking-ovn/blob/6.0.0.0b1/networking_ovn/ml2/trunk_driver.py#L53
I was able to reproduce this issue on devstack with ml2/ovn with steps like: 1. Create trunk parent port openstack port create --network private trunk-port 2. Create trunk openstack network trunk create --parent-port a103ed20-ea9c-46b6-a89d-4441b6d4fb72 trunk 3. Create trunked net neutron net-create trunked-net neutron subnet-create trunked-net 192.168.100.0/24 4. Create subports: openstack port create --network trunked-net subport1 openstack port create --network trunked-net subport2 openstack port create --network trunked-net subport3 openstack port create --network trunked-net subport4 openstack port create --network trunked-net subport5 4. Add subports to trunk: openstack network trunk set --subport port=b41659ba-2444-4a34-96b3-9bcd7ec89296,segmentation-type=vlan,segmentation-id=101 --subport port=a70f1dc9-8fe7-481e-9608-d3d51cdc1eac,segmentation-type=vlan,segmentation-id=102 --subport port=2b7dc21f-18ee-42ea-854c-01e4654b7f0c,segmentation-type=vlan,segmentation-id=103 --subport port=8d1e3151-992c-48d4-85ec-618449189423,segmentation-type=vlan,segmentation-id=104 --subport port=91d4d286-81d8-4101-a3c2-359e0c800a7e,segmentation-type=vlan,segmentation-id=105 9da4ef0f-72bf-4031-8c4f-5d3b739b8e5b 5. Spawm vm connected to this trunk's parent port: nova boot --image cirros-0.3.4-i386-disk --flavor m1.micro --nic port-id=a103ed20-ea9c-46b6-a89d-4441b6d4fb72 vm-trunks After that watch status of subports, it took a lot of time to switch all of them to ACTIVE. When I did the same steps on devstack with ml2/ovs it was much faster.
Some update with what I did and what I found today. I used simple script to reproduce the issue: http://pastebin.test.redhat.com/742759 with this script I got results like: $ bash test_trunks.sh +------------+------------------------------------------------+ | Property | Value | +------------+------------------------------------------------+ | ip_address | 10.0.0.31,fdd5:442b:9728:0:f816:3eff:fef8:f8ac | | mac_addr | fa:16:3e:f8:f8:ac | | net_id | 120fe645-30f8-478a-9446-b357550acb3a | | port_id | adf3ee81-2ab1-478f-abfd-6024991af9b0 | | port_state | DOWN | | tag | - | +------------+------------------------------------------------+ Number of active ports: 0 .... Number of active ports: 0 .... Number of active ports: 1 .... Number of active ports: 1 .... Number of active ports: 2 .... Number of active ports: 2 .... Number of active ports: 2 .... Number of active ports: 3 .... Number of active ports: 3 .... Number of active ports: 4 .... Number of active ports: 4 .... Number of active ports: 4 .... Number of active ports: 4 .... Number of active ports: 5 .... Transition of ports to ACTIVE took 99.169112739 seconds So it took almost 100 seconds to turn all subports to active after trunk port was connected to instance. I was debugging this issue and I found that it takes about 20 seconds for each subport in set_port_status_up: Mar 22 16:44:13 devstack-ovn-ubuntu-controller neutron-server[12800]: INFO networking_ovn.ml2.mech_driver [None req-93d6d785-05ee-42a4-b909-206eb9bfd983 None None] OVN reports status up for port: 824dd0b0-5e31-4b44-9718-b4e1c8dc9af5 (id of subport-3) Mar 22 16:44:37 devstack-ovn-ubuntu-controller neutron-server[12800]: INFO networking_ovn.ml2.mech_driver [None req-93d6d785-05ee-42a4-b909-206eb9bfd983 None None] OVN reports status up for port: f0ebaacd-22ce-454c-a8ac-8dc1dc408f62 (id of subport-5) Mar 22 16:44:53 devstack-ovn-ubuntu-controller neutron-server[12800]: INFO networking_ovn.ml2.mech_driver [None req-93d6d785-05ee-42a4-b909-206eb9bfd983 None None] OVN reports status up for port: 07c7396b-5c64-4ada-a992-fd6fca1d130a (id of subport-4) Mar 22 16:45:10 devstack-ovn-ubuntu-controller neutron-server[12800]: INFO networking_ovn.ml2.mech_driver [None req-93d6d785-05ee-42a4-b909-206eb9bfd983 None None] OVN reports status up for port: e5406129-114a-414b-b923-869864e45fd9 (id of subport-2) Mar 22 16:45:26 devstack-ovn-ubuntu-controller neutron-server[12800]: INFO networking_ovn.ml2.mech_driver [None req-93d6d785-05ee-42a4-b909-206eb9bfd983 None None] OVN reports status up for port: adf3ee81-2ab1-478f-abfd-6024991af9b0 (id of trunk port) Mar 22 16:45:31 devstack-ovn-ubuntu-controller neutron-server[12800]: INFO networking_ovn.ml2.mech_driver [None req-93d6d785-05ee-42a4-b909-206eb9bfd983 None None] OVN reports status up for port: e80b876f-ee81-484a-b645-4ea616bc4ef9 (id of subport-1) So, next I wanted to found out where we are loosing time in this set_port_status_up method. So I added some "dummy" logging (patch like http://pastebin.test.redhat.com/742768) to check where it spends most of the time. It looks that we lost most of the time in _update_subport_host_if_needed() method, and inside this method vast majority of time is spend in self._plugin.update_port() - in overall it is around 10 seconds there for each subport. Second bottleneck is in provisioning_blocks.provisioning_complete() method, it's around 5 seconds. Both those methods are not from networking-ovn but from core neutron code. For now I don't know yet how to improve it. First thing which comes to my mind is to call set_port_status_up in parallel but I don't know networking-ovn mech driver well thus I have no idea what other problems it may cause and why it isn't done like that for now.
Lucas' working on this BZ and a general optimization mechanism in the context of this design document: https://docs.openstack.org/networking-ovn/latest/contributor/design/distributed_ovsdb_events.html
Verified on python-networking-ovn-4.0.3-10.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2629