Bug 1731214

Summary: After creating a bunch of VMs we get to a point where no VMs can be created due to VirtualInterfaceCreateException: Virtual Interface creation failed
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: openstack-neutronAssignee: Lucas Alvares Gomes <lmartins>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: amuller, chrisw, lmartins, njohnston, pveiga, schari, scohen
Target Milestone: ---Keywords: Reopened, Scale, TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-30 10:46:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2019-07-18 16:10:49 UTC
Description of problem:
Ater creating a bunch of VMs we get to a point where no VMs can be created due to VirtualInterfaceCreateException: Virtual Interface creation failed in nova-compute .   In the neutron logs, we can see the following errors:


2019-07-18 15:44:50.277 32 WARNING neutron.scheduler.dhcp_agent_scheduler [req-0ab42e81-31cc-4b45-ab28-20024b00377c 70bfd15a9f9a3f32a589a3cfcbcb78886d3be060e961dd38048eef87870202a9 01b728e67a5a4
d88b655e5b3bc232450 - 0ef23cf14b4a4cecade7d9f1cee1862b 0ef23cf14b4a4cecade7d9f1cee1862b] No more DHCP agents: NeutronDbObjectDuplicateEntry: Failed to create a duplicate IpamAllocation: for attr
ibute(s) ['PRIMARY'] with value(s) 10.4.0.235-b93a20e4-db6b-4b67-82ec-52ccb3d1eb89
2019-07-18 15:44:50.279 32 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [req-0ab42e81-31cc-4b45-ab28-20024b00377c 70bfd15a9f9a3f32a589a3cfcbcb78886d3be060e961dd38048eef87870202a9 01
b728e67a5a4d88b655e5b3bc232450 - 0ef23cf14b4a4cecade7d9f1cee1862b 0ef23cf14b4a4cecade7d9f1cee1862b] Unable to schedule network c3810a7c-f646-444a-a21e-4073d105c48d: no agents available; will ret
ry on subsequent port and subnet creation events.: NeutronDbObjectDuplicateEntry: Failed to create a duplicate IpamAllocation: for attribute(s) ['PRIMARY'] with value(s) 10.4.0.235-b93a20e4-db6b
-4b67-82ec-52ccb3d1eb89
2019-07-18 15:44:50.470 35 WARNING neutron.scheduler.dhcp_agent_scheduler [req-c906ea96-31cb-470a-b1e5-eec33f70fcfb 70bfd15a9f9a3f32a589a3cfcbcb78886d3be060e961dd38048eef87870202a9 01b728e67a5a4
d88b655e5b3bc232450 - 0ef23cf14b4a4cecade7d9f1cee1862b 0ef23cf14b4a4cecade7d9f1cee1862b] No more DHCP agents: NeutronDbObjectDuplicateEntry: Failed to create a duplicate IpamAllocation: for attr
ibute(s) ['PRIMARY'] with value(s) 10.4.1.1-b93a20e4-db6b-4b67-82ec-52ccb3d1eb89
2019-07-18 15:44:50.471 35 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [req-c906ea96-31cb-470a-b1e5-eec33f70fcfb 70bfd15a9f9a3f32a589a3cfcbcb78886d3be060e961dd38048eef87870202a9 01
b728e67a5a4d88b655e5b3bc232450 - 0ef23cf14b4a4cecade7d9f1cee1862b 0ef23cf14b4a4cecade7d9f1cee1862b] Unable to schedule network c3810a7c-f646-444a-a21e-4073d105c48d: no agents available; will ret
ry on subsequent port and subnet creation events.: NeutronDbObjectDuplicateEntry: Failed to create a duplicate IpamAllocation: for attribute(s) ['PRIMARY'] with value(s) 10.4.1.1-b93a20e4-db6b-4
b67-82ec-52ccb3d1eb89
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers [req-3938763b-d18c-4e47-8073-a4ea6ec8ba80 63abb58b0cfa46bca9f7c2e9772f7183 fdf586078d6f465695db34d793436147 - default default] Mechanism driver 'ovn' failed in update_port_postcommit: RowNotFound: Cannot find Logical_Switch_Port with name=9934adc9-eb68-4acd-9c63-20e9895bb842
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 427, in _call_on_drivers
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers     getattr(driver.obj, method_name)(context)
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python2.7/site-packages/networking_ovn/ml2/mech_driver.py", line 531, in update_port_postcommit
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers     self._ovn_client.update_port(port, port_object=original_port)
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python2.7/site-packages/networking_ovn/common/ovn_client.py", line 440, in update_port
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers     ovn_port = self._nb_idl.lookup('Logical_Switch_Port', port['id'])
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 103, in lookup
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers     return self._lookup(table, record)
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 143, in _lookup
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers     row = idlutils.row_by_value(self, rl.table, rl.column, record)
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers     raise RowNotFound(table=table, col=column, match=match)
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers RowNotFound: Cannot find Logical_Switch_Port with name=9934adc9-eb68-4acd-9c63-20e9895bb842
2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers 


Version-Release number of selected component (if applicable):
[root@overcloud-control-0 neutron]# docker ps | grep neutron
cc85a2702132        10.10.10.10:8787/rhosp13/openstack-neutron-server-ovn:13.0-76        "kolla_start"            3 days ago          Up 3 days (healthy)                         neutron_api
[root@overcloud-control-0 neutron]# docker ps | grep ovn
769390ad9d63        10.10.10.10:8787/rhosp13/openstack-ovn-controller:13.0-78            "kolla_start"            3 days ago          Up 3 days                                   ovn_controller
cc85a2702132        10.10.10.10:8787/rhosp13/openstack-neutron-server-ovn:13.0-76        "kolla_start"            3 days ago          Up 3 days (healthy)                         neutron_api
567fe6ecc578        10.10.10.10:8787/rhosp13/openstack-nova-novncproxy:13.0-85           "kolla_start"            3 days ago          Up 2 days (healthy)                         nova_vnc_proxy
0c5b579c00eb        10.10.10.10:8787/rhosp13/openstack-ovn-northd:pcmklatest             "/bin/bash /usr/lo..."   3 days ago          Up 3 days                                   ovn-dbs-bundle-docker-0


How reproducible:
Often

Steps to Reproduce:
1. Create a bunch of VMs until you hit this issue
2.
3.

Actual results:
All the VMs will fail allocating network once this point is reached

Expected results:
Never

Additional info:
If we delete VMs and retry later on, it works again.

Comment 1 David Hill 2019-07-18 16:49:16 UTC
[root@overcloud-control-0 neutron]# tail -F server.log | grep ERROR
2019-07-18 16:46:28.378 28 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:10.10.10.10:6642: no response to inactivity probe after 5 seconds, disconnecting: Exception: Could not retrieve schema from tcp:10.10.10.10:6642: Connection refused
2019-07-18 16:46:30.810 40 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:10.10.10.10:6642: no response to inactivity probe after 5 seconds, disconnecting: Exception: Could not retrieve schema from tcp:10.10.10.10:6642: Connection refused
2019-07-18 16:46:43.816 40 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:10.10.10.10:6642: no response to inactivity probe after 5 seconds, disconnecting: Exception: Could not retrieve schema from tcp:10.10.10.10:6642: Connection refused

Comment 3 David Hill 2019-07-18 17:12:08 UTC
2019-07-18 17:11:14.222 53 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"2e74e4b2-f0ab-493b-bcdc-aa54273374b1\") for index on column \"name\".  First row, with UUID f85f5e43-bada-41e8-b3e0-9ec6edc04689, was inserted by this transaction.  Second row, with UUID 65449685-0cad-45d3-9dcc-ad957615c606, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
2019-07-18 17:11:14.223 53 ERROR ovsdbapp.backend.ovs_idl.transaction [req-e16e3e31-96df-484f-897e-db6221b13867 - - - - -] Traceback (most recent call last):
    raise RuntimeError(msg)
RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"2e74e4b2-f0ab-493b-bcdc-aa54273374b1\") for index on column \"name\".  First row, with UUID f85f5e43-bada-41e8-b3e0-9ec6edc04689, was inserted by this transaction.  Second row, with UUID 65449685-0cad-45d3-9dcc-ad957615c606, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance [req-e16e3e31-96df-484f-897e-db6221b13867 - - - - -] Failed to fix resource 2e74e4b2-f0ab-493b-bcdc-aa54273374b1 (type: ports): RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"2e74e4b2-f0ab-493b-bcdc-aa54273374b1\") for index on column \"name\".  First row, with UUID f85f5e43-bada-41e8-b3e0-9ec6edc04689, was inserted by this transaction.  Second row, with UUID 65449685-0cad-45d3-9dcc-ad957615c606, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance Traceback (most recent call last):
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib/python2.7/site-packages/networking_ovn/common/maintenance.py", line 232, in check_for_inconsistencies
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     self._fix_create_update(row)
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib/python2.7/site-packages/networking_ovn/common/maintenance.py", line 160, in _fix_create_update
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     res_map['ovn_create'](n_obj)
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib/python2.7/site-packages/networking_ovn/common/ovn_client.py", line 350, in create_port
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     self.add_txns_to_sync_port_dns_records(txn, port)
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     self.gen.next()
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib/python2.7/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 139, in transaction
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     yield t
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     self.gen.next()
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 102, in transaction
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     del self._nested_txns_map[cur_thread_id]
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 59, in __exit__
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     self.result = self.commit()
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance     raise result.ex
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"2e74e4b2-f0ab-493b-bcdc-aa54273374b1\") for index on column \"name\".  First row, with UUID f85f5e43-bada-41e8-b3e0-9ec6edc04689, was inserted by this transaction.  Second row, with UUID 65449685-0cad-45d3-9dcc-ad957615c606, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance

Comment 4 David Hill 2019-07-18 19:09:07 UTC
It looks like setting this on all the nodes (controller + computes) solved the issue:

ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000

Comment 5 David Hill 2019-07-18 19:12:17 UTC
This issue is related to this one 

https://bugzilla.redhat.com/show_bug.cgi?id=1723463