Description of problem: Ater creating a bunch of VMs we get to a point where no VMs can be created due to VirtualInterfaceCreateException: Virtual Interface creation failed in nova-compute . In the neutron logs, we can see the following errors: 2019-07-18 15:44:50.277 32 WARNING neutron.scheduler.dhcp_agent_scheduler [req-0ab42e81-31cc-4b45-ab28-20024b00377c 70bfd15a9f9a3f32a589a3cfcbcb78886d3be060e961dd38048eef87870202a9 01b728e67a5a4 d88b655e5b3bc232450 - 0ef23cf14b4a4cecade7d9f1cee1862b 0ef23cf14b4a4cecade7d9f1cee1862b] No more DHCP agents: NeutronDbObjectDuplicateEntry: Failed to create a duplicate IpamAllocation: for attr ibute(s) ['PRIMARY'] with value(s) 10.4.0.235-b93a20e4-db6b-4b67-82ec-52ccb3d1eb89 2019-07-18 15:44:50.279 32 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [req-0ab42e81-31cc-4b45-ab28-20024b00377c 70bfd15a9f9a3f32a589a3cfcbcb78886d3be060e961dd38048eef87870202a9 01 b728e67a5a4d88b655e5b3bc232450 - 0ef23cf14b4a4cecade7d9f1cee1862b 0ef23cf14b4a4cecade7d9f1cee1862b] Unable to schedule network c3810a7c-f646-444a-a21e-4073d105c48d: no agents available; will ret ry on subsequent port and subnet creation events.: NeutronDbObjectDuplicateEntry: Failed to create a duplicate IpamAllocation: for attribute(s) ['PRIMARY'] with value(s) 10.4.0.235-b93a20e4-db6b -4b67-82ec-52ccb3d1eb89 2019-07-18 15:44:50.470 35 WARNING neutron.scheduler.dhcp_agent_scheduler [req-c906ea96-31cb-470a-b1e5-eec33f70fcfb 70bfd15a9f9a3f32a589a3cfcbcb78886d3be060e961dd38048eef87870202a9 01b728e67a5a4 d88b655e5b3bc232450 - 0ef23cf14b4a4cecade7d9f1cee1862b 0ef23cf14b4a4cecade7d9f1cee1862b] No more DHCP agents: NeutronDbObjectDuplicateEntry: Failed to create a duplicate IpamAllocation: for attr ibute(s) ['PRIMARY'] with value(s) 10.4.1.1-b93a20e4-db6b-4b67-82ec-52ccb3d1eb89 2019-07-18 15:44:50.471 35 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [req-c906ea96-31cb-470a-b1e5-eec33f70fcfb 70bfd15a9f9a3f32a589a3cfcbcb78886d3be060e961dd38048eef87870202a9 01 b728e67a5a4d88b655e5b3bc232450 - 0ef23cf14b4a4cecade7d9f1cee1862b 0ef23cf14b4a4cecade7d9f1cee1862b] Unable to schedule network c3810a7c-f646-444a-a21e-4073d105c48d: no agents available; will ret ry on subsequent port and subnet creation events.: NeutronDbObjectDuplicateEntry: Failed to create a duplicate IpamAllocation: for attribute(s) ['PRIMARY'] with value(s) 10.4.1.1-b93a20e4-db6b-4 b67-82ec-52ccb3d1eb89 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers [req-3938763b-d18c-4e47-8073-a4ea6ec8ba80 63abb58b0cfa46bca9f7c2e9772f7183 fdf586078d6f465695db34d793436147 - default default] Mechanism driver 'ovn' failed in update_port_postcommit: RowNotFound: Cannot find Logical_Switch_Port with name=9934adc9-eb68-4acd-9c63-20e9895bb842 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 427, in _call_on_drivers 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/networking_ovn/ml2/mech_driver.py", line 531, in update_port_postcommit 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers self._ovn_client.update_port(port, port_object=original_port) 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/networking_ovn/common/ovn_client.py", line 440, in update_port 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers ovn_port = self._nb_idl.lookup('Logical_Switch_Port', port['id']) 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 103, in lookup 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers return self._lookup(table, record) 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 143, in _lookup 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers row = idlutils.row_by_value(self, rl.table, rl.column, record) 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers raise RowNotFound(table=table, col=column, match=match) 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers RowNotFound: Cannot find Logical_Switch_Port with name=9934adc9-eb68-4acd-9c63-20e9895bb842 2019-07-18 15:44:50.671 36 ERROR neutron.plugins.ml2.managers Version-Release number of selected component (if applicable): [root@overcloud-control-0 neutron]# docker ps | grep neutron cc85a2702132 10.10.10.10:8787/rhosp13/openstack-neutron-server-ovn:13.0-76 "kolla_start" 3 days ago Up 3 days (healthy) neutron_api [root@overcloud-control-0 neutron]# docker ps | grep ovn 769390ad9d63 10.10.10.10:8787/rhosp13/openstack-ovn-controller:13.0-78 "kolla_start" 3 days ago Up 3 days ovn_controller cc85a2702132 10.10.10.10:8787/rhosp13/openstack-neutron-server-ovn:13.0-76 "kolla_start" 3 days ago Up 3 days (healthy) neutron_api 567fe6ecc578 10.10.10.10:8787/rhosp13/openstack-nova-novncproxy:13.0-85 "kolla_start" 3 days ago Up 2 days (healthy) nova_vnc_proxy 0c5b579c00eb 10.10.10.10:8787/rhosp13/openstack-ovn-northd:pcmklatest "/bin/bash /usr/lo..." 3 days ago Up 3 days ovn-dbs-bundle-docker-0 How reproducible: Often Steps to Reproduce: 1. Create a bunch of VMs until you hit this issue 2. 3. Actual results: All the VMs will fail allocating network once this point is reached Expected results: Never Additional info: If we delete VMs and retry later on, it works again.
[root@overcloud-control-0 neutron]# tail -F server.log | grep ERROR 2019-07-18 16:46:28.378 28 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:10.10.10.10:6642: no response to inactivity probe after 5 seconds, disconnecting: Exception: Could not retrieve schema from tcp:10.10.10.10:6642: Connection refused 2019-07-18 16:46:30.810 40 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:10.10.10.10:6642: no response to inactivity probe after 5 seconds, disconnecting: Exception: Could not retrieve schema from tcp:10.10.10.10:6642: Connection refused 2019-07-18 16:46:43.816 40 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:10.10.10.10:6642: no response to inactivity probe after 5 seconds, disconnecting: Exception: Could not retrieve schema from tcp:10.10.10.10:6642: Connection refused
2019-07-18 17:11:14.222 53 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"2e74e4b2-f0ab-493b-bcdc-aa54273374b1\") for index on column \"name\". First row, with UUID f85f5e43-bada-41e8-b3e0-9ec6edc04689, was inserted by this transaction. Second row, with UUID 65449685-0cad-45d3-9dcc-ad957615c606, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} 2019-07-18 17:11:14.223 53 ERROR ovsdbapp.backend.ovs_idl.transaction [req-e16e3e31-96df-484f-897e-db6221b13867 - - - - -] Traceback (most recent call last): raise RuntimeError(msg) RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"2e74e4b2-f0ab-493b-bcdc-aa54273374b1\") for index on column \"name\". First row, with UUID f85f5e43-bada-41e8-b3e0-9ec6edc04689, was inserted by this transaction. Second row, with UUID 65449685-0cad-45d3-9dcc-ad957615c606, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance [req-e16e3e31-96df-484f-897e-db6221b13867 - - - - -] Failed to fix resource 2e74e4b2-f0ab-493b-bcdc-aa54273374b1 (type: ports): RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"2e74e4b2-f0ab-493b-bcdc-aa54273374b1\") for index on column \"name\". First row, with UUID f85f5e43-bada-41e8-b3e0-9ec6edc04689, was inserted by this transaction. Second row, with UUID 65449685-0cad-45d3-9dcc-ad957615c606, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance Traceback (most recent call last): 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/networking_ovn/common/maintenance.py", line 232, in check_for_inconsistencies 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance self._fix_create_update(row) 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/networking_ovn/common/maintenance.py", line 160, in _fix_create_update 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance res_map['ovn_create'](n_obj) 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/networking_ovn/common/ovn_client.py", line 350, in create_port 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance self.add_txns_to_sync_port_dns_records(txn, port) 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__ 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance self.gen.next() 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 139, in transaction 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance yield t 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__ 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance self.gen.next() 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 102, in transaction 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance del self._nested_txns_map[cur_thread_id] 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 59, in __exit__ 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance self.result = self.commit() 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance raise result.ex 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"2e74e4b2-f0ab-493b-bcdc-aa54273374b1\") for index on column \"name\". First row, with UUID f85f5e43-bada-41e8-b3e0-9ec6edc04689, was inserted by this transaction. Second row, with UUID 65449685-0cad-45d3-9dcc-ad957615c606, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} 2019-07-18 17:11:14.223 53 ERROR networking_ovn.common.maintenance
It looks like setting this on all the nodes (controller + computes) solved the issue: ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000
This issue is related to this one https://bugzilla.redhat.com/show_bug.cgi?id=1723463