1578844 – Transient virtual interface creation failures in dynamic environment

Bug 1578844 - Transient virtual interface creation failures in dynamic environment

Summary: Transient virtual interface creation failures in dynamic environment

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-neutron
Sub Component:
Version:	10.0 (Newton)
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	10.0 (Newton)
Assignee:	Brian Haley
QA Contact:	Candido Campos
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-16 13:28 UTC by Sergii Mykhailushko
Modified:	2022-07-09 10:25 UTC (History)
CC List:	11 users (show)
Fixed In Version:	openstack-neutron-9.4.1-23
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-04-30 16:58:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-13577	None	None	None	2022-03-13 15:03:07 UTC
Red Hat Knowledge Base (Solution)	3540601	None	None	None	2018-07-23 10:10:09 UTC
Red Hat Product Errata	RHSA-2019:0916	None	None	None	2019-04-30 16:58:23 UTC

Description Sergii Mykhailushko 2018-05-16 13:28:41 UTC

Description of problem:

On large hypervisors (3 TB RAM) and in very dynamic environments (instances are created and deleted continiously in fast pace) creation of an instance sometimes fails with "VirtualInterfaceCreateException: Virtual Interface creation failed". 

An error occurs for a large number of instances and lasts for a while, but then disappears and the instances are created without errors.

The following is in nova log:

~~~
ERROR nova.compute.manager [instance: <uuid>] 
ERROR nova.compute.manager [req-<uuid> - - -] [instance: <uuid>] Failed to allocate network(s)
ERROR nova.compute.manager [instance: <uuid>] Traceback (most recent call last):
ERROR nova.compute.manager [instance: <uuid>]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1928, in _build_and_run_instance
ERROR nova.compute.manager [instance: <uuid>]     block_device_info=block_device_info)
ERROR nova.compute.manager [instance: <uuid>]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2674, in spawn
ERROR nova.compute.manager [instance: <uuid>]     destroy_disks_on_failure=True)
ERROR nova.compute.manager [instance: <uuid>]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5005, in _create_domain_and_network
ERROR nova.compute.manager [instance: <uuid>]     raise exception.VirtualInterfaceCreateException()
ERROR nova.compute.manager [instance: <uuid>] VirtualInterfaceCreateException: Virtual Interface creation failed
ERROR nova.compute.manager [instance: <uuid>] 
ERROR nova.compute.manager [req-<uuid> - - -] [instance: <uuid>] Build of instance <uuid> aborted: Failed to allocate the network(s), not rescheduling.
ERROR nova.compute.manager [instance: <uuid>] Traceback (most recent call last):
ERROR nova.compute.manager [instance: <uuid>]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1787, in _do_build_and_run_instance 
ERROR nova.compute.manager [instance: <uuid>]     filter_properties)
ERROR nova.compute.manager [instance: <uuid>]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1968, in _build_and_run_instance
ERROR nova.compute.manager [instance: <uuid>]     reason=msg)
ERROR nova.compute.manager [instance: <uuid>] BuildAbortException: Build of instance <uuid> aborted: Failed to allocate the network(s), not rescheduling.
~~~

Overall symptoms look like main reason for rpc_loop's long run is system starving for resources (instances vs host) and hitting timeout (long port deletions):

~~~
Loop iteration exceeded interval (2 vs. 4306.8187561)!
~~~

Workaround for this would be changing CPU allocation policy on hypervisors to not use 'isolcpus' option, after which VirtualInterfaceCreateException errors are gone:

https://access.redhat.com/solutions/2884991





Version-Release number of selected component (if applicable):

- RHOSP 10
- openstack-neutron-common-9.4.1-5.el7ost.noarch
- openstack-neutron-openvswitch-9.4.1-5.el7ost.noarch
- python-neutron-9.4.1-5.el7ost.noarch
- python-neutron-lib-0.4.0-1.el7ost.noarch
- python-neutronclient-6.0.1-1.el7ost.noarch

Comment 19 Brian Haley 2018-08-06 13:52:27 UTC

Just wanted to make sure needinfo was set so this is visible to Pablo.

Comment 26 Brian Haley 2019-03-18 15:57:08 UTC

Adding fixed-in version as this issue should already be fixed.

Comment 28 Lon Hohberger 2019-03-19 10:34:29 UTC

According to our records, this should be resolved by openstack-neutron-9.4.1-32.el7ost.  This build is available now.

Comment 33 errata-xmlrpc 2019-04-30 16:58:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0916

Note You need to log in before you can comment on or make changes to this bug.