1751709 – Error istance creation "Virtual Interface creation failed"

Bug 1751709 - Error istance creation "Virtual Interface creation failed"

Summary: Error istance creation "Virtual Interface creation failed"

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Artom Lifshitz
QA Contact:	OSP DFG:Compute
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-12 12:45 UTC by Shailesh Chhabdiya
Modified:	2023-09-18 00:17 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-10 00:16:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-23684	0	None	None	None	2023-03-24 15:28:04 UTC

Description Shailesh Chhabdiya 2019-09-12 12:45:15 UTC

Description of problem:

During the creation of some instance on compute node, some of this get the follow error message (after correct volume creation, during the spawn status):

Instance failed to spawn: VirtualInterfaceCreateException: Virtual Interface creation failed


Version-Release number of selected component (if applicable):
Red Hat Openstack 13

How reproducible:
Create instance around 13-15 via horizon/CLI leads to same error 

Steps to Reproduce:
run 
"openstack server create ... --max 15 --min 15 "

Actual results:
Should create all instance

Expected results:
Randomly some instance fail

Comment 4 Artom Lifshitz 2019-09-13 11:43:26 UTC

Could you turn on debug mode for both Nova and Neutron, reproduce the issue, and submit a new set of sosreports? I have a vague handle on what's going, but I need debug-level logs to fully understand.

Comment 9 Artom Lifshitz 2019-10-25 03:22:03 UTC

I have closed this bug as it has been waiting for more info for at least 4 weeks. We only do this to ensure that we don't accumulate stale bugs which can't be addressed. If you are able to provide the requested information (see comments #4, #7, and #8), please feel free to re-open this bug.

Comment 17 Artom Lifshitz 2019-11-29 14:17:55 UTC

We confirmed with the rest of the Compute DFG, there are only supposed to be 2 compute containers: nova_compute and nova_migration_target (which is just the SSH daemon for migrations). Having an extra nova-compute container would explain what we're seeing. Given that both containers are configured identically, they are both logging to the same file, and what we're seeing is their intermingled logs. Compute A prepares to wait for the external event, nova-api receives it and sends it to compute B. Compute B throws it away as unexpected because it never prepared for it, and then compute A times out waiting for the event that got sent to compute B.

Have the customer stop the extra nova_compute container and see if that fixes their issue (also check the other compute hosts for similar problems).

If that doesn't help, our next step would be to examine the contents of the containers, as we've noticed that they appear to have patched them:

$ grep -i vendor -A 2 docker/docker_inspect_a76abfd390b4
  "vendor": "Par-Tec",
  "version": "2.1"

This is probably OK if they added an out-of-tree scheduler filter for example, but it behoves to make sure that they're actually running OSP code, and not some random modifications that could introduce weird bugs.

Comment 18 Artom Lifshitz 2020-01-10 00:16:25 UTC

It's been more than a month since we requested confirmation that removing the extra nova_compute container solves the issue. While we didn't get said confirmation, we feel that it's a solid enough theory that can we close this BZ.

Comment 19 Andreas Karis 2020-01-20 12:43:01 UTC

Hi,

I can confirm: we ran this on all compute nodes and then the issue disappeared:
~~~
docker ps | grep nova_compute | awk '{print $NF}' | while read container ; do docker stop $container ; docker rm $container ; done
paunch debug --file /var/lib/tripleo-config/docker-container-startup-config-step_4.json --container nova_compute --action run
docker ps | grep nova_compute
~~~

- Andreas

Comment 21 Red Hat Bugzilla 2023-09-18 00:17:22 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.