Description of problem: During the creation of some instance on compute node, some of this get the follow error message (after correct volume creation, during the spawn status): Instance failed to spawn: VirtualInterfaceCreateException: Virtual Interface creation failed Version-Release number of selected component (if applicable): Red Hat Openstack 13 How reproducible: Create instance around 13-15 via horizon/CLI leads to same error Steps to Reproduce: run "openstack server create ... --max 15 --min 15 " Actual results: Should create all instance Expected results: Randomly some instance fail
Could you turn on debug mode for both Nova and Neutron, reproduce the issue, and submit a new set of sosreports? I have a vague handle on what's going, but I need debug-level logs to fully understand.
I have closed this bug as it has been waiting for more info for at least 4 weeks. We only do this to ensure that we don't accumulate stale bugs which can't be addressed. If you are able to provide the requested information (see comments #4, #7, and #8), please feel free to re-open this bug.
We confirmed with the rest of the Compute DFG, there are only supposed to be 2 compute containers: nova_compute and nova_migration_target (which is just the SSH daemon for migrations). Having an extra nova-compute container would explain what we're seeing. Given that both containers are configured identically, they are both logging to the same file, and what we're seeing is their intermingled logs. Compute A prepares to wait for the external event, nova-api receives it and sends it to compute B. Compute B throws it away as unexpected because it never prepared for it, and then compute A times out waiting for the event that got sent to compute B. Have the customer stop the extra nova_compute container and see if that fixes their issue (also check the other compute hosts for similar problems). If that doesn't help, our next step would be to examine the contents of the containers, as we've noticed that they appear to have patched them: $ grep -i vendor -A 2 docker/docker_inspect_a76abfd390b4 "vendor": "Par-Tec", "version": "2.1" This is probably OK if they added an out-of-tree scheduler filter for example, but it behoves to make sure that they're actually running OSP code, and not some random modifications that could introduce weird bugs.
It's been more than a month since we requested confirmation that removing the extra nova_compute container solves the issue. While we didn't get said confirmation, we feel that it's a solid enough theory that can we close this BZ.
Hi, I can confirm: we ran this on all compute nodes and then the issue disappeared: ~~~ docker ps | grep nova_compute | awk '{print $NF}' | while read container ; do docker stop $container ; docker rm $container ; done paunch debug --file /var/lib/tripleo-config/docker-container-startup-config-step_4.json --container nova_compute --action run docker ps | grep nova_compute ~~~ - Andreas
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days