Bug 1323487 - Instances are getting in to error state due to vif_plugin timeout when trying to evacuate 5 or more instances.
Summary: Instances are getting in to error state due to vif_plugin timeout when trying...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 8.0 (Liberty)
Assignee: Eoghan Glynn
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-03 14:46 UTC by Udi Shkalim
Modified: 2019-09-09 14:56 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-05 11:02:12 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Udi Shkalim 2016-04-03 14:46:15 UTC
Description of problem:
Setup - 3 controllers 2 compute 1 ceph
During Instance-HA testing in openstack-director I was trying to evacuate 5  instances from a compute node when only 3 out of the five evacuated successfully and the remaining 2 got in to error state:

2016-03-31 16:46:08.030 3690 WARNING nova.virt.libvirt.driver [req-3ebcaf2e-a9b4-4a5a-b53e-94d0704bcbce 3ed9877c4ca94cffa4b3e9b8c6944e65 0452451866d6412fbe39d928bba159a6 - - -] [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db] Timeout waiting
 for vif plugging callback for instance 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db
2016-03-31 16:46:09.036 3690 WARNING nova.virt.libvirt.driver [req-91fe5184-90b0-4c98-832f-561cec640a55 3ed9877c4ca94cffa4b3e9b8c6944e65 0452451866d6412fbe39d928bba159a6 - - -] [instance: 1c63770f-beee-431c-808e-fa167cb12922] Timeout waiting
 for vif plugging callback for instance 1c63770f-beee-431c-808e-fa167cb12922
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [req-3ebcaf2e-a9b4-4a5a-b53e-94d0704bcbce 3ed9877c4ca94cffa4b3e9b8c6944e65 0452451866d6412fbe39d928bba159a6 - - -] [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db] Setting instance vm_s
tate to ERROR
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db] Traceback (most recent call last):
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6351, in _error_out_instance_on_exception
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]     yield
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2700, in rebuild_instance
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]     bdms, recreate, on_shared_storage, preserve_ephemeral)
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2739, in _do_rebuild_instance_with_claim
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]     self._do_rebuild_instance(*args, **kwargs)
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2847, in _do_rebuild_instance
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]     self._rebuild_default_impl(**kwargs)
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2620, in _rebuild_default_impl
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]     block_device_info=new_block_device_info)
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2539, in spawn
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]     block_device_info=block_device_info)
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4675, in _create_domain_and_network
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]     raise exception.VirtualInterfaceCreateException()
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db] VirtualInterfaceCreateException: Virtual Interface creation failed
2016-03-31 16:46:09.407 3690 ERROR nova.compute.manager [instance: 0b70fff2-a3c7-4eaf-b02d-5c87cb3d57db]
2016-03-31 16:46:10.260 3690 ERROR oslo_messaging.rpc.dispatcher [req-3ebcaf2e-a9b4-4a5a-b53e-94d0704bcbce 3ed9877c4ca94cffa4b3e9b8c6944e65 0452451866d6412fbe39d928bba159a6 - - -] Exception during message handling: Virtual Interface creation failed



Version-Release number of selected component (if applicable):
Director:
openstack-puppet-modules-7.0.17-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.12-2.el7ost.noarch

Nova:
openstack-nova-common-12.0.2-2.el7ost.noarch
openstack-nova-novncproxy-12.0.2-2.el7ost.noarch
python-novaclient-3.1.0-2.el7ost.noarch
openstack-nova-compute-12.0.2-2.el7ost.noarch
openstack-nova-console-12.0.2-2.el7ost.noarch
openstack-nova-api-12.0.2-2.el7ost.noarch
python-nova-12.0.2-2.el7ost.noarch
openstack-nova-cert-12.0.2-2.el7ost.noarch
openstack-nova-scheduler-12.0.2-2.el7ost.noarch
openstack-nova-conductor-12.0.2-2.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Setup Instance-HA on ospd deployment with shared storage
2. Boot 10 instances
3. hard reboot one of the compute nodes
4. make sure some of the instances passed and some got in to error state

Actual results:
Some of the instances are getting in to error state

Expected results:
All instances evacuate successfully

Additional info:
From nova.conf

# Fail instance boot if vif plugging fails (boolean value)
#vif_plugging_is_fatal=true
vif_plugging_is_fatal=True

# Number of seconds to wait for neutron vif plugging events to arrive before
# continuing or failing (see vif_plugging_is_fatal). If this is set to zero and
# vif_plugging_is_fatal is False, events should not be expected to arrive at
# all. (integer value)
#vif_plugging_timeout=300
vif_plugging_timeout=300

Comment 2 Udi Shkalim 2016-04-05 11:02:12 UTC
Issue was seen once and can not be reproduced since.

Comment 3 zhougc 2016-07-05 07:37:13 UTC
Hi,I have the same environment and configuration.the Issue can be reporduced 100%,I want to know the bug described above did not occur in your environment.if not,How to solve the problem.Thank you

Comment 4 Udi Shkalim 2016-07-05 09:10:21 UTC
After deleting all the instances and creating new instances the issue reported above has not been seen, even after multiple attempts to reproduce.


Note You need to log in before you can comment on or make changes to this bug.