Bug 1399227

Summary: VM in error state after evacuation
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: openstack-novaAssignee: Sahid Ferdjaoui <sferdjao>
Status: CLOSED WONTFIX QA Contact: Prasanth Anbalagan <panbalag>
Severity: high Docs Contact:
Priority: high    
Version: 6.0 (Juno)CC: awaugama, berrange, dasmith, dmaley, eglynn, jmelvin, jthomas, kchamart, mburns, sbauza, sferdjao, sgordon, sknauss, srevivo, vromanso
Target Milestone: asyncKeywords: Unconfirmed
Target Release: 6.0 (Juno)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1406345 (view as bug list) Environment:
Last Closed: 2017-01-24 11:08:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1406345    

Description Jeremy 2016-11-28 15:14:24 UTC
Description of problem:
During the failure of a compute node, we initiate the evacuation for the virtual machines that are configured to be evacuated. 
In some cases when using ceph as shared storage the VM ends up in error state after evacuation as a result of a compute blade is pulled out. By checking the nova compute logs, the failure happens due to getting ImageBusy exception as can be seen below:


2016-11-23 10:18:27.916 6039 TRACE nova.compute.manager [instance: e31d20a3-f807-48bd-a0dc-f134af5cec3f] ImageBusy: error removing image
2016-11-23 10:18:27.916 6039 TRACE nova.compute.manager [instance: e31d20a3-f807-48bd-a0dc-f134af5cec3f]
2016-11-23 10:18:28.276 6039 ERROR oslo_messaging.rpc.dispatcher [req-90df906c-1b53-436d-9ce2-482df6ffa8e0 faba8d87dd324fdeb8a2d593d177c8a0 7b2adad362354277a8bbbcdbc7ef9809 - - -] Exception during message handling: error removing image
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6911, in rebuild_instance
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     preserve_ephemeral=preserve_ephemeral)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 461, in decorated_function
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     return func(*args, **kwargs)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     payload)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     return f(self, context, *args, **kw)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 341, in decorated_function
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     LOG.warning(msg, e, instance_uuid=instance_uuid)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)

2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 312, in decorated_function
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 391, in decorated_function
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 369, in decorated_function
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     kwargs['instance'], e, sys.exc_info())
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 357, in decorated_function
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3150, in rebuild_instance
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     self._rebuild_default_impl(**kwargs)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2995, in _rebuild_default_impl
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     block_device_info=new_block_device_info)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2585, in spawn
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     admin_pass=admin_password)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2991, in _create_image
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     instance, configdrive_path, 'disk.config' + suffix)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 816, in import_file
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     self.driver.remove_image(name)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/rbd_utils.py", line 276, in remove_image
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     rbd.RBD().remove(client.ioctx, name)
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/rbd.py", line 303, in remove
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher     raise make_ex(ret, 'error removing image')
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher ImageBusy: error removing image
2016-11-23 10:18:28.276 6039 TRACE oslo_messaging.rpc.dispatcher


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. evacuate instance from compute node
2.
3.

Actual results:
instance in error state

Expected results:
instnace not in error state

Additional info:
Looks similar to the following but different versions:
https://bugzilla.redhat.com/show_bug.cgi?id=1241613
https://bugzilla.redhat.com/show_bug.cgi?id=1287696

Comment 4 Eoghan Glynn 2016-11-30 13:55:52 UTC
Requires backport of https://review.openstack.org/169446 which is in 2014.2.4 upstream but not 2014.2.3 (on which latest OSP7 is based).

Comment 7 Jeremy 2016-11-30 14:48:26 UTC
Hello,
The customer is on OSP6.

Comment 8 GE Scott Knauss 2016-11-30 14:54:38 UTC
Right. OSP6. They also tested in OSP7 and saw the same problem. I believe some of what they uploaded (sosreports) were from the OSP7 environment, which caused the confusion. Their primary concern at this time though is OSP6. 

-Scott