Bug 1393093 - Race condition in nova compute during snapshot
Summary: Race condition in nova compute during snapshot
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 8.0 (Liberty)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Eoghan Glynn
QA Contact: Prasanth Anbalagan
URL:
Whiteboard: hot
Depends On:
Blocks: 1194008 1295530
TreeView+ depends on / blocked
 
Reported: 2016-11-08 20:41 UTC by Srinivas
Modified: 2020-12-21 19:37 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-05 18:55:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
Nova compute log from hypervisor that has the instnace (591.08 KB, application/x-gzip)
2016-11-15 15:26 UTC, Srinivas
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1639914 0 None None None 2016-11-08 20:41:34 UTC

Description Srinivas 2016-11-08 20:41:34 UTC
Description of problem:
When snapshot is created from an instance and immediately deleting the instance seems to cause race condition. I was able to re-create it on liberty

Version-Release number of selected component (if applicable):
openstack-nova-common-12.0.2-5_v12.0.6_fusion.noarch
python-nova-12.0.2-5_v12.0.6_fusion.noarch
openstack-nova-compute-12.0.2-5_v12.0.6_fusion.noarch


How reproducible:
1. nova boot --flavor m1.large --image 6d4259ce-5873-42cb-8cbe-9873f069c149 testinstance

id | bef22f9b-ade4-48a1-86c4-b9a007897eb3

2. nova image-create bef22f9b-ade4-48a1-86c4-b9a007897eb3 testinstance-snap ; nova delete bef22f9b-ade4-48a1-86c4-b9a007897eb3
Request to delete server bef22f9b-ade4-48a1-86c4-b9a007897eb3 has been accepted.



Actual results:
 nova image-list doesn't show the snapshot

 nova list doesn't show the instance

Expected results:
Snap shot to exist for the instance that is just deleted

Additional info:
Nova compute log indicates a race condition while executing CLI commands in 2 above

<182>1 2016-10-28T14:46:41.830208+00:00 hyper1 nova-compute 30056 - [40521 levelname="INFO" component="nova-compute" funcname="nova.compute.manager" request_id="req-e9e4e899-e2a7-4bf8-bdf1-c26f5634cfda" user="51fa0172fbdf495e89132f7f4574e750" tenant="00ead348c5f9475f8940ab29cd767c5e" instance="[instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] " lineno="/usr/lib/python2.7/site-packages/nova/compute/manager.py:2249"] nova.compute.manager Terminating instance
<183>1 2016-10-28T14:46:42.057653+00:00 hyper1 nova-compute 30056 - [40521 levelname="DEBUG" component="nova-compute" funcname="nova.compute.manager" request_id="req-1c4cf749-a6a8-46af-b331-f70dc1e9f364" user="51fa0172fbdf495e89132f7f4574e750" tenant="00ead348c5f9475f8940ab29cd767c5e" instance="[instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] " lineno="/usr/lib/python2.7/site-packages/nova/compute/manager.py:420"] nova.compute.manager Cleaning up image ae9ebf4b-7dd6-4615-816f-c2f3c7c08530 decorated_function /usr/lib/python2.7/site-packages/nova/compute/manager.py:420
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] Traceback (most recent call last):
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 416, in decorated_function
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] *args, **kwargs)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3038, in snapshot_instance
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] task_states.IMAGE_SNAPSHOT)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3068, in _snapshot_instance
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] update_task_state)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1447, in snapshot
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] guest.save_memory_state()
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 363, in save_memory_state
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] self._domain.managedSave(0)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] result = proxy_call(self._autowrap, f, *args, **kwargs)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] rv = execute(f, *args, **kwargs)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] six.reraise(c, e, tb)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] rv = meth(*args, **kwargs)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1397, in managedSave
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] if ret == -1: raise libvirtError ('virDomainManagedSave() failed', dom=self)
!!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] libvirtError: operation failed: domain is no longer running

Comment 1 Kashyap Chamarthy 2016-11-09 11:25:14 UTC
(In reply to Srinivas from comment #0)
> Description of problem:
> When snapshot is created from an instance and immediately deleting the
> instance seems to cause race condition. I was able to re-create it on liberty
> 
> Version-Release number of selected component (if applicable):
> openstack-nova-common-12.0.2-5_v12.0.6_fusion.noarch
> python-nova-12.0.2-5_v12.0.6_fusion.noarch
> openstack-nova-compute-12.0.2-5_v12.0.6_fusion.noarch

This is not exactly Liberty.  Nor RHOS, because the packages aren't built by Red Hat.

Are you able to reproduce this with an official RHOS version?

[...]

Comment 2 Dan Smith 2016-11-10 16:41:49 UTC
Yeah, I think Kashyap is right. Can you reproduce this on OSP8?

Comment 3 Srinivas 2016-11-15 15:19:44 UTC
I can recreate this issue with OSP8 bits too

bash-4.2# rpm -qa | grep nova
python-nova-12.0.4-16.el7ost.noarch
openstack-nova-common-12.0.4-16.el7ost.noarch
python-novaclient-3.1.0-2.el7ost.noarch
openstack-nova-compute-12.0.4-16.el7ost.noarch

bash-4.2# inst=d971abf5-9620-402c-b572-d7ece69da5e2
bash-4.2# nova image-create $inst test-instance-snap ; nova delete $inst
Request to delete server d971abf5-9620-402c-b572-d7ece69da5e2 has been accepted.

bash-4.2# nova image-list
+--------------------------------------+--------------------------------------------------+--------+--------+
| ID                                   | Name                                             | Status | Server |
+--------------------------------------+--------------------------------------------------+--------+--------+
| 1f459482-8ce2-4eae-8476-ecb605523cb8 | VCO_cisco_metapod_validation_image_DO_NOT_DELETE | ACTIVE |        |
| fd1ee7d4-d11f-4818-8ac1-87a64badf69d | cirros-0.3.4-x86_64-aki                          | ACTIVE |        |
| a59c3718-9838-4306-8af1-43fdabf5d040 | cirros-0.3.4-x86_64-ami                          | ACTIVE |        |
| 6e4f3b9b-d266-40b8-9587-63e8f301283b | cirros-0.3.4-x86_64-ari                          | ACTIVE |        |
| 60692d25-5e23-405a-8e8b-5b23b4a9d8e8 | cirros-0.3.4-x86_64-raw                          | ACTIVE |        |
| 652b6c55-f591-4e9b-b09f-19ba1beebf21 | mc-ubuntu-img                                    | ACTIVE |        |
+--------------------------------------+--------------------------------------------------+--------+--------+
bash-4.2# nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

Comment 4 Srinivas 2016-11-15 15:26:13 UTC
Created attachment 1220862 [details]
Nova compute log from hypervisor that has the instnace

Comment 6 Artom Lifshitz 2018-01-05 18:55:02 UTC
Hello,

The corresponding bug has been closed WONTFIX upstream:

> The compute API never blocks a delete request, unless the server is locked. So
> by design you can attempt to delete a server in any case where it's unlocked
> (if you're an admin you can bypass the locked state too). So we aren't going to
> put a conditional on the delete API such that you can't delete the server while
> it's being snapshot.
>
> [We're] not sure what you're looking for as far as a bug or fix. [...] The
> compute manager will cleanup the snapshot image in glance if the server was
> deleted during the snapshot.


Note You need to log in before you can comment on or make changes to this bug.