Bug 974057 - openatck-nova: snapshots are stuck in saving status forever when trying to create several snapshots on different instances
openatck-nova: snapshots are stuck in saving status forever when trying to cr...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
unspecified
x86_64 Linux
unspecified Severity high
: beta
: 4.0
Assigned To: Xavier Queralt
Haim
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-13 07:10 EDT by Dafna Ron
Modified: 2014-01-12 19:57 EST (History)
8 users (show)

See Also:
Fixed In Version: openstack-nova-2013.2-0.21.b3.el6ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-19 19:05:58 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (1.71 MB, application/x-gzip)
2013-06-13 07:10 EDT, Dafna Ron
no flags Details

  None (edit)
Description Dafna Ron 2013-06-13 07:10:30 EDT
Created attachment 760604 [details]
logs

Description of problem:

I created and image and launched 10 instances from it on 2 different hosts. 
after the instances were running I tried creating a snapshot from each of the instances. 

compute log showed some time-out errors and 'model server went away' erros and some of the snapshots appear in saving status forever.

--please note that some of the instances changed status to shut off but some of the snapshots in saving are on active instances and not the shutoff ones.  

Version-Release number of selected component (if applicable):

openstack-nova-compute-2013.1.1-4.el6ost.noarch
openstack-nova-api-2013.1.1-4.el6ost.noarch
libvirt-0.10.2-18.el6_4.5.x86_64
qemu-img-rhev-0.12.1.2-2.355.el6_4.4.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.4.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create an image and launch 10 instances from it on two different hosts
2. once the instances are active, create a snapshot for each of the instances.
3.

Actual results:

after time-out errors in the compute log, the snapshots appear in status "saving" forever.

Expected results:

if there is a failure in the create of the snapshots we should rollback or change report an error.  

Additional info:

[root@opens-vdsb ~(keystone_admin)]# nova list
+--------------------------------------+-------------------------------------------+---------+---------------------------+
| ID                                   | Name                                      | Status  | Networks                  |
+--------------------------------------+-------------------------------------------+---------+---------------------------+
| 1b1eb170-3bdb-496e-bcaa-2a3cd5078e06 | HAHA-1b1eb170-3bdb-496e-bcaa-2a3cd5078e06 | ACTIVE  | novanetwork=192.168.32.19 |
| 2e19d87f-44ac-4525-bccd-41f2d0350a92 | HAHA-2e19d87f-44ac-4525-bccd-41f2d0350a92 | SHUTOFF | novanetwork=192.168.32.17 |
| 7f4df8e2-77fc-495e-acea-467b039e1ccd | HAHA-7f4df8e2-77fc-495e-acea-467b039e1ccd | SHUTOFF | novanetwork=192.168.32.3  |
| 80fbf669-e8c5-422a-b2ff-90c34e09d8ec | HAHA-80fbf669-e8c5-422a-b2ff-90c34e09d8ec | ACTIVE  | novanetwork=192.168.32.4  |
| 9f0cb489-56d3-4258-93e6-0c927cf70352 | HAHA-9f0cb489-56d3-4258-93e6-0c927cf70352 | ACTIVE  | novanetwork=192.168.32.2  |
| a512a696-13f3-4e48-8154-d8de3ad35af4 | HAHA-a512a696-13f3-4e48-8154-d8de3ad35af4 | SHUTOFF | novanetwork=192.168.32.16 |
| d965ae10-6175-4c57-93f9-47b57c7a2907 | HAHA-d965ae10-6175-4c57-93f9-47b57c7a2907 | ACTIVE  | novanetwork=192.168.32.14 |
| da936d2d-0273-4f47-b6cb-24ea24577317 | HAHA-da936d2d-0273-4f47-b6cb-24ea24577317 | ACTIVE  | novanetwork=192.168.32.15 |
| fa7ea571-a0c1-4cef-9788-ef1e0cf1bf8d | HAHA-fa7ea571-a0c1-4cef-9788-ef1e0cf1bf8d | SHUTOFF | novanetwork=192.168.32.18 |
| 12893599-1418-4cf0-b74d-5f2200418a74 | haha10                                    | ACTIVE  | novanetwork=192.168.32.5  |
+--------------------------------------+-------------------------------------------+---------+---------------------------+
[root@opens-vdsb ~(keystone_admin)]# nova image-list
+--------------------------------------+--------+--------+--------------------------------------+
| ID                                   | Name   | Status | Server                               |
+--------------------------------------+--------+--------+--------------------------------------+
| b083c449-ce01-4831-bb0f-4c212b7c2c5f | dafna  | ACTIVE |                                      |
| 50ad0eb6-662b-4aea-933d-cbc2d45ab32f | snap1  | ACTIVE | 1b1eb170-3bdb-496e-bcaa-2a3cd5078e06 |
| 97167459-b609-4ae2-8afe-ca2649ae38ce | snap10 | SAVING | 12893599-1418-4cf0-b74d-5f2200418a74 |
| 9b388eea-ca2d-4d8c-a260-95e5cd8f5a4e | snap2  | SAVING | 2e19d87f-44ac-4525-bccd-41f2d0350a92 |
| c24722c5-6ba2-4bd8-aee4-47da3bd82c5b | snap3  | SAVING | 7f4df8e2-77fc-495e-acea-467b039e1ccd |
| 740d796e-2155-41c0-af97-dcfbb0d448cc | snap4  | ACTIVE | 80fbf669-e8c5-422a-b2ff-90c34e09d8ec |
| c7b77ade-bd2c-4af2-b4a6-7a82fd9bd36d | snap5  | ACTIVE | 9f0cb489-56d3-4258-93e6-0c927cf70352 |
| 0762e63e-7ec9-42e2-b9f8-65b3bbf1af55 | snap6  | SAVING | a512a696-13f3-4e48-8154-d8de3ad35af4 |
| 527ab8fa-40cb-4290-b672-9d46705fec38 | snap7  | SAVING | d965ae10-6175-4c57-93f9-47b57c7a2907 |
| fba75c83-6056-4119-8657-7a5e45c302eb | snap8  | ACTIVE | da936d2d-0273-4f47-b6cb-24ea24577317 |
| ae1536e7-6a68-4d06-ae9a-710d9a7d56e4 | snap9  | SAVING | fa7ea571-a0c1-4cef-9788-ef1e0cf1bf8d |
+--------------------------------------+--------+--------+--------------------------------------+


2013-06-13 11:31:11.597 2966 INFO nova.compute.manager [-] Lifecycle event 1 on VM 1b1eb170-3bdb-496e-bcaa-2a3cd5078e06
2013-06-13 11:31:11.858 2966 INFO nova.compute.manager [-] [instance: 1b1eb170-3bdb-496e-bcaa-2a3cd5078e06] During sync_power_state the instance has a pending task. Skip.
2013-06-13 11:31:28.739 2966 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
2013-06-13 11:32:27.171 2966 ERROR nova.servicegroup.drivers.db [-] model server went away
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db Traceback (most recent call last):
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py", line 92, in _report_state
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     service.service_ref, state_catalog)
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 627, in service_update
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     return self.conductor_rpcapi.service_update(context, service, values)
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 365, in service_update
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     return self.call(context, msg, version='1.34')
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/proxy.py", line 80, in call
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     return rpc.call(context, self._get_topic(topic), msg, timeout)
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/__init__.py", line 140, in call
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     return _get_impl().call(CONF, context, topic, msg, timeout)
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 610, in call
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     rpc_amqp.get_connection_pool(conf, Connection))
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 612, in call
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     rv = list(rv)
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 554, in __iter__
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     self.done()
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     self.gen.next()
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 551, in __iter__
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     self._iterator.next()
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 435, in iterconsume
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     yield self.ensure(_error_callback, _consume)
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 379, in ensure
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     error_callback(e)
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 420, in _error_callback
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db     raise rpc_common.Timeout()
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db Timeout: Timeout while waiting on RPC response.
2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db
Comment 2 Xavier Queralt 2013-09-26 05:11:44 EDT
I've been unable to reproduce this problem with the latest version of OpenStack. I've forced a Timeout in the snapshot code and now it is being handled properly: snapshot information is cleaned and the instance goes back to its initial state.

There have been a lot of changes in the snapshot code which makes it hard to figure out on which version it got fixed. I'll move the bug to modified and mark it as fixed in the latest build so QE can verify it.
Comment 4 Scott Lewis 2013-11-19 11:54:25 EST
Auto adding >= MODIFIED bugs to beta
Comment 8 Haim 2013-12-12 05:53:38 EST
verified, executed the same steps as above and snapshots were created successfully.

openstack-ceilometer-common-2013.2-4.el6ost.noarch
openstack-nova-conductor-2013.2-9.el6ost.noarch
openstack-heat-api-cfn-2013.2-4.el6ost.noarch
redhat-access-plugin-openstack-4.0.0-0.el6ost.noarch
openstack-swift-account-1.10.0-2.el6ost.noarch
openssl-devel-1.0.1e-15.el6.x86_64
openstack-neutron-2013.2-14.el6ost.noarch
openstack-nova-common-2013.2-9.el6ost.noarch
openstack-nova-scheduler-2013.2-9.el6ost.noarch
openstack-swift-container-1.10.0-2.el6ost.noarch
b43-openfwwf-5.2-4.el6.noarch
openvswitch-1.11.0-1.el6.x86_64
python-django-openstack-auth-1.1.2-1.el6ost.noarch
openstack-heat-common-2013.2-4.el6ost.noarch
openstack-ceilometer-compute-2013.2-4.el6ost.noarch
openstack-selinux-0.1.3-2.el6ost.noarch
openstack-nova-cert-2013.2-9.el6ost.noarch
openssl-1.0.1e-15.el6.x86_64
openstack-nova-api-2013.2-9.el6ost.noarch
openstack-heat-api-2013.2-4.el6ost.noarch
openstack-packstack-2013.2.1-0.14.dev919.el6ost.noarch
openstack-swift-1.10.0-2.el6ost.noarch
openstack-dashboard-2013.2-8.el6ost.noarch
openstack-ceilometer-collector-2013.2-4.el6ost.noarch
openstack-nova-console-2013.2-9.el6ost.noarch
openstack-heat-api-cloudwatch-2013.2-4.el6ost.noarch
openstack-neutron-openvswitch-2013.2-14.el6ost.noarch
openssh-5.3p1-94.el6.x86_64
openssh-server-5.3p1-94.el6.x86_64
openstack-swift-object-1.10.0-2.el6ost.noarch
openstack-dashboard-theme-2013.2-8.el6ost.noarch
openstack-ceilometer-central-2013.2-4.el6ost.noarch
openstack-nova-compute-2013.2-9.el6ost.noarch
openstack-keystone-2013.2-3.el6ost.noarch
openstack-utils-2013.2-2.el6ost.noarch
openldap-2.4.23-32.el6_4.1.x86_64
openstack-ceilometer-api-2013.2-4.el6ost.noarch
openstack-nova-novncproxy-2013.2-9.el6ost.noarch
openstack-heat-engine-2013.2-4.el6ost.noarch
openssh-clients-5.3p1-94.el6.x86_64
Comment 10 errata-xmlrpc 2013-12-19 19:05:58 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html

Note You need to log in before you can comment on or make changes to this bug.