Bug 997840 - live block migration stopped working, claiming DestinationDiskExists
Summary: live block migration stopped working, claiming DestinationDiskExists
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 3.0
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: z2
: 3.0
Assignee: Xavier Queralt
QA Contact: Jaroslav Henner
URL:
Whiteboard: storage
Depends On:
Blocks: 993100
TreeView+ depends on / blocked
 
Reported: 2013-08-16 09:35 UTC by Jaroslav Henner
Modified: 2019-09-09 13:26 UTC (History)
9 users (show)

Fixed In Version: openstack-nova-2013.1.3-2.el6ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-03 20:21:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log (9.54 KB, text/plain)
2013-08-16 09:39 UTC, Jaroslav Henner
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1193359 0 None None None Never
OpenStack gerrit 42588 0 None None None Never
Red Hat Product Errata RHSA-2013:1199 0 normal SHIPPED_LIVE Moderate: openstack-nova security and bug fix update 2013-09-04 00:16:56 UTC

Description Jaroslav Henner 2013-08-16 09:35:01 UTC
Description of problem:
live block migration fails because nova claims it found a disk that it shouldn't be there.

Version-Release number of selected component (if applicable):
openstack-nova-common-2013.1.3-1.el6ost.noarch

How reproducible:
always

Steps to Reproduce:
1. nova live-migration --block-migrate $VM
2. check the logs
3.

Actual results:
no migration, errors in logs

Expected results:
migrated, no error

Additional info:

Comment 1 Jaroslav Henner 2013-08-16 09:39:16 UTC
Created attachment 787211 [details]
log

Comment 2 Jaroslav Henner 2013-08-16 15:58:33 UTC
I tried to reproduce on fresh deployment of 2013-08-05.1. It migration passed.
I updated to 2013-08-15.1. Migration passed. So I wonder why it started failing on my production deployment.

Comment 3 Jaroslav Henner 2013-08-16 16:27:07 UTC
I am not sure whether I made some error, but now, with puddle 2013-08-15.1 I can reproduce.:


for node in node-01.lithium node-02.lithium; do echo $node; ssh $node ls /var/lib/nova/instances; echo; done
node-01.lithium                                                                    
Warning: Permanently added 'node-01.lithium' (RSA) to the list of known hosts.
a64c03f1-3d58-4f75-b38a-a526730ca431
_base
locks

node-02.lithium
Warning: Permanently added 'node-02.lithium' (RSA) to the list of known hosts.
a64c03f1-3d58-4f75-b38a-a526730ca431

+-------------------------------------+----------------------------------------------------------+
| Property                            | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| status                              | BUILD                                                    |
| updated                             | 2013-08-16T16:23:41Z                                     |
| OS-EXT-STS:task_state               | block_device_mapping                                     |
| OS-EXT-SRV-ATTR:host                | node-01.lithium.rhev.lab.eng.brq.redhat.com              |
| key_name                            | None                                                     |
| image                               | cirros1 (ab24ccbd-4c89-4444-b0b2-a06a79c44306)           |
| hostId                              | 911886e953f179550c30da8760ac9d00bd0f5aa76dfcb5c328d7c1e3 |
| OS-EXT-STS:vm_state                 | building                                                 |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000000c                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-01.lithium.rhev.lab.eng.brq.redhat.com              |
| flavor                              | m1.tiny (1)                                              |
| id                                  | a64c03f1-3d58-4f75-b38a-a526730ca431                     |
...
| config_drive                        |                                                          |
+-------------------------------------+----------------------------------------------------------+
[root@folsom-rhel6 ~(keystone_admin)]# nova live-migration --block-migrate  foo
[root@folsom-rhel6 ~(keystone_admin)]# nova show foo
+-------------------------------------+----------------------------------------------------------+
| Property                            | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| status                              | ACTIVE                                                   |
| updated                             | 2013-08-16T16:23:58Z                                     |
| OS-EXT-STS:task_state               | None                                                     |
| OS-EXT-SRV-ATTR:host                | node-01.lithium.rhev.lab.eng.brq.redhat.com              |
| key_name                            | None                                                     |
| image                               | cirros1 (ab24ccbd-4c89-4444-b0b2-a06a79c44306)           |
| hostId                              | 911886e953f179550c30da8760ac9d00bd0f5aa76dfcb5c328d7c1e3 |
| OS-EXT-STS:vm_state                 | active                                                   |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000000c                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-01.lithium.rhev.lab.eng.brq.redhat.com              |
...
| config_drive                        |                                                          |
+-------------------------------------+----------------------------------------------------------+

I saw that directory appeared on the dest host, and then it disappeared again. I will try to retest it again then. I still think it is a regression because I was moving a VMs a lot in grizzly OpenStack.

Comment 4 Jaroslav Henner 2013-08-16 22:26:12 UTC
Reproduced. It really doesn't happen in 2013-08-05.1 but it does happen in 2013-08-15.1. I must have been too quick. Checking for MIGRATING status of VM is not enough.

Comment 5 Xavier Queralt 2013-08-19 08:57:13 UTC
Proposed backport to the stable release

Comment 9 Yogev Rabl 2013-08-25 12:42:50 UTC
We need more info in order to verify this bug: 
1. What is the setup of the RHOS components?  
2. What is the storage setup?
3. Can you please add the Cinder's logs.
4. Please elaborate, on which logs should we check (step 2). 

please return the bug to me to verify. 

thanks.

Comment 10 Xavier Queralt 2013-08-26 08:24:21 UTC
(In reply to Yogev Rabl from comment #9)
> We need more info in order to verify this bug: 
> 1. What is the setup of the RHOS components?  
> 2. What is the storage setup?
> 3. Can you please add the Cinder's logs.
> 4. Please elaborate, on which logs should we check (step 2). 
> 
> please return the bug to me to verify. 
> 
> thanks.

1. A plain RHOS setup with at least two compute nodes
2. Without shared storage (i.e. instance disks are local)
3. Cinder has nothing to do with this bug. The live block migration moves an instance's disk (don't confuse it with a disk in cinder, it's the local disk created from an image on instance creation) from one host to the other.
4. Consequently, the logs you've to check are the compute logs. Look for the exception DestinationDiskExists which shouldn't be there after the fix.


Steps to reproduce:
1. create an instance from an image (no volumes).
2. Running "nova show <instance name>", check on which host the instance is running (See property OS-EXT-SRV-ATTR:host).
3. run "nova live-migration --block-migrate <instance name>"
4. Check compute's log file in both hosts, where you shouldn't find the DestinationDiskExists exception.
5. Running "nova show <instance name>" again, the host for this instance must have changed and the instance is in status ACTIVE.

Comment 11 Jaroslav Henner 2013-08-26 13:24:49 UTC
After upgrade, it works again:
[root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host
| OS-EXT-SRV-ATTR:host                | master-01...                          |
| hostId                              | 8c875ab353cd54d8cb39ba4169f51a66c5999a185d598f9754a2e974            |
| OS-EXT-SRV-ATTR:hypervisor_hostname | master-01...                          |
[root@controller ~(keystone_admin)]$ nova live-migration  f5c80071-8a4f-4805-8aaa-1487fafca6af --block-migrate 
[root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host
| OS-EXT-SRV-ATTR:host                | master-02...                          |
| hostId                              | e063be730b5e973391d5353e5ce89f1965bedaa2acde75ee08624079            |
| OS-EXT-SRV-ATTR:hypervisor_hostname | master-02...                          |
[root@controller ~(keystone_admin)]$ nova live-migration  f5c80071-8a4f-4805-8aaa-1487fafca6af --block-migrate 
[root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host
| OS-EXT-SRV-ATTR:host                | master-01...                          |
| hostId                              | 8c875ab353cd54d8cb39ba4169f51a66c5999a185d598f9754a2e974            |
| OS-EXT-SRV-ATTR:hypervisor_hostname | master-01...                          |

Comment 13 errata-xmlrpc 2013-09-03 20:21:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1199.html


Note You need to log in before you can comment on or make changes to this bug.