Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 997840

Summary: live block migration stopped working, claiming DestinationDiskExists
Product: Red Hat OpenStack Reporter: Jaroslav Henner <jhenner>
Component: openstack-novaAssignee: Xavier Queralt <xqueralt>
Status: CLOSED ERRATA QA Contact: Jaroslav Henner <jhenner>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.0CC: ajeain, apevec, dallan, hateya, jhenner, ndipanov, sclewis, xqueralt, yeylon
Target Milestone: z2Keywords: Regression, ZStream
Target Release: 3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: openstack-nova-2013.1.3-2.el6ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-03 20:21:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 993100    
Attachments:
Description Flags
log none

Description Jaroslav Henner 2013-08-16 09:35:01 UTC
Description of problem:
live block migration fails because nova claims it found a disk that it shouldn't be there.

Version-Release number of selected component (if applicable):
openstack-nova-common-2013.1.3-1.el6ost.noarch

How reproducible:
always

Steps to Reproduce:
1. nova live-migration --block-migrate $VM
2. check the logs
3.

Actual results:
no migration, errors in logs

Expected results:
migrated, no error

Additional info:

Comment 1 Jaroslav Henner 2013-08-16 09:39:16 UTC
Created attachment 787211 [details]
log

Comment 2 Jaroslav Henner 2013-08-16 15:58:33 UTC
I tried to reproduce on fresh deployment of 2013-08-05.1. It migration passed.
I updated to 2013-08-15.1. Migration passed. So I wonder why it started failing on my production deployment.

Comment 3 Jaroslav Henner 2013-08-16 16:27:07 UTC
I am not sure whether I made some error, but now, with puddle 2013-08-15.1 I can reproduce.:


for node in node-01.lithium node-02.lithium; do echo $node; ssh $node ls /var/lib/nova/instances; echo; done
node-01.lithium                                                                    
Warning: Permanently added 'node-01.lithium' (RSA) to the list of known hosts.
a64c03f1-3d58-4f75-b38a-a526730ca431
_base
locks

node-02.lithium
Warning: Permanently added 'node-02.lithium' (RSA) to the list of known hosts.
a64c03f1-3d58-4f75-b38a-a526730ca431

+-------------------------------------+----------------------------------------------------------+
| Property                            | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| status                              | BUILD                                                    |
| updated                             | 2013-08-16T16:23:41Z                                     |
| OS-EXT-STS:task_state               | block_device_mapping                                     |
| OS-EXT-SRV-ATTR:host                | node-01.lithium.rhev.lab.eng.brq.redhat.com              |
| key_name                            | None                                                     |
| image                               | cirros1 (ab24ccbd-4c89-4444-b0b2-a06a79c44306)           |
| hostId                              | 911886e953f179550c30da8760ac9d00bd0f5aa76dfcb5c328d7c1e3 |
| OS-EXT-STS:vm_state                 | building                                                 |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000000c                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-01.lithium.rhev.lab.eng.brq.redhat.com              |
| flavor                              | m1.tiny (1)                                              |
| id                                  | a64c03f1-3d58-4f75-b38a-a526730ca431                     |
...
| config_drive                        |                                                          |
+-------------------------------------+----------------------------------------------------------+
[root@folsom-rhel6 ~(keystone_admin)]# nova live-migration --block-migrate  foo
[root@folsom-rhel6 ~(keystone_admin)]# nova show foo
+-------------------------------------+----------------------------------------------------------+
| Property                            | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| status                              | ACTIVE                                                   |
| updated                             | 2013-08-16T16:23:58Z                                     |
| OS-EXT-STS:task_state               | None                                                     |
| OS-EXT-SRV-ATTR:host                | node-01.lithium.rhev.lab.eng.brq.redhat.com              |
| key_name                            | None                                                     |
| image                               | cirros1 (ab24ccbd-4c89-4444-b0b2-a06a79c44306)           |
| hostId                              | 911886e953f179550c30da8760ac9d00bd0f5aa76dfcb5c328d7c1e3 |
| OS-EXT-STS:vm_state                 | active                                                   |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000000c                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-01.lithium.rhev.lab.eng.brq.redhat.com              |
...
| config_drive                        |                                                          |
+-------------------------------------+----------------------------------------------------------+

I saw that directory appeared on the dest host, and then it disappeared again. I will try to retest it again then. I still think it is a regression because I was moving a VMs a lot in grizzly OpenStack.

Comment 4 Jaroslav Henner 2013-08-16 22:26:12 UTC
Reproduced. It really doesn't happen in 2013-08-05.1 but it does happen in 2013-08-15.1. I must have been too quick. Checking for MIGRATING status of VM is not enough.

Comment 5 Xavier Queralt 2013-08-19 08:57:13 UTC
Proposed backport to the stable release

Comment 9 Yogev Rabl 2013-08-25 12:42:50 UTC
We need more info in order to verify this bug: 
1. What is the setup of the RHOS components?  
2. What is the storage setup?
3. Can you please add the Cinder's logs.
4. Please elaborate, on which logs should we check (step 2). 

please return the bug to me to verify. 

thanks.

Comment 10 Xavier Queralt 2013-08-26 08:24:21 UTC
(In reply to Yogev Rabl from comment #9)
> We need more info in order to verify this bug: 
> 1. What is the setup of the RHOS components?  
> 2. What is the storage setup?
> 3. Can you please add the Cinder's logs.
> 4. Please elaborate, on which logs should we check (step 2). 
> 
> please return the bug to me to verify. 
> 
> thanks.

1. A plain RHOS setup with at least two compute nodes
2. Without shared storage (i.e. instance disks are local)
3. Cinder has nothing to do with this bug. The live block migration moves an instance's disk (don't confuse it with a disk in cinder, it's the local disk created from an image on instance creation) from one host to the other.
4. Consequently, the logs you've to check are the compute logs. Look for the exception DestinationDiskExists which shouldn't be there after the fix.


Steps to reproduce:
1. create an instance from an image (no volumes).
2. Running "nova show <instance name>", check on which host the instance is running (See property OS-EXT-SRV-ATTR:host).
3. run "nova live-migration --block-migrate <instance name>"
4. Check compute's log file in both hosts, where you shouldn't find the DestinationDiskExists exception.
5. Running "nova show <instance name>" again, the host for this instance must have changed and the instance is in status ACTIVE.

Comment 11 Jaroslav Henner 2013-08-26 13:24:49 UTC
After upgrade, it works again:
[root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host
| OS-EXT-SRV-ATTR:host                | master-01...                          |
| hostId                              | 8c875ab353cd54d8cb39ba4169f51a66c5999a185d598f9754a2e974            |
| OS-EXT-SRV-ATTR:hypervisor_hostname | master-01...                          |
[root@controller ~(keystone_admin)]$ nova live-migration  f5c80071-8a4f-4805-8aaa-1487fafca6af --block-migrate 
[root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host
| OS-EXT-SRV-ATTR:host                | master-02...                          |
| hostId                              | e063be730b5e973391d5353e5ce89f1965bedaa2acde75ee08624079            |
| OS-EXT-SRV-ATTR:hypervisor_hostname | master-02...                          |
[root@controller ~(keystone_admin)]$ nova live-migration  f5c80071-8a4f-4805-8aaa-1487fafca6af --block-migrate 
[root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host
| OS-EXT-SRV-ATTR:host                | master-01...                          |
| hostId                              | 8c875ab353cd54d8cb39ba4169f51a66c5999a185d598f9754a2e974            |
| OS-EXT-SRV-ATTR:hypervisor_hostname | master-01...                          |

Comment 13 errata-xmlrpc 2013-09-03 20:21:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1199.html