Description of problem: live block migration fails because nova claims it found a disk that it shouldn't be there. Version-Release number of selected component (if applicable): openstack-nova-common-2013.1.3-1.el6ost.noarch How reproducible: always Steps to Reproduce: 1. nova live-migration --block-migrate $VM 2. check the logs 3. Actual results: no migration, errors in logs Expected results: migrated, no error Additional info:
Created attachment 787211 [details] log
I tried to reproduce on fresh deployment of 2013-08-05.1. It migration passed. I updated to 2013-08-15.1. Migration passed. So I wonder why it started failing on my production deployment.
I am not sure whether I made some error, but now, with puddle 2013-08-15.1 I can reproduce.: for node in node-01.lithium node-02.lithium; do echo $node; ssh $node ls /var/lib/nova/instances; echo; done node-01.lithium Warning: Permanently added 'node-01.lithium' (RSA) to the list of known hosts. a64c03f1-3d58-4f75-b38a-a526730ca431 _base locks node-02.lithium Warning: Permanently added 'node-02.lithium' (RSA) to the list of known hosts. a64c03f1-3d58-4f75-b38a-a526730ca431 +-------------------------------------+----------------------------------------------------------+ | Property | Value | +-------------------------------------+----------------------------------------------------------+ | status | BUILD | | updated | 2013-08-16T16:23:41Z | | OS-EXT-STS:task_state | block_device_mapping | | OS-EXT-SRV-ATTR:host | node-01.lithium.rhev.lab.eng.brq.redhat.com | | key_name | None | | image | cirros1 (ab24ccbd-4c89-4444-b0b2-a06a79c44306) | | hostId | 911886e953f179550c30da8760ac9d00bd0f5aa76dfcb5c328d7c1e3 | | OS-EXT-STS:vm_state | building | | OS-EXT-SRV-ATTR:instance_name | instance-0000000c | | OS-EXT-SRV-ATTR:hypervisor_hostname | node-01.lithium.rhev.lab.eng.brq.redhat.com | | flavor | m1.tiny (1) | | id | a64c03f1-3d58-4f75-b38a-a526730ca431 | ... | config_drive | | +-------------------------------------+----------------------------------------------------------+ [root@folsom-rhel6 ~(keystone_admin)]# nova live-migration --block-migrate foo [root@folsom-rhel6 ~(keystone_admin)]# nova show foo +-------------------------------------+----------------------------------------------------------+ | Property | Value | +-------------------------------------+----------------------------------------------------------+ | status | ACTIVE | | updated | 2013-08-16T16:23:58Z | | OS-EXT-STS:task_state | None | | OS-EXT-SRV-ATTR:host | node-01.lithium.rhev.lab.eng.brq.redhat.com | | key_name | None | | image | cirros1 (ab24ccbd-4c89-4444-b0b2-a06a79c44306) | | hostId | 911886e953f179550c30da8760ac9d00bd0f5aa76dfcb5c328d7c1e3 | | OS-EXT-STS:vm_state | active | | OS-EXT-SRV-ATTR:instance_name | instance-0000000c | | OS-EXT-SRV-ATTR:hypervisor_hostname | node-01.lithium.rhev.lab.eng.brq.redhat.com | ... | config_drive | | +-------------------------------------+----------------------------------------------------------+ I saw that directory appeared on the dest host, and then it disappeared again. I will try to retest it again then. I still think it is a regression because I was moving a VMs a lot in grizzly OpenStack.
Reproduced. It really doesn't happen in 2013-08-05.1 but it does happen in 2013-08-15.1. I must have been too quick. Checking for MIGRATING status of VM is not enough.
Proposed backport to the stable release
We need more info in order to verify this bug: 1. What is the setup of the RHOS components? 2. What is the storage setup? 3. Can you please add the Cinder's logs. 4. Please elaborate, on which logs should we check (step 2). please return the bug to me to verify. thanks.
(In reply to Yogev Rabl from comment #9) > We need more info in order to verify this bug: > 1. What is the setup of the RHOS components? > 2. What is the storage setup? > 3. Can you please add the Cinder's logs. > 4. Please elaborate, on which logs should we check (step 2). > > please return the bug to me to verify. > > thanks. 1. A plain RHOS setup with at least two compute nodes 2. Without shared storage (i.e. instance disks are local) 3. Cinder has nothing to do with this bug. The live block migration moves an instance's disk (don't confuse it with a disk in cinder, it's the local disk created from an image on instance creation) from one host to the other. 4. Consequently, the logs you've to check are the compute logs. Look for the exception DestinationDiskExists which shouldn't be there after the fix. Steps to reproduce: 1. create an instance from an image (no volumes). 2. Running "nova show <instance name>", check on which host the instance is running (See property OS-EXT-SRV-ATTR:host). 3. run "nova live-migration --block-migrate <instance name>" 4. Check compute's log file in both hosts, where you shouldn't find the DestinationDiskExists exception. 5. Running "nova show <instance name>" again, the host for this instance must have changed and the instance is in status ACTIVE.
After upgrade, it works again: [root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host | OS-EXT-SRV-ATTR:host | master-01... | | hostId | 8c875ab353cd54d8cb39ba4169f51a66c5999a185d598f9754a2e974 | | OS-EXT-SRV-ATTR:hypervisor_hostname | master-01... | [root@controller ~(keystone_admin)]$ nova live-migration f5c80071-8a4f-4805-8aaa-1487fafca6af --block-migrate [root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host | OS-EXT-SRV-ATTR:host | master-02... | | hostId | e063be730b5e973391d5353e5ce89f1965bedaa2acde75ee08624079 | | OS-EXT-SRV-ATTR:hypervisor_hostname | master-02... | [root@controller ~(keystone_admin)]$ nova live-migration f5c80071-8a4f-4805-8aaa-1487fafca6af --block-migrate [root@controller ~(keystone_admin)]$ nova show f5c80071-8a4f-4805-8aaa-1487fafca6af | grep host | OS-EXT-SRV-ATTR:host | master-01... | | hostId | 8c875ab353cd54d8cb39ba4169f51a66c5999a185d598f9754a2e974 | | OS-EXT-SRV-ATTR:hypervisor_hostname | master-01... |
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1199.html