Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1646382

Summary: Live migration of instance with QEMU 2.12.0 on RHEL 7.6 fails
Product: Red Hat OpenStack Reporter: Vadim Khitrin <vkhitrin>
Component: openstack-novaAssignee: Lee Yarwood <lyarwood>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: atelang, berrange, cfontain, dasmith, eglynn, jamsmith, jhakimra, kchamart, lyarwood, pmannidi, sbauza, sgordon, skramaja, ssigwald, supadhya, tcarlin, vromanso, yrachman
Target Milestone: asyncKeywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-14.1.0-34.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1647362 (view as bug list) Environment:
Last Closed: 2018-11-29 19:48:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1647362, 1647441    

Description Vadim Khitrin 2018-11-05 13:39:47 UTC
Description of problem:

Attempting to live migrate a guest instance with DPDK NIC results in the following error:

[stack@undercloud-0 ~]$ openstack server list --all
+--------------------------------------+---------------------------------------------+--------+-----------------------------------+---------------------------------------+
| ID                                   | Name                                        | Status | Networks                          | Image Name                            |
+--------------------------------------+---------------------------------------------+--------+-----------------------------------+---------------------------------------+
| c6d4db44-350c-499d-99b8-5cdc40ac192a | tempest-TestDpdkScenarios-server-1407418096 | ACTIVE | data1=10.10.135.108, 10.35.185.93 | rhel-guest-image-7.5-180.x86_64.qcow2 |
+--------------------------------------+---------------------------------------------+--------+-----------------------------------+---------------------------------------+

[stack@undercloud-0 ~]$ openstack server migrate --live compute-0.localdomain --block-migration c6d4db44-350c-499d-99b8-5cdc40ac192a
Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'MigrationError_Remote'> (HTTP 500) (Request-ID: req-15bfb6df-f88c-4ae1-a5dc-a4ccadcbab2b)

The guest enters error state afterwards:

[stack@undercloud-0 ~]$ openstack server show c6d4db44-350c-499d-99b8-5cdc40ac192a
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                                | Value                                                                                                                                                                 |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                                                                                                                                |
| OS-EXT-AZ:availability_zone          | nova                                                                                                                                                                  |
| OS-EXT-SRV-ATTR:host                 | compute-1.localdomain                                                                                                                                                 |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-1.localdomain                                                                                                                                                 |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000047                                                                                                                                                     |
| OS-EXT-STS:power_state               | Running                                                                                                                                                               |
| OS-EXT-STS:task_state                | migrating                                                                                                                                                             |
| OS-EXT-STS:vm_state                  | error                                                                                                                                                                 |
| OS-SRV-USG:launched_at               | 2018-11-05T12:37:55.000000                                                                                                                                            |
| OS-SRV-USG:terminated_at             | None                                                                                                                                                                  |
| accessIPv4                           |                                                                                                                                                                       |
| accessIPv6                           |                                                                                                                                                                       |
| addresses                            | data1=10.10.135.108, 10.35.185.93                                                                                                                                     |
| config_drive                         | True                                                                                                                                                                  |
| created                              | 2018-11-05T12:37:48Z                                                                                                                                                  |
| fault                                | {u'message': u'Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/c6d4db44-350c-499d-99b8-5cdc40ac192a/disk : Unexpected error while    |
|                                      | running command.\nCommand: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_AL', u'code': 400, u'created': u'2018-11-05T13:20:07Z'}     |
| flavor                               | live-migration (4a593598-08d0-4a39-98db-4356b90caf89)                                                                                                                 |
| hostId                               | 8fec9565610917c5cf95909e868bd635d7115c67b1a611b4eb167464                                                                                                              |
| id                                   | c6d4db44-350c-499d-99b8-5cdc40ac192a                                                                                                                                  |
| image                                | rhel-guest-image-7.5-180.x86_64.qcow2 (f9e8a2ac-cc34-4756-8e53-12056400fb9e)                                                                                          |
| key_name                             | tempest-TestDpdkScenarios-1183332307                                                                                                                                  |
| name                                 | tempest-TestDpdkScenarios-server-1407418096                                                                                                                           |
| os-extended-volumes:volumes_attached | []                                                                                                                                                                    |
| project_id                           | 69347614cb9a44ad8e5dbed3db3e80e6                                                                                                                                      |
| properties                           |                                                                                                                                                                       |
| security_groups                      | [{u'name': u'tempest-TestDpdkScenarios-1214750980'}]                                                                                                                  |
| status                               | ERROR                                                                                                                                                                 |
| updated                              | 2018-11-05T13:20:07Z                                                                                                                                                  |
| user_id                              | f0bd094e517d413ba16de31cd0f1da59                                                                                                                                      |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Snippet of nova-compute.log on compute node that the guest instance resides on (will attach sosreport):
----------------------------------------------------------------------
2018-11-05 13:25:05.448 68539 ERROR nova.compute.manager
2018-11-05 13:25:30.258 68539 INFO nova.compute.manager [-] [instance: c6d4db44-350c-499d-99b8-5cdc40ac192a] During sync_power_state the instance has a pending task (migrating). Skip.
2018-11-05 13:26:07.307 68539 INFO nova.compute.resource_tracker [req-81292d8c-6258-467b-9674-922e6701739e - - - - -] Auditing locally available compute resources for node compute-1.localdomain
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager [req-81292d8c-6258-467b-9674-922e6701739e - - - - -] Error updating resources for node compute-1.localdomain.
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager Traceback (most recent call last):
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6460, in update_available_resource_for_node
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager     rt.update_available_resource(context)
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 511, in update_available_resource
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager     resources = self.driver.get_available_resource(self.nodename)
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5595, in get_available_resource
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager     disk_over_committed = self._get_disk_over_committed_size_total()
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7188, in _get_disk_over_committed_size_total
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager     block_device_info=block_device_info)
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7097, in _get_instance_disk_info
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager     dk_size = disk_api.get_allocated_disk_size(path)
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 158, in get_allocated_disk_size
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager     return images.qemu_img_info(path).disk_size
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 72, in qemu_img_info
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager     raise exception.InvalidDiskInfo(reason=msg)
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/c6d4db44-350c-499d-99b8-5cdc40ac192a/disk : Unexpected error while running command.
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/c6d4db44-350c-499d-99b8-5cdc40ac192a/disk
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager Exit code: 1
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager Stdout: u''
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager Stderr: u'qemu-img: Could not open \'/var/lib/nova/instances/c6d4db44-350c-499d-99b8-5cdc40ac192a/disk\': Failed to get shared "write" lock\nIs another process using the image?\n'
2018-11-05 13:26:07.446 68539 ERROR nova.compute.manager

Version-Release number of selected component (if applicable): 2018-10-30.1

How reproducible: always

Steps to Reproduce:
1. Create network
2. Spawn guest instance with vhost NIC
3. Attempt to live migrate guest instance

Actual results:

Live igration fails and if command is ran manually with 'openstack' cli then the guest instance enters ERROR state

Expected results:

Live migration successful

Additional info:

Comment 10 Joe H. Rahme 2018-11-28 14:58:55 UTC
Verified that live migrations work successfully in the latest puddle:

[stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 
10   -p 2018-11-27.1
[stack@undercloud-0 ~]$ yum list installed | grep nova
openstack-nova-api.noarch        1:14.1.0-33.el7ost     @rhelosp-10.0-puddle    
openstack-nova-cert.noarch       1:14.1.0-33.el7ost     @rhelosp-10.0-puddle    
openstack-nova-common.noarch     1:14.1.0-33.el7ost     @rhelosp-10.0-puddle    
openstack-nova-compute.noarch    1:14.1.0-33.el7ost     @rhelosp-10.0-puddle    
openstack-nova-conductor.noarch  1:14.1.0-33.el7ost     @rhelosp-10.0-puddle    
openstack-nova-scheduler.noarch  1:14.1.0-33.el7ost     @rhelosp-10.0-puddle    
puppet-nova.noarch               9.6.0-9.el7ost         @rhelosp-10.0-puddle    
python-nova.noarch               1:14.1.0-33.el7ost     @rhelosp-10.0-puddle    
python-novaclient.noarch         1:6.0.2-2.el7ost       @rhelosp-10.0-puddle    



+--------------------------------------+----------------------------------------------------------+
| Field                                | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | compute-1.localdomain                                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-1.localdomain                                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000001                                        |
| OS-EXT-STS:power_state               | Running                                                  |
| OS-EXT-STS:task_state                | None                                                     |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2018-11-28T14:52:21.000000                               |
| OS-SRV-USG:terminated_at             | None                                                     |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| addresses                            | private=192.168.100.3                                    |
| adminPass                            | 7ykNyFzk75yC                                             |
| config_drive                         |                                                          |
| created                              | 2018-11-28T14:52:14Z                                     |
| flavor                               | m1.smoke (63be7aed-3169-4039-b735-d37b6b9cf3f9)          |
| hostId                               | 1ecf895c3720630f9528eb064991daed0e01eeaaeaf608816a6e97d7 |
| id                                   | c857847b-0c15-49db-8700-9e517da1eb4c                     |
| image                                | cirros (7b602bba-8074-4329-a2e9-2f5151a133d1)            |
| key_name                             | None                                                     |
| name                                 | test-23711                                               |
| os-extended-volumes:volumes_attached | []                                                       |
| progress                             | 0                                                        |
| project_id                           | 03e91516409442df96460b82c5df9bbd                         |
| properties                           |                                                          |
| security_groups                      | [{u'name': u'default'}]                                  |
| status                               | ACTIVE                                                   |
| updated                              | 2018-11-28T14:52:21Z                                     |
| user_id                              | d4a28b5b16c742afa9f92c557e407e26                         |
+--------------------------------------+----------------------------------------------------------+
[stack@undercloud-0 ~]$ openstack server migrate --live compute-0.localdomain --block-migration test-23711
[stack@undercloud-0 ~]$ sleep 10 && openstack server list
+--------------------------------------+------------+--------+-----------------------+------------+
| ID                                   | Name       | Status | Networks              | Image Name |
+--------------------------------------+------------+--------+-----------------------+------------+
| c857847b-0c15-49db-8700-9e517da1eb4c | test-23711 | ACTIVE | private=192.168.100.3 | cirros     |
+--------------------------------------+------------+--------+-----------------------+------------+
[stack@undercloud-0 ~]$ openstack server show test-23711
+--------------------------------------+----------------------------------------------------------+
| Field                                | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | compute-0.localdomain                                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-0.localdomain                                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000001                                        |
| OS-EXT-STS:power_state               | Running                                                  |
| OS-EXT-STS:task_state                | None                                                     |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2018-11-28T14:52:21.000000                               |
| OS-SRV-USG:terminated_at             | None                                                     |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| addresses                            | private=192.168.100.3                                    |
| config_drive                         |                                                          |
| created                              | 2018-11-28T14:52:14Z                                     |
| flavor                               | m1.smoke (63be7aed-3169-4039-b735-d37b6b9cf3f9)          |
| hostId                               | 5be89ae57a28e58ca2cf8a5049d9d76db01d1cceacbb3b5885776d5c |
| id                                   | c857847b-0c15-49db-8700-9e517da1eb4c                     |
| image                                | cirros (7b602bba-8074-4329-a2e9-2f5151a133d1)            |
| key_name                             | None                                                     |
| name                                 | test-23711                                               |
| os-extended-volumes:volumes_attached | []                                                       |
| progress                             | 0                                                        |
| project_id                           | 03e91516409442df96460b82c5df9bbd                         |
| properties                           |                                                          |
| security_groups                      | [{u'name': u'default'}]                                  |
| status                               | ACTIVE                                                   |
| updated                              | 2018-11-28T14:53:22Z                                     |
| user_id                              | d4a28b5b16c742afa9f92c557e407e26                         |
+--------------------------------------+----------------------------------------------------------+

Comment 15 errata-xmlrpc 2018-11-29 19:48:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3735

Comment 16 Kashyap Chamarthy 2018-12-05 08:47:33 UTC
*** Bug 1654804 has been marked as a duplicate of this bug. ***

Comment 17 Kashyap Chamarthy 2018-12-05 09:02:01 UTC
*** Bug 1533444 has been marked as a duplicate of this bug. ***

Comment 18 Lee Yarwood 2019-04-05 07:55:54 UTC
*** Bug 1696578 has been marked as a duplicate of this bug. ***