Bug 1469774 - Vm with large volume fails to live migrate. [NEEDINFO]
Vm with large volume fails to live migrate.
Status: ASSIGNED
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder (Show other bugs)
11.0 (Ocata)
Unspecified Unspecified
low Severity low
: ---
: 11.0 (Ocata)
Assigned To: Jon Bernard
Avi Avraham
: Triaged, Unconfirmed, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-11 15:46 EDT by Jeremy
Modified: 2017-08-09 10:42 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
pgrist: needinfo? (jobernar)


Attachments (Terms of Use)

  None (edit)
Description Jeremy 2017-07-11 15:46:48 EDT
Description of problem: vm with attached volume fails to live migrate. Also noticed that other vm with volume can live migrate. The most noticable difference is the failing vm has 500G volume, the working vm has 110G volume.

Also this is currently 1 controller deploy not using director.

Details below

Version-Release number of selected component (if applicable):
openstack-nova-api-15.0.3-3.el7ost.noarch                   Fri Jun  9 17:36:41 2017


How reproducible:
100%

Steps to Reproduce:
1.below
2.
3.

Actual results:

migration fails
Expected results:
migration works

Additional info:



###Trying to live migrate from compute 5 to compute 8. 

 nova live-migration a623eaf5-0010-46fd-bca1-b5f3b4320e94 e1-compute-08.eng1.moc.edu


fails with vm 500gb volume. works vm with 110 gb volume..


[root@e1-control-02 ~]# . keystonerc_admin
[root@e1-control-02 ~(openstack_admin)]# nova show a623eaf5-0010-46fd-bca1-b5f3b4320e94
/usr/lib/python2.7/site-packages/novaclient/client.py:278: UserWarning: The 'tenant_id' argument is deprecated in Ocata and its use may result in errors in future releases. As 'project_id' is provided, the 'tenant_id' argument will be ignored.
  warnings.warn(msg)
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                     |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | e1-compute-05.eng1.moc.edu                               |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | e1-compute-05.eng1.moc.edu                               |
| OS-EXT-SRV-ATTR:instance_name        | instance-000006c9                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2017-04-14T20:45:38.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2017-04-14T20:45:25Z                                     |
| flavor                               | m1.large (4)                                             |
| hostId                               | 128883edbe55d8848fc068cabf35e83deba2dd79700dc581c6fa6ca6 |
| id                                   | a623eaf5-0010-46fd-bca1-b5f3b4320e94                     |
| image                                | ubuntu16.04 (a332ad63-8b38-4700-b818-b39fa69233a9)       |
| key_name                             | worldmap                                                 |
| metadata                             | {}                                                       |
| name                                 | wm-postgres                                              |
| os-extended-volumes:volumes_attached | [{"id": "51a5ba24-1e22-4316-8e13-99085f0826d1"}]         |
| progress                             | 0                                                        |
| security_groups                      | default, ping_and_ssh, postgres                          |
| status                               | ACTIVE                                                   |
| tenant_id                            | 00a8d5e942bb442b86733f0f92280fcc                         |
| updated                              | 2017-07-11T18:49:03Z                                     |
| user_id                              | 21c33e79f076445c86933d1f5472bb64                         |
| wm-network network                   | 192.168.100.8, 128.31.22.41                              |
+--------------------------------------+----------------------------------------------------------+



[root@e1-control-02 ~(openstack_admin)]# cinder show 51a5ba24-1e22-4316-8e13-99085f0826d1
+--------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property                       | Value                                                                                                                                                                                                                                                                                                     |
+--------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| attachments                    | [{'server_id': 'a623eaf5-0010-46fd-bca1-b5f3b4320e94', 'attachment_id': '47e241a6-98f2-4159-b8ae-23fdf2332e76', 'attached_at': '2017-04-18T20:20:39.000000', 'host_name': None, 'volume_id': '51a5ba24-1e22-4316-8e13-99085f0826d1', 'device': '/dev/vdb', 'id': '51a5ba24-1e22-4316-8e13-99085f0826d1'}] |
| availability_zone              | nova                                                                                                                                                                                                                                                                                                      |
| bootable                       | false                                                                                                                                                                                                                                                                                                     |
| consistencygroup_id            | None                                                                                                                                                                                                                                                                                                      |
| created_at                     | 2017-04-18T20:17:55.000000                                                                                                                                                                                                                                                                                |
| description                    |                                                                                                                                                                                                                                                                                                           |
| encrypted                      | False                                                                                                                                                                                                                                                                                                     |
| id                             | 51a5ba24-1e22-4316-8e13-99085f0826d1                                                                                                                                                                                                                                                                      |
| metadata                       | {'readonly': 'False', 'attached_mode': 'rw'}                                                                                                                                                                                                                                                              |
| migration_status               | None                                                                                                                                                                                                                                                                                                      |
| multiattach                    | False                                                                                                                                                                                                                                                                                                     |
| name                           | postgres-data                                                                                                                                                                                                                                                                                             |
| os-vol-host-attr:host          | e1-control-02.eng1.moc.edu#DEFAULT                                                                                                                                                                                                                                                                        |
| os-vol-mig-status-attr:migstat | None                                                                                                                                                                                                                                                                                                      |
| os-vol-mig-status-attr:name_id | None                                                                                                                                                                                                                                                                                                      |
| os-vol-tenant-attr:tenant_id   | 00a8d5e942bb442b86733f0f92280fcc                                                                                                                                                                                                                                                                          |
| readonly                       | False                                                                                                                                                                                                                                                                                                     |
| replication_status             | disabled                                                                                                                                                                                                                                                                                                  |
| size                           | 500                                                                                                                                                                                                                                                                                                       |
| snapshot_id                    | None                                                                                                                                                                                                                                                                                                      |
| source_volid                   | None                                                                                                                                                                                                                                                                                                      |
| status                         | in-use                                                                                                                                                                                                                                                                                                    |
| updated_at                     | 2017-04-18T20:20:40.000000                                                                                                                                                                                                                                                                                |
| user_id                        | 21c33e79f076445c86933d1f5472bb64                                                                                                                                                                                                                                                                          |
| volume_type                    | Ceph                                                                                                                                                                                                                                                                                                      |
+--------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[root@e1-control-02 ~(openstack_admin)]#




###compute 5 .log  (source compute)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [req-2d3f03eb-315b-4ab8-b702-4e9a218001e9 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - - -] [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94] Pre live migration failed at e1-compute-08.eng1.moc.edu
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94] Traceback (most recent call last):
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5379, in _do_live_migration
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]     block_migration, disk, dest, migrate_data)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]   File "/usr/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 723, in pre_live_migration
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]     disk=disk, migrate_data=migrate_data)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]     retry=self.retry)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]     timeout=timeout, retry=retry)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 505, in send
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]     retry=retry)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 494, in _send
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]     result = self._waiter.wait(msg_id, timeout)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 386, in wait
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]     message = self.waiters.get(msg_id, timeout=timeout)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 275, in get
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]     'to message ID %s' % msg_id)
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94] MessagingTimeout: Timed out waiting for a reply to message ID 3aa25b9c93024a91a319d238ce18978a
2017-07-11 15:10:36.247 38310 ERROR nova.compute.manager [instance: a623eaf5-0010-46fd-bca1-b5f3b4320e94]


###compute 8  (destination)

2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server [req-2d3f03eb-315b-4ab8-b702-4e9a218001e9 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - - -] Exception during message handling
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 75, in wrapped
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     function_name, call_dict, binary)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     self.force_reraise()
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 66, in wrapped
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5117, in remove_volume_connection
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     self.volume_api.terminate_connection(context, volume_id, connector)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 168, in wrapper
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     res = method(self, ctx, *args, **kwargs)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 190, in wrapper
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     res = method(self, ctx, volume_id, *args, **kwargs)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 396, in terminate_connection
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     connector)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinderclient/v2/volumes.py", line 414, in terminate_connection
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     {'connector': connector})
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinderclient/v2/volumes.py", line 334, in _action
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     resp, body = self.api.client.post(url, body=body)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 167, in post
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     return self._cs_request(url, 'POST', **kwargs)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 155, in _cs_request
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     return self.request(url, method, **kwargs)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 144, in request
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server     raise exceptions.from_response(resp, body)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server ClientException: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-1b9399f7-007f-4fa9-ac28-76bd0dec80eb)
2017-07-11 15:11:36.676 185883 ERROR oslo_messaging.rpc.server


2017-07-11 15:11:37.636 185883 ERROR nova.volume.cinder [req-2d3f03eb-315b-4ab8-b702-4e9a218001e9 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - - -] Connection between volume 51a5ba24-1e22-4316-8e13-99085f0826d1 and host e1-compute-08.eng1.moc.edu might have succeeded, but attempt to terminate connection has failed. Validate the connection and determine if manual cleanup is needed. Error: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-fc112c56-0d5c-4793-b2dd-e0c57d5697cc) Code: 500.





###cinder api.log  (nothing in volume.log for that volume))

2017-07-11 15:11:36.633 25492 INFO cinder.api.middleware.fault [req-1b9399f7-007f-4fa9-ac28-76bd0dec80eb 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - default default] https://engage1.massopen.cloud:8776/v2/c53c18b2d29641e0877bbbd8d87f8267/volumes/51a5ba24-1e22-4316-8e13-99085f0826d1/action returned with HTTP 500
2017-07-11 15:11:36.634 25492 INFO eventlet.wsgi.server [req-1b9399f7-007f-4fa9-ac28-76bd0dec80eb 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - default default] 192.168.128.8 "POST /v2/c53c18b2d29641e0877bbbd8d87f8267/volumes/51a5ba24-1e22-4316-8e13-99085f0826d1/action HTTP/1.1" status: 500  len: 425 time: 60.0830290
2017-07-11 15:11:37.630 25492 ERROR cinder.api.middleware.fault [req-fc112c56-0d5c-4793-b2dd-e0c57d5697cc 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - default default] Caught error: <class 'oslo_messaging.exceptions.MessagingTimeout'> Timed out waiting for a reply to message ID 72ee3cb5e1a645e7b03a464f49c7b456
.
....
...
2017-07-11 15:11:37.630 25492 ERROR cinder.api.middleware.fault   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 386, in wait
2017-07-11 15:11:37.630 25492 ERROR cinder.api.middleware.fault     message = self.waiters.get(msg_id, timeout=timeout)
2017-07-11 15:11:37.630 25492 ERROR cinder.api.middleware.fault   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 275, in get
2017-07-11 15:11:37.630 25492 ERROR cinder.api.middleware.fault     'to message ID %s' % msg_id)
2017-07-11 15:11:37.630 25492 ERROR cinder.api.middleware.fault MessagingTimeout: Timed out waiting for a reply to message ID 72ee3cb5e1a645e7b03a464f49c7b456
2017-07-11 15:11:37.630 25492 ERROR cinder.api.middleware.fault
2017-07-11 15:11:37.632 25492 INFO cinder.api.middleware.fault [req-fc112c56-0d5c-4793-b2dd-e0c57d5697cc 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - default default] https://engage1.massopen.cloud:8776/v2/c53c18b2d29641e0877bbbd8d87f8267/volumes/51a5ba24-1e22-4316-8e13-99085f0826d1/action returned with HTTP 500
2017-07-11 15:11:37.633 25492 INFO eventlet.wsgi.server [req-fc112c56-0d5c-4793-b2dd-e0c57d5697cc 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - default default] 192.168.128.8 "POST /v2/c53c18b2d29641e0877bbbd8d87f8267/volumes/51a5ba24-1e22-4316-8e13-99085f0826d1/action HTTP/1.1" status: 500  len: 425 time: 60.0737071
?51a5ba24-1e22-4316-8e13-99085f0826d


2017-07-11 15:34:56.179 25490 INFO cinder.api.middleware.fault [req-7936b434-fa90-49f7-80e0-c1d28afed984 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - default default] https://engage1.massopen.cloud:8776/v2/c53c18b2d29641e0877bbbd8d87f8267/volumes/51a5ba24-1e22-4316-8e13-99085f0826d1/action returned with HTTP 500
2017-07-11 15:34:56.180 25490 INFO eventlet.wsgi.server [req-7936b434-fa90-49f7-80e0-c1d28afed984 533ad9ab4fed403fb98f1ffc2f2b4436 c53c18b2d29641e0877bbbd8d87f8267 - default default] 192.168.128.8 "POST /v2/c53c18b2d29641e0877bbbd8d87f8267/volumes/51a5ba24-1e22-4316-8e13-99085f0826d1/action HTTP/1.1" status: 500  len: 425 time: 60.0803649
Comment 5 Jeremy 2017-07-13 09:38:14 EDT
Update:

The customer created a test vm the exact same specs and volume size as the one described above that doesn't work, and the test vm migrates fine between all compute nodes. So the question is why does the one not? The one not migrating the customer mentions the volume is serving IO and the test that works is not .
Comment 6 Jeremy 2017-07-17 10:30:02 EDT
Update:
Vm not live migrating was set to grub menu so there was no IO. The vm still fails to migrate.
Comment 8 Jeremy 2017-07-18 11:20:15 EDT
Customer was able to change the old volume type to allow the volume to work again.

https://github.com/openstack/cinder/blob/544d13ef0a9397a18af506607150b0f2c2c3752c/doc/source/admin/blockstorage-multi-backend.rst
Comment 9 Jon Bernard 2017-07-18 11:51:03 EDT
The volume type of the failing volume references a backend that's no longer running, and I think this is the reason the messages are timing out - the DB defines a topic that no cinder services is subscribed to receive because they've all be re-configured to respond to the new volume type.  Changing the volume type to the new value should fix it.

Note You need to log in before you can comment on or make changes to this bug.