Bug 1347461 - Nova Live Migration Failed to Update Database
Summary: Nova Live Migration Failed to Update Database
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Eoghan Glynn
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks: 1630771
TreeView+ depends on / blocked
 
Reported: 2016-06-17 00:02 UTC by Faiaz Ahmed
Modified: 2022-08-10 09:48 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1630771 (view as bug list)
Environment:
Last Closed: 2018-01-26 11:58:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-7797 0 None None None 2022-08-10 09:48:20 UTC
Red Hat Knowledge Base (Solution) 2070503 0 None None None 2016-06-17 00:06:27 UTC

Description Faiaz Ahmed 2016-06-17 00:02:21 UTC
Description of problem:
Migration happen successfully in-between nova nodes except updating the database.

Here is short example
The following instance was migrated from test-box234 to test-box099 at 2016-06-11 13:30 UTC (or 2016-06-11 06:30 MST):

| OS-EXT-SRV-ATTR:host                 | test-box099.test.gss.lab                                                                 |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | test-box099.test.gss.lab                                                                 |
| OS-EXT-SRV-ATTR:instance_name        | instance-000031dd                                                                                |
| OS-SRV-USG:launched_at               | 2016-05-02T20:44:04.000000                                                                       |
| created                              | 2016-05-02T20:43:58Z                                                                             |
| flavor                               | 8x32 (101)                                                                                       |
| hostId                               | c14fba39a498eda92e0a9d68eaa56ee303a4c2d944185659a3a66867                                         |
| id                                   | 3690dc0c-6973-413b-8ed1-50d05b5213fe                                                             |
| image                                | Template-RHEL6x64 (295f5422-a4fa-4744-9dfa-9d458dfe5a12)                                         |
| name                                 | lgposput00252                                                                                    |
| os-extended-volumes:volumes_attached | [{"id": "c202abda-dba8-407c-9704-1eca7a82901b"}, {"id": "a65d9eb0-de55-48be-9153-a7ef7fd47072"}] |
| prod_app_416_intranet_ipc2_2 network | 10.18.141.89                                                                                     |
| status                               | ACTIVE                                                                                           |
| updated                              | 2016-06-16T00:13:22Z                                                                             |


/var/log/libvirt/qemu/instance-000031dd.log from both nodes indicate that the instance was properly brought up and shutdown:

test-box099:2016-06-11 13:30:15.367+0000: starting up libvirt version: 1.2.17, package: 13.el7_2.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-01-27-16:53:13, x86-036.build.eng.bos.redhat.com), qemu version: 2.3.0 (qemu-kvm-rhev-2.3.0-31.el7_2.7)
test-box234:2016-06-11 13:32:05.772+0000: shutting down

According to nova-compute.log on test-box234:

2016-06-11 06:32:10.586 9258 INFO nova.compute.manager [-] [instance: 3690dc0c-6973-413b-8ed1-50d05b5213fe] Migrating instance to test-box099.test.gss.labfinished successfully.

According to nova-compute.log on test-box099, this might be related to lost connection to rabbitmq:

2016-06-11 06:32:10.367 24004 INFO nova.compute.manager [req-edd77dbb-fc6e-49eb-99c0-d5dab2da38f7 fe4729cf6a7e462fb8925daf36ae3c8e ac59371df2ba455b939d6ed2f79c6b04] [instance: 3690dc0c-6973-413b-8ed1-50d05b5213fe] Post operation of migration started

2016-06-11 06:33:10.506 24004 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Timed out waiting for a reply to message ID b6a08cc87e0f4c8e9604cf6ee2331c52
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     incoming.message))
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     result = getattr(endpoint, method)(ctxt, **new_args)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 423, in decorated_function
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     payload)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     return f(self, context, *args, **kw)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 329, in decorated_function
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     e, sys.exc_info())
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 316, in decorated_function
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4864, in post_live_migration_at_destination
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     self.host)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/network/api.py", line 101, in wrapped
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     return func(self, context, *args, **kwargs)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/network/api.py", line 541, in setup_networks_on_host
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     self.network_rpcapi.setup_networks_on_host(context, **args)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/network/rpcapi.py", line 270, in setup_networks_on_host
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     teardown=teardown)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 361, in call
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     return self.prepare().call(ctxt, method, **kwargs)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     wait_for_reply=True, timeout=timeout)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     timeout=timeout)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 452, in send
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     return self._send(target, ctxt, message, wait_for_reply, timeout)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 443, in _send
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     result = self._waiter.wait(msg_id, timeout)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 306, in wait
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     reply, ending = self._poll_connection(msg_id, timer)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 253, in _poll_connection
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     self._raise_timeout_exception(msg_id)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 222, in _raise_timeout_exception
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher     'Timed out waiting for a reply to message ID %s' % msg_id)
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher MessagingTimeout: Timed out waiting for a reply to message ID b6a08cc87e0f4c8e9604cf6ee2331c52
2016-06-11 06:33:10.506 24004 TRACE oslo.messaging.rpc.dispatcher 
2016-06-11 06:33:10.541 24004 ERROR oslo.messaging._drivers.common [-] Returning exception Timed out waiting for a reply to message ID b6a08cc87e0f4c8e9604cf6ee2331c52 to caller
2016-06-11 06:33:10.541 24004 ERROR oslo.messaging._drivers.common [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply\n    incoming.message))\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch\n    return self._do_dispatch(endpoint, method, ctxt, args)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch\n    result = getattr(endpoint, method)(ctxt, **new_args)\n', '  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 423, in decorated_function\n    return function(self, context, *args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped\n    payload)\n', '  File "/usr/lib/python2.7/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped\n    return f(self, context, *args, **kw)\n', '  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 329, in decorated_function\n    e, sys.exc_info())\n', '  File "/usr/lib/python2.7/site-packages/nova/openstack/common/excutils.py", line 68, in __exit__\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 316, in decorated_function\n    return function(self, context, *args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4864, in post_live_migration_at_destination\n    self.host)\n', '  File "/usr/lib/python2.7/site-packages/nova/network/api.py", line 101, in wrapped\n    return func(self, context, *args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/nova/network/api.py", line 541, in setup_networks_on_host\n    self.network_rpcapi.setup_networks_on_host(context, **args)\n', '  File "/usr/lib/python2.7/site-packages/nova/network/rpcapi.py", line 270, in setup_networks_on_host\n    teardown=teardown)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 361, in call\n    return self.prepare().call(ctxt, method, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call\n    wait_for_reply=True, timeout=timeout)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send\n    timeout=timeout)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 452, in send\n    return self._send(target, ctxt, message, wait_for_reply, timeout)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 443, in _send\n    result = self._waiter.wait(msg_id, timeout)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 306, in wait\n    reply, ending = self._poll_connection(msg_id, timer)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 253, in _poll_connection\n    self._raise_timeout_exception(msg_id)\n', '  File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 222, in _raise_timeout_exception\n    \'Timed out waiting for a reply to message ID %s\' % msg_id)\n', 'MessagingTimeout: Timed out waiting for a reply to message ID b6a08cc87e0f4c8e9604cf6ee2331c52\n']
2016-06-11 06:33:20.366 24004 WARNING oslo.messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : b6a08cc87e0f4c8e9604cf6ee2331c52, message : {u'_unique_id': u'a34017e552d44eeaa72d646e2a6a3417', u'failure': u'{"args": ["Timed out waiting for a reply to message ID 252e8beae9c5452f9704403efa25727a"], "module": "oslo.messaging.exceptions", "kwargs": {}, "message": "Timed out waiting for a reply to message ID 252e8beae9c5452f9704403efa25727a", "tb": ["Traceback (most recent call last):\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py\\", line 133, in _dispatch_and_reply\\n    incoming.message))\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py\\", line 176, in _dispatch\\n    return self._do_dispatch(endpoint, method, ctxt, args)\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py\\", line 122, in _do_dispatch\\n    result = getattr(endpoint, method)(ctxt, **new_args)\\n", "  File \\"/usr/lib/python2.7/site-packages/nova/network/manager.py\\", line 1323, in setup_networks_on_host\\n    gt.wait()\\n", "  File \\"/usr/lib/python2.7/site-packages/eventlet/greenthread.py\\", line 168, in wait\\n    return self._exit_event.wait()\\n", "  File \\"/usr/lib/python2.7/site-packages/eventlet/event.py\\", line 116, in wait\\n    return hubs.get_hub().switch()\\n", "  File \\"/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py\\", line 187, in switch\\n    return self.greenlet.switch()\\n", "  File \\"/usr/lib/python2.7/site-packages/eventlet/greenthread.py\\", line 194, in main\\n    result = function(*args, **kwargs)\\n", "  File \\"/usr/lib/python2.7/site-packages/nova/network/rpcapi.py\\", line 281, in rpc_setup_network_on_host\\n    network_id=network_id, teardown=teardown)\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py\\", line 150, in call\\n    wait_for_reply=True, timeout=timeout)\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/transport.py\\", line 90, in _send\\n    timeout=timeout)\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py\\", line 452, in send\\n    return self._send(target, ctxt, message, wait_for_reply, timeout)\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py\\", line 443, in _send\\n    result = self._waiter.wait(msg_id, timeout)\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py\\", line 306, in wait\\n    reply, ending = self._poll_connection(msg_id, timer)\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py\\", line 253, in _poll_connection\\n    self._raise_timeout_exception(msg_id)\\n", "  File \\"/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py\\", line 222, in _raise_timeout_exception\\n    \'Timed out waiting for a reply to message ID %s\' % msg_id)\\n", "MessagingTimeout: Timed out waiting for a reply to message ID 252e8beae9c5452f9704403efa25727a\\n"], "class": "MessagingTimeout"}', u'result': None}
2016-06-11 06:33:20.378 24004 WARNING oslo.messaging._drivers.amqpdriver [-] _queues: {'5b7f431076a6475b93372c8434469a05': <Queue at 0x36f0850 maxsize=None>}
2016-06-11 06:33:20.379 24004 WARNING oslo.messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : b6a08cc87e0f4c8e9604cf6ee2331c52, message : {u'_unique_id': u'bdd7eead08594774b9371b7efc985078', u'failure': None, u'result': None, u'ending': True}
2016-06-11 06:33:20.379 24004 WARNING oslo.messaging._drivers.amqpdriver [-] _queues: {'5b7f431076a6475b93372c8434469a05': <Queue at 0x36f0850 maxsize=None>}



Workaround
=========== 
Only can be solve by manually updating the database

update instances set host='NEW COMPUTE HOSTNAME',node='NEW COMPUTE HOSTNAME' where uuid='UUID';

https://access.redhat.com/solutions/2070503

Comment 3 Faiaz Ahmed 2016-06-17 00:05:36 UTC
On the rabbitMQ side

What's happening on 
11-Jun-2016::06:33: 

# egrep -E "11-Jun-2016" *-sosreport-*.tar.xz/sosreport-*/var/log/rabbitmq/ -rl 

From the rabbitmq log no crash / error on any of the controller node.

Comment 6 Jack Waterworth 2016-11-30 15:08:09 UTC
any movement on this one_

Comment 7 Angela Soni 2017-02-20 20:52:15 UTC
Following up to check for any updates.

Comment 9 David Hill 2017-09-01 22:24:28 UTC
We just had a similar issue where the VM is migrated to the destination compute host but in the database, it stayed on the source compute host.   This is using RHOSP 10.

Comment 11 Lee Yarwood 2018-01-08 11:21:06 UTC
(In reply to David Hill from comment #9)
> We just had a similar issue where the VM is migrated to the destination
> compute host but in the database, it stayed on the source compute host.  
> This is using RHOSP 10.

Any additional details? Suggesting that this is an issue on more recent, supported versions of OSP and providing zero evidence is not helpful.


Note You need to log in before you can comment on or make changes to this bug.