Bug 1221776 - nova migrate fails with ssh command failure
Summary: nova migrate fails with ssh command failure
Keywords:
Status: CLOSED DUPLICATE of bug 1267598
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ga
: 10.0 (Newton)
Assignee: Sven Anderson
QA Contact: Shai Revivo
URL:
Whiteboard:
: 1156000 (view as bug list)
Depends On:
Blocks: 1156010 1198809 1241501 1243520 1258302
TreeView+ depends on / blocked
 
Reported: 2015-05-14 19:36 UTC by Sean Toner
Modified: 2019-11-14 06:43 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1240356 (view as bug list)
Environment:
Last Closed: 2016-11-14 20:47:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 882143 0 None None None 2016-09-01 14:12:23 UTC

Description Sean Toner 2015-05-14 19:36:13 UTC
Description of problem:
=======================

While attempting to do a nova migrate command, nova hit this error:

==> /var/log/nova/nova-compute.log <==
2015-05-14 15:04:00.908 31772 INFO nova.compute.manager [req-bb21bdc3-5d12-4c39-a2dd-b4a546b2b0cb 3875ca0280ec42e8aae09532749cfa7c d959e9d532c341179a8f037af0aef7aa - - -] [instance: ef5f72e4-2617-43c6-8243-60bd4999c55e] Setting instance back to ACTIVE after: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh 10.8.29.230 mkdir -p /var/lib/nova/instances/ef5f72e4-2617-43c6-8243-60bd4999c55e
Exit code: 255
Stdout: u''
Stderr: u'Host key verification failed.\r\n'
2015-05-14 15:04:01.018 31772 INFO nova.scheduler.client.report [req-bb21bdc3-5d12-4c39-a2dd-b4a546b2b0cb 3875ca0280ec42e8aae09532749cfa7c d959e9d532c341179a8f037af0aef7aa - - -] Compute_service record updated for ('rhel71-7-1.lab.eng.rdu2.redhat.com', 'rhel71-7-1.lab.eng.rdu2.redhat.com')
2015-05-14 15:04:01.194 31772 INFO nova.scheduler.client.report [req-bb21bdc3-5d12-4c39-a2dd-b4a546b2b0cb 3875ca0280ec42e8aae09532749cfa7c d959e9d532c341179a8f037af0aef7aa - - -] Compute_service record updated for ('rhel71-7-1.lab.eng.rdu2.redhat.com', 'rhel71-7-1.lab.eng.rdu2.redhat.com')
2015-05-14 15:04:01.195 31772 ERROR oslo_messaging.rpc.dispatcher [req-bb21bdc3-5d12-4c39-a2dd-b4a546b2b0cb 3875ca0280ec42e8aae09532749cfa7c d959e9d532c341179a8f037af0aef7aa - - -] Exception during message handling: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh 10.8.29.230 mkdir -p /var/lib/nova/instances/ef5f72e4-2617-43c6-8243-60bd4999c55e
Exit code: 255
Stdout: u''
Stderr: u'Host key verification failed.\r\n'
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6748, in resize_instance
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     clean_shutdown=clean_shutdown)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     payload)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     return f(self, context, *args, **kw)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 327, in decorated_function
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     LOG.warning(msg, e, instance_uuid=instance_uuid)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 298, in decorated_function
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 377, in decorated_function
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 286, in decorated_function
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     migration.instance_uuid, exc_info=True)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 269, in decorated_function
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 355, in decorated_function
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     kwargs['instance'], e, sys.exc_info())
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 343, in decorated_function
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4012, in resize_instance
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     self.instance_events.clear_events_for_instance(instance)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     self.gen.throw(type, value, traceback)
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6382, in _error_out_instance_on_exception
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher     raise error.inner_exception
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Command: ssh 10.8.29.230 mkdir -p /var/lib/nova/instances/ef5f72e4-2617-43c6-8243-60bd4999c55e
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Exit code: 255
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Stdout: u''
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Stderr: u'Host key verification failed.\r\n'
2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher 



Version-Release number of selected component (if applicable):
=============================================================

[root@rhel71-7-1 ~(keystone_admin)]# rpm -qa | grep nova
openstack-nova-conductor-2015.1.0-3.el7ost.noarch
openstack-nova-common-2015.1.0-3.el7ost.noarch
openstack-nova-cert-2015.1.0-3.el7ost.noarch
python-novaclient-2.23.0-1.el7ost.noarch
openstack-nova-compute-2015.1.0-3.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-3.el7ost.noarch
openstack-nova-api-2015.1.0-3.el7ost.noarch
openstack-nova-console-2015.1.0-3.el7ost.noarch
openstack-nova-scheduler-2015.1.0-3.el7ost.noarch
python-nova-2015.1.0-3.el7ost.noarch


How reproducible:
=================

Always


Steps to Reproduce:
1. nova boot --flavor 1 --image cirros simple
2. nova migrate instance-id


Actual results:
===============

The ssh error indicated above


Expected results:
=================



Additional info:
================

I tried running the failing command as the nova user

[root@rhel71-7-1 ~(keystone_admin)]# runuser -u nova ssh 10.8.29.230 mkdir -p /var/lib/nova/instances/ef5f72e4-2617-43c6-8243-60bd4999c55e
The authenticity of host '10.8.29.230 (10.8.29.230)' can't be established.
ECDSA key fingerprint is 21:d1:b4:50:dc:56:b0:0a:25:bb:3e:44:d9:f4:bc:e8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.8.29.230' (ECDSA) to the list of known hosts.
This account is currently not available.
and listed above as the nova user:


However, running this command as root is successful.

Comment 3 Matthew Booth 2015-06-23 13:56:50 UTC
Looks to me like the target host key isn't in /etc/ssh/ssh_known_hosts.

Unfortunately there are a couple of actions which require compute hosts to be able to ssh directly between each other, and this is one of them. This requires host keys to be propagated to ssh_known_hosts on all nova computes which might have to communicate, so safest to do all of them. It also requires ssh keys to have been configured correctly in /var/lib/nova/.ssh on all hosts.

Packstack will do this at installation time for all hosts it installs. However, it obviously can't do it for hosts installed subsequently. My guess is that this host has been installed later, and its keys haven't been propagated. You'll need to do it outside the control of packstack.

Comment 5 Jon Schlueter 2015-06-24 17:04:06 UTC
from comment 3 above it almost sounds if this bug is a design flaw and not a bug per say.  But it does bring up a good point as to the expectation of what should "nova migrate <instance id>" be capable of doing and maybe better error handling when it can't do it due to permissions, so that it's obvious what to do to make it work for the operator.

Comment 6 Lon Hohberger 2015-06-26 13:36:07 UTC
I agree that the error message is poor, but I don't necessarily think it's a design issue (that level of error handling is typical).

When you do a libvirt migration using ssh, each host must be able to talk to the other without keys or passwords needing to be exchanged.  Each compute host would need a key generated, then the public keys copied to every other compute host.

Nova can't really orchestrate this sort of thing - maybe the director or packstack can.

Deploying N compute nodes is part of the director workflow.

Adding a new compute node to the openstack deployment is certainly something the director would handle (and would need more key distribution once complete!), so I'm moving this to the director.

Comment 7 Jan Provaznik 2015-06-26 17:43:12 UTC
Proper SSH user configuration is not automated for now in RDO director, the process of setting up SSH on compute nodes will be part of documentation, doc patch is here:
https://review.gerrithub.io/#/c/236817/

Comment 8 Mike Burns 2015-07-07 00:52:27 UTC
Per PM, moving this to A1.  Cloned bug 1240356 created on Docs team to make sure this is documented.

Comment 11 Mike Burns 2015-08-28 16:43:05 UTC
*** Bug 1156000 has been marked as a duplicate of this bug. ***

Comment 12 Emilien Macchi 2015-08-31 11:24:27 UTC
There is two steps to allow SSH compute migration:

* Generate a dedicated keypair for Nova Compute service, probably not with Puppet but in TripleO tools (during bootstrap).
* Configure Puppet (puppet-nova: ::nova SSH parameters) with the content of the keys, on compute manifests.

puppet-nova will prepare /var/lib/nova/.ssh directories with the keys and configure libvirt migration automatically. After that, you will be able to migrate instances.

Comment 15 Hugh Brock 2016-02-03 17:32:21 UTC
Emilien, agree, we should fix this. Can you reassign appropriately? Thanks.

Comment 17 Stephen Gordon 2016-11-14 20:47:32 UTC

*** This bug has been marked as a duplicate of bug 1267598 ***

Comment 18 awaugama 2017-09-07 19:05:55 UTC
Dup -- QE will decide about automating the original


Note You need to log in before you can comment on or make changes to this bug.