Description of problem: ======================= While attempting to do a nova migrate command, nova hit this error: ==> /var/log/nova/nova-compute.log <== 2015-05-14 15:04:00.908 31772 INFO nova.compute.manager [req-bb21bdc3-5d12-4c39-a2dd-b4a546b2b0cb 3875ca0280ec42e8aae09532749cfa7c d959e9d532c341179a8f037af0aef7aa - - -] [instance: ef5f72e4-2617-43c6-8243-60bd4999c55e] Setting instance back to ACTIVE after: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command. Command: ssh 10.8.29.230 mkdir -p /var/lib/nova/instances/ef5f72e4-2617-43c6-8243-60bd4999c55e Exit code: 255 Stdout: u'' Stderr: u'Host key verification failed.\r\n' 2015-05-14 15:04:01.018 31772 INFO nova.scheduler.client.report [req-bb21bdc3-5d12-4c39-a2dd-b4a546b2b0cb 3875ca0280ec42e8aae09532749cfa7c d959e9d532c341179a8f037af0aef7aa - - -] Compute_service record updated for ('rhel71-7-1.lab.eng.rdu2.redhat.com', 'rhel71-7-1.lab.eng.rdu2.redhat.com') 2015-05-14 15:04:01.194 31772 INFO nova.scheduler.client.report [req-bb21bdc3-5d12-4c39-a2dd-b4a546b2b0cb 3875ca0280ec42e8aae09532749cfa7c d959e9d532c341179a8f037af0aef7aa - - -] Compute_service record updated for ('rhel71-7-1.lab.eng.rdu2.redhat.com', 'rhel71-7-1.lab.eng.rdu2.redhat.com') 2015-05-14 15:04:01.195 31772 ERROR oslo_messaging.rpc.dispatcher [req-bb21bdc3-5d12-4c39-a2dd-b4a546b2b0cb 3875ca0280ec42e8aae09532749cfa7c d959e9d532c341179a8f037af0aef7aa - - -] Exception during message handling: Resize error: not able to execute ssh command: Unexpected error while running command. Command: ssh 10.8.29.230 mkdir -p /var/lib/nova/instances/ef5f72e4-2617-43c6-8243-60bd4999c55e Exit code: 255 Stdout: u'' Stderr: u'Host key verification failed.\r\n' 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last): 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher executor_callback)) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher executor_callback) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6748, in resize_instance 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher clean_shutdown=clean_shutdown) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher payload) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 327, in decorated_function 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher LOG.warning(msg, e, instance_uuid=instance_uuid) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 298, in decorated_function 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 377, in decorated_function 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 286, in decorated_function 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher migration.instance_uuid, exc_info=True) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 269, in decorated_function 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 355, in decorated_function 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info()) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 343, in decorated_function 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4012, in resize_instance 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher self.instance_events.clear_events_for_instance(instance) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__ 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher self.gen.throw(type, value, traceback) 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6382, in _error_out_instance_on_exception 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher raise error.inner_exception 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command. 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Command: ssh 10.8.29.230 mkdir -p /var/lib/nova/instances/ef5f72e4-2617-43c6-8243-60bd4999c55e 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Exit code: 255 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Stdout: u'' 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Stderr: u'Host key verification failed.\r\n' 2015-05-14 15:04:01.195 31772 TRACE oslo_messaging.rpc.dispatcher Version-Release number of selected component (if applicable): ============================================================= [root@rhel71-7-1 ~(keystone_admin)]# rpm -qa | grep nova openstack-nova-conductor-2015.1.0-3.el7ost.noarch openstack-nova-common-2015.1.0-3.el7ost.noarch openstack-nova-cert-2015.1.0-3.el7ost.noarch python-novaclient-2.23.0-1.el7ost.noarch openstack-nova-compute-2015.1.0-3.el7ost.noarch openstack-nova-novncproxy-2015.1.0-3.el7ost.noarch openstack-nova-api-2015.1.0-3.el7ost.noarch openstack-nova-console-2015.1.0-3.el7ost.noarch openstack-nova-scheduler-2015.1.0-3.el7ost.noarch python-nova-2015.1.0-3.el7ost.noarch How reproducible: ================= Always Steps to Reproduce: 1. nova boot --flavor 1 --image cirros simple 2. nova migrate instance-id Actual results: =============== The ssh error indicated above Expected results: ================= Additional info: ================ I tried running the failing command as the nova user [root@rhel71-7-1 ~(keystone_admin)]# runuser -u nova ssh 10.8.29.230 mkdir -p /var/lib/nova/instances/ef5f72e4-2617-43c6-8243-60bd4999c55e The authenticity of host '10.8.29.230 (10.8.29.230)' can't be established. ECDSA key fingerprint is 21:d1:b4:50:dc:56:b0:0a:25:bb:3e:44:d9:f4:bc:e8. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '10.8.29.230' (ECDSA) to the list of known hosts. This account is currently not available. and listed above as the nova user: However, running this command as root is successful.
Looks to me like the target host key isn't in /etc/ssh/ssh_known_hosts. Unfortunately there are a couple of actions which require compute hosts to be able to ssh directly between each other, and this is one of them. This requires host keys to be propagated to ssh_known_hosts on all nova computes which might have to communicate, so safest to do all of them. It also requires ssh keys to have been configured correctly in /var/lib/nova/.ssh on all hosts. Packstack will do this at installation time for all hosts it installs. However, it obviously can't do it for hosts installed subsequently. My guess is that this host has been installed later, and its keys haven't been propagated. You'll need to do it outside the control of packstack.
from comment 3 above it almost sounds if this bug is a design flaw and not a bug per say. But it does bring up a good point as to the expectation of what should "nova migrate <instance id>" be capable of doing and maybe better error handling when it can't do it due to permissions, so that it's obvious what to do to make it work for the operator.
I agree that the error message is poor, but I don't necessarily think it's a design issue (that level of error handling is typical). When you do a libvirt migration using ssh, each host must be able to talk to the other without keys or passwords needing to be exchanged. Each compute host would need a key generated, then the public keys copied to every other compute host. Nova can't really orchestrate this sort of thing - maybe the director or packstack can. Deploying N compute nodes is part of the director workflow. Adding a new compute node to the openstack deployment is certainly something the director would handle (and would need more key distribution once complete!), so I'm moving this to the director.
Proper SSH user configuration is not automated for now in RDO director, the process of setting up SSH on compute nodes will be part of documentation, doc patch is here: https://review.gerrithub.io/#/c/236817/
Per PM, moving this to A1. Cloned bug 1240356 created on Docs team to make sure this is documented.
*** Bug 1156000 has been marked as a duplicate of this bug. ***
There is two steps to allow SSH compute migration: * Generate a dedicated keypair for Nova Compute service, probably not with Puppet but in TripleO tools (during bootstrap). * Configure Puppet (puppet-nova: ::nova SSH parameters) with the content of the keys, on compute manifests. puppet-nova will prepare /var/lib/nova/.ssh directories with the keys and configure libvirt migration automatically. After that, you will be able to migrate instances.
Emilien, agree, we should fix this. Can you reassign appropriately? Thanks.
*** This bug has been marked as a duplicate of bug 1267598 ***
Dup -- QE will decide about automating the original