Description of problem: Migration of the instance fails dure to ssh keys misconfiguration. Version-Release number of selected component (if applicable): OSP 16.1 Puddle RHOS-16.1-RHEL-8-20200604.n.1 How reproducible: Deploy osp 16.1 and perform migration of instance Additional info: Logs: 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server self.force_reraise() 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server raise value 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 191, in decorated_function 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server self.force_reraise() 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server raise value 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 161, in decorated_function 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1372, in decorated_function 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 219, in decorated_function 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info()) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server self.force_reraise() 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server raise value 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 207, in decorated_function 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 4887, in resize_instance 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server self._revert_allocation(context, instance, migration) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server self.force_reraise() 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server raise value 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 4884, in resize_instance 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server instance_type, clean_shutdown, request_spec) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 4943, in _resize_instance 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server request_spec) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__ 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server self.gen.throw(type, value, traceback) 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 8987, in _error_out_instance_on_exception 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server raise error.inner_exception 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server nova.exception.ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command. 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Command: ssh -o BatchMode=yes 10.10.130.118 mkdir -p /var/lib/nova/instances/08d69b1d-fc09-478d-b21e-d1981763ad9f 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Exit code: 255 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Stdout: '' 2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Stderr: '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe fingerprint for the RSA key sent by the remote host is\nSHA256:/jb9hU6z4AFwzFbUeYyXtYusVAIOYJsa1aH6gGk+nLI.\r\nPlease contact your system administrator.\r\nAdd correct host key in /dev/null to get rid of this message.\r\nOffending RSA key in /etc/ssh/ssh_known_hosts:3\r\nRSA host key for [10.10.130.118]:2022 has changed and you have requested strict checking.\r\nHost key verification failed.\r\n'
The sosreports link: http://rhos-release.virt.bos.redhat.com/log/bz1845957/
FYI, With compose RHOS-16.1-RHEL-8-20200611.n.0 we are still facing the issue of Live Migration. Requesting this as a blocker
(In reply to Sanjay Upadhyay from comment #2) > FYI, With compose RHOS-16.1-RHEL-8-20200611.n.0 we are still facing the > issue of Live Migration. Requesting this as a blocker Could you please attach /etc/ssh/ssh_known_hosts from the compute node?
(In reply to Ollie Walsh from comment #3) > (In reply to Sanjay Upadhyay from comment #2) > > FYI, With compose RHOS-16.1-RHEL-8-20200611.n.0 we are still facing the > > issue of Live Migration. Requesting this as a blocker > > Could you please attach /etc/ssh/ssh_known_hosts from the compute node? .. ssh_known_hosts from the live migration source compute. The host key from dest compute host would also be helpful - /etc/ssh/ssh_host_*.pub.
Created attachment 1697630 [details] computehciovsdpdk-0 /etc/ssh/ssh_known_hosts
Created attachment 1697631 [details] computehciovsdpdk-0 /etc/ssh/ssh_host_*
Created attachment 1697632 [details] computehciovsdpdk-1 /etc/ssh/ssh_known_hosts
Created attachment 1697634 [details] computehciovsdpdk-1 /etc/ssh/ssh_host_rsa_key
Provided requested files from both compute nodes.
(In reply to Maxim Babushkin from comment #9) > Provided requested files from both compute nodes. Asked for the public key e.g /etc/ssh/ssh_host_rsa_key.pub but I think the private key can be used to generate this...
Public rsa key for computehciovsdpdk-1 (generated from attached private key): AAAAB3NzaC1yc2EAAAADAQABAAABgQCq2Xys18mxUBr4JHDBT2HQlfUB4KqJcysaw/79MMpCGIkaSeBwX+Q9uvo71YVfg5Z3boC/Ch7JMRF3ffAgvthQCIh2zYVVi8R2klyTBjHSFTUkufbirKfd9J01fc7PNfwkWO5mTQM9T0XTUm7X2HwcndyK8MW+ADLMUFFehIuRvLJcOXo5YQl/lISkm5sslKp1KkmVobU2A53zIHduweZEnzzxHd+rJveICI+kAhQ8X7CXBOM3HPgJSVXiiukixf+4dZzMq9pQhnc8Aj22fAlXq+sF+SocyB8pS3yRcbNO0fJclSRQSByL3myfwHQbGrrNIJ/dr3eGASiUqQHXolIL8mRHTPuTKX2CmA0VROV8rfxJQwsPDBDe6WCfFEeV/dSABY4/VcSmjDhRV2V4aQhVobO35iZs/3389OjlMOJQk5prGVF5dmn1x5KT2XlWiZrLOENg/cklKTTCmcnP81IUZfZv3z11qdkjCCoeudpK7Af2eivKhSGM83nURPWzugc= Which matches the entry on /etc/ssh/ssh_known_hosts on computehciovsdpdk-0: [192.0.90.19]*,[computehciovsdpdk-1.localdomain]*,[computehciovsdpdk-1]*,[10.10.130.167]*,[computehciovsdpdk-1.internalapi]*,[computehciovsdpdk-1.internalapi.localdomain]*,[10.10.131.142]*,[computehciovsdpdk-1.tenant]*,[computehciovsdpdk-1.tenant.localdomain]*,[10.10.132.122]*,[computehciovsdpdk-1.storage]*,[computehciovsdpdk-1.storage.localdomain]*,[10.10.133.146]*,[computehciovsdpdk-1.storagemgmt]*,[computehciovsdpdk-1.storagemgmt.localdomain]*, ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCq2Xys18mxUBr4JHDBT2HQlfUB4KqJcysaw/79MMpCGIkaSeBwX+Q9uvo71YVfg5Z3boC/Ch7JMRF3ffAgvthQCIh2zYVVi8R2klyTBjHSFTUkufbirKfd9J01fc7PNfwkWO5mTQM9T0XTUm7X2HwcndyK8MW+ADLMUFFehIuRvLJcOXo5YQl/lISkm5sslKp1KkmVobU2A53zIHduweZEnzzxHd+rJveICI+kAhQ8X7CXBOM3HPgJSVXiiukixf+4dZzMq9pQhnc8Aj22fAlXq+sF+SocyB8pS3yRcbNO0fJclSRQSByL3myfwHQbGrrNIJ/dr3eGASiUqQHXolIL8mRHTPuTKX2CmA0VROV8rfxJQwsPDBDe6WCfFEeV/dSABY4/VcSmjDhRV2V4aQhVobO35iZs/3389OjlMOJQk5prGVF5dmn1x5KT2XlWiZrLOENg/cklKTTCmcnP81IUZfZv3z11qdkjCCoeudpK7Af2eivKhSGM83nURPWzugc=
Are you enabling the infrared option to setup ssh keys?
No. I'm not using any explicit ssh key setup option of infrared. In my opinion, it should happen automatically and be configured by tripleo.
(In reply to Maxim Babushkin from comment #14) > No. > I'm not using any explicit ssh key setup option of infrared. > In my opinion, it should happen automatically and be configured by tripleo. indeed, that's why I'm asking. The only thing I can think of is that *something else* is changing the ssh hosts keys after nova_migration_target has started. Is it possible to get on this env for a closer look?
I will install my setup tomorrow and keep it for you to debug.
looks like regression issue in 16.1 latest compose (RHOS-16.1-RHEL-8-20200611.n.0,RHOS-16.1-RHEL-8-20200610.n.0). This is RC blocker for us and changing component to nova for their analysis.
It's either tripleo-ansible/t-h-t or an infra issue.
(In reply to Ollie Walsh from comment #12) > Which matches the entry on /etc/ssh/ssh_known_hosts on computehciovsdpdk-0: > [192.0.90.19]*,[computehciovsdpdk-1.localdomain]*,[computehciovsdpdk-1]*,[10. > 10.130.167]*,[computehciovsdpdk-1.internalapi]*,[computehciovsdpdk-1. > internalapi.localdomain]*,[10.10.131.142]*,[computehciovsdpdk-1.tenant]*, > [computehciovsdpdk-1.tenant.localdomain]*,[10.10.132.122]*, > [computehciovsdpdk-1.storage]*,[computehciovsdpdk-1.storage.localdomain]*, > [10.10.133.146]*,[computehciovsdpdk-1.storagemgmt]*,[computehciovsdpdk-1. > storagemgmt.localdomain]*, ssh-rsa > AAAAB3NzaC1yc2EAAAADAQABAAABgQCq2Xys18mxUBr4JHDBT2HQlfUB4KqJcysaw/ > 79MMpCGIkaSeBwX+Q9uvo71YVfg5Z3boC/ > Ch7JMRF3ffAgvthQCIh2zYVVi8R2klyTBjHSFTUkufbirKfd9J01fc7PNfwkWO5mTQM9T0XTUm7X2 > HwcndyK8MW+ADLMUFFehIuRvLJcOXo5YQl/ > lISkm5sslKp1KkmVobU2A53zIHduweZEnzzxHd+rJveICI+kAhQ8X7CXBOM3HPgJSVXiiukixf+4d > ZzMq9pQhnc8Aj22fAlXq+sF+SocyB8pS3yRcbNO0fJclSRQSByL3myfwHQbGrrNIJ/ > dr3eGASiUqQHXolIL8mRHTPuTKX2CmA0VROV8rfxJQwsPDBDe6WCfFEeV/dSABY4/ > VcSmjDhRV2V4aQhVobO35iZs/3389OjlMOJQk5prGVF5dmn1x5KT2XlWiZrLOENg/ > cklKTTCmcnP81IUZfZv3z11qdkjCCoeudpK7Af2eivKhSGM83nURPWzugc= There is an issue with this entry: 192.0.90.19 is the undercloud ctrl_plane IP which suggests this is https://bugs.launchpad.net/tripleo/+bug/1861296
I don't believe this the same issue as https://bugs.launchpad.net/tripleo/+bug/1861296. That was caused by a bad jinja2 syntax that resulted in missing host/ips for the ssh known host entry. The patch that cause was merged to upstream ussuri and not backported.
Reproduced this on stable/train: Deploy an overcloud. Delete the overcloud. Deploy an overcloud with the same stack name and same host names. The cached ansible facts from the 1st deployment (overcloud-0_1) are used in the 2nd deployment (overcloud-0): [CentOS-7.8 - root@undercloud mistral]# grep host_key overcloud-0/.ansible/fact_cache/overcloud-0-novacompute-0 "ansible_ssh_host_key_ecdsa_public": "AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBG1POUEid7AiBJNsHexvyy4D3oyhKP8ht7zHZ7FktsOb7PrLZVe0wWOxP/X6TdMZYLeTpDDsCo+gEXQXlVZ+hC8=", "ansible_ssh_host_key_ed25519_public": "AAAAC3NzaC1lZDI1NTE5AAAAIDD5gi10zP5St8MrsvoUqAbwoZGRHbY2PI7hUA0m3rpd", "ansible_ssh_host_key_rsa_public": "AAAAB3NzaC1yc2EAAAADAQABAAABAQCglZI/tVpWdC+71yBsE3HQIkoFcnSSIrtHLxXHGO/M382Z6lNK22oR7athjzsQIKaf6gW+paNI+Uf1DcebHQPpIqYHUl64XlyjayZ5xwdbK/dTgxCLRXvYousIC21Lg/7cpi2aY1dhQ8zLZXKnIveydS+twNRZ1Haol5pWIuB52WgX7idAysMkU6Smsxs/uxsJlMJ6Dby2IK5jXS/N5XM4aHo0gWBZ4Ea4UADXyJKfrrjrjLZHSc58Cp0WFAfgQukfTk9BnUzGVNBLF/w1ihalV1PkbBvv16+PKEDfwXnX49KJ75s76HVh+bD5KLVCCA0QSGLJilC7QqGUVXFlTpSB", [CentOS-7.8 - root@undercloud mistral]# grep host_key overcloud-0_1/.ansible/fact_cache/overcloud-0-novacompute-0 "ansible_ssh_host_key_ecdsa_public": "AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBG1POUEid7AiBJNsHexvyy4D3oyhKP8ht7zHZ7FktsOb7PrLZVe0wWOxP/X6TdMZYLeTpDDsCo+gEXQXlVZ+hC8=", "ansible_ssh_host_key_ed25519_public": "AAAAC3NzaC1lZDI1NTE5AAAAIDD5gi10zP5St8MrsvoUqAbwoZGRHbY2PI7hUA0m3rpd", "ansible_ssh_host_key_rsa_public": "AAAAB3NzaC1yc2EAAAADAQABAAABAQCglZI/tVpWdC+71yBsE3HQIkoFcnSSIrtHLxXHGO/M382Z6lNK22oR7athjzsQIKaf6gW+paNI+Uf1DcebHQPpIqYHUl64XlyjayZ5xwdbK/dTgxCLRXvYousIC21Lg/7cpi2aY1dhQ8zLZXKnIveydS+twNRZ1Haol5pWIuB52WgX7idAysMkU6Smsxs/uxsJlMJ6Dby2IK5jXS/N5XM4aHo0gWBZ4Ea4UADXyJKfrrjrjLZHSc58Cp0WFAfgQukfTk9BnUzGVNBLF/w1ihalV1PkbBvv16+PKEDfwXnX49KJ75s76HVh+bD5KLVCCA0QSGLJilC7QqGUVXFlTpSB", ()[nova@overcloud-0-novacompute-0 /]$ ssh overcloud-0-novacompute-1 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the RSA key sent by the remote host is SHA256:c3KSB9JQENyKvCM5fe/UVUUO6CgvGoORNjvFz1wo18E. Please contact your system administrator. Add correct host key in /dev/null to get rid of this message. Offending RSA key in /etc/ssh/ssh_known_hosts:6 RSA host key for [overcloud-0-novacompute-1]:2022 has changed and you have requested strict checking. Host key verification failed. The facts are cached for 2 hours so it's only likely to be issue when a deployment is deleted and immediately redeployed. This should be easy to work around e.g remove /var/lib/mistral/<stack_name> after the overcloud delete, or just use a different overcloud stack name.
Hi Ollie, Thanks for reproducing it and finding the root cause. Why the tripleo will not make sure to clean all the leftovers after stack delete? Making it a manual step adds additional step for the user to remember to perform.
I think we do delete that stack name. I think the facts end up in a different spot. I believe this change was backported for Upgrades so we'll likely need to address that. https://review.opendev.org/#/c/725515/3/tripleo_common/actions/ansible.py
https://review.opendev.org/#/c/682855 was the original change where the facts end up in /var/tmp
We can force the clearing of the cache at the start of a deployment to avoid this.
Note this is only likely to be an issue for dev/test/POC deployments. It's extremely unlikely that a production deployment would be deployed then, within the next 2 hours, deleted & redeployed.
Deploy an overcloud. Delete the overcloud. Deploy an overcloud with the same stack name and same host names. New ssh keys were generated and cached one were not used
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4284