Bug 1845957 - [16.1] Migration of instance fails due to ssh keys missconfiguration
Summary: [16.1] Migration of instance fails due to ssh keys missconfiguration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z2
: 16.1 (Train on RHEL 8.2)
Assignee: Alex Schultz
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-10 13:39 UTC by Maxim Babushkin
Modified: 2020-10-28 15:37 UTC (History)
22 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200708133447.c21cc82.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-28 15:37:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
computehciovsdpdk-0 /etc/ssh/ssh_known_hosts (5.02 KB, text/plain)
2020-06-16 14:20 UTC, Maxim Babushkin
no flags Details
computehciovsdpdk-0 /etc/ssh/ssh_host_* (3.60 KB, text/plain)
2020-06-16 14:20 UTC, Maxim Babushkin
no flags Details
computehciovsdpdk-1 /etc/ssh/ssh_known_hosts (5.02 KB, text/plain)
2020-06-16 14:21 UTC, Maxim Babushkin
no flags Details
computehciovsdpdk-1 /etc/ssh/ssh_host_rsa_key (3.60 KB, text/plain)
2020-06-16 14:22 UTC, Maxim Babushkin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1884654 0 None None None 2020-06-22 21:11:20 UTC
OpenStack gerrit 737379 0 None MERGED Always clear cached facts first 2021-02-18 09:41:09 UTC
Red Hat Product Errata RHEA-2020:4284 0 None None None 2020-10-28 15:37:55 UTC

Description Maxim Babushkin 2020-06-10 13:39:50 UTC
Description of problem:
Migration of the instance fails dure to ssh keys misconfiguration.

Version-Release number of selected component (if applicable):
OSP 16.1
Puddle RHOS-16.1-RHEL-8-20200604.n.1

How reproducible:
Deploy osp 16.1 and perform migration of instance

Additional info:
Logs:

2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server 
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server 
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     function_name, call_dict, binary, tb)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     raise value
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 191, in decorated_function
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     "Error: %s", e, instance=instance)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     raise value
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 161, in decorated_function
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1372, in decorated_function
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 219, in decorated_function
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     kwargs['instance'], e, sys.exc_info())
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     raise value
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 207, in decorated_function
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 4887, in resize_instance
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     self._revert_allocation(context, instance, migration)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     raise value
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 4884, in resize_instance
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     instance_type, clean_shutdown, request_spec)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 4943, in _resize_instance
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     request_spec)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     self.gen.throw(type, value, traceback)
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 8987, in _error_out_instance_on_exception
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server     raise error.inner_exception
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server nova.exception.ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Command: ssh -o BatchMode=yes 10.10.130.118 mkdir -p /var/lib/nova/instances/08d69b1d-fc09-478d-b21e-d1981763ad9f
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Exit code: 255
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Stdout: ''
2020-06-10 12:18:19.505 7 ERROR oslo_messaging.rpc.server Stderr: '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe fingerprint for the RSA key sent by the remote host is\nSHA256:/jb9hU6z4AFwzFbUeYyXtYusVAIOYJsa1aH6gGk+nLI.\r\nPlease contact your system administrator.\r\nAdd correct host key in /dev/null to get rid of this message.\r\nOffending RSA key in /etc/ssh/ssh_known_hosts:3\r\nRSA host key for [10.10.130.118]:2022 has changed and you have requested strict checking.\r\nHost key verification failed.\r\n'

Comment 1 Maxim Babushkin 2020-06-10 13:42:20 UTC
The sosreports link:
http://rhos-release.virt.bos.redhat.com/log/bz1845957/

Comment 2 Sanjay Upadhyay 2020-06-16 10:54:46 UTC
FYI, With compose RHOS-16.1-RHEL-8-20200611.n.0 we are still facing the issue of Live Migration. Requesting this as a blocker

Comment 3 Ollie Walsh 2020-06-16 12:04:12 UTC
(In reply to Sanjay Upadhyay from comment #2)
> FYI, With compose RHOS-16.1-RHEL-8-20200611.n.0 we are still facing the
> issue of Live Migration. Requesting this as a blocker

Could you please attach /etc/ssh/ssh_known_hosts from the compute node?

Comment 4 Ollie Walsh 2020-06-16 12:16:34 UTC
(In reply to Ollie Walsh from comment #3)
> (In reply to Sanjay Upadhyay from comment #2)
> > FYI, With compose RHOS-16.1-RHEL-8-20200611.n.0 we are still facing the
> > issue of Live Migration. Requesting this as a blocker
> 
> Could you please attach /etc/ssh/ssh_known_hosts from the compute node?

.. ssh_known_hosts from the live migration source compute.

The host key from dest compute host would also be helpful - /etc/ssh/ssh_host_*.pub.

Comment 5 Maxim Babushkin 2020-06-16 14:20:00 UTC
Created attachment 1697630 [details]
computehciovsdpdk-0 /etc/ssh/ssh_known_hosts

Comment 6 Maxim Babushkin 2020-06-16 14:20:47 UTC
Created attachment 1697631 [details]
computehciovsdpdk-0 /etc/ssh/ssh_host_*

Comment 7 Maxim Babushkin 2020-06-16 14:21:36 UTC
Created attachment 1697632 [details]
computehciovsdpdk-1 /etc/ssh/ssh_known_hosts

Comment 8 Maxim Babushkin 2020-06-16 14:22:12 UTC
Created attachment 1697634 [details]
computehciovsdpdk-1 /etc/ssh/ssh_host_rsa_key

Comment 9 Maxim Babushkin 2020-06-16 14:22:51 UTC
Provided requested files from both compute nodes.

Comment 11 Ollie Walsh 2020-06-16 15:38:40 UTC
(In reply to Maxim Babushkin from comment #9)
> Provided requested files from both compute nodes.

Asked for the public key e.g /etc/ssh/ssh_host_rsa_key.pub but I think the private key can be used to generate this...

Comment 12 Ollie Walsh 2020-06-16 15:52:39 UTC
Public rsa key for computehciovsdpdk-1 (generated from attached private key):
AAAAB3NzaC1yc2EAAAADAQABAAABgQCq2Xys18mxUBr4JHDBT2HQlfUB4KqJcysaw/79MMpCGIkaSeBwX+Q9uvo71YVfg5Z3boC/Ch7JMRF3ffAgvthQCIh2zYVVi8R2klyTBjHSFTUkufbirKfd9J01fc7PNfwkWO5mTQM9T0XTUm7X2HwcndyK8MW+ADLMUFFehIuRvLJcOXo5YQl/lISkm5sslKp1KkmVobU2A53zIHduweZEnzzxHd+rJveICI+kAhQ8X7CXBOM3HPgJSVXiiukixf+4dZzMq9pQhnc8Aj22fAlXq+sF+SocyB8pS3yRcbNO0fJclSRQSByL3myfwHQbGrrNIJ/dr3eGASiUqQHXolIL8mRHTPuTKX2CmA0VROV8rfxJQwsPDBDe6WCfFEeV/dSABY4/VcSmjDhRV2V4aQhVobO35iZs/3389OjlMOJQk5prGVF5dmn1x5KT2XlWiZrLOENg/cklKTTCmcnP81IUZfZv3z11qdkjCCoeudpK7Af2eivKhSGM83nURPWzugc=

Which matches the entry on /etc/ssh/ssh_known_hosts on computehciovsdpdk-0:
[192.0.90.19]*,[computehciovsdpdk-1.localdomain]*,[computehciovsdpdk-1]*,[10.10.130.167]*,[computehciovsdpdk-1.internalapi]*,[computehciovsdpdk-1.internalapi.localdomain]*,[10.10.131.142]*,[computehciovsdpdk-1.tenant]*,[computehciovsdpdk-1.tenant.localdomain]*,[10.10.132.122]*,[computehciovsdpdk-1.storage]*,[computehciovsdpdk-1.storage.localdomain]*,[10.10.133.146]*,[computehciovsdpdk-1.storagemgmt]*,[computehciovsdpdk-1.storagemgmt.localdomain]*, ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCq2Xys18mxUBr4JHDBT2HQlfUB4KqJcysaw/79MMpCGIkaSeBwX+Q9uvo71YVfg5Z3boC/Ch7JMRF3ffAgvthQCIh2zYVVi8R2klyTBjHSFTUkufbirKfd9J01fc7PNfwkWO5mTQM9T0XTUm7X2HwcndyK8MW+ADLMUFFehIuRvLJcOXo5YQl/lISkm5sslKp1KkmVobU2A53zIHduweZEnzzxHd+rJveICI+kAhQ8X7CXBOM3HPgJSVXiiukixf+4dZzMq9pQhnc8Aj22fAlXq+sF+SocyB8pS3yRcbNO0fJclSRQSByL3myfwHQbGrrNIJ/dr3eGASiUqQHXolIL8mRHTPuTKX2CmA0VROV8rfxJQwsPDBDe6WCfFEeV/dSABY4/VcSmjDhRV2V4aQhVobO35iZs/3389OjlMOJQk5prGVF5dmn1x5KT2XlWiZrLOENg/cklKTTCmcnP81IUZfZv3z11qdkjCCoeudpK7Af2eivKhSGM83nURPWzugc=

Comment 13 Ollie Walsh 2020-06-16 16:34:06 UTC
Are you enabling the infrared option to setup ssh keys?

Comment 14 Maxim Babushkin 2020-06-16 17:27:08 UTC
No.
I'm not using any explicit ssh key setup option of infrared.
In my opinion, it should happen automatically and be configured by tripleo.

Comment 15 Ollie Walsh 2020-06-16 17:35:22 UTC
(In reply to Maxim Babushkin from comment #14)
> No.
> I'm not using any explicit ssh key setup option of infrared.
> In my opinion, it should happen automatically and be configured by tripleo.

indeed, that's why I'm asking. The only thing I can think of is that *something else* is changing the ssh hosts keys after nova_migration_target has started. Is it possible to get on this env for a closer look?

Comment 16 Maxim Babushkin 2020-06-16 17:38:00 UTC
I will install my setup tomorrow and keep it for you to debug.

Comment 17 Haresh Khandelwal 2020-06-17 13:23:20 UTC
looks like regression issue in 16.1 latest compose (RHOS-16.1-RHEL-8-20200611.n.0,RHOS-16.1-RHEL-8-20200610.n.0). 
This is RC blocker for us and changing component to nova for their analysis.

Comment 18 Ollie Walsh 2020-06-17 13:29:50 UTC
It's either tripleo-ansible/t-h-t or an infra issue.

Comment 19 Ollie Walsh 2020-06-17 14:19:36 UTC
(In reply to Ollie Walsh from comment #12)
> Which matches the entry on /etc/ssh/ssh_known_hosts on computehciovsdpdk-0:
> [192.0.90.19]*,[computehciovsdpdk-1.localdomain]*,[computehciovsdpdk-1]*,[10.
> 10.130.167]*,[computehciovsdpdk-1.internalapi]*,[computehciovsdpdk-1.
> internalapi.localdomain]*,[10.10.131.142]*,[computehciovsdpdk-1.tenant]*,
> [computehciovsdpdk-1.tenant.localdomain]*,[10.10.132.122]*,
> [computehciovsdpdk-1.storage]*,[computehciovsdpdk-1.storage.localdomain]*,
> [10.10.133.146]*,[computehciovsdpdk-1.storagemgmt]*,[computehciovsdpdk-1.
> storagemgmt.localdomain]*, ssh-rsa
> AAAAB3NzaC1yc2EAAAADAQABAAABgQCq2Xys18mxUBr4JHDBT2HQlfUB4KqJcysaw/
> 79MMpCGIkaSeBwX+Q9uvo71YVfg5Z3boC/
> Ch7JMRF3ffAgvthQCIh2zYVVi8R2klyTBjHSFTUkufbirKfd9J01fc7PNfwkWO5mTQM9T0XTUm7X2
> HwcndyK8MW+ADLMUFFehIuRvLJcOXo5YQl/
> lISkm5sslKp1KkmVobU2A53zIHduweZEnzzxHd+rJveICI+kAhQ8X7CXBOM3HPgJSVXiiukixf+4d
> ZzMq9pQhnc8Aj22fAlXq+sF+SocyB8pS3yRcbNO0fJclSRQSByL3myfwHQbGrrNIJ/
> dr3eGASiUqQHXolIL8mRHTPuTKX2CmA0VROV8rfxJQwsPDBDe6WCfFEeV/dSABY4/
> VcSmjDhRV2V4aQhVobO35iZs/3389OjlMOJQk5prGVF5dmn1x5KT2XlWiZrLOENg/
> cklKTTCmcnP81IUZfZv3z11qdkjCCoeudpK7Af2eivKhSGM83nURPWzugc=

There is an issue with this entry: 192.0.90.19 is the undercloud ctrl_plane IP which suggests this is https://bugs.launchpad.net/tripleo/+bug/1861296

Comment 20 Ollie Walsh 2020-06-17 16:55:20 UTC
I don't believe this the same issue as https://bugs.launchpad.net/tripleo/+bug/1861296. That was caused by a bad jinja2 syntax that resulted in missing host/ips for the ssh known host entry. The patch that cause was merged to upstream ussuri and not backported.

Comment 25 Ollie Walsh 2020-06-22 19:57:56 UTC
Reproduced this on stable/train:

Deploy an overcloud.
Delete the overcloud.
Deploy an overcloud with the same stack name and same host names.

The cached ansible facts from the 1st deployment (overcloud-0_1) are used in the 2nd deployment (overcloud-0):

    [CentOS-7.8 - root@undercloud mistral]# grep host_key overcloud-0/.ansible/fact_cache/overcloud-0-novacompute-0 
        "ansible_ssh_host_key_ecdsa_public": "AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBG1POUEid7AiBJNsHexvyy4D3oyhKP8ht7zHZ7FktsOb7PrLZVe0wWOxP/X6TdMZYLeTpDDsCo+gEXQXlVZ+hC8=", 
        "ansible_ssh_host_key_ed25519_public": "AAAAC3NzaC1lZDI1NTE5AAAAIDD5gi10zP5St8MrsvoUqAbwoZGRHbY2PI7hUA0m3rpd", 
        "ansible_ssh_host_key_rsa_public": "AAAAB3NzaC1yc2EAAAADAQABAAABAQCglZI/tVpWdC+71yBsE3HQIkoFcnSSIrtHLxXHGO/M382Z6lNK22oR7athjzsQIKaf6gW+paNI+Uf1DcebHQPpIqYHUl64XlyjayZ5xwdbK/dTgxCLRXvYousIC21Lg/7cpi2aY1dhQ8zLZXKnIveydS+twNRZ1Haol5pWIuB52WgX7idAysMkU6Smsxs/uxsJlMJ6Dby2IK5jXS/N5XM4aHo0gWBZ4Ea4UADXyJKfrrjrjLZHSc58Cp0WFAfgQukfTk9BnUzGVNBLF/w1ihalV1PkbBvv16+PKEDfwXnX49KJ75s76HVh+bD5KLVCCA0QSGLJilC7QqGUVXFlTpSB", 

    [CentOS-7.8 - root@undercloud mistral]# grep host_key overcloud-0_1/.ansible/fact_cache/overcloud-0-novacompute-0 
        "ansible_ssh_host_key_ecdsa_public": "AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBG1POUEid7AiBJNsHexvyy4D3oyhKP8ht7zHZ7FktsOb7PrLZVe0wWOxP/X6TdMZYLeTpDDsCo+gEXQXlVZ+hC8=", 
        "ansible_ssh_host_key_ed25519_public": "AAAAC3NzaC1lZDI1NTE5AAAAIDD5gi10zP5St8MrsvoUqAbwoZGRHbY2PI7hUA0m3rpd", 
        "ansible_ssh_host_key_rsa_public": "AAAAB3NzaC1yc2EAAAADAQABAAABAQCglZI/tVpWdC+71yBsE3HQIkoFcnSSIrtHLxXHGO/M382Z6lNK22oR7athjzsQIKaf6gW+paNI+Uf1DcebHQPpIqYHUl64XlyjayZ5xwdbK/dTgxCLRXvYousIC21Lg/7cpi2aY1dhQ8zLZXKnIveydS+twNRZ1Haol5pWIuB52WgX7idAysMkU6Smsxs/uxsJlMJ6Dby2IK5jXS/N5XM4aHo0gWBZ4Ea4UADXyJKfrrjrjLZHSc58Cp0WFAfgQukfTk9BnUzGVNBLF/w1ihalV1PkbBvv16+PKEDfwXnX49KJ75s76HVh+bD5KLVCCA0QSGLJilC7QqGUVXFlTpSB", 

()[nova@overcloud-0-novacompute-0 /]$ ssh overcloud-0-novacompute-1
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:c3KSB9JQENyKvCM5fe/UVUUO6CgvGoORNjvFz1wo18E.
Please contact your system administrator.
Add correct host key in /dev/null to get rid of this message.
Offending RSA key in /etc/ssh/ssh_known_hosts:6
RSA host key for [overcloud-0-novacompute-1]:2022 has changed and you have requested strict checking.
Host key verification failed.



The facts are cached for 2 hours so it's only likely to be issue when a deployment is deleted and immediately redeployed.

This should be easy to work around e.g remove /var/lib/mistral/<stack_name> after the overcloud delete, or just use a different overcloud stack name.

Comment 26 Maxim Babushkin 2020-06-22 20:05:30 UTC
Hi Ollie,

Thanks for reproducing it and finding the root cause.
Why the tripleo will not make sure to clean all the leftovers after stack delete?
Making it a manual step adds additional step for the user to remember to perform.

Comment 27 Alex Schultz 2020-06-22 20:37:41 UTC
I think we do delete that stack name. I think the facts end up in a different spot.  I believe this change was backported for Upgrades so we'll likely need to address that.  https://review.opendev.org/#/c/725515/3/tripleo_common/actions/ansible.py

Comment 28 Alex Schultz 2020-06-22 20:39:17 UTC
https://review.opendev.org/#/c/682855 was the original change where the facts end up in /var/tmp

Comment 29 Alex Schultz 2020-06-22 21:04:06 UTC
We can force the clearing of the cache at the start of a deployment to avoid this.

Comment 33 Ollie Walsh 2020-08-20 13:23:44 UTC
Note this is only likely to be an issue for dev/test/POC deployments. It's extremely unlikely that a production deployment would be deployed then, within the next 2 hours, deleted & redeployed.

Comment 38 Jad Haj Yahya 2020-09-20 13:59:49 UTC
Deploy an overcloud.
Delete the overcloud.
Deploy an overcloud with the same stack name and same host names.

New ssh keys were generated and cached one were not used

Comment 45 errata-xmlrpc 2020-10-28 15:37:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284


Note You need to log in before you can comment on or make changes to this bug.