Bug 1649485 - Installation hangs/fails late when SSH host keys are not in ~/.ssh/known_hosts
Summary: Installation hangs/fails late when SSH host keys are not in ~/.ssh/known_hosts
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhhi
Version: rhhiv-1.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHHI-V 1.5.z Async
Assignee: Sahina Bose
QA Contact: SATHEESARAN
URL:
Whiteboard:
: 1514466 1653603 (view as bug list)
Depends On: 1651516
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-13 17:17 UTC by John Call
Modified: 2021-12-10 18:14 UTC (History)
7 users (show)

Fixed In Version: cockpit-ovirt-dashboard-0.11.38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1651516 (view as bug list)
Environment:
Last Closed: 2019-05-20 04:55:46 UTC
Embargoed:


Attachments (Terms of Use)

Description John Call 2018-11-13 17:17:35 UTC
Description of problem:
This is an RFE to check for accepted host keys before going through ~60 minutes of installation and configuration before hanging indefinitely with no feedback to the user.  The cockpit installer simply hangs forever at "TASK [Set Engine public key as authorized key without validating the TLS/SSL certificates]" 

Root-cause is that cockpit wizard asks for FQDN/IP of target RHV hosts (ovirtmgmt interface), but doesn't check if those values are present in /root/.ssh/known_hosts


Version-Release number of selected component (if applicable):
Applicable to RHHI 1.5 and RHV 4.2.7


How reproducible:
Ignore chapter 5 of the Deployment Guide.


Actual results:
Cockpit wizard hangs forever with no feedback at "TASK [Set Engine public key as authorized key without validating the TLS/SSL certificates]"


Expected results:
Cockpit wizard (ansible task) should check for host keys early, and fail if not present in /root/.ssh/known_hosts (e.g. https://docs.ansible.com/ansible/2.5/modules/known_hosts_module.html)

Alternatively, the task could be modified to include "ansible_ssh_extra_args: -o StrictHostKeyChecking=no"


Additional info:
See full thread on rhev-tech at http://post-office.corp.redhat.com/archives/rhev-tech/2018-November/msg00071.html

---------- Forwarded message ---------
From: Gobinda Das
Date: Tue, Nov 13, 2018 at 4:32 AM
Subject: Re: [rhev-tech] RHHI 1.5 / RHV 4.2.7 deployment hung at task "Set Engine public key..."
To: Doron Fediuck

Hi Doron,
Yes you are right it should not hung instead giving some message or time out
I will work on it.

On Tue, Nov 13, 2018 at 4:28 PM Doron Fediuck wrote:
Thanks Gobinda.
Sounds like a bug in the sense that we cannot have installation hung for whatever reason;
Either verify the inputs before running or fail. Am I missing something?

On Tue, 13 Nov 2018 at 09:51, Gobinda Das wrote:
If TASK [Set Engine public key as authorized key without
validating the TLS/SSL certificates] issue came after HE deployment then it's for auto add hosts.
This comes when you have not configured password less ssh for fqdn or ip which needs to be auto added.
In this case rhhi2.home.lab and rhhi3.home.lab needs to be configured from first host.


On Tue, Nov 13, 2018, 07:06 John Call <jcall wrote:
Hi rhev-tech,
I tried to install a fresh RHHI 1.5 / RHV 4.2.7 today, but my deployment is hung indefinitely (at least 3 hours now w/out error or progress) on the TASK [Set Engine public key as authorized key without validating the TLS/SSL certificates]

Could this be caused by not having pre-populated the SSH host keys?  Before going through the hosted-engine wizard I created an ssh key and pushed it to the IP addresses (192.168.255.10x) that run the Gluster service, but not to the FQDN/IP address (192.168.0.10x) of the future ovirtmgmt interfaces.

Thanks in advance, John Call.
******************************************
[root@rhhi1 ~]# ip -4 a s 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
4: enp12s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 192.168.255.105/24 brd 192.168.255.255 scope global noprefixroute enp12s0f1
       valid_lft forever preferred_lft forever
20: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
       valid_lft forever preferred_lft forever
26: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.0.105/24 brd 192.168.0.255 scope global ovirtmgmt
       valid_lft forever preferred_lft forever


[root@rhhi1 ~]# dig +short -x 192.168.0.105
rhhi1.home.lab.


[root@rhhi1 ~]# cat /root/.ssh/known_hosts
192.168.255.105 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNGRfEWIQkQB/6H+DMozKvZoh7avvaRgJG4RadXTGcmlnahmtY11yiWfVFyT2pwdVyMRuCxl0Mteo60lEUWltCM=
192.168.255.106 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBPXBaMsBUAXGlMRN4I8BHKEOT0a2Jevtki6CnYgPGTaVAHmDb4S5D39H01h7tMesUf6U1/PERM80N8/XxyeixOo=
192.168.255.107 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBPDwIq9MRHRaqWHVjjnX+amDQ02LHJQ57BRSkl802nBXzujugpFWnwhKXxJwiU9WmWAG/o+w/m7UUMBs8CXm3C0=
###
### only Gluster IPs here, no host keys present for future ovirtmgmt FQDN/IP
###

Comment 1 John Call 2018-11-13 17:37:11 UTC
The docs could be improved like this https://bugzilla.redhat.com/show_bug.cgi?id=1649490 [Need to be clearer about accepting SSH host keys for both Gluster and ovirtmgmt interfaces]

Comment 2 Sahina Bose 2018-11-20 09:24:27 UTC
*** Bug 1514466 has been marked as a duplicate of this bug. ***

Comment 3 Sahina Bose 2018-11-29 06:12:03 UTC
*** Bug 1653603 has been marked as a duplicate of this bug. ***

Comment 6 SATHEESARAN 2019-01-17 09:11:07 UTC
The fix for the dependent ovirt bug has failed verification.

The semantics of the fix is to check for the FQDN hostnames in known_hosts file,
but that is not sufficient. This check should be there for hostnames for gluster network.

Moving this bug to ASSIGNED

Comment 7 SATHEESARAN 2019-01-17 09:31:07 UTC
(In reply to SATHEESARAN from comment #6)
> The fix for the dependent ovirt bug has failed verification.
> 
> The semantics of the fix is to check for the FQDN hostnames in known_hosts
> file,
> but that is not sufficient. This check should be there for hostnames for
> gluster network.
> 
> Moving this bug to ASSIGNED

Misunderstood the requirement. I got that info from Gobinda and rereading comment0,
this check is implemented only for FQDNs that corresponds to ovirtmgmt interface of additional hosts
Moving this bug back to ON_QA

Comment 8 SATHEESARAN 2019-01-17 10:53:49 UTC
Verified this bug with cockpit-ovirt-dashboard-0.11.38.

Hostnames under FQDN tab are validated that they are available in known_hosts file


Note You need to log in before you can comment on or make changes to this bug.