Bug 1331555 - Address SSH fingerprints from a host that was used and then reprovisioned.
Summary: Address SSH fingerprints from a host that was used and then reprovisioned.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHCI
Version: 1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 1.0
Assignee: dgao
QA Contact: Tasos Papaioannou
Dan Macpherson
URL:
Whiteboard:
Depends On: 1255779
Blocks: qci-sprint-17
TreeView+ depends on / blocked
 
Reported: 2016-04-28 19:19 UTC by John Matthews
Modified: 2016-09-13 16:28 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1255779
Environment:
Last Closed: 2016-09-13 16:28:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:1862 0 normal SHIPPED_LIVE Red Hat Quickstart Installer 1.0 2016-09-13 20:18:48 UTC

Description John Matthews 2016-04-28 19:19:27 UTC
See comment#6 in cloned Bug


+++ This bug was initially created as a clone of Bug #1255779 +++

Description of problem: Currently, if trying to reuse installed hosts in other deployments, an error appears and installation can not continue. If the user decides to redeploy, he should be allowed to do so successfully. A simple example usecase:

1. User does a deploy of RHEV called Dev1.
2. Deployment finishes successfully.
3. User then decides he suddenly wants CFME also - how to do this?
4. So he makes a second deployment, of RHEV and CFME, called Dev2.
5. Previously used hosts appear in the installation options.
6. Installation starts and completes successfully.
7. User now has deployment Dev2 with RHEV and CFME.

--- Additional comment from John Matthews on 2015-08-21 10:54:06 EDT ---



--- Additional comment from Dave Johnson on 2015-08-21 15:43:18 EDT ---

So the proper procedure would include deleting the hosts from Satellite first, allowing them to re-PXE into foreman discovery, and then try again. 

We also opened bug 1255893 and bug 1255889 which are related

--- Additional comment from Dave Johnson on 2015-08-21 15:44:15 EDT ---

Grr, I meant to mention, QE is going to test this out and see how it goes, this probably will need to move to documentation but I want to see it work before doing that.

--- Additional comment from Antonin Pagac on 2015-08-24 07:39:45 EDT ---

After delete and re-discovery of the hosts and start of a second deployment, the Actions::Fusor::Deployment::Rhev::Deploy tasks hangs on timeout:

Foreman::Exception: ERF42-7017 [Foreman::Exception]: You've reached the timeout set for this action. If the action is still ongoing, you can click on the "Resume Deployment" button to continue.

This is because one of the machines is trying to download a kickstart file, which fails. foreman-proxy/proxy.log says:

ERROR -- : Failed to retrieve provision template for e4d36a7f-44b1-4f01-a314-0df432e9954f: Error retrieving provision for e4d36a7f-44b1-4f01-a314-0df432e9954f from sat.rhci.com: Net::HTTPNotFound
    "GET /unattended/provision?token=e4d36a7f-44b1-4f01-a314-0df432e9954f HTTP/1.1" 500 184 0.0567

with corresponding production.log message:

   Parameters: {"token"=>"e4d36a7f-44b1-4f01-a314-0df432e9954f", "unattended"=>{}}
unattended: unable to find a host that matches the request from 192.168.77.1
Filter chain halted as :get_host_details rendered or redirected
Completed 404 Not Found in 3ms (ActiveRecord: 0.9ms)

I tried to download the kickstart multiple times, also restarting the host, nothing helped.

--- Additional comment from John Matthews on 2015-08-24 08:05:50 EDT ---

(In reply to Antonin Pagac from comment #4)
> After delete and re-discovery of the hosts and start of a second deployment,
> the Actions::Fusor::Deployment::Rhev::Deploy tasks hangs on timeout:
> 
> Foreman::Exception: ERF42-7017 [Foreman::Exception]: You've reached the
> timeout set for this action. If the action is still ongoing, you can click
> on the "Resume Deployment" button to continue.
> 
> This is because one of the machines is trying to download a kickstart file,
> which fails. foreman-proxy/proxy.log says:
> 
> ERROR -- : Failed to retrieve provision template for
> e4d36a7f-44b1-4f01-a314-0df432e9954f: Error retrieving provision for
> e4d36a7f-44b1-4f01-a314-0df432e9954f from sat.rhci.com: Net::HTTPNotFound
>     "GET /unattended/provision?token=e4d36a7f-44b1-4f01-a314-0df432e9954f
> HTTP/1.1" 500 184 0.0567
> 
> with corresponding production.log message:
> 
>    Parameters: {"token"=>"e4d36a7f-44b1-4f01-a314-0df432e9954f",
> "unattended"=>{}}
> unattended: unable to find a host that matches the request from 192.168.77.1
> Filter chain halted as :get_host_details rendered or redirected
> Completed 404 Not Found in 3ms (ActiveRecord: 0.9ms)
> 
> I tried to download the kickstart multiple times, also restarting the host,
> nothing helped.


This sounds similar to when a provisioning token times out, I wonder if the deletion was not complete and an older token was used for fetching the kickstart.

I'd recommend chatting with Sat6 QE to learn how they test discovery, how do they delete a host and rediscover.

--- Additional comment from Antonin Pagac on 2015-08-25 14:14:11 EDT ---

After a second try, I think we made a mistake when deleting hosts from Satellite. All content hosts should be deleted, as well as all hosts that are under Hosts -> All hosts menu in Satellite. Then the host machines can be rebooted, rediscovered and successfully used in second deployment.

However, this stands only for hosts provision and deployment of RHEV. When deployment continues and CFME installation begins, we hit an error:

Net::SSH::HostKeyMismatch: fingerprint 31:68:48:1e:e4:af:65:34:fb:1c:4c:3c:8c:4c:4a:4b does not match for "192.168.77.102"

while running Actions::Fusor::Deployment::Rhev::TransferImage task.
This makes me think that something still persists somewhere, and second deployment can not be successfully completed.

Comment 1 dgao 2016-07-19 11:57:16 UTC
I believe the issue is corrected with https://github.com/fusor/fusor/pull/340

This was submitted 8/28/2015 to address https://bugzilla.redhat.com/show_bug.cgi?id=1257312

Since this patch, strict host checking has been set to false when connecting to hosts. As a result, this bug should no longer be an issue.

Antonin, please retest to see if you can reproduce.

Comment 5 John Matthews 2016-07-19 12:41:11 UTC
Moving to ON_QA as per comment #1 to request QE retest

Comment 10 Tasos Papaioannou 2016-08-16 18:18:50 UTC
Verified on QCI-1.0-RHEL-7-20160815.t.0 that the ssh host key mismatch error no longer appears.

Comment 12 errata-xmlrpc 2016-09-13 16:28:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1862


Note You need to log in before you can comment on or make changes to this bug.