Bug 1255779

Summary: RFE: Allow reusing installed hosts in other deployments
Product: Red Hat Quickstart Cloud Installer Reporter: Antonin Pagac <apagac>
Component: Installation - RHCIAssignee: John Matthews <jmatthew>
Status: CLOSED NOTABUG QA Contact: Dave Johnson <dajohnso>
Severity: unspecified Docs Contact: Dan Macpherson <dmacpher>
Priority: unspecified    
Version: 1.0CC: bthurber, tcarlin, tsanders
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: 1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1331555 (view as bug list) Environment:
Last Closed: 2016-04-28 19:21:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1331555    

Description Antonin Pagac 2015-08-21 14:30:01 UTC
Description of problem: Currently, if trying to reuse installed hosts in other deployments, an error appears and installation can not continue. If the user decides to redeploy, he should be allowed to do so successfully. A simple example usecase:

1. User does a deploy of RHEV called Dev1.
2. Deployment finishes successfully.
3. User then decides he suddenly wants CFME also - how to do this?
4. So he makes a second deployment, of RHEV and CFME, called Dev2.
5. Previously used hosts appear in the installation options.
6. Installation starts and completes successfully.
7. User now has deployment Dev2 with RHEV and CFME.

Comment 1 John Matthews 2015-08-21 14:54:06 UTC
*** Bug 1254712 has been marked as a duplicate of this bug. ***

Comment 2 Dave Johnson 2015-08-21 19:43:18 UTC
So the proper procedure would include deleting the hosts from Satellite first, allowing them to re-PXE into foreman discovery, and then try again. 

We also opened bug 1255893 and bug 1255889 which are related

Comment 3 Dave Johnson 2015-08-21 19:44:15 UTC
Grr, I meant to mention, QE is going to test this out and see how it goes, this probably will need to move to documentation but I want to see it work before doing that.

Comment 4 Antonin Pagac 2015-08-24 11:39:45 UTC
After delete and re-discovery of the hosts and start of a second deployment, the Actions::Fusor::Deployment::Rhev::Deploy tasks hangs on timeout:

Foreman::Exception: ERF42-7017 [Foreman::Exception]: You've reached the timeout set for this action. If the action is still ongoing, you can click on the "Resume Deployment" button to continue.

This is because one of the machines is trying to download a kickstart file, which fails. foreman-proxy/proxy.log says:

ERROR -- : Failed to retrieve provision template for e4d36a7f-44b1-4f01-a314-0df432e9954f: Error retrieving provision for e4d36a7f-44b1-4f01-a314-0df432e9954f from sat.rhci.com: Net::HTTPNotFound
    "GET /unattended/provision?token=e4d36a7f-44b1-4f01-a314-0df432e9954f HTTP/1.1" 500 184 0.0567

with corresponding production.log message:

   Parameters: {"token"=>"e4d36a7f-44b1-4f01-a314-0df432e9954f", "unattended"=>{}}
unattended: unable to find a host that matches the request from 192.168.77.1
Filter chain halted as :get_host_details rendered or redirected
Completed 404 Not Found in 3ms (ActiveRecord: 0.9ms)

I tried to download the kickstart multiple times, also restarting the host, nothing helped.

Comment 5 John Matthews 2015-08-24 12:05:50 UTC
(In reply to Antonin Pagac from comment #4)
> After delete and re-discovery of the hosts and start of a second deployment,
> the Actions::Fusor::Deployment::Rhev::Deploy tasks hangs on timeout:
> 
> Foreman::Exception: ERF42-7017 [Foreman::Exception]: You've reached the
> timeout set for this action. If the action is still ongoing, you can click
> on the "Resume Deployment" button to continue.
> 
> This is because one of the machines is trying to download a kickstart file,
> which fails. foreman-proxy/proxy.log says:
> 
> ERROR -- : Failed to retrieve provision template for
> e4d36a7f-44b1-4f01-a314-0df432e9954f: Error retrieving provision for
> e4d36a7f-44b1-4f01-a314-0df432e9954f from sat.rhci.com: Net::HTTPNotFound
>     "GET /unattended/provision?token=e4d36a7f-44b1-4f01-a314-0df432e9954f
> HTTP/1.1" 500 184 0.0567
> 
> with corresponding production.log message:
> 
>    Parameters: {"token"=>"e4d36a7f-44b1-4f01-a314-0df432e9954f",
> "unattended"=>{}}
> unattended: unable to find a host that matches the request from 192.168.77.1
> Filter chain halted as :get_host_details rendered or redirected
> Completed 404 Not Found in 3ms (ActiveRecord: 0.9ms)
> 
> I tried to download the kickstart multiple times, also restarting the host,
> nothing helped.


This sounds similar to when a provisioning token times out, I wonder if the deletion was not complete and an older token was used for fetching the kickstart.

I'd recommend chatting with Sat6 QE to learn how they test discovery, how do they delete a host and rediscover.

Comment 6 Antonin Pagac 2015-08-25 18:14:11 UTC
After a second try, I think we made a mistake when deleting hosts from Satellite. All content hosts should be deleted, as well as all hosts that are under Hosts -> All hosts menu in Satellite. Then the host machines can be rebooted, rediscovered and successfully used in second deployment.

However, this stands only for hosts provision and deployment of RHEV. When deployment continues and CFME installation begins, we hit an error:

Net::SSH::HostKeyMismatch: fingerprint 31:68:48:1e:e4:af:65:34:fb:1c:4c:3c:8c:4c:4a:4b does not match for "192.168.77.102"

while running Actions::Fusor::Deployment::Rhev::TransferImage task.
This makes me think that something still persists somewhere, and second deployment can not be successfully completed.

Comment 7 Todd Sanders 2016-04-28 19:21:54 UTC
Cloned to 1331555 for leaving SSH Fingerprint in known hosts.  Closing this RFE, as host was not properly deleted.