See comment#6 in cloned Bug +++ This bug was initially created as a clone of Bug #1255779 +++ Description of problem: Currently, if trying to reuse installed hosts in other deployments, an error appears and installation can not continue. If the user decides to redeploy, he should be allowed to do so successfully. A simple example usecase: 1. User does a deploy of RHEV called Dev1. 2. Deployment finishes successfully. 3. User then decides he suddenly wants CFME also - how to do this? 4. So he makes a second deployment, of RHEV and CFME, called Dev2. 5. Previously used hosts appear in the installation options. 6. Installation starts and completes successfully. 7. User now has deployment Dev2 with RHEV and CFME. --- Additional comment from John Matthews on 2015-08-21 10:54:06 EDT --- --- Additional comment from Dave Johnson on 2015-08-21 15:43:18 EDT --- So the proper procedure would include deleting the hosts from Satellite first, allowing them to re-PXE into foreman discovery, and then try again. We also opened bug 1255893 and bug 1255889 which are related --- Additional comment from Dave Johnson on 2015-08-21 15:44:15 EDT --- Grr, I meant to mention, QE is going to test this out and see how it goes, this probably will need to move to documentation but I want to see it work before doing that. --- Additional comment from Antonin Pagac on 2015-08-24 07:39:45 EDT --- After delete and re-discovery of the hosts and start of a second deployment, the Actions::Fusor::Deployment::Rhev::Deploy tasks hangs on timeout: Foreman::Exception: ERF42-7017 [Foreman::Exception]: You've reached the timeout set for this action. If the action is still ongoing, you can click on the "Resume Deployment" button to continue. This is because one of the machines is trying to download a kickstart file, which fails. foreman-proxy/proxy.log says: ERROR -- : Failed to retrieve provision template for e4d36a7f-44b1-4f01-a314-0df432e9954f: Error retrieving provision for e4d36a7f-44b1-4f01-a314-0df432e9954f from sat.rhci.com: Net::HTTPNotFound "GET /unattended/provision?token=e4d36a7f-44b1-4f01-a314-0df432e9954f HTTP/1.1" 500 184 0.0567 with corresponding production.log message: Parameters: {"token"=>"e4d36a7f-44b1-4f01-a314-0df432e9954f", "unattended"=>{}} unattended: unable to find a host that matches the request from 192.168.77.1 Filter chain halted as :get_host_details rendered or redirected Completed 404 Not Found in 3ms (ActiveRecord: 0.9ms) I tried to download the kickstart multiple times, also restarting the host, nothing helped. --- Additional comment from John Matthews on 2015-08-24 08:05:50 EDT --- (In reply to Antonin Pagac from comment #4) > After delete and re-discovery of the hosts and start of a second deployment, > the Actions::Fusor::Deployment::Rhev::Deploy tasks hangs on timeout: > > Foreman::Exception: ERF42-7017 [Foreman::Exception]: You've reached the > timeout set for this action. If the action is still ongoing, you can click > on the "Resume Deployment" button to continue. > > This is because one of the machines is trying to download a kickstart file, > which fails. foreman-proxy/proxy.log says: > > ERROR -- : Failed to retrieve provision template for > e4d36a7f-44b1-4f01-a314-0df432e9954f: Error retrieving provision for > e4d36a7f-44b1-4f01-a314-0df432e9954f from sat.rhci.com: Net::HTTPNotFound > "GET /unattended/provision?token=e4d36a7f-44b1-4f01-a314-0df432e9954f > HTTP/1.1" 500 184 0.0567 > > with corresponding production.log message: > > Parameters: {"token"=>"e4d36a7f-44b1-4f01-a314-0df432e9954f", > "unattended"=>{}} > unattended: unable to find a host that matches the request from 192.168.77.1 > Filter chain halted as :get_host_details rendered or redirected > Completed 404 Not Found in 3ms (ActiveRecord: 0.9ms) > > I tried to download the kickstart multiple times, also restarting the host, > nothing helped. This sounds similar to when a provisioning token times out, I wonder if the deletion was not complete and an older token was used for fetching the kickstart. I'd recommend chatting with Sat6 QE to learn how they test discovery, how do they delete a host and rediscover. --- Additional comment from Antonin Pagac on 2015-08-25 14:14:11 EDT --- After a second try, I think we made a mistake when deleting hosts from Satellite. All content hosts should be deleted, as well as all hosts that are under Hosts -> All hosts menu in Satellite. Then the host machines can be rebooted, rediscovered and successfully used in second deployment. However, this stands only for hosts provision and deployment of RHEV. When deployment continues and CFME installation begins, we hit an error: Net::SSH::HostKeyMismatch: fingerprint 31:68:48:1e:e4:af:65:34:fb:1c:4c:3c:8c:4c:4a:4b does not match for "192.168.77.102" while running Actions::Fusor::Deployment::Rhev::TransferImage task. This makes me think that something still persists somewhere, and second deployment can not be successfully completed.
I believe the issue is corrected with https://github.com/fusor/fusor/pull/340 This was submitted 8/28/2015 to address https://bugzilla.redhat.com/show_bug.cgi?id=1257312 Since this patch, strict host checking has been set to false when connecting to hosts. As a result, this bug should no longer be an issue. Antonin, please retest to see if you can reproduce.
Moving to ON_QA as per comment #1 to request QE retest
Verified on QCI-1.0-RHEL-7-20160815.t.0 that the ssh host key mismatch error no longer appears.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1862