Bug 1307072 - CFME deployment stuck retrying CloudForms::WaitForConsole after 2000+ attempts
Summary: CFME deployment stuck retrying CloudForms::WaitForConsole after 2000+ attempts
Keywords:
Status: NEW
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - CloudForms
Version: 1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: John Matthews
QA Contact: Dave Johnson
Dan Macpherson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-12 15:51 UTC by Landon LaSmith
Modified: 2016-09-23 18:50 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Truncated deployment.log but it's over 8000 lines of "No route to host - connect(2)" (40.61 KB, text/plain)
2016-02-12 15:51 UTC, Landon LaSmith
no flags Details

Description Landon LaSmith 2016-02-12 15:51:33 UTC
Created attachment 1123563 [details]
Truncated deployment.log but it's over 8000 lines of "No route to host - connect(2)"

Description of problem:
During a OSP+CFME deployment the CloudForms install is stuck at 61.1% while attempting to retry Actions::Fusor::Deployment::CloudForms::WaitForConsole over 2000 times.  It's failing on the subtask Actions::Fusor::Deployment::CloudForms::UpdateAdminPassword due to "No route to host - connect(2)".

Version-Release number of selected component (if applicable):


How reproducible:
Unknown

Steps to Reproduce:
1. Install Sat and OOO from ISO
2. Deploy OSP + CFME
3.

Actual results:
OSP+CFME install stuck at 61.1% while failing to update the CFME admin password

Expected results:
Deployment should fail when satellite is unable to contact CFME after a max number of attempts.

Additional info:
ISO Installer Versions:
RHCI-6.0-RHEL-7-20160208.1
RHCIOOO-7-RHEL-7-20160127.0

Network Info Used for install:
Sat Prov - 192.168.155.0/24 - GW: 192.168.155.1
OSP Prov - 192.0.2.0/24 - GW: 192.0.2.1
OSP Pub - 192.168.156.0/24 - GW: 192.168.156.1

Comment 1 John Matthews 2016-02-12 16:08:01 UTC
Landon,

Thank you for the log file, unfortunately this is not enough for us to determine what went wrong.  We need more info.  What you saw, with the task failing at WaitForConsole is not a frequent error...I suspect something went wrong in CFME itself, starting the web service.

If you have the deployment up please debug this further so we may learn what is wrong to fix.

The WaitForConsole task assumes the VM is up and that appliance-console-cli succeeded,  this task should be waiting for the CFME WebService itself to come up and be ready.

Are you able to access the CFME WebUI?  
If not, please log into the CFME VM and see if you can find more information of what went wrong, look for CFME log files, I think the log files will be at /var/www/miq/vmdb/log

Comment 2 Landon LaSmith 2016-02-12 21:33:08 UTC
John, 

I checked the CFME instance and it was unresponsive on the console.  There was also an issue with the overcloud hypervisor.  After restarting both, Satellite automatically contacted CFME and continued with the deployment successfully.  I'm going to try to reproduce the issue but it appears that Satellite would have continued to try to reach CFME without checking for a max number of failed attempts.


Note You need to log in before you can comment on or make changes to this bug.