Bug 1269468 - RHEV hosted engine deployment stuck @ 92%
RHEV hosted engine deployment stuck @ 92%
Status: CLOSED ERRATA
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHEV (Show other bugs)
1.0
x86_64 Linux
high Severity unspecified
: ga
: 1.0
Assigned To: John Matthews
Thom Carlin
Dan Macpherson
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-07 08:09 EDT by Tzach Shefi
Modified: 2016-09-13 12:22 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-13 12:22:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Logs (90.21 KB, text/plain)
2015-10-07 08:10 EDT, Tzach Shefi
no flags Details

  None (edit)
Description Tzach Shefi 2015-10-07 08:09:45 EDT
Description of problem: Trying to install RHEV hosted engine + CF, on bare-metal, Dynaflow shows:  

7: Actions::Fusor::Deployment::Rhev::WaitForDataCenter (suspended) [ 8446.15s / 174.90s ]
Started at: 2015-10-07 09:20:39 UTC
Ended at: 2015-10-07 11:41:25 UTC
Real time: 8446.15s
Execution time (excluding suspended state): 174.90s
Input:
---
deployment_id: 2
locale: en
Output:

---
task: false
poll_attempts:
  total: 537
  failed: 0


Version-Release number of selected component (if applicable):
RHCI ISO October 2nd, RHCI-6.0-RHEL-7-20151002.t.1-RHCI-x86_64-dvd1.iso

How reproducible:
Unsure first time I hit this. 

Steps to Reproduce:
1. Wanted to install RHEV (hosted engine) + CF 
2. Watch RHEV deployment  
3.

Actual results:
RHEV deployment stuck at 92%

Foreman production logs added shows:
2015-10-07 07:47:44 [I] ================ Rhev::WaitForDataCenter get_status method ====================


Expected results:
Deployment should successfully complete.  

Additional info:
Added foreman production log, plus RHEV logs.
Comment 1 Tzach Shefi 2015-10-07 08:10 EDT
Created attachment 1080632 [details]
Logs
Comment 2 John Matthews 2015-10-13 14:28:56 EDT
This is likely a problem in the backend and us not detecting this and showing the user a problem occurred.
Comment 3 David Peacock 2015-11-04 09:53:00 EST
I'm encountering this problem myself; are there steps I can take to investigate further and get more information to help get past this?
Comment 4 John Matthews 2015-11-04 10:18:14 EST
David,

When you run into RHEV being stuck at 92% from the Unified Installer it means that the puppet module did not run properly on the RHEV engine and was unable to complete the RHEV setup.

To debug:
ssh to the rhev engine instance
look at /var/log/messages

you will see some error messages from the puppet run which will help narrow down the problem.

The most common issues are:
 - RHEV hypervisors are not up/accessible
 - RHEV hypervisors are unable to mount the NFS shares for the storage/export domains
 - The NFS shares for the storage/export domains contain RHEV data from a prior run, so  the shares need to be cleaned out.

Feel free to ping us in #rhci on internal IRC to talk through any issues.
Comment 5 David Peacock 2015-11-10 09:37:36 EST
Really appreciate this response, thank you so much John. :-)

My lab time expired so I'll be unable to get back to this for a couple of weeks; until then please can you keep this on ice and I'll work through your suggestions when I get back to it?

Thank you so much!
Comment 6 Jean-Francois Saucier 2016-02-23 07:29:47 EST
I can see the same problem on my side. I tried a deployment using the TP2 RC9 ISO. The packages sync failed so my deployment got stuck at 87.5% and "Rhev::WaitForDataCenter get_status method".

I manually resynced the packages in the hope the deployment will continue but it didn't restart.

The weird thing is that I have no way to cancel that deployment. I cannot click the cancel button in my deployment (it is greyed out) and if I go to the list of deployment, I only have the "Edit" button, no delete. And there is no timeout too, it's been 12 hours.
Comment 11 Thom Carlin 2016-08-08 15:08:03 EDT
Suspect root cause is a DNS lookup issue?

Failed to establish session with host <<hypervisor_fqdn>>: java.nio.channels.UnresolvedAddressException

Error running command: /usr/share/fusor_ovirt/bin/ovirt_get_datacenter_status.py --api_user <<api_user>> --api_host <<api_host>> --api_pass <<https://en.wikipedia.org/wiki/Surely_You%27re_Joking,_Mr._Feynman!>> --data_center Default

Marking as needinfo on myself to see if self-hosted works successfully with QCI 1.0
Comment 12 Thom Carlin 2016-08-28 23:35:37 EDT
Verified in QCI-1.0-RHEL-7-201608125.t.0.  Was able to successfully deploy RHV self-hosted on bare metal.

Please reopen if this reoccurs.
Comment 14 errata-xmlrpc 2016-09-13 12:22:16 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1862

Note You need to log in before you can comment on or make changes to this bug.