Red Hat Bugzilla – Bug 1269468
RHEV hosted engine deployment stuck @ 92%
Last modified: 2016-09-13 12:22:16 EDT
Description of problem: Trying to install RHEV hosted engine + CF, on bare-metal, Dynaflow shows:
7: Actions::Fusor::Deployment::Rhev::WaitForDataCenter (suspended) [ 8446.15s / 174.90s ]
Started at: 2015-10-07 09:20:39 UTC
Ended at: 2015-10-07 11:41:25 UTC
Real time: 8446.15s
Execution time (excluding suspended state): 174.90s
Version-Release number of selected component (if applicable):
RHCI ISO October 2nd, RHCI-6.0-RHEL-7-20151002.t.1-RHCI-x86_64-dvd1.iso
Unsure first time I hit this.
Steps to Reproduce:
1. Wanted to install RHEV (hosted engine) + CF
2. Watch RHEV deployment
RHEV deployment stuck at 92%
Foreman production logs added shows:
2015-10-07 07:47:44 [I] ================ Rhev::WaitForDataCenter get_status method ====================
Deployment should successfully complete.
Added foreman production log, plus RHEV logs.
Created attachment 1080632 [details]
This is likely a problem in the backend and us not detecting this and showing the user a problem occurred.
I'm encountering this problem myself; are there steps I can take to investigate further and get more information to help get past this?
When you run into RHEV being stuck at 92% from the Unified Installer it means that the puppet module did not run properly on the RHEV engine and was unable to complete the RHEV setup.
ssh to the rhev engine instance
look at /var/log/messages
you will see some error messages from the puppet run which will help narrow down the problem.
The most common issues are:
- RHEV hypervisors are not up/accessible
- RHEV hypervisors are unable to mount the NFS shares for the storage/export domains
- The NFS shares for the storage/export domains contain RHEV data from a prior run, so the shares need to be cleaned out.
Feel free to ping us in #rhci on internal IRC to talk through any issues.
Really appreciate this response, thank you so much John. :-)
My lab time expired so I'll be unable to get back to this for a couple of weeks; until then please can you keep this on ice and I'll work through your suggestions when I get back to it?
Thank you so much!
I can see the same problem on my side. I tried a deployment using the TP2 RC9 ISO. The packages sync failed so my deployment got stuck at 87.5% and "Rhev::WaitForDataCenter get_status method".
I manually resynced the packages in the hope the deployment will continue but it didn't restart.
The weird thing is that I have no way to cancel that deployment. I cannot click the cancel button in my deployment (it is greyed out) and if I go to the list of deployment, I only have the "Edit" button, no delete. And there is no timeout too, it's been 12 hours.
Suspect root cause is a DNS lookup issue?
Failed to establish session with host <<hypervisor_fqdn>>: java.nio.channels.UnresolvedAddressException
Error running command: /usr/share/fusor_ovirt/bin/ovirt_get_datacenter_status.py --api_user <<api_user>> --api_host <<api_host>> --api_pass <<https://en.wikipedia.org/wiki/Surely_You%27re_Joking,_Mr._Feynman!>> --data_center Default
Marking as needinfo on myself to see if self-hosted works successfully with QCI 1.0
Verified in QCI-1.0-RHEL-7-201608125.t.0. Was able to successfully deploy RHV self-hosted on bare metal.
Please reopen if this reoccurs.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.