Bug 1220310

Summary: [hosted-engine-setup] [Gluster support] Deployment gets stuck: "oVirt API connection failure"
Product: [Retired] oVirt Reporter: Elad <ebenahar>
Component: ovirt-hosted-engine-setupAssignee: Sandro Bonazzola <sbonazzo>
Status: CLOSED DUPLICATE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6CC: acanan, dfediuck, ebenahar, ecohen, gklein, lsurette, rbalakri, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: integration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-05-12 07:26:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1083025, 1173669    
Attachments:
Description Flags
/var/log/ from the host none

Description Elad 2015-05-11 09:47:01 UTC
Created attachment 1024142 [details]
/var/log/ from the host

Description of problem:
Tried to deploy hosted engine over Gluster. Got to the phase when DB health check completed and the hosted-engine installation waited for VDSM to become operational. In this phase the deployment got stuck.


Version-Release number of selected component (if applicable):
ovirt-3.6.0-1 
ovirt-hosted-engine-setup-1.3.0-0.0.master.20150401110307.git9665976.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1.
- Created a new volume in the Gluster server:

gluster volume create elad3 replica 3 transport tcp 10.35.160.6:/export/elad3 10.35.160.202:/home/elad/1 10.35.160.203:/home/elad/1 force

- Changed owner-gid and owner-uid to vdsm:kvm:

gluster volume set elad3 owner-uid 36
gluster volume set elad3 owner-uid 36

- Started the volume:

gluster volume start elad3 

2. Executed hosted-engine --deploy, picked glusterfs and gave it the path of the volume
3. Installed RHEL6.6 on the VM and executed engine-setup

Actual results:
After DB health check completed, the installation got stuck with the following:

[ INFO  ] Waiting for the host to become operational in the engine. This may take several minutes...
[ INFO  ] Still waiting for VDSM host to become operational...



 I got this error in the setup log:

20**FILTERED**5-05-**FILTERED** 09:00:2**FILTERED** DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:**FILTERED**89 VDSM host in  state
20**FILTERED**5-05-**FILTERED** 09:02:29 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:**FILTERED**83 Error fetching host state: [ERROR]::oVirt API connection failure, (7, 'Failed connect to elad-he.qa.lab.tlv.redhat.com:443; Connection timed out')



Expected results:
Hosted-engine deployment over Gluster should end successfully.

Additional info:
/var/log/ from the host

Comment 2 Sandro Bonazzola 2015-05-11 11:17:29 UTC
vdsm logs ends at 2015-05-11 08:21:07 while above logs are from 09:02:29. 

at such time the setup logs:
20**FILTERED**5-05-**FILTERED** 08:2**FILTERED**:05 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:**FILTERED**89 VDSM host in installing state

vdsm has been stopped by ovirt-host-deploy, executed by ovirt-engine, and it has not been restarted.

I need the host-deploy logs and/or the engine logs i order to understand why vdsm has not been restarted.

Comment 3 Doron Fediuck 2015-05-12 07:26:35 UTC
See possible workarounds in the duplicate bz.

*** This bug has been marked as a duplicate of bug 1201355 ***

Comment 4 Sandro Bonazzola 2015-05-12 07:28:23 UTC
Closed as duplicate since it seems the same issue described in bug #1201355.
When vdsmd service is stopped, it kills glusterfs process causing the storage domain to disappear.

Comment 5 Elad 2015-05-12 08:37:18 UTC
Engine VM moves to Paused so it does seems like the issue reported in bug #1201355