Bug 1220310

Summary:

[hosted-engine-setup] [Gluster support] Deployment gets stuck: "oVirt API connection failure"

Product:

[Retired] oVirt

Reporter:

Elad <ebenahar>

Component:

ovirt-hosted-engine-setup

Assignee:

Sandro Bonazzola <sbonazzo>

Status:

CLOSED DUPLICATE

QA Contact:

Elad <ebenahar>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.6

CC:

acanan, dfediuck, ebenahar, ecohen, gklein, lsurette, rbalakri, yeylon

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

integration

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-05-12 07:26:35 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1083025, 1173669

Attachments:

Description	Flags
/var/log/ from the host	none

Description Elad 2015-05-11 09:47:01 UTC

Created attachment 1024142 [details]
/var/log/ from the host

Description of problem:
Tried to deploy hosted engine over Gluster. Got to the phase when DB health check completed and the hosted-engine installation waited for VDSM to become operational. In this phase the deployment got stuck.


Version-Release number of selected component (if applicable):
ovirt-3.6.0-1 
ovirt-hosted-engine-setup-1.3.0-0.0.master.20150401110307.git9665976.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1.
- Created a new volume in the Gluster server:

gluster volume create elad3 replica 3 transport tcp 10.35.160.6:/export/elad3 10.35.160.202:/home/elad/1 10.35.160.203:/home/elad/1 force

- Changed owner-gid and owner-uid to vdsm:kvm:

gluster volume set elad3 owner-uid 36
gluster volume set elad3 owner-uid 36

- Started the volume:

gluster volume start elad3 

2. Executed hosted-engine --deploy, picked glusterfs and gave it the path of the volume
3. Installed RHEL6.6 on the VM and executed engine-setup

Actual results:
After DB health check completed, the installation got stuck with the following:

[ INFO  ] Waiting for the host to become operational in the engine. This may take several minutes...
[ INFO  ] Still waiting for VDSM host to become operational...



 I got this error in the setup log:

20**FILTERED**5-05-**FILTERED** 09:00:2**FILTERED** DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:**FILTERED**89 VDSM host in  state
20**FILTERED**5-05-**FILTERED** 09:02:29 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:**FILTERED**83 Error fetching host state: [ERROR]::oVirt API connection failure, (7, 'Failed connect to elad-he.qa.lab.tlv.redhat.com:443; Connection timed out')



Expected results:
Hosted-engine deployment over Gluster should end successfully.

Additional info:
/var/log/ from the host

Comment 2 Sandro Bonazzola 2015-05-11 11:17:29 UTC

vdsm logs ends at 2015-05-11 08:21:07 while above logs are from 09:02:29. 

at such time the setup logs:
20**FILTERED**5-05-**FILTERED** 08:2**FILTERED**:05 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:**FILTERED**89 VDSM host in installing state

vdsm has been stopped by ovirt-host-deploy, executed by ovirt-engine, and it has not been restarted.

I need the host-deploy logs and/or the engine logs i order to understand why vdsm has not been restarted.

Comment 3 Doron Fediuck 2015-05-12 07:26:35 UTC

See possible workarounds in the duplicate bz.

*** This bug has been marked as a duplicate of bug 1201355 ***

Comment 4 Sandro Bonazzola 2015-05-12 07:28:23 UTC

Closed as duplicate since it seems the same issue described in bug #1201355.
When vdsmd service is stopped, it kills glusterfs process causing the storage domain to disappear.

Comment 5 Elad 2015-05-12 08:37:18 UTC

Engine VM moves to Paused so it does seems like the issue reported in bug #1201355