1567237 – HostedEngine going to bad state due to network related issue

Bug 1567237 - HostedEngine going to bad state due to network related issue

Summary: HostedEngine going to bad state due to network related issue

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	ovirt-hosted-engine-setup
Classification:	oVirt
Component:	General
Sub Component:
Version:	2.2.9
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ido Rosenzwig
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1567235
TreeView+	depends on / blocked

Reported:	2018-04-13 15:48 UTC by bipin
Modified:	2018-04-17 05:30 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:	1567235
Environment:
Last Closed:	2018-04-17 05:30:30 UTC
oVirt Team:	Gluster
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description bipin 2018-04-13 15:48:04 UTC

+++ This bug was initially created as a clone of Bug #1567235 +++

Description of problem:
During HE deployment, the engine fails to come up and the state remains bad.
The HE is unreachable from the host also seems like some network/ip rule related errors from the logs.


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.2.16-1.el7ev.noarch
vdsm-4.20.23-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180410.0.el7.noarch
redhat-release-virtualization-host-4.2-2.1.el7.x86_64

How reproducible:
3/3


Steps to Reproduce:
1. Gluster deployment 
2. Start the HE deployment
3. See it during [Wait for the engine to come up on the target VM] process


Actual results:
The HE doesnt come up 


Expected results:
The HE should be up and running

Additional info:

[ INFO  ] TASK [Wait for the engine to come up on the target VM]

[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.194213", "end": "2018-04-13 20:32:57.590781", "rc": 0, "start": "2018-04-13 20:32:57.396568", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2209 (Fri Apr 13 20:32:55 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2209 (Fri Apr 13 20:32:55 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton7.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"7526302f\", \"local_conf_timestamp\": 2209, \"host-ts\": 2209}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2209 (Fri Apr 13 20:32:55 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2209 (Fri Apr 13 20:32:55 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton7.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"7526302f\", \"local_conf_timestamp\": 2209, \"host-ts\": 2209}, \"global_maintenance\": false}"]}

Comment 3 Ryan Barry 2018-04-13 15:53:28 UTC

From the output, engine is definitely starting.

Can you please post:

/var/log/libvirt/*
/var/log/ovirt-hosted-engine-ha/*

Also, if the VM is available over SSH, engine.log

Comment 7 SATHEESARAN 2018-04-17 05:30:30 UTC

Thanks Simone. We debugged this issue, by adding VNC console option to the Hosted Engine VM and finally it turned out to be a DHCP related issue in the lab network.

Closing this bug.

Note You need to log in before you can comment on or make changes to this bug.