1567235 – HostedEngine going to bad state due to network related issue

Bug 1567235 - HostedEngine going to bad state due to network related issue

Summary: HostedEngine going to bad state due to network related issue

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhhi
Sub Component:
Version:	rhhiv-1.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Sahina Bose
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1567237
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-13 15:44 UTC by bipin
Modified:	2018-04-17 05:31 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1567237 (view as bug list)
Environment:
Last Closed:	2018-04-17 05:31:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description bipin 2018-04-13 15:44:59 UTC

Description of problem:
During HE deployment, the engine fails to come up and the state remains bad.
The HE is unreachable from the host also seems like some network/ip rule related errors from the logs.


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.2.16-1.el7ev.noarch
vdsm-4.20.23-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180410.0.el7.noarch
redhat-release-virtualization-host-4.2-2.1.el7.x86_64

How reproducible:
3/3


Steps to Reproduce:
1. Gluster deployment 
2. Start the HE deployment
3. See it during [Wait for the engine to come up on the target VM] process


Actual results:
The HE doesnt come up 


Expected results:
The HE should be up and running

Additional info:

[ INFO  ] TASK [Wait for the engine to come up on the target VM]

[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.194213", "end": "2018-04-13 20:32:57.590781", "rc": 0, "start": "2018-04-13 20:32:57.396568", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2209 (Fri Apr 13 20:32:55 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2209 (Fri Apr 13 20:32:55 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton7.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"7526302f\", \"local_conf_timestamp\": 2209, \"host-ts\": 2209}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2209 (Fri Apr 13 20:32:55 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2209 (Fri Apr 13 20:32:55 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton7.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"7526302f\", \"local_conf_timestamp\": 2209, \"host-ts\": 2209}, \"global_maintenance\": false}"]}

Comment 1 SATHEESARAN 2018-04-17 05:31:17 UTC

Thanks Simone. We debugged this issue, by adding VNC console option to the Hosted Engine VM and finally it turned out to be a DHCP related issue in the lab network.

Closing this bug.

Note You need to log in before you can comment on or make changes to this bug.