Bug 1567237 - HostedEngine going to bad state due to network related issue
Summary: HostedEngine going to bad state due to network related issue
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.2.9
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Ido Rosenzwig
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks: 1567235
TreeView+ depends on / blocked
 
Reported: 2018-04-13 15:48 UTC by bipin
Modified: 2018-04-17 05:30 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1567235
Environment:
Last Closed: 2018-04-17 05:30:30 UTC
oVirt Team: Gluster


Attachments (Terms of Use)

Description bipin 2018-04-13 15:48:04 UTC
+++ This bug was initially created as a clone of Bug #1567235 +++

Description of problem:
During HE deployment, the engine fails to come up and the state remains bad.
The HE is unreachable from the host also seems like some network/ip rule related errors from the logs.


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.2.16-1.el7ev.noarch
vdsm-4.20.23-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180410.0.el7.noarch
redhat-release-virtualization-host-4.2-2.1.el7.x86_64

How reproducible:
3/3


Steps to Reproduce:
1. Gluster deployment 
2. Start the HE deployment
3. See it during [Wait for the engine to come up on the target VM] process


Actual results:
The HE doesnt come up 


Expected results:
The HE should be up and running

Additional info:

[ INFO  ] TASK [Wait for the engine to come up on the target VM]

[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.194213", "end": "2018-04-13 20:32:57.590781", "rc": 0, "start": "2018-04-13 20:32:57.396568", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2209 (Fri Apr 13 20:32:55 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2209 (Fri Apr 13 20:32:55 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton7.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"7526302f\", \"local_conf_timestamp\": 2209, \"host-ts\": 2209}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2209 (Fri Apr 13 20:32:55 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2209 (Fri Apr 13 20:32:55 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton7.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"7526302f\", \"local_conf_timestamp\": 2209, \"host-ts\": 2209}, \"global_maintenance\": false}"]}

Comment 3 Ryan Barry 2018-04-13 15:53:28 UTC
From the output, engine is definitely starting.

Can you please post:

/var/log/libvirt/*
/var/log/ovirt-hosted-engine-ha/*

Also, if the VM is available over SSH, engine.log

Comment 7 SATHEESARAN 2018-04-17 05:30:30 UTC
Thanks Simone. We debugged this issue, by adding VNC console option to the Hosted Engine VM and finally it turned out to be a DHCP related issue in the lab network.

Closing this bug.


Note You need to log in before you can comment on or make changes to this bug.