Bug 1567237

Summary: HostedEngine going to bad state due to network related issue
Product: [oVirt] ovirt-hosted-engine-setup Reporter: bipin <bshetty>
Component: GeneralAssignee: Ido Rosenzwig <irosenzw>
Status: CLOSED NOTABUG QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.2.9CC: bshetty, bugs, cshao, rhs-bugs, sabose, sasundar
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1567235 Environment:
Last Closed: 2018-04-17 05:30:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1567235    

Description bipin 2018-04-13 15:48:04 UTC
+++ This bug was initially created as a clone of Bug #1567235 +++

Description of problem:
During HE deployment, the engine fails to come up and the state remains bad.
The HE is unreachable from the host also seems like some network/ip rule related errors from the logs.


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.2.16-1.el7ev.noarch
vdsm-4.20.23-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180410.0.el7.noarch
redhat-release-virtualization-host-4.2-2.1.el7.x86_64

How reproducible:
3/3


Steps to Reproduce:
1. Gluster deployment 
2. Start the HE deployment
3. See it during [Wait for the engine to come up on the target VM] process


Actual results:
The HE doesnt come up 


Expected results:
The HE should be up and running

Additional info:

[ INFO  ] TASK [Wait for the engine to come up on the target VM]

[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.194213", "end": "2018-04-13 20:32:57.590781", "rc": 0, "start": "2018-04-13 20:32:57.396568", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2209 (Fri Apr 13 20:32:55 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2209 (Fri Apr 13 20:32:55 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton7.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"7526302f\", \"local_conf_timestamp\": 2209, \"host-ts\": 2209}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2209 (Fri Apr 13 20:32:55 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2209 (Fri Apr 13 20:32:55 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton7.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"7526302f\", \"local_conf_timestamp\": 2209, \"host-ts\": 2209}, \"global_maintenance\": false}"]}

Comment 3 Ryan Barry 2018-04-13 15:53:28 UTC
From the output, engine is definitely starting.

Can you please post:

/var/log/libvirt/*
/var/log/ovirt-hosted-engine-ha/*

Also, if the VM is available over SSH, engine.log

Comment 7 SATHEESARAN 2018-04-17 05:30:30 UTC
Thanks Simone. We debugged this issue, by adding VNC console option to the Hosted Engine VM and finally it turned out to be a DHCP related issue in the lab network.

Closing this bug.