Bug 1346137

Summary: [z-stream clone - 3.6.7] Engine status show as "Can't connect to HA daemon" after reboot RHEV-H.
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-hosted-engine-setupAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.7CC: amarchuk, dfediuck, fdeutsch, gklein, huzhao, jbelka, leiwang, lsurette, nsednev, pstehlik, rbarry, stirabos, weiwang, yaniwang, ycui, ykaul, ylavi
Target Milestone: ovirt-3.6.7Keywords: Regression, Triaged, ZStream
Target Release: 3.6.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1342988 Environment:
Last Closed: 2016-07-11 11:05:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1342988    
Bug Blocks:    

Comment 2 Nikolai Sednev 2016-06-14 09:22:32 UTC
Forth to https://bugzilla.redhat.com/show_bug.cgi?id=1343980#c5 and coupled together with results I've got from latest deployment of the components as appears bellow, the hosted-engine deployment and host's reboot, both worked for me just fine.

Host:
qemu-kvm-rhev-2.3.0-31.el7_2.15.x86_64
mom-0.5.4-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-vmconsole-host-1.0.2-2.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.5.x86_64
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch
ovirt-vmconsole-1.0.2-2.el7ev.noarch
vdsm-4.17.31-0.el7ev.noarch
Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016
Linux alma03.qa.lab.tlv.redhat.com 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Engine:
rhevm-dwh-setup-3.6.6-1.el6ev.noarch
rhevm-webadmin-portal-3.6.7.3-0.1.el6.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-setup-3.6.7.3-0.1.el6.noarch
rhevm-tools-backup-3.6.7.3-0.1.el6.noarch
rhevm-doc-3.6.7-1.el6eng.noarch
rhevm-branding-rhev-3.6.0-10.el6ev.noarch
rhevm-setup-base-3.6.7.3-0.1.el6.noarch
rhevm-backend-3.6.7.3-0.1.el6.noarch
rhevm-dbscripts-3.6.7.3-0.1.el6.noarch
rhevm-dependencies-3.6.0-1.el6ev.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-sdk-python-3.6.7.0-1.el6ev.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-image-uploader-3.6.0-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.7.3-0.1.el6.noarch
rhevm-vmconsole-proxy-helper-3.6.7.3-0.1.el6.noarch
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-restapi-3.6.7.3-0.1.el6.noarch
rhevm-3.6.7.3-0.1.el6.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.7.3-0.1.el6.noarch
rhevm-extensions-api-impl-3.6.7.3-0.1.el6.noarch
rhevm-websocket-proxy-3.6.7.3-0.1.el6.noarch
rhevm-reports-3.6.5.1-1.el6ev.noarch
rhevm-tools-3.6.7.3-0.1.el6.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.7.3-0.1.el6.noarch
rhevm-dwh-3.6.6-1.el6ev.noarch
rhevm-userportal-3.6.7.3-0.1.el6.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
rhevm-lib-3.6.7.3-0.1.el6.noarch
rhevm-cli-3.6.2.1-1.el6ev.noarch
rhevm-setup-plugin-websocket-proxy-3.6.7.3-0.1.el6.noarch
Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Fri May 6 14:54:05 EDT 2016
Linux nsednev-he-2.qa.lab.tlv.redhat.com 2.6.32-642.1.1.el6.x86_64 #1 SMP Fri May 6 14:54:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)


I've rebooted the host and agent connected to it's storage domain successfully, HE-VM started and I could connect to it via WEBUI in 4min 55s minutes after agent was up.

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : alma03.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : e60cc25c
Host timestamp                     : 323
[root@alma03 ~]# systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-06-13 16:27:09 IDT; 4min 55s ago
 Main PID: 2615 (ovirt-ha-agent)
   CGroup: /system.slice/ovirt-ha-agent.service
           └─2615 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

Jun 13 16:32:01 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
Jun 13 16:32:01 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain
Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE
Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineStarting (score: 3400)

I'm moving this bug to Verified forth to https://bugzilla.redhat.com/show_bug.cgi?id=1343980#c5 comment and as on components from here, the reproduction of the bug failed, and engine could get started normally by ha-agent. The issue is not related to specific OS, but to ovirt-hosted-engine-setup component, which is fixed starting from ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch.
Please feel free to reopen this bug if this issue still exists on your system.