Hide Forgot
Created attachment 1165136 [details] he-faile.png Description of problem: Engine status show as "Can't connect to HA daemon" after reboot RHEV-H. Version-Release number of selected component (if applicable): rhev-hypervisor7-7.2-20160602.0.iso ovirt-node-3.6.1-12.0.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.1-1.el7ev.noarch vdsm-4.17.30-1.el7ev.noarch rhevm-appliance-20160602.0-1.el7ev.ova How reproducible: 100% Steps to Reproduce: 1. install RHEV-H 7.2-20160602.0 2. Deploy HE with correct steps. 3. After engine is running, reboot Host. 4. Login RHEV-H, check engine status. Actual results: Engine status show as "Can't connect to HA daemon" after reboot RHEV-H. Expected results: Engine still can work well after reboot RHEV-H. Additional info: # systemctl status ovirt-ha-agent.service ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2016-06-06 09:38:00 UTC; 59s ago Main PID: 29650 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─29650 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 06 09:38:47 cshaotest.redhat.com ovirt-ha-agent[29650]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent Jun 06 09:38:47 cshaotest.redhat.com ovirt-ha-agent[29650]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent Jun 06 09:38:52 cshaotest.redhat.com ovirt-ha-agent[29650]: WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '5' Jun 06 09:38:52 cshaotest.redhat.com ovirt-ha-agent[29650]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: cshaotest.redhat.com Jun 06 09:38:52 cshaotest.redhat.com ovirt-ha-agent[29650]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Jun 06 09:38:52 cshaotest.redhat.com ovirt-ha-agent[29650]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage Jun 06 09:38:52 cshaotest.redhat.com ovirt-ha-agent[29650]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 06 09:38:55 cshaotest.redhat.com ovirt-ha-agent[29650]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 06 09:38:55 cshaotest.redhat.com ovirt-ha-agent[29650]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent Jun 06 09:38:55 cshaotest.redhat.com ovirt-ha-agent[29650]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent Missing mount point with NFS storage which is set for HE-VM storage # mount ... ...
Created attachment 1165137 [details] all log info
Add "Regression" keyword due to no such issue on RHEV-H 7.2 for RHEV 3.6.6 (rhev-hypervisor7-7.2-20160517.0) build.
*** Bug 1343383 has been marked as a duplicate of this bug. ***
*** Bug 1343980 has been marked as a duplicate of this bug. ***
I encountered this bug on rhevh-ng 4.0 beta 1 build(rhev-hypervisor7-ng-4.0-20160607.1,ovirt-hosted-engine-setup-2.0.0-1.el7ev), from QE point of view, this regression bug is 4.0 beta blocker.
ovirt-hosted-engine-setup-1.3.7.2-1.el7ev already includes this fix
fyi looks good with: # rpm -q ovirt-hosted-engine-setup ovirt-hosted-engine-setup-1.3.7.3-0.0.master.20160607094202.git6c7a783.el7.centos.noarch # grep mnt_ /etc/ovirt-hosted-engine/hosted-engine.conf mnt_options=
Forth to https://bugzilla.redhat.com/show_bug.cgi?id=1343980#c5 and coupled together with results I've got from latest deployment of the components as appears bellow, the hosted-engine deployment and host's reboot, both worked for me just fine. Host: qemu-kvm-rhev-2.3.0-31.el7_2.15.x86_64 mom-0.5.4-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-host-1.0.2-2.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch ovirt-vmconsole-1.0.2-2.el7ev.noarch vdsm-4.17.31-0.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux alma03.qa.lab.tlv.redhat.com 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: rhevm-dwh-setup-3.6.6-1.el6ev.noarch rhevm-webadmin-portal-3.6.7.3-0.1.el6.noarch rhevm-spice-client-x64-cab-3.6-7.el6.noarch rhevm-setup-plugins-3.6.5-1.el6ev.noarch rhevm-setup-3.6.7.3-0.1.el6.noarch rhevm-tools-backup-3.6.7.3-0.1.el6.noarch rhevm-doc-3.6.7-1.el6eng.noarch rhevm-branding-rhev-3.6.0-10.el6ev.noarch rhevm-setup-base-3.6.7.3-0.1.el6.noarch rhevm-backend-3.6.7.3-0.1.el6.noarch rhevm-dbscripts-3.6.7.3-0.1.el6.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-spice-client-x86-cab-3.6-7.el6.noarch rhevm-sdk-python-3.6.7.0-1.el6ev.noarch rhevm-guest-agent-common-1.0.11-6.el6ev.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.7.3-0.1.el6.noarch rhevm-vmconsole-proxy-helper-3.6.7.3-0.1.el6.noarch rhevm-reports-setup-3.6.5.1-1.el6ev.noarch rhevm-restapi-3.6.7.3-0.1.el6.noarch rhevm-3.6.7.3-0.1.el6.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-spice-client-x86-msi-3.6-7.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.7.3-0.1.el6.noarch rhevm-extensions-api-impl-3.6.7.3-0.1.el6.noarch rhevm-websocket-proxy-3.6.7.3-0.1.el6.noarch rhevm-reports-3.6.5.1-1.el6ev.noarch rhevm-tools-3.6.7.3-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.7.3-0.1.el6.noarch rhevm-dwh-3.6.6-1.el6ev.noarch rhevm-userportal-3.6.7.3-0.1.el6.noarch rhevm-spice-client-x64-msi-3.6-7.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-lib-3.6.7.3-0.1.el6.noarch rhevm-cli-3.6.2.1-1.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.6.7.3-0.1.el6.noarch Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Fri May 6 14:54:05 EDT 2016 Linux nsednev-he-2.qa.lab.tlv.redhat.com 2.6.32-642.1.1.el6.x86_64 #1 SMP Fri May 6 14:54:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.8 (Santiago) I've rebooted the host and agent connected to it's storage domain successfully, HE-VM started and I could connect to it via WEBUI in 4min 55s minutes after agent was up. # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : e60cc25c Host timestamp : 323 [root@alma03 ~]# systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2016-06-13 16:27:09 IDT; 4min 55s ago Main PID: 2615 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─2615 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 13 16:32:01 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 13 16:32:01 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf Jun 13 16:32:02 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2615]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineStarting (score: 3400)
I'm moving this bug to Verified forth to https://bugzilla.redhat.com/show_bug.cgi?id=1343980#c5 comment and as on components from here https://bugzilla.redhat.com/show_bug.cgi?id=1342988#c9 the reproduction of the bug failed, and engine could get started normally by ha-agent. The issue is not related to specific OS, but to ovirt-hosted-engine-setup component, which is fixed starting from ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch. Please feel free to reopen this bug if this issue still exists on your system.
Nikolai, please use this bug to verify the 4.0 build, the 3.6.7 result needs to be reflected in bug 1346137
(In reply to Fabian Deutsch from comment #12) > Nikolai, please use this bug to verify the 4.0 build, the 3.6.7 result needs > to be reflected in bug 1346137 https://bugzilla.redhat.com/show_bug.cgi?id=1346137 is a duplication of this very bug, they both has target release of 3.6.7. Please consider opening a clone to 4.0, as on 3.6.7 everything is working OK.
(In reply to Fabian Deutsch from comment #12) > Nikolai, please use this bug to verify the 4.0 build, the 3.6.7 result needs > to be reflected in bug 1346137 Please disregard the comment 13, I see that you've changed target release of this bug to 4.0.
After successful fresh deployment of HE on host I've rebooted the host as required by reproduction steps. After host booted up, I've hit this bug https://bugzilla.redhat.com/show_bug.cgi?id=1343005 and engine failed to get started.After ~9 minutes engine recovered and started just fine. Moving this bug status to verified, as its not OS specific and initial root cause has been fixed. ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2016-06-19 20:53:10 IDT; 12min ago Main PID: 2916 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─2916 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jun 19 21:04:48 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Engine vm running on localhost Jun 19 21:04:49 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Jun 19 21:04:54 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage Jun 19 21:04:54 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 19 21:05:04 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server Jun 19 21:05:04 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain Jun 19 21:05:04 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images Jun 19 21:05:04 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images Jun 19 21:05:09 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain Jun 19 21:05:09 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[2916]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : FQDNofmyhosthere Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : d517f01c Host timestamp : 929 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=929 (Sun Jun 19 21:06:52 2016) host-id=1 score=3400 maintenance=False state=EngineUp stopped=False Components on host: ovirt-iso-uploader-4.0.0-1.el7ev.noarch ovirt-engine-restapi-4.0.0.5-0.1.el7ev.noarch ovirt-vmconsole-1.0.3-1.el7ev.noarch vdsm-4.18.3-0.el7ev.x86_64 ovirt-host-deploy-1.5.0-1.el7ev.noarch ovirt-vmconsole-proxy-1.0.3-1.el7ev.noarch ovirt-engine-extension-aaa-jdbc-1.1.0-1.el7ev.noarch ovirt-engine-dwh-4.0.0-2.el7ev.noarch ovirt-engine-setup-4.0.0.5-0.1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-engine-cli-3.6.7.0-1.el7ev.noarch ovirt-engine-websocket-proxy-4.0.0.5-0.1.el7ev.noarch ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-image-uploader-4.0.0-1.el7ev.noarch ovirt-engine-userportal-4.0.0.5-0.1.el7ev.noarch ovirt-engine-webadmin-portal-4.0.0.5-0.1.el7ev.noarch ovirt-engine-tools-4.0.0.5-0.1.el7ev.noarch ovirt-engine-setup-plugin-websocket-proxy-4.0.0.5-0.1.el7ev.noarch ovirt-engine-backend-4.0.0.5-0.1.el7ev.noarch ovirt-engine-vmconsole-proxy-helper-4.0.0.5-0.1.el7ev.noarch ovirt-engine-dbscripts-4.0.0.5-0.1.el7ev.noarch ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch ovirt-engine-lib-4.0.0.5-0.1.el7ev.noarch ovirt-engine-setup-plugin-ovirt-engine-common-4.0.0.5-0.1.el7ev.noarch ovirt-engine-extensions-api-impl-4.0.0.5-0.1.el7ev.noarch ovirt-host-deploy-java-1.5.0-1.el7ev.noarch ovirt-engine-dwh-setup-4.0.0-2.el7ev.noarch ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.0.0.5-0.1.el7ev.noarch ovirt-engine-4.0.0.5-0.1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64 mom-0.5.4-1.el7ev.noarch ovirt-engine-tools-backup-4.0.0.5-0.1.el7ev.noarch ovirt-log-collector-4.0.0-1.el7ev.noarch ovirt-engine-dashboard-1.0.0-20160615git43298a4.el7ev.x86_64 ovirt-engine-setup-plugin-ovirt-engine-4.0.0.5-0.1.el7ev.noarch ovirt-vmconsole-host-1.0.3-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch ovirt-engine-setup-base-4.0.0.5-0.1.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: rhevm-doc-4.0.0-2.el7ev.noarch rhev-release-4.0.0-17-001.noarch rhevm-setup-plugins-4.0.0.1-1.el7ev.noarch rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch rhevm-branding-rhev-4.0.0-1.el7ev.noarch rhevm-guest-agent-common-1.0.12-2.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-4.0.0.5-0.1.el7ev.noarch rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch rhev-guest-tools-iso-4.0-2.el7ev.noarch Linux version 3.10.0-327.18.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Apr 8 05:09:53 EDT 2016 Linux 3.10.0-327.18.2.el7.x86_64 #1 SMP Fri Apr 8 05:09:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1744.html