It might take up to 3 minutes to acquire the lock if the host was previously fenced (or anyway if the host id wasn't cleanly released). We must increase a timeout in order to be able to start engine vm. Probably also the exception should be handled avoiding traceback. MainThread::ERROR::2013-10-14 13:36:17,834::hosted_engine::457::HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during domain acquisition None MainThread::WARNING::2013-10-14 13:36:17,834::hosted_engine::247::HostedEngine::(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during domain acquisition MainThread::WARNING::2013-10-14 13:36:17,834::hosted_engine::250::HostedEngine::(start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 237, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 458, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during domain acquisition - how to reproduce: ----- hosted engine env. with 1 host. engine vm runs on that host. reboot the host check that after reboot engine vm was created and started.
Merged Change-Id: Ic586b1f11374632724cf71da5d3cac72eb83ca19
fixed in version?
transferred accidentally to VERIFIED moving back to ON_QA
Created attachment 819803 [details] Ha agent log
Created attachment 819804 [details] Broker log on host
After manual reboot on host when host is again powered up Vm of hosted engine stays in down status and is not powered up by hosted engine. Added agent.log and broker.log of this aciton. # hosted-engine --check-liveliness No handlers could be found for logger "otopi.__main__" Hosted Engine is not up! # hosted-engine --vm-status --== Host 1 status ==-- Hostname : `host hostname` Host ID : 1 Engine status : vm-up good-health-status Score : 2400 Host timestamp : 1383662480 Extra metadata : metadata_parse_version=1 metadata_feature_version=1 timestamp=1383662480 (Tue Nov 5 15:41:20 2013) host-id=1 score=2400 bridge=True cpu-load=0.45875 engine-health=vm-up good-health-status gateway=True mem-free=5810 mem-load=0.00331548074471 on powering down vm command #hosted_engine --vm-poweroff Virtual machine does not exist
Created attachment 819815 [details] HA agent log on host
Created attachment 819816 [details] Broker log on host
attached correct logs
(In reply to Lukas Svaty from comment #7) [...] > # hosted-engine --check-liveliness > No handlers could be found for logger "otopi.__main__" > Hosted Engine is not up! > > # hosted-engine --vm-status > > --== Host 1 status ==-- > > Hostname : `host hostname` > Host ID : 1 > Engine status : vm-up good-health-status [...] > #hosted_engine --vm-poweroff > Virtual machine does not exist I wonder if there is something else going on here based on these results, which indicate some odd behavior/inconsistency in the host's state. I'd be interested in looking at the vdsm and libvirt logs, or if possible, in having a look at the system itself.
(In reply to Lukas Svaty from comment #7) > After manual reboot on host when host is again powered up > Vm of hosted engine stays in down status and is not powered > up by hosted engine. Added agent.log and broker.log of this aciton. > > # hosted-engine --check-liveliness > No handlers could be found for logger "otopi.__main__" > Hosted Engine is not up! > > # hosted-engine --vm-status > > --== Host 1 status ==-- > > Hostname : `host hostname` > Host ID : 1 > Engine status : vm-up good-health-status > Score : 2400 > Host timestamp : 1383662480 > Extra metadata : > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=1383662480 (Tue Nov 5 15:41:20 2013) > host-id=1 > score=2400 > bridge=True > cpu-load=0.45875 > engine-health=vm-up good-health-status > gateway=True > mem-free=5810 > mem-load=0.00331548074471 > > on powering down vm command > > #hosted_engine --vm-poweroff > Virtual machine does not exist Lukas,what vdsm version and ovirt-hosted-engine-ha version are you using ?
Leonid: vdsm-4.13.0-0.5.beta1.el6ev.x86_64 ovirt-hosted-engine-setup-1.0.0-0.7.beta2.el6ev.noarch ovirt-hosted-engine-ha-0.1.0-0.4.beta1.el6ev.noarch just realized this was tested on already created self-hosted engine which was updated to new version (ovirt-hosted-engine-ha-0.1.0-0.4) could this be an issue? Is upgrade from version 0.1.0-0.3 to 0.1.0-0.4 of HA package supported?
Merged Change-Id: I6e0c0dbeb50c2181b29565f0d933ad56ec05bb7b
Vm did not start after host reboot moving back to ASSIGNED Host after reboot: [root@slot-5 ~]# hosted-engine --check-liveliness No handlers could be found for logger "otopi.__main__" Hosted Engine is not up! [root@slot-5 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : slot-5.rhev.lab.eng.brq.redhat.com Host ID : 1 Engine status : unknown stale-data Score : 2400 Host timestamp : 1385029760 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1385029760 (Thu Nov 21 11:29:20 2013) host-id=1 score=2400 bridge=True cpu-load=0.01 engine-health=vm-down gateway=True mem-free=11535 mem-load=0.000252228014125 Package versions: vdsm-4.13.0-0.9.beta1.el6ev.x86_64 libvirt-0.10.2-29.el6.x86_64 ovirt-hosted-engine-ha-0.1.0-0.6.beta1.el6ev.noarch ovirt-hosted-engine-setup-1.0.0-0.9.beta4.el6ev.noarch ovirt-host-deploy-1.1.1-1.el6ev.noarch Aditional: host logs: broker.log agent.log info: VM started after #hosted-engine --vm-start
Created attachment 827121 [details] HA agent on host
Created attachment 827122 [details] Broker log on host
This bug is currently attached to errata RHEA-2013:15591. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
ovirt-hosted-engine-ha is a new package; does not need errata for bugs during its development.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0080.html