Bug 1018849 - TImeout for monitor domain should be raised.
TImeout for monitor domain should be raised.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha (Show other bugs)
unspecified
Unspecified Unspecified
unspecified Severity high
: ---
: 3.3.0
Assigned To: Greg Padgett
Lukas Svaty
sla
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-14 10:07 EDT by Leonid Natapov
Modified: 2016-06-12 19:16 EDT (History)
8 users (show)

See Also:
Fixed In Version: ovirt-hosted-engine-ha-0.1.0-0.6.beta1.el6ev
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-21 11:50:58 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Ha agent log (38.40 KB, text/x-log)
2013-11-05 09:39 EST, Lukas Svaty
no flags Details
Broker log on host (473.40 KB, text/x-log)
2013-11-05 09:40 EST, Lukas Svaty
no flags Details
HA agent log on host (1.29 MB, text/x-log)
2013-11-05 10:03 EST, Lukas Svaty
no flags Details
Broker log on host (3.19 MB, text/x-log)
2013-11-05 10:04 EST, Lukas Svaty
no flags Details
HA agent on host (14.90 KB, text/x-log)
2013-11-21 05:38 EST, Lukas Svaty
no flags Details
Broker log on host (80.29 KB, text/x-log)
2013-11-21 05:38 EST, Lukas Svaty
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 20305 None None None Never
oVirt gerrit 21273 None None None Never

  None (edit)
Description Leonid Natapov 2013-10-14 10:07:57 EDT
It might take up to 3 minutes to acquire the lock if the host was previously fenced (or anyway if the host id wasn't cleanly released). We must increase a timeout in order to be able to start engine vm.

Probably also the exception should be handled avoiding traceback.  

 MainThread::ERROR::2013-10-14 13:36:17,834::hosted_engine::457::HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during domain acquisition
 None
MainThread::WARNING::2013-10-14 13:36:17,834::hosted_engine::247::HostedEngine::(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during
domain acquisition
MainThread::WARNING::2013-10-14 13:36:17,834::hosted_engine::250::HostedEngine::(start_monitoring) Unexpected error
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 237, in start_monitoring
    self._initialize_domain_monitor()
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 458, in _initialize_domain_monitor
     raise Exception(msg)
 Exception: Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during domain acquisition
-
how to reproduce:
-----
hosted engine env. with 1 host. 
engine vm runs on that host.
reboot the host
check that after reboot engine vm was created and started.
Comment 1 Greg Padgett 2013-10-28 09:47:28 EDT
Merged Change-Id: Ic586b1f11374632724cf71da5d3cac72eb83ca19
Comment 3 Lukas Svaty 2013-11-05 05:45:40 EST
fixed in version?
Comment 4 Lukas Svaty 2013-11-05 05:55:43 EST
transferred accidentally to VERIFIED moving back to ON_QA
Comment 5 Lukas Svaty 2013-11-05 09:39:34 EST
Created attachment 819803 [details]
Ha agent log
Comment 6 Lukas Svaty 2013-11-05 09:40:21 EST
Created attachment 819804 [details]
Broker log on host
Comment 7 Lukas Svaty 2013-11-05 10:02:29 EST
After manual reboot on host when host is again powered up
Vm of hosted engine stays in down status and is not powered
up by hosted engine. Added agent.log and broker.log of this aciton.

# hosted-engine --check-liveliness
No handlers could be found for logger "otopi.__main__"
Hosted Engine is not up!

# hosted-engine --vm-status

--== Host 1 status ==--

Hostname                           : `host hostname`
Host ID                            : 1
Engine status                      : vm-up good-health-status
Score                              : 2400
Host timestamp                     : 1383662480
Extra metadata                     :
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=1383662480 (Tue Nov  5 15:41:20 2013)
	host-id=1
	score=2400
	bridge=True
	cpu-load=0.45875
	engine-health=vm-up good-health-status
	gateway=True
	mem-free=5810
	mem-load=0.00331548074471

on powering down vm command

#hosted_engine --vm-poweroff
Virtual machine does not exist
Comment 8 Lukas Svaty 2013-11-05 10:03:27 EST
Created attachment 819815 [details]
HA agent log on host
Comment 9 Lukas Svaty 2013-11-05 10:04:49 EST
Created attachment 819816 [details]
Broker log on host
Comment 10 Lukas Svaty 2013-11-05 10:05:22 EST
attached correct logs
Comment 11 Greg Padgett 2013-11-08 17:53:01 EST
(In reply to Lukas Svaty from comment #7)
[...]
> # hosted-engine --check-liveliness
> No handlers could be found for logger "otopi.__main__"
> Hosted Engine is not up!
> 
> # hosted-engine --vm-status
> 
> --== Host 1 status ==--
> 
> Hostname                           : `host hostname`
> Host ID                            : 1
> Engine status                      : vm-up good-health-status
[...]
> #hosted_engine --vm-poweroff
> Virtual machine does not exist

I wonder if there is something else going on here based on these results, which indicate some odd behavior/inconsistency in the host's state.  I'd be interested in looking at the vdsm and libvirt logs, or if possible, in having a look at the system itself.
Comment 12 Leonid Natapov 2013-11-08 19:43:25 EST
(In reply to Lukas Svaty from comment #7)
> After manual reboot on host when host is again powered up
> Vm of hosted engine stays in down status and is not powered
> up by hosted engine. Added agent.log and broker.log of this aciton.
> 
> # hosted-engine --check-liveliness
> No handlers could be found for logger "otopi.__main__"
> Hosted Engine is not up!
> 
> # hosted-engine --vm-status
> 
> --== Host 1 status ==--
> 
> Hostname                           : `host hostname`
> Host ID                            : 1
> Engine status                      : vm-up good-health-status
> Score                              : 2400
> Host timestamp                     : 1383662480
> Extra metadata                     :
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=1383662480 (Tue Nov  5 15:41:20 2013)
> 	host-id=1
> 	score=2400
> 	bridge=True
> 	cpu-load=0.45875
> 	engine-health=vm-up good-health-status
> 	gateway=True
> 	mem-free=5810
> 	mem-load=0.00331548074471
> 
> on powering down vm command
> 
> #hosted_engine --vm-poweroff
> Virtual machine does not exist

Lukas,what vdsm version and ovirt-hosted-engine-ha version are you using ?
Comment 13 Lukas Svaty 2013-11-10 17:50:17 EST
Leonid:

vdsm-4.13.0-0.5.beta1.el6ev.x86_64
ovirt-hosted-engine-setup-1.0.0-0.7.beta2.el6ev.noarch
ovirt-hosted-engine-ha-0.1.0-0.4.beta1.el6ev.noarch


just realized this was tested on already created self-hosted engine which was updated to new version (ovirt-hosted-engine-ha-0.1.0-0.4) could this be an issue? 
Is upgrade from version 0.1.0-0.3 to 0.1.0-0.4 of HA package supported?
Comment 15 Greg Padgett 2013-11-18 10:18:15 EST
Merged Change-Id: I6e0c0dbeb50c2181b29565f0d933ad56ec05bb7b
Comment 16 Lukas Svaty 2013-11-21 05:37:20 EST
Vm did not start after host reboot moving back to ASSIGNED

Host after reboot:

[root@slot-5 ~]# hosted-engine --check-liveliness
No handlers could be found for logger "otopi.__main__"
Hosted Engine is not up!
[root@slot-5 ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : False
Hostname                           : slot-5.rhev.lab.eng.brq.redhat.com
Host ID                            : 1
Engine status                      : unknown stale-data
Score                              : 2400
Host timestamp                     : 1385029760
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=1385029760 (Thu Nov 21 11:29:20 2013)
	host-id=1
	score=2400
	bridge=True
	cpu-load=0.01
	engine-health=vm-down
	gateway=True
	mem-free=11535
	mem-load=0.000252228014125

Package versions:
vdsm-4.13.0-0.9.beta1.el6ev.x86_64
libvirt-0.10.2-29.el6.x86_64
ovirt-hosted-engine-ha-0.1.0-0.6.beta1.el6ev.noarch
ovirt-hosted-engine-setup-1.0.0-0.9.beta4.el6ev.noarch
ovirt-host-deploy-1.1.1-1.el6ev.noarch

Aditional:
host logs: broker.log agent.log

info: VM started after #hosted-engine --vm-start
Comment 17 Lukas Svaty 2013-11-21 05:38:02 EST
Created attachment 827121 [details]
HA agent on host
Comment 18 Lukas Svaty 2013-11-21 05:38:35 EST
Created attachment 827122 [details]
Broker log on host
Comment 20 Charlie 2013-11-27 20:41:33 EST
This bug is currently attached to errata RHEA-2013:15591. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.
Comment 21 Greg Padgett 2013-12-06 12:59:26 EST
ovirt-hosted-engine-ha is a new package; does not need errata for bugs during its development.
Comment 22 errata-xmlrpc 2014-01-21 11:50:58 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0080.html

Note You need to log in before you can comment on or make changes to this bug.