Bug 1018849 - TImeout for monitor domain should be raised.
Summary: TImeout for monitor domain should be raised.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Greg Padgett
QA Contact: Lukas Svaty
URL:
Whiteboard: sla
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-14 14:07 UTC by Leonid Natapov
Modified: 2016-06-12 23:16 UTC (History)
8 users (show)

Fixed In Version: ovirt-hosted-engine-ha-0.1.0-0.6.beta1.el6ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-21 16:50:58 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Ha agent log (38.40 KB, text/x-log)
2013-11-05 14:39 UTC, Lukas Svaty
no flags Details
Broker log on host (473.40 KB, text/x-log)
2013-11-05 14:40 UTC, Lukas Svaty
no flags Details
HA agent log on host (1.29 MB, text/x-log)
2013-11-05 15:03 UTC, Lukas Svaty
no flags Details
Broker log on host (3.19 MB, text/x-log)
2013-11-05 15:04 UTC, Lukas Svaty
no flags Details
HA agent on host (14.90 KB, text/x-log)
2013-11-21 10:38 UTC, Lukas Svaty
no flags Details
Broker log on host (80.29 KB, text/x-log)
2013-11-21 10:38 UTC, Lukas Svaty
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0080 0 normal SHIPPED_LIVE new package: ovirt-hosted-engine-ha 2014-01-21 21:00:07 UTC
oVirt gerrit 20305 0 None MERGED agent: extend timeout for domain monitor acquisition 2020-12-02 03:25:03 UTC
oVirt gerrit 21273 0 None MERGED agent: fixes for startup 2020-12-02 03:25:04 UTC

Description Leonid Natapov 2013-10-14 14:07:57 UTC
It might take up to 3 minutes to acquire the lock if the host was previously fenced (or anyway if the host id wasn't cleanly released). We must increase a timeout in order to be able to start engine vm.

Probably also the exception should be handled avoiding traceback.  

 MainThread::ERROR::2013-10-14 13:36:17,834::hosted_engine::457::HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during domain acquisition
 None
MainThread::WARNING::2013-10-14 13:36:17,834::hosted_engine::247::HostedEngine::(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during
domain acquisition
MainThread::WARNING::2013-10-14 13:36:17,834::hosted_engine::250::HostedEngine::(start_monitoring) Unexpected error
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 237, in start_monitoring
    self._initialize_domain_monitor()
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 458, in _initialize_domain_monitor
     raise Exception(msg)
 Exception: Failed to start monitoring domain (sd_uuid=ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf, host_id=1): timeout during domain acquisition
-
how to reproduce:
-----
hosted engine env. with 1 host. 
engine vm runs on that host.
reboot the host
check that after reboot engine vm was created and started.

Comment 1 Greg Padgett 2013-10-28 13:47:28 UTC
Merged Change-Id: Ic586b1f11374632724cf71da5d3cac72eb83ca19

Comment 3 Lukas Svaty 2013-11-05 10:45:40 UTC
fixed in version?

Comment 4 Lukas Svaty 2013-11-05 10:55:43 UTC
transferred accidentally to VERIFIED moving back to ON_QA

Comment 5 Lukas Svaty 2013-11-05 14:39:34 UTC
Created attachment 819803 [details]
Ha agent log

Comment 6 Lukas Svaty 2013-11-05 14:40:21 UTC
Created attachment 819804 [details]
Broker log on host

Comment 7 Lukas Svaty 2013-11-05 15:02:29 UTC
After manual reboot on host when host is again powered up
Vm of hosted engine stays in down status and is not powered
up by hosted engine. Added agent.log and broker.log of this aciton.

# hosted-engine --check-liveliness
No handlers could be found for logger "otopi.__main__"
Hosted Engine is not up!

# hosted-engine --vm-status

--== Host 1 status ==--

Hostname                           : `host hostname`
Host ID                            : 1
Engine status                      : vm-up good-health-status
Score                              : 2400
Host timestamp                     : 1383662480
Extra metadata                     :
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=1383662480 (Tue Nov  5 15:41:20 2013)
	host-id=1
	score=2400
	bridge=True
	cpu-load=0.45875
	engine-health=vm-up good-health-status
	gateway=True
	mem-free=5810
	mem-load=0.00331548074471

on powering down vm command

#hosted_engine --vm-poweroff
Virtual machine does not exist

Comment 8 Lukas Svaty 2013-11-05 15:03:27 UTC
Created attachment 819815 [details]
HA agent log on host

Comment 9 Lukas Svaty 2013-11-05 15:04:49 UTC
Created attachment 819816 [details]
Broker log on host

Comment 10 Lukas Svaty 2013-11-05 15:05:22 UTC
attached correct logs

Comment 11 Greg Padgett 2013-11-08 22:53:01 UTC
(In reply to Lukas Svaty from comment #7)
[...]
> # hosted-engine --check-liveliness
> No handlers could be found for logger "otopi.__main__"
> Hosted Engine is not up!
> 
> # hosted-engine --vm-status
> 
> --== Host 1 status ==--
> 
> Hostname                           : `host hostname`
> Host ID                            : 1
> Engine status                      : vm-up good-health-status
[...]
> #hosted_engine --vm-poweroff
> Virtual machine does not exist

I wonder if there is something else going on here based on these results, which indicate some odd behavior/inconsistency in the host's state.  I'd be interested in looking at the vdsm and libvirt logs, or if possible, in having a look at the system itself.

Comment 12 Leonid Natapov 2013-11-09 00:43:25 UTC
(In reply to Lukas Svaty from comment #7)
> After manual reboot on host when host is again powered up
> Vm of hosted engine stays in down status and is not powered
> up by hosted engine. Added agent.log and broker.log of this aciton.
> 
> # hosted-engine --check-liveliness
> No handlers could be found for logger "otopi.__main__"
> Hosted Engine is not up!
> 
> # hosted-engine --vm-status
> 
> --== Host 1 status ==--
> 
> Hostname                           : `host hostname`
> Host ID                            : 1
> Engine status                      : vm-up good-health-status
> Score                              : 2400
> Host timestamp                     : 1383662480
> Extra metadata                     :
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=1383662480 (Tue Nov  5 15:41:20 2013)
> 	host-id=1
> 	score=2400
> 	bridge=True
> 	cpu-load=0.45875
> 	engine-health=vm-up good-health-status
> 	gateway=True
> 	mem-free=5810
> 	mem-load=0.00331548074471
> 
> on powering down vm command
> 
> #hosted_engine --vm-poweroff
> Virtual machine does not exist

Lukas,what vdsm version and ovirt-hosted-engine-ha version are you using ?

Comment 13 Lukas Svaty 2013-11-10 22:50:17 UTC
Leonid:

vdsm-4.13.0-0.5.beta1.el6ev.x86_64
ovirt-hosted-engine-setup-1.0.0-0.7.beta2.el6ev.noarch
ovirt-hosted-engine-ha-0.1.0-0.4.beta1.el6ev.noarch


just realized this was tested on already created self-hosted engine which was updated to new version (ovirt-hosted-engine-ha-0.1.0-0.4) could this be an issue? 
Is upgrade from version 0.1.0-0.3 to 0.1.0-0.4 of HA package supported?

Comment 15 Greg Padgett 2013-11-18 15:18:15 UTC
Merged Change-Id: I6e0c0dbeb50c2181b29565f0d933ad56ec05bb7b

Comment 16 Lukas Svaty 2013-11-21 10:37:20 UTC
Vm did not start after host reboot moving back to ASSIGNED

Host after reboot:

[root@slot-5 ~]# hosted-engine --check-liveliness
No handlers could be found for logger "otopi.__main__"
Hosted Engine is not up!
[root@slot-5 ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : False
Hostname                           : slot-5.rhev.lab.eng.brq.redhat.com
Host ID                            : 1
Engine status                      : unknown stale-data
Score                              : 2400
Host timestamp                     : 1385029760
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=1385029760 (Thu Nov 21 11:29:20 2013)
	host-id=1
	score=2400
	bridge=True
	cpu-load=0.01
	engine-health=vm-down
	gateway=True
	mem-free=11535
	mem-load=0.000252228014125

Package versions:
vdsm-4.13.0-0.9.beta1.el6ev.x86_64
libvirt-0.10.2-29.el6.x86_64
ovirt-hosted-engine-ha-0.1.0-0.6.beta1.el6ev.noarch
ovirt-hosted-engine-setup-1.0.0-0.9.beta4.el6ev.noarch
ovirt-host-deploy-1.1.1-1.el6ev.noarch

Aditional:
host logs: broker.log agent.log

info: VM started after #hosted-engine --vm-start

Comment 17 Lukas Svaty 2013-11-21 10:38:02 UTC
Created attachment 827121 [details]
HA agent on host

Comment 18 Lukas Svaty 2013-11-21 10:38:35 UTC
Created attachment 827122 [details]
Broker log on host

Comment 20 Charlie 2013-11-28 01:41:33 UTC
This bug is currently attached to errata RHEA-2013:15591. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 21 Greg Padgett 2013-12-06 17:59:26 UTC
ovirt-hosted-engine-ha is a new package; does not need errata for bugs during its development.

Comment 22 errata-xmlrpc 2014-01-21 16:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0080.html


Note You need to log in before you can comment on or make changes to this bug.