Bug 1019229 - ovirt-hosted-engine | Engine VM is not reported by vdsm although it's up and running.
Summary: ovirt-hosted-engine | Engine VM is not reported by vdsm although it's up and ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Greg Padgett
QA Contact: Ilanit Stein
URL:
Whiteboard: sla
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-15 10:36 UTC by Leonid Natapov
Modified: 2016-06-12 23:16 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-28 15:37:00 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm log file (143.79 KB, application/x-gzip)
2013-10-15 10:36 UTC, Leonid Natapov
no flags Details
vdsm.log.3.xz (537.21 KB, application/x-xz)
2013-10-15 15:53 UTC, Federico Simoncelli
no flags Details

Description Leonid Natapov 2013-10-15 10:36:53 UTC
Created attachment 812419 [details]
vdsm log file

Description of problem:

Engine VM is not reported by vdsm although it's up and running.


Scenario:
1.hosted engine environment with 2 hosts.
2.engine vm runs on 1 host and the second host is unavailable.
3.block connection to storage from the 1 host,so vm should start on the second host but it failed because the second host is unavailable. After resuming connection on the first host I can see that engine vm is up and running (i can connect to it) but according to vdsm it's not exist.

Here is the sanlock status output: 
------------------------------------------------------------------
[root@purple-vds3 vdsm]# sanlock status
daemon 6c067f1a-1056-4314-b825-7f2bf73d3b41.purple-vds
p -1 helper
p -1 listener
p 28340 HostedEngine
p 19763 
p -1 status
s ec7d5406-2cb9-472c-b7e7-2ad6fedf9e3a:1:/dev/ec7d5406-2cb9-472c-b7e7-2ad6fedf9e3a/ids:0
s ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:1:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/dom_md/ids:0
s hosted-engine:1:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/ha_agent/hosted-engine.lockspace:0
r ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf:586a0e62-5f38-4e93-848f-57588e128261:/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com\:_export_hosted__engine1/ef4a31eb-688f-4f19-af9f-6a3e0bf82ebf/images/b64e7d62-c3cb-44d1-b280-f4e1318720fe/586a0e62-5f38-4e93-848f-57588e128261.lease:0:20 p 28340
------------------------------------------------------------------
vdsm version: vdsm-4.13.0-13.gitc08fbb6.el6_4.x86_64
vdsm log attached.

Comment 1 Ayal Baron 2013-10-15 11:28:18 UTC
what does vdsClient getVmsList report?

Comment 2 Leonid Natapov 2013-10-15 11:36:23 UTC
vdsClient list shows nothing.
vdsClient getVmsList can only be used with the pool and run on SPM. I have no pool.

Comment 3 Federico Simoncelli 2013-10-15 15:53:32 UTC
Created attachment 812604 [details]
vdsm.log.3.xz

Attaching some additional logs.
At first glance this looks like a race between vmCreate and vmDestroy:

10:33:44,290::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
10:33:47,529::BindingXMLRPC::984::vds::(wrapper) client [10.35.102.57]::call vmDestroy with ('ac687a24-05cd-
11:07:51,461::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:07:54,173::BindingXMLRPC::984::vds::(wrapper) client [10.35.102.57]::call vmDestroy with ('ac687a24-05cd-
11:20:00,162::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:20:11,626::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:20:12,928::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:20:23,653::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:20:24,944::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:20:36,199::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:20:37,551::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:20:48,235::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:20:49,522::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:21:00,222::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:21:01,533::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:21:12,199::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:21:13,480::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:21:24,739::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:21:26,022::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':

Thread-65601::ERROR::2013-10-15 11:21:22,551::vm::2132::vm.Vm::(_startUnderlyingVm) vmId=`ac687a24-05cd-43fb-9f9d-c53169fa8cd0`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 2092, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/vm.py", line 2959, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Requested operation is not valid: domain is already active as 'HostedEngine'

11:21:37,233::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:21:38,538::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:21:49,542::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:21:50,831::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:22:01,652::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:22:02,975::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:22:13,848::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:22:15,168::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:22:25,986::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:22:27,313::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:22:38,357::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:22:39,771::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:22:50,659::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:22:52,039::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:23:02,789::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:23:04,095::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:23:15,140::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:23:16,511::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:23:27,233::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:23:28,556::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:23:39,230::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:23:40,521::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
11:23:51,268::BindingXMLRPC::984::vds::(wrapper) client [10.35.109.10]::call vmDestroy with ('ac687a24-05cd-
11:23:52,646::BindingXMLRPC::984::vds::(wrapper) client [127.0.0.1]::call vmCreate with ({'emulatedMachine':
12:46:07,337::BindingXMLRPC::984::vds::(wrapper) client [10.35.102.57]::call vmDestroy with ('ac687a24-05cd-

Beside fixing this issue in vdsm we should ask ourselves why the he-agent is issuing these commands.

Comment 4 Leonid Natapov 2013-10-15 19:32:18 UTC
After discussion with Greg another bug was opened regarding these messages.
https://bugzilla.redhat.com/show_bug.cgi?id=1019468

Comment 7 Michal Skrivanek 2013-11-13 15:12:00 UTC
from virt point of view the VM was destroyed by
Thread-69404::DEBUG::2013-10-15 12:46:07,337::BindingXMLRPC::984::vds::(wrapper) client [10.35.102.57]::call vmDestroy …
and since then it doesn't exist. If someone somewhere somehow started it again without letting vdsm know then it must have been the ha_agent?

Comment 8 Michal Skrivanek 2013-11-13 15:13:16 UTC
any other logs? from ha_agent perhaps?
only vdsm is not really helpful, I'm missing the time references and what was going on

Comment 9 Leonid Natapov 2013-11-13 15:58:28 UTC
(In reply to Michal Skrivanek from comment #8)
> any other logs? from ha_agent perhaps?
> only vdsm is not really helpful, I'm missing the time references and what
> was going on

No :( it was a month ago when the bug was opened. And Federico asked vdsm logs,so I attached them.

Comment 10 Doron Fediuck 2013-11-18 12:20:26 UTC
Greg & Leonid,
let's revisit this issue;
See if still reproducible. If so who creates / destroys the VM.

Comment 11 Greg Padgett 2013-11-19 00:25:55 UTC
I tried to reproduce it with the latest packages and everything looks good.  I hope Leonid can confirm.

(As an aside, I wonder if this is related to bug 1022924.  It seems to have many similarities.)

Comment 12 Leonid Natapov 2013-11-24 13:21:09 UTC
Indeed. I was unable to reproduce it.

Comment 13 Doron Fediuck 2013-11-28 15:37:00 UTC
Please re-open if you're able to reproduce.


Note You need to log in before you can comment on or make changes to this bug.