Bug 907877 - vdsm: we are re-running vm that raised libvirt error domain is already active (no exception raised by vdsm to engine)
Summary: vdsm: we are re-running vm that raised libvirt error domain is already active...
Keywords:
Status: CLOSED DUPLICATE of bug 907972
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.2.0
Assignee: Nobody's working on this, feel free to take it
QA Contact:
URL:
Whiteboard: virt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-05 13:03 UTC by Dafna Ron
Modified: 2014-01-01 08:42 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-14 12:38:17 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (550.85 KB, application/x-gzip)
2013-02-05 13:03 UTC, Dafna Ron
no flags Details

Description Dafna Ron 2013-02-05 13:03:07 UTC
Created attachment 693360 [details]
logs

Description of problem:

I had a vm that was stuck in wait fir launch so after a minute I decided to power off the vm and re-start it. 
after I powered off the vm and restarted it, it failed to run on the same host again and we re-ran it on the second host. 
looking at the error in the vdsm, libvirt failed to start the vm because the domain is already up in libvirt. 
however, since no specific error was raised to engine, we re-start the vm on the second host. 

the event is already listed in event log: 
VM NNNNN is down. Exit message: Requested operation is not valid: domain is already active as 'NNNNN'.

but I cannot see any exception which will prevent the engine from re-running the vm. 

Version-Release number of selected component (if applicable):

sf5
vdsm-4.10.2-5.0.el6ev.x86_64
libvirt-0.10.2-18.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create a vm and run it
2. suspend the vm
3. create a live snapshot while the vm is suspended
4. once the snapshot was created resume the vm
5. power off the vm
6. run the vm again
7. vm will be stuck in wait for launch -> power off
8. try to start the vm again. 
  
Actual results:

we are re-running a domain on a second host when the domain already exists in libvirt. 

Expected results:

we should not re-run a vm if the domain already exists in libvirt. 
exception should be raised to engine. 

Additional info:

first host: 


virsh > list
 Id    Name                           State
----------------------------------------------------
 5     KKKKK                          shut off
 8     NNNNN                          shut off


second host: 

 Id    Name                           State
----------------------------------------------------
 29    KKKKK                          running
 31    NNNNN                          running


  File "/usr/share/vdsm/vm.py", line 662, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1518, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 104, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2645, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Requested operation is not valid: domain is already active as 'KKKKK'


2013-02-05 14:23:00,397 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-44) [437e7a48] Rerun vm 11d0501a-59aa-4566-81f5-be8c5eeced79. Called from vds gold-vdsd

Comment 1 Dafna Ron 2013-02-05 14:39:18 UTC
sorry - I forgot a step:

Steps to Reproduce:
1. create a vm and run it
2. suspend the vm
3. create a live snapshot while the vm is suspended
4. once the snapshot was created resume the vm
5. power off the vm
6. delete the snapshot
7. run the vm again
8. vm will be stuck in wait for launch -> power off
9. try to start the vm again.

Comment 2 Dafna Ron 2013-02-05 15:24:19 UTC
after some more tests this scenario is simpler.
the domain is listed as existed in libvirt because of a bug in which after suspend -> resume -> power off -> power on of vm the vm will start with status shut off in libvirt -> vdsm is not getting a pid and vm is stuck in wait for launch. 

https://bugzilla.redhat.com/show_bug.cgi?id=907972

Comment 3 Michal Skrivanek 2013-02-14 12:38:17 UTC
then I'd really dupe it, if you don't mind. We need to avoid 907972 in the first place

*** This bug has been marked as a duplicate of bug 907972 ***


Note You need to log in before you can comment on or make changes to this bug.