Bug 907877
Summary: | vdsm: we are re-running vm that raised libvirt error domain is already active (no exception raised by vdsm to engine) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
Component: | vdsm | Assignee: | Nobody's working on this, feel free to take it <nobody> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.2.0 | CC: | bazulay, hateya, iheim, lpeer, michal.skrivanek, ykaul | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.2.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | virt | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-14 12:38:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
sorry - I forgot a step: Steps to Reproduce: 1. create a vm and run it 2. suspend the vm 3. create a live snapshot while the vm is suspended 4. once the snapshot was created resume the vm 5. power off the vm 6. delete the snapshot 7. run the vm again 8. vm will be stuck in wait for launch -> power off 9. try to start the vm again. after some more tests this scenario is simpler. the domain is listed as existed in libvirt because of a bug in which after suspend -> resume -> power off -> power on of vm the vm will start with status shut off in libvirt -> vdsm is not getting a pid and vm is stuck in wait for launch. https://bugzilla.redhat.com/show_bug.cgi?id=907972 then I'd really dupe it, if you don't mind. We need to avoid 907972 in the first place *** This bug has been marked as a duplicate of bug 907972 *** |
Created attachment 693360 [details] logs Description of problem: I had a vm that was stuck in wait fir launch so after a minute I decided to power off the vm and re-start it. after I powered off the vm and restarted it, it failed to run on the same host again and we re-ran it on the second host. looking at the error in the vdsm, libvirt failed to start the vm because the domain is already up in libvirt. however, since no specific error was raised to engine, we re-start the vm on the second host. the event is already listed in event log: VM NNNNN is down. Exit message: Requested operation is not valid: domain is already active as 'NNNNN'. but I cannot see any exception which will prevent the engine from re-running the vm. Version-Release number of selected component (if applicable): sf5 vdsm-4.10.2-5.0.el6ev.x86_64 libvirt-0.10.2-18.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. create a vm and run it 2. suspend the vm 3. create a live snapshot while the vm is suspended 4. once the snapshot was created resume the vm 5. power off the vm 6. run the vm again 7. vm will be stuck in wait for launch -> power off 8. try to start the vm again. Actual results: we are re-running a domain on a second host when the domain already exists in libvirt. Expected results: we should not re-run a vm if the domain already exists in libvirt. exception should be raised to engine. Additional info: first host: virsh > list Id Name State ---------------------------------------------------- 5 KKKKK shut off 8 NNNNN shut off second host: Id Name State ---------------------------------------------------- 29 KKKKK running 31 NNNNN running File "/usr/share/vdsm/vm.py", line 662, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 1518, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 104, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2645, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Requested operation is not valid: domain is already active as 'KKKKK' 2013-02-05 14:23:00,397 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-44) [437e7a48] Rerun vm 11d0501a-59aa-4566-81f5-be8c5eeced79. Called from vds gold-vdsd