Bug 912390 - vdsm: race between create and destory of VM leaves VM running on host while engine thinks its down.
Summary: vdsm: race between create and destory of VM leaves VM running on host while e...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
low
low vote
Target Milestone: ovirt-4.0.0-beta
: 4.17.999
Assignee: Francesco Romani
QA Contact: Artyom
URL:
Whiteboard:
: 1028045 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-18 15:18 UTC by Yaniv Kaul
Modified: 2016-08-12 14:05 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-12 14:05:12 UTC
oVirt Team: Virt
ylavi: ovirt-4.0.0?
rule-engine: planning_ack+
tjelinek: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
vdsm log (332.69 KB, application/x-gzip)
2013-02-18 15:21 UTC, Yaniv Kaul
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 44989 master ABANDONED vm: improve safety between startup and shutdown 2016-03-24 09:58:02 UTC
oVirt gerrit 54792 master MERGED vm: use proper threading.Event()s 2016-04-05 10:24:48 UTC
oVirt gerrit 55150 master ABANDONED vm: serialize destroy() and creation 2016-04-21 13:45:22 UTC
oVirt gerrit 55151 master MERGED vm: handle destroy request while starting up 2016-04-28 07:47:28 UTC

Description Yaniv Kaul 2013-02-18 15:18:45 UTC
Description of problem:
I tried to run VM, it seems not to run so I pressed stop several times. I guess it caused a race that while RHEVM thought the VM is down, in essence it is still up and running in the VDSM:

Possible cause:

Thread-984::ERROR::2013-02-18 17:03:14,129::vm::680::vm.Vm::(_startUnderlyingVm) vmId=`2723115e-744d-40ea-8b7d-57258f2c9d37`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 642, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1480, in _run
    self._domDependentInit()
  File "/usr/share/vdsm/libvirtvm.py", line 1354, in _domDependentInit
    raise Exception('destroy() called before Vm started')
Exception: destroy() called before Vm started


Version-Release number of selected component (if applicable):
vdsm-4.10.2-1.4.el6.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Yaniv Kaul 2013-02-18 15:21:28 UTC
Created attachment 698922 [details]
vdsm log

Comment 2 Michal Skrivanek 2013-02-21 09:11:39 UTC
this is harmless so I suppose the exception can be a normal log message.

You've killed the VM while it was still being created and we have a code to (almost) correctly handle that...well, looking at the code it's far from being bulletproof...actually, ugly! 
I'd say too late for current release,but let's make it better in 3.3

Comment 4 Michal Skrivanek 2013-09-18 09:10:10 UTC
refactoring of that part of code didn't make it in 3.3 timeframe, pushing to 3.4...

Comment 5 Michal Skrivanek 2014-02-12 10:54:40 UTC
let's see if we can refactor this in 3.5 timeframe...

Comment 6 Michal Skrivanek 2014-04-11 06:29:26 UTC
*** Bug 1028045 has been marked as a duplicate of this bug. ***

Comment 7 Michal Skrivanek 2014-04-11 06:30:15 UTC
additional thoughts from bug 1028045:

I don't care much about vdsm logs, but in UI we shouldn't say anything if we find the VM down. To differentiate, though, we may need to do an extra check at the engine level to see if that VM is not running somewhere else (I'm thinking of a race at the end of migration before the state is updated in UI one would send poweroff to the source host where the VM doesn't run any more)

Comment 8 Francesco Romani 2015-08-17 16:55:53 UTC
cleanup started -> POST

Comment 9 Michal Skrivanek 2015-09-18 07:30:11 UTC
the current patch is the right path, but there are more pieces missing. We won't be able to fit it into 3.6, proposing to postpone

Comment 12 Red Hat Bugzilla Rules Engine 2015-11-16 14:11:15 UTC
This bug is flagged for 3.6, yet the milestone is for 4.0 version, therefore the milestone has been reset.
Please set the correct milestone or add the flag.

Comment 13 Mike McCune 2016-03-28 22:37:22 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions

Comment 14 Francesco Romani 2016-04-21 13:48:02 UTC
After some thought, I don't think we can do much better than https://gerrit.ovirt.org/#/c/55151/5
on Vdsm side; otherwise, we'll need a complete overhaul of *both* the creation and destroy flow, which could be done only for 4.1.

Not sure if patches are needed for Engine side.

Comment 15 Francesco Romani 2016-04-29 12:33:23 UTC
(In reply to Francesco Romani from comment #14)
> After some thought, I don't think we can do much better than
> https://gerrit.ovirt.org/#/c/55151/5
> on Vdsm side; otherwise, we'll need a complete overhaul of *both* the
> creation and destroy flow, which could be done only for 4.1.
> 
> Not sure if patches are needed for Engine side.

Arik, do you think we need patches on Engine side? Could Vdsm be improved to make Engine's life easier here?

Comment 16 Francesco Romani 2016-05-09 06:52:42 UTC
(In reply to Francesco Romani from comment #15)
> (In reply to Francesco Romani from comment #14)
> > After some thought, I don't think we can do much better than
> > https://gerrit.ovirt.org/#/c/55151/5
> > on Vdsm side; otherwise, we'll need a complete overhaul of *both* the
> > creation and destroy flow, which could be done only for 4.1.
> > 
> > Not sure if patches are needed for Engine side.
> 
> Arik, do you think we need patches on Engine side? Could Vdsm be improved to
> make Engine's life easier here?

Considering the monitoring changes in Engine since this bug was reported, we believe no further patches are required.

Comment 17 Artyom 2016-08-10 11:36:43 UTC
Verified on vdsm-4.18.10-1.el7ev.x86_64

1) Added sleep to /usr/share/vdsm/virt/vm.py
self._vmCreationEvent.set()
    try:
        from time import sleep
        sleep(120)
        self._run()
on the host
2) Started VM in the engine on the specific host
3) Check that host still does not have VM
# virsh -r list
 Id    Name                           State
----------------------------------------------------

4) Poweroff VM

Check VDSM log, do not see any traceback or ERROR related to the bug problem.


Note You need to log in before you can comment on or make changes to this bug.