Bug 1010980 - Vm fails to start after OS installation.
Vm fails to start after OS installation.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup (Show other bugs)
unspecified
Unspecified Unspecified
urgent Severity urgent
: ---
: 3.3.0
Assigned To: Sandro Bonazzola
Leonid Natapov
integration
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-23 08:42 EDT by Leonid Natapov
Modified: 2014-01-21 11:53 EST (History)
9 users (show)

See Also:
Fixed In Version: ovirt-hosted-engine-setup-1.0.0-0.7.beta2.el6ev
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-21 11:53:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (3.53 MB, text/plain)
2013-09-23 08:42 EDT, Leonid Natapov
no flags Details
logs (226.54 KB, text/plain)
2013-09-23 08:45 EDT, Leonid Natapov
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 20545 None None None Never
oVirt gerrit 20556 None None None Never

  None (edit)
Description Leonid Natapov 2013-09-23 08:42:33 EDT
Created attachment 801592 [details]
logs

Description of problem:

VM unable to start after OS installation.

Scenario:
ovirt-hosted-engine setup creates VM.
user connects to VM and install OS.
After successful OS installation VM reboots and destroys. 

At this point setup asks user if OS installation was successful.
If answer is yes. setup should create and start VM again.

but VM fails to start.
Here is the traceback from vdsm.log file

Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 356, in teardownVolumePath
    res = self.irs.teardownImage(drive['domainID'],
  File "/usr/share/vdsm/vm.py", line 1361, in __getitem__
    raise KeyError(key)
KeyError: 'domainID'

full vdsm log and ovirt-hosted-engine-setup.log file attached.
Comment 1 Leonid Natapov 2013-09-23 08:44:50 EDT
vdsm-4.12.0-127.gitedb88bf.el6ev.x86_64
libvirt-0.10.2-18.el6_4.9.x86_64
ovirt-hosted-engine-setup-1.0.0-0.4.1.beta.1.el6.noarch
ovirt-hosted-engine-ha-0.1.0-0.1.beta.1.el6.noarch
Comment 2 Leonid Natapov 2013-09-23 08:45:16 EDT
Created attachment 801604 [details]
logs
Comment 3 Greg Padgett 2013-09-23 09:55:09 EDT
From what I can tell, the issue is that if qemu exits, vdsm stats for the vm become stale instead of being removed.  This patch should at least serve as a workaround, if not a complete fix:

http://gerrit.ovirt.org/19470

Eduardo, Federico, your thoughts on this would be most welcome.  Thanks!
Comment 4 Greg Padgett 2013-09-23 17:32:38 EDT
In http://gerrit.ovirt.org/#/c/19470/ danken noted that the stats need to stay around until the engine can retrieve the status.  I then did the following test:

1. Start the vm
2. Poweroff the vm via the console
3. Confirm the bug was reproduced (starting vm fails with "Virtual machine already exists")
4. Destroy the vm with vdsClient
5. Start the vm - this time it succeeds.

Sandro, perhaps adding a call in hosted-engine-setup to destroy the vm after os installation (as in step 4) would solve the issue?
Comment 5 Sandro Bonazzola 2013-09-24 03:05:37 EDT
(In reply to Greg Padgett from comment #4)

> Sandro, perhaps adding a call in hosted-engine-setup to destroy the vm after
> os installation (as in step 4) would solve the issue?

Greg, the hosted engine VM is created with 'destroy' action on             'on_poweroff', 'on_reboot', 'on_crash' events.
http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/vdsm_hooks/hostedengine.py;h=e9e2ac42fe0981606d89a29a7b56cacd5809e928;hb=HEAD#l36

It also already issue a destroy command if the above is not honored:
after OS installation 
http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/plugins/ovirt-hosted-engine-setup/vm/runvm.py;h=32d82a15fc8d312c9e06227d5a9920f30ba1bcbc;hb=HEAD#l297

and after engine liveliness validation before starting ha daemons:
http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/plugins/ovirt-hosted-engine-setup/ha/ha_services.py;h=a96fce43ad77c90d6824875e1da12476296eb3a1;hb=HEAD#l70
Comment 6 Greg Padgett 2013-09-27 17:10:56 EDT
(In reply to Sandro Bonazzola from comment #5)
> (In reply to Greg Padgett from comment #4)
> 
> > Sandro, perhaps adding a call in hosted-engine-setup to destroy the vm after
> > os installation (as in step 4) would solve the issue?
> 
> Greg, the hosted engine VM is created with 'destroy' action on            
> 'on_poweroff', 'on_reboot', 'on_crash' events.
> http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/
> vdsm_hooks/hostedengine.py;h=e9e2ac42fe0981606d89a29a7b56cacd5809e928;
> hb=HEAD#l36
> 
> It also already issue a destroy command if the above is not honored:
> after OS installation 
> http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/
> plugins/ovirt-hosted-engine-setup/vm/runvm.py;
> h=32d82a15fc8d312c9e06227d5a9920f30ba1bcbc;hb=HEAD#l297
> 
> and after engine liveliness validation before starting ha daemons:
> http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/
> plugins/ovirt-hosted-engine-setup/ha/ha_services.py;
> h=a96fce43ad77c90d6824875e1da12476296eb3a1;hb=HEAD#l70

I did some more testing... I think there are a few things going on:

1. I could be mistaken, but I think the on_poweroff/etc hooks affect libvirt state (or at least, I don't see anything in vdsm that watch them).

2. The `hosted-engine --deploy` operation does indeed run the destroy command, as you mentioned above.  It appears I ran into this failure because I had restarted the vm using `hosted-engine --vm-start`, changed some things, and shut it down again.  Because deploy had already destroyed the vm and displayed the prompt asking the OS had installed successfully, it didn't destroy it again.

You could say it's my fault for starting the vm out-of-band from the deployment, rather than just answering "No" and letting the deployment code restart the vm for me.  We could work around this by either adding more messaging so people don't go rogue like me and start the vm by hand, or perhaps by having adding a vdsm destroy command somewhere in the `hosted-engine --vm-start` flow--only if it's not running already of course.
Comment 7 Sandro Bonazzola 2013-10-02 05:07:43 EDT
(In reply to Greg Padgett from comment #6)

> You could say it's my fault for starting the vm out-of-band from the
> deployment, rather than just answering "No" and letting the deployment code
> restart the vm for me.  We could work around this by either adding more
> messaging so people don't go rogue like me and start the vm by hand, or
> perhaps by having adding a vdsm destroy command somewhere in the
> `hosted-engine --vm-start` flow--only if it's not running already of course.

I'm not sure that destroying the VM when --vm-start is called is a good idea.
As Dan pointed out:

Whatever started the Vm should monitor it and issue the destroy verb when it finds the VM has gone Down.

So if the user start the VM with --vm-start it should be the user that call also --vm-poweroff after the shutdown.

However:
- I think that the destroy on shutdown requested by the hook should be honored.
- I think that I can add some additional checks before trying to create the VM, checking if the user has done something like you, leaving around a stale VM and tell him to cleanup it.
Comment 9 Sandro Bonazzola 2013-10-21 10:28:31 EDT
I assume you've moved this on me for:

> - I think that I can add some additional checks before trying to create the
> VM, checking if the user has done something like you, leaving around a stale
> VM and tell him to cleanup it.

right?
Comment 10 Greg Padgett 2013-10-21 11:02:28 EDT
(In reply to Sandro Bonazzola from comment #9)
> I assume you've moved this on me for:
> 
> > - I think that I can add some additional checks before trying to create the
> > VM, checking if the user has done something like you, leaving around a stale
> > VM and tell him to cleanup it.
> 
> right?

right, thanks.
Comment 11 Sandro Bonazzola 2013-10-25 09:22:15 EDT
patch merged on upstream master and 1.0 branch.
Comment 13 Leonid Natapov 2013-11-03 06:56:47 EST
fixed.
Comment 14 Charlie 2013-11-27 20:18:52 EST
This bug is currently attached to errata RHBA-2013:15257. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.
Comment 15 Sandro Bonazzola 2013-12-05 05:42:08 EST
hosted engine is a new package, does not need errata for specific bugs during its development.
Comment 16 errata-xmlrpc 2014-01-21 11:53:39 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0083.html

Note You need to log in before you can comment on or make changes to this bug.