Hide Forgot
Created attachment 801592 [details] logs Description of problem: VM unable to start after OS installation. Scenario: ovirt-hosted-engine setup creates VM. user connects to VM and install OS. After successful OS installation VM reboots and destroys. At this point setup asks user if OS installation was successful. If answer is yes. setup should create and start VM again. but VM fails to start. Here is the traceback from vdsm.log file Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 356, in teardownVolumePath res = self.irs.teardownImage(drive['domainID'], File "/usr/share/vdsm/vm.py", line 1361, in __getitem__ raise KeyError(key) KeyError: 'domainID' full vdsm log and ovirt-hosted-engine-setup.log file attached.
vdsm-4.12.0-127.gitedb88bf.el6ev.x86_64 libvirt-0.10.2-18.el6_4.9.x86_64 ovirt-hosted-engine-setup-1.0.0-0.4.1.beta.1.el6.noarch ovirt-hosted-engine-ha-0.1.0-0.1.beta.1.el6.noarch
Created attachment 801604 [details] logs
From what I can tell, the issue is that if qemu exits, vdsm stats for the vm become stale instead of being removed. This patch should at least serve as a workaround, if not a complete fix: http://gerrit.ovirt.org/19470 Eduardo, Federico, your thoughts on this would be most welcome. Thanks!
In http://gerrit.ovirt.org/#/c/19470/ danken noted that the stats need to stay around until the engine can retrieve the status. I then did the following test: 1. Start the vm 2. Poweroff the vm via the console 3. Confirm the bug was reproduced (starting vm fails with "Virtual machine already exists") 4. Destroy the vm with vdsClient 5. Start the vm - this time it succeeds. Sandro, perhaps adding a call in hosted-engine-setup to destroy the vm after os installation (as in step 4) would solve the issue?
(In reply to Greg Padgett from comment #4) > Sandro, perhaps adding a call in hosted-engine-setup to destroy the vm after > os installation (as in step 4) would solve the issue? Greg, the hosted engine VM is created with 'destroy' action on 'on_poweroff', 'on_reboot', 'on_crash' events. http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/vdsm_hooks/hostedengine.py;h=e9e2ac42fe0981606d89a29a7b56cacd5809e928;hb=HEAD#l36 It also already issue a destroy command if the above is not honored: after OS installation http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/plugins/ovirt-hosted-engine-setup/vm/runvm.py;h=32d82a15fc8d312c9e06227d5a9920f30ba1bcbc;hb=HEAD#l297 and after engine liveliness validation before starting ha daemons: http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/plugins/ovirt-hosted-engine-setup/ha/ha_services.py;h=a96fce43ad77c90d6824875e1da12476296eb3a1;hb=HEAD#l70
(In reply to Sandro Bonazzola from comment #5) > (In reply to Greg Padgett from comment #4) > > > Sandro, perhaps adding a call in hosted-engine-setup to destroy the vm after > > os installation (as in step 4) would solve the issue? > > Greg, the hosted engine VM is created with 'destroy' action on > 'on_poweroff', 'on_reboot', 'on_crash' events. > http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/ > vdsm_hooks/hostedengine.py;h=e9e2ac42fe0981606d89a29a7b56cacd5809e928; > hb=HEAD#l36 > > It also already issue a destroy command if the above is not honored: > after OS installation > http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/ > plugins/ovirt-hosted-engine-setup/vm/runvm.py; > h=32d82a15fc8d312c9e06227d5a9920f30ba1bcbc;hb=HEAD#l297 > > and after engine liveliness validation before starting ha daemons: > http://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=blob;f=src/ > plugins/ovirt-hosted-engine-setup/ha/ha_services.py; > h=a96fce43ad77c90d6824875e1da12476296eb3a1;hb=HEAD#l70 I did some more testing... I think there are a few things going on: 1. I could be mistaken, but I think the on_poweroff/etc hooks affect libvirt state (or at least, I don't see anything in vdsm that watch them). 2. The `hosted-engine --deploy` operation does indeed run the destroy command, as you mentioned above. It appears I ran into this failure because I had restarted the vm using `hosted-engine --vm-start`, changed some things, and shut it down again. Because deploy had already destroyed the vm and displayed the prompt asking the OS had installed successfully, it didn't destroy it again. You could say it's my fault for starting the vm out-of-band from the deployment, rather than just answering "No" and letting the deployment code restart the vm for me. We could work around this by either adding more messaging so people don't go rogue like me and start the vm by hand, or perhaps by having adding a vdsm destroy command somewhere in the `hosted-engine --vm-start` flow--only if it's not running already of course.
(In reply to Greg Padgett from comment #6) > You could say it's my fault for starting the vm out-of-band from the > deployment, rather than just answering "No" and letting the deployment code > restart the vm for me. We could work around this by either adding more > messaging so people don't go rogue like me and start the vm by hand, or > perhaps by having adding a vdsm destroy command somewhere in the > `hosted-engine --vm-start` flow--only if it's not running already of course. I'm not sure that destroying the VM when --vm-start is called is a good idea. As Dan pointed out: Whatever started the Vm should monitor it and issue the destroy verb when it finds the VM has gone Down. So if the user start the VM with --vm-start it should be the user that call also --vm-poweroff after the shutdown. However: - I think that the destroy on shutdown requested by the hook should be honored. - I think that I can add some additional checks before trying to create the VM, checking if the user has done something like you, leaving around a stale VM and tell him to cleanup it.
I assume you've moved this on me for: > - I think that I can add some additional checks before trying to create the > VM, checking if the user has done something like you, leaving around a stale > VM and tell him to cleanup it. right?
(In reply to Sandro Bonazzola from comment #9) > I assume you've moved this on me for: > > > - I think that I can add some additional checks before trying to create the > > VM, checking if the user has done something like you, leaving around a stale > > VM and tell him to cleanup it. > > right? right, thanks.
patch merged on upstream master and 1.0 branch.
fixed.
This bug is currently attached to errata RHBA-2013:15257. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
hosted engine is a new package, does not need errata for specific bugs during its development.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0083.html