Description of problem:
trying to redeploy fails, HA service didn't stop
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run "hosted-engine --deploy" and fail it
2. rerun "hosted-engine --deploy" using the same NFS share
Additional info: (from vdsm logs)
Thread-53::ERROR::2013-11-26 13:27:48,742::BindingXMLRPC::1003::vds::(wrapper) unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/BindingXMLRPC.py", line 989, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/BindingXMLRPC.py", line 240, in vmSetTicket
return vm.setTicket(password, ttl, existingConnAction, params)
File "/usr/share/vdsm/API.py", line 592, in setTicket
return v.setTicket(password, ttl, existingConnAction, params)
File "/usr/share/vdsm/vm.py", line 4303, in setTicket
graphics = _domParseStr(self._dom.XMLDesc(0)).childNodes. \
AttributeError: 'NoneType' object has no attribute 'XMLDesc'
Created attachment 829235 [details]
*** Bug 1034826 has been marked as a duplicate of this bug. ***
What should the setup do if there's an already defined VM on this machine with the same name? Stop it? Delete?
What is the valid way to continue?
in this specific case there was an earlier error from libvirt which did not find a VM, since it was not running. So it shouldn't be an issue.
Generally speaking, we should check if there's a running VM. If we find one, ask the user permission to kill it in order to proceed and then stop it.
Relevant error in attached vdsm.log here is:
Thread-42::DEBUG::2013-11-26 13:27:37,707::libvirtconnection::108::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 9 edom: 20 level: 2 message: operation failed: domain 'HostedEngine' already exists with uuid 7c13d921-6adf-4737-94fa-e387b3de1c97
Thread-42::DEBUG::2013-11-26 13:27:37,707::vm::2118::vm.Vm::(_startUnderlyingVm) vmId=`af3da3f8-b598-4810-9845-f58f679a6d8e`::_ongoingCreations released
Thread-42::ERROR::2013-11-26 13:27:37,708::vm::2144::vm.Vm::(_startUnderlyingVm) vmId=`af3da3f8-b598-4810-9845-f58f679a6d8e`::The vm start process failed
Hosted engine is trying to create a VM 'HostedEngine' with a new uuid: af3da3f8-b598-4810-9845-f58f679a6d8e
The VM has been started by the HA daemon at reboot after a partial / aborted setup.
Pushed a first patch avoiding to have ha daemons started by just installing the rpm and rebooting.
pushed a second patch for checking if any vm is already running on the host, the same way we do for storage pools.
If we find any VM running we can't deploy hosted engine on the system.
the system lists the uuids of the running VMs.
Since this is not a condition that should be reached on a clean system, the user should investigate on why the VM is running so we don't shutdown it, we just abort the deploy command.
hosted-engine-setup side patches have been merged on upstream master and 1.1 branches. Pending review on hosted-engine-ha side.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.