Description of problem: trying to redeploy fails, HA service didn't stop Version-Release number of selected component (if applicable): is24.2 How reproducible: 100 Steps to Reproduce: 1. run "hosted-engine --deploy" and fail it 2. rerun "hosted-engine --deploy" using the same NFS share 3. Actual results: deploy fails Expected results: should work Additional info: (from vdsm logs) Thread-53::ERROR::2013-11-26 13:27:48,742::BindingXMLRPC::1003::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 989, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 240, in vmSetTicket return vm.setTicket(password, ttl, existingConnAction, params) File "/usr/share/vdsm/API.py", line 592, in setTicket return v.setTicket(password, ttl, existingConnAction, params) File "/usr/share/vdsm/vm.py", line 4303, in setTicket graphics = _domParseStr(self._dom.XMLDesc(0)).childNodes[0]. \ AttributeError: 'NoneType' object has no attribute 'XMLDesc'
Created attachment 829235 [details] logs
*** Bug 1034826 has been marked as a duplicate of this bug. ***
@Doron What should the setup do if there's an already defined VM on this machine with the same name? Stop it? Delete? What is the valid way to continue? Thanks.
Hi Alex, in this specific case there was an earlier error from libvirt which did not find a VM, since it was not running. So it shouldn't be an issue. Generally speaking, we should check if there's a running VM. If we find one, ask the user permission to kill it in order to proceed and then stop it.
Relevant error in attached vdsm.log here is: Thread-42::DEBUG::2013-11-26 13:27:37,707::libvirtconnection::108::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 9 edom: 20 level: 2 message: operation failed: domain 'HostedEngine' already exists with uuid 7c13d921-6adf-4737-94fa-e387b3de1c97 Thread-42::DEBUG::2013-11-26 13:27:37,707::vm::2118::vm.Vm::(_startUnderlyingVm) vmId=`af3da3f8-b598-4810-9845-f58f679a6d8e`::_ongoingCreations released Thread-42::ERROR::2013-11-26 13:27:37,708::vm::2144::vm.Vm::(_startUnderlyingVm) vmId=`af3da3f8-b598-4810-9845-f58f679a6d8e`::The vm start process failed Hosted engine is trying to create a VM 'HostedEngine' with a new uuid: af3da3f8-b598-4810-9845-f58f679a6d8e The VM has been started by the HA daemon at reboot after a partial / aborted setup.
Pushed a first patch avoiding to have ha daemons started by just installing the rpm and rebooting.
pushed a second patch for checking if any vm is already running on the host, the same way we do for storage pools. If we find any VM running we can't deploy hosted engine on the system. the system lists the uuids of the running VMs. Since this is not a condition that should be reached on a clean system, the user should investigate on why the VM is running so we don't shutdown it, we just abort the deploy command.
hosted-engine-setup side patches have been merged on upstream master and 1.1 branches. Pending review on hosted-engine-ha side.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0505.html