Bug 1034726

Summary: When re-running --deploy, ha services should be stopped to allow re-using existing storage
Product: Red Hat Enterprise Virtualization Manager Reporter: Aharon Canan <acanan>
Component: ovirt-hosted-engine-setupAssignee: Sandro Bonazzola <sbonazzo>
Status: CLOSED ERRATA QA Contact: movciari
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: aburden, adingman, dfediuck, didi, gpadgett, iheim, josh, mpavlik, oschreib, pbandark, pstehlik, rshutt, sbonazzo, scohen, talayan, ukar, wdaniel
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: integration
Fixed In Version: ovirt-3.4.0-beta3 Doc Type: Bug Fix
Doc Text:
* Previously, the high-availability daemon was enabled by the rpm install and not stopped upon termination of a hosted-engine deployment. This meant that if the hosted engine was deployed, but was aborted or failed after having created the engine virtual machine, the hosted engine could not be redeployed as it conflicted with the virtual machine already started by the high availability daemon. Now, the high availability daemon is enabled by hosted-engine deployment, and the hosted engine checks for an existing virtual machine running on the host. Redeployment of the hosted engine no longer fails due to the presence of a virtual machine created during a previous deployment.
Story Points: ---
Clone Of:
: 1066373 (view as bug list) Environment:
Last Closed: 2014-06-09 14:47:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1066373, 1078909, 1142926    
Attachments:
Description Flags
logs none

Description Aharon Canan 2013-11-26 12:03:22 UTC
Description of problem:
trying to redeploy fails, HA service didn't stop

Version-Release number of selected component (if applicable):
is24.2

How reproducible:
100

Steps to Reproduce:
1. run "hosted-engine --deploy" and fail it 
2. rerun "hosted-engine --deploy" using the same NFS share 
3.

Actual results:
deploy fails

Expected results:
should work 

Additional info: (from vdsm logs)
Thread-53::ERROR::2013-11-26 13:27:48,742::BindingXMLRPC::1003::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 989, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/BindingXMLRPC.py", line 240, in vmSetTicket
    return vm.setTicket(password, ttl, existingConnAction, params)
  File "/usr/share/vdsm/API.py", line 592, in setTicket
    return v.setTicket(password, ttl, existingConnAction, params)
  File "/usr/share/vdsm/vm.py", line 4303, in setTicket
    graphics = _domParseStr(self._dom.XMLDesc(0)).childNodes[0]. \
AttributeError: 'NoneType' object has no attribute 'XMLDesc'

Comment 1 Aharon Canan 2013-11-26 12:07:21 UTC
Created attachment 829235 [details]
logs

Comment 2 Sandro Bonazzola 2013-11-26 16:05:58 UTC
*** Bug 1034826 has been marked as a duplicate of this bug. ***

Comment 3 Alex Lourie 2013-11-27 12:13:45 UTC
@Doron

What should the setup do if there's an already defined VM on this machine with the same name? Stop it? Delete?

What is the valid way to continue?

Thanks.

Comment 4 Doron Fediuck 2013-11-28 08:38:15 UTC
Hi Alex,
in this specific case there was an earlier error from libvirt which did not find a VM, since it was not running. So it shouldn't be an issue.

Generally speaking, we should check if there's a running VM. If we find one, ask the user permission to kill it in order to proceed and then stop it.

Comment 12 Sandro Bonazzola 2014-02-14 11:07:49 UTC
Relevant error in attached vdsm.log here is:

Thread-42::DEBUG::2013-11-26 13:27:37,707::libvirtconnection::108::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 9 edom: 20 level: 2 message: operation failed: domain 'HostedEngine' already exists with uuid 7c13d921-6adf-4737-94fa-e387b3de1c97
Thread-42::DEBUG::2013-11-26 13:27:37,707::vm::2118::vm.Vm::(_startUnderlyingVm) vmId=`af3da3f8-b598-4810-9845-f58f679a6d8e`::_ongoingCreations released
Thread-42::ERROR::2013-11-26 13:27:37,708::vm::2144::vm.Vm::(_startUnderlyingVm) vmId=`af3da3f8-b598-4810-9845-f58f679a6d8e`::The vm start process failed

Hosted engine is trying to create a VM 'HostedEngine' with a new uuid: af3da3f8-b598-4810-9845-f58f679a6d8e

The VM has been started by the HA daemon at reboot after a partial / aborted setup.

Comment 13 Sandro Bonazzola 2014-02-14 12:07:22 UTC
Pushed a first patch avoiding to have ha daemons started by just installing the rpm and rebooting.

Comment 14 Sandro Bonazzola 2014-02-14 12:25:48 UTC
pushed a second patch for checking if any vm is already running on the host, the same way we do for storage pools.
If we find any VM running we can't deploy hosted engine on the system.
the system lists the uuids of the running VMs.
Since this is not a condition that should be reached on a clean system, the user should investigate on why the VM is running so we don't shutdown it, we just abort the deploy command.

Comment 15 Sandro Bonazzola 2014-02-17 11:02:17 UTC
hosted-engine-setup side patches have been merged on upstream master and 1.1 branches. Pending review on hosted-engine-ha side.

Comment 22 errata-xmlrpc 2014-06-09 14:47:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0505.html