Bug 1463653

Summary: [RFE] Provide a failure-is-ok command for decomisioning a host
Product: [oVirt] ovirt-engine Reporter: Martin Sivák <msivak>
Component: BLL.InfraAssignee: Miroslava Voglova <mvoglova>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: high    
Version: 4.1.3.3CC: bugs, lsvaty, mavital, mgoldboi, mperina, msivak, nsednev, oourfali
Target Milestone: ovirt-4.2.0Keywords: FutureFeature, Triaged
Target Release: 4.2.0Flags: rule-engine: ovirt-4.2+
gklein: testing_plan_complete-
mgoldboi: planning_ack+
mperina: devel_ack+
mavital: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-20 10:53:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817363, 1321889, 1454308, 1607510    

Description Martin Sivák 2017-06-21 12:24:57 UTC
Description of problem:

There are couple of issues that can arise from removing a hosted engine host from the ovirt-engine without undeploying it first (https://bugzilla.redhat.com/show_bug.cgi?id=1321889).

We would like to be able to execute a cleanup routine during host remove action. The host remove flow should not wait for the success or failure of the cleanup action, merely reporting the result to the user would be enough.


The other issue might arise if autostart VM without engine feature is ever implemented (https://bugzilla.redhat.com/show_bug.cgi?id=817363 and https://bugzilla.redhat.com/show_bug.cgi?id=1325468#c4).

Idea:

Why don't we just provide a generic fireAndForgetAnsiblePlayForHost and write all the cleanup logic in a form of extensible playbook (so the sysadmin can extend it).

Comment 1 Yaniv Kaul 2017-06-22 06:41:44 UTC
(In reply to Martin Sivák from comment #0)
> Description of problem:
> 
> There are couple of issues that can arise from removing a hosted engine host
> from the ovirt-engine without undeploying it first
> (https://bugzilla.redhat.com/show_bug.cgi?id=1321889).
> 
> We would like to be able to execute a cleanup routine during host remove
> action. The host remove flow should not wait for the success or failure of
> the cleanup action, merely reporting the result to the user would be enough.

But when we remove a host we are not communicating with it...? It's in maintenance mode.

Comment 2 Martin Perina 2017-06-22 10:22:29 UTC
We could add an Ansible role into [1], which could perform needed "host cleanup" and which would be executed during removal of host. As mentioned above we will remove the host even if removal role is not executed successfully.

We are creating infrastructure to execute Ansible from engine as a part of BZ1462811, so if we will be able to successfully finish it, we can use it even for this task. So at the moment targeting to 4.2, but leaving dev-ack on ? as finishing this in 4.2 timeframe depends of successful finish of BZ1462811 and available resources.

Comment 6 Nikolai Sednev 2017-11-29 12:06:31 UTC
Please provide reproduction steps for this RFE.

Comment 7 Martin Perina 2017-11-29 12:22:10 UTC
All the details can be found at [1]. When you remove a host from engine, this playbooks is executed and if HE configuration is detected on the host, following actions will be performed:

1. Stopping and disable the ovirt-ha-agent and ovirt-ha-broker services
2. Renaming HE config file to the same name with suffix '.undeployed'

Also it should be tested that if this Ansible playbook failed, that host is successfully removed. And also that execution of this playbook does not affect in any way non-HE hosts


[1] https://github.com/oVirt/ovirt-engine/blob/master/packaging/playbooks/ovirt-host-remove.yml

Comment 8 Nikolai Sednev 2017-11-30 14:35:45 UTC
1.Deployed SHE over Gluster.
2.Added NFS data storage domain.
3.Got SHE storage domain and SHE-VM auto-imported.
4.Added additional clean ha-host without ovirt-hosted-engine-setup previously installed on it.
5.Additional ha-host became active as ha-host.
6.Set additional ha-host in to maintenance and removed it using UI.
7.Additional ha-host was removed with ha and broker services turned-off and HE config file was changed to same name with suffix '.undeployed' (/etc/ovirt-hosted-engine/hosted-engine.conf.undeployed).
8.Returned to step 4-5, hosted-engine.conf was created again, then I've blocked ssh connectivity from engine to host.
9.Continued to step 6.
10.Additional ha-host was removed from the engine, while Ansible playbook failed and ha-agent and ha-broker services were not affected and were running on host and /etc/ovirt-hosted-engine/hosted-engine.conf.undeployed was not created on removed host, just as expected.

Tested on these components:
ovirt-hosted-engine-ha-2.2.0-0.0.master.20171128125909.20171128125907.gitfa5daa6.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.0-0.0.master.20171129192644.git440040c.el7.centos.noarch
ovirt-engine-appliance-4.2-20171129.1.el7.centos.noarch

Moving this RFE to verified.

Comment 9 Sandro Bonazzola 2017-12-20 10:53:15 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.