Bug 1463653 - [RFE] Provide a failure-is-ok command for decomisioning a host
[RFE] Provide a failure-is-ok command for decomisioning a host
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra (Show other bugs)
4.1.3.3
Unspecified Unspecified
high Severity medium (vote)
: ovirt-4.2.0
: 4.2.0
Assigned To: Miroslava Voglova
Nikolai Sednev
: FutureFeature, Triaged
Depends On:
Blocks: 817363 autostart-w-engine 1321889 1454308
  Show dependency treegraph
 
Reported: 2017-06-21 08:24 EDT by Martin Sivák
Modified: 2017-12-20 05:53 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-20 05:53:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2+
gklein: testing_plan_complete-
mgoldboi: planning_ack+
mperina: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 83622 master MERGED core: cleanup after host remove 2017-11-20 07:27 EST

  None (edit)
Description Martin Sivák 2017-06-21 08:24:57 EDT
Description of problem:

There are couple of issues that can arise from removing a hosted engine host from the ovirt-engine without undeploying it first (https://bugzilla.redhat.com/show_bug.cgi?id=1321889).

We would like to be able to execute a cleanup routine during host remove action. The host remove flow should not wait for the success or failure of the cleanup action, merely reporting the result to the user would be enough.


The other issue might arise if autostart VM without engine feature is ever implemented (https://bugzilla.redhat.com/show_bug.cgi?id=817363 and https://bugzilla.redhat.com/show_bug.cgi?id=1325468#c4).

Idea:

Why don't we just provide a generic fireAndForgetAnsiblePlayForHost and write all the cleanup logic in a form of extensible playbook (so the sysadmin can extend it).
Comment 1 Yaniv Kaul 2017-06-22 02:41:44 EDT
(In reply to Martin Sivák from comment #0)
> Description of problem:
> 
> There are couple of issues that can arise from removing a hosted engine host
> from the ovirt-engine without undeploying it first
> (https://bugzilla.redhat.com/show_bug.cgi?id=1321889).
> 
> We would like to be able to execute a cleanup routine during host remove
> action. The host remove flow should not wait for the success or failure of
> the cleanup action, merely reporting the result to the user would be enough.

But when we remove a host we are not communicating with it...? It's in maintenance mode.
Comment 2 Martin Perina 2017-06-22 06:22:29 EDT
We could add an Ansible role into [1], which could perform needed "host cleanup" and which would be executed during removal of host. As mentioned above we will remove the host even if removal role is not executed successfully.

We are creating infrastructure to execute Ansible from engine as a part of BZ1462811, so if we will be able to successfully finish it, we can use it even for this task. So at the moment targeting to 4.2, but leaving dev-ack on ? as finishing this in 4.2 timeframe depends of successful finish of BZ1462811 and available resources.
Comment 6 Nikolai Sednev 2017-11-29 07:06:31 EST
Please provide reproduction steps for this RFE.
Comment 7 Martin Perina 2017-11-29 07:22:10 EST
All the details can be found at [1]. When you remove a host from engine, this playbooks is executed and if HE configuration is detected on the host, following actions will be performed:

1. Stopping and disable the ovirt-ha-agent and ovirt-ha-broker services
2. Renaming HE config file to the same name with suffix '.undeployed'

Also it should be tested that if this Ansible playbook failed, that host is successfully removed. And also that execution of this playbook does not affect in any way non-HE hosts


[1] https://github.com/oVirt/ovirt-engine/blob/master/packaging/playbooks/ovirt-host-remove.yml
Comment 8 Nikolai Sednev 2017-11-30 09:35:45 EST
1.Deployed SHE over Gluster.
2.Added NFS data storage domain.
3.Got SHE storage domain and SHE-VM auto-imported.
4.Added additional clean ha-host without ovirt-hosted-engine-setup previously installed on it.
5.Additional ha-host became active as ha-host.
6.Set additional ha-host in to maintenance and removed it using UI.
7.Additional ha-host was removed with ha and broker services turned-off and HE config file was changed to same name with suffix '.undeployed' (/etc/ovirt-hosted-engine/hosted-engine.conf.undeployed).
8.Returned to step 4-5, hosted-engine.conf was created again, then I've blocked ssh connectivity from engine to host.
9.Continued to step 6.
10.Additional ha-host was removed from the engine, while Ansible playbook failed and ha-agent and ha-broker services were not affected and were running on host and /etc/ovirt-hosted-engine/hosted-engine.conf.undeployed was not created on removed host, just as expected.

Tested on these components:
ovirt-hosted-engine-ha-2.2.0-0.0.master.20171128125909.20171128125907.gitfa5daa6.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.0-0.0.master.20171129192644.git440040c.el7.centos.noarch
ovirt-engine-appliance-4.2-20171129.1.el7.centos.noarch

Moving this RFE to verified.
Comment 9 Sandro Bonazzola 2017-12-20 05:53:15 EST
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.