Bug 1463653 - [RFE] Provide a failure-is-ok command for decomisioning a host
Summary: [RFE] Provide a failure-is-ok command for decomisioning a host
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.1.3.3
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.2.0
: 4.2.0
Assignee: Miroslava Voglova
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 817363 1321889 1454308 1607510
TreeView+ depends on / blocked
 
Reported: 2017-06-21 12:24 UTC by Martin Sivák
Modified: 2020-02-10 21:30 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-20 10:53:15 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.2+
gklein: testing_plan_complete-
mgoldboi: planning_ack+
mperina: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1613291 0 medium CLOSED [text] log says ovirt-ha-agent is starting after HE undeploy but it's actually being disabled and stopped 2021-02-22 00:41:40 UTC
oVirt gerrit 83622 0 master MERGED core: cleanup after host remove 2021-01-19 02:23:26 UTC

Internal Links: 1613291

Description Martin Sivák 2017-06-21 12:24:57 UTC
Description of problem:

There are couple of issues that can arise from removing a hosted engine host from the ovirt-engine without undeploying it first (https://bugzilla.redhat.com/show_bug.cgi?id=1321889).

We would like to be able to execute a cleanup routine during host remove action. The host remove flow should not wait for the success or failure of the cleanup action, merely reporting the result to the user would be enough.


The other issue might arise if autostart VM without engine feature is ever implemented (https://bugzilla.redhat.com/show_bug.cgi?id=817363 and https://bugzilla.redhat.com/show_bug.cgi?id=1325468#c4).

Idea:

Why don't we just provide a generic fireAndForgetAnsiblePlayForHost and write all the cleanup logic in a form of extensible playbook (so the sysadmin can extend it).

Comment 1 Yaniv Kaul 2017-06-22 06:41:44 UTC
(In reply to Martin Sivák from comment #0)
> Description of problem:
> 
> There are couple of issues that can arise from removing a hosted engine host
> from the ovirt-engine without undeploying it first
> (https://bugzilla.redhat.com/show_bug.cgi?id=1321889).
> 
> We would like to be able to execute a cleanup routine during host remove
> action. The host remove flow should not wait for the success or failure of
> the cleanup action, merely reporting the result to the user would be enough.

But when we remove a host we are not communicating with it...? It's in maintenance mode.

Comment 2 Martin Perina 2017-06-22 10:22:29 UTC
We could add an Ansible role into [1], which could perform needed "host cleanup" and which would be executed during removal of host. As mentioned above we will remove the host even if removal role is not executed successfully.

We are creating infrastructure to execute Ansible from engine as a part of BZ1462811, so if we will be able to successfully finish it, we can use it even for this task. So at the moment targeting to 4.2, but leaving dev-ack on ? as finishing this in 4.2 timeframe depends of successful finish of BZ1462811 and available resources.

Comment 6 Nikolai Sednev 2017-11-29 12:06:31 UTC
Please provide reproduction steps for this RFE.

Comment 7 Martin Perina 2017-11-29 12:22:10 UTC
All the details can be found at [1]. When you remove a host from engine, this playbooks is executed and if HE configuration is detected on the host, following actions will be performed:

1. Stopping and disable the ovirt-ha-agent and ovirt-ha-broker services
2. Renaming HE config file to the same name with suffix '.undeployed'

Also it should be tested that if this Ansible playbook failed, that host is successfully removed. And also that execution of this playbook does not affect in any way non-HE hosts


[1] https://github.com/oVirt/ovirt-engine/blob/master/packaging/playbooks/ovirt-host-remove.yml

Comment 8 Nikolai Sednev 2017-11-30 14:35:45 UTC
1.Deployed SHE over Gluster.
2.Added NFS data storage domain.
3.Got SHE storage domain and SHE-VM auto-imported.
4.Added additional clean ha-host without ovirt-hosted-engine-setup previously installed on it.
5.Additional ha-host became active as ha-host.
6.Set additional ha-host in to maintenance and removed it using UI.
7.Additional ha-host was removed with ha and broker services turned-off and HE config file was changed to same name with suffix '.undeployed' (/etc/ovirt-hosted-engine/hosted-engine.conf.undeployed).
8.Returned to step 4-5, hosted-engine.conf was created again, then I've blocked ssh connectivity from engine to host.
9.Continued to step 6.
10.Additional ha-host was removed from the engine, while Ansible playbook failed and ha-agent and ha-broker services were not affected and were running on host and /etc/ovirt-hosted-engine/hosted-engine.conf.undeployed was not created on removed host, just as expected.

Tested on these components:
ovirt-hosted-engine-ha-2.2.0-0.0.master.20171128125909.20171128125907.gitfa5daa6.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.0-0.0.master.20171129192644.git440040c.el7.centos.noarch
ovirt-engine-appliance-4.2-20171129.1.el7.centos.noarch

Moving this RFE to verified.

Comment 9 Sandro Bonazzola 2017-12-20 10:53:15 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.