Red Hat Bugzilla – Bug 1463653
[RFE] Provide a failure-is-ok command for decomisioning a host
Last modified: 2017-12-20 05:53:15 EST
Description of problem:
There are couple of issues that can arise from removing a hosted engine host from the ovirt-engine without undeploying it first (https://bugzilla.redhat.com/show_bug.cgi?id=1321889).
We would like to be able to execute a cleanup routine during host remove action. The host remove flow should not wait for the success or failure of the cleanup action, merely reporting the result to the user would be enough.
The other issue might arise if autostart VM without engine feature is ever implemented (https://bugzilla.redhat.com/show_bug.cgi?id=817363 and https://bugzilla.redhat.com/show_bug.cgi?id=1325468#c4).
Why don't we just provide a generic fireAndForgetAnsiblePlayForHost and write all the cleanup logic in a form of extensible playbook (so the sysadmin can extend it).
(In reply to Martin Sivák from comment #0)
> Description of problem:
> There are couple of issues that can arise from removing a hosted engine host
> from the ovirt-engine without undeploying it first
> We would like to be able to execute a cleanup routine during host remove
> action. The host remove flow should not wait for the success or failure of
> the cleanup action, merely reporting the result to the user would be enough.
But when we remove a host we are not communicating with it...? It's in maintenance mode.
We could add an Ansible role into , which could perform needed "host cleanup" and which would be executed during removal of host. As mentioned above we will remove the host even if removal role is not executed successfully.
We are creating infrastructure to execute Ansible from engine as a part of BZ1462811, so if we will be able to successfully finish it, we can use it even for this task. So at the moment targeting to 4.2, but leaving dev-ack on ? as finishing this in 4.2 timeframe depends of successful finish of BZ1462811 and available resources.
Please provide reproduction steps for this RFE.
All the details can be found at . When you remove a host from engine, this playbooks is executed and if HE configuration is detected on the host, following actions will be performed:
1. Stopping and disable the ovirt-ha-agent and ovirt-ha-broker services
2. Renaming HE config file to the same name with suffix '.undeployed'
Also it should be tested that if this Ansible playbook failed, that host is successfully removed. And also that execution of this playbook does not affect in any way non-HE hosts
1.Deployed SHE over Gluster.
2.Added NFS data storage domain.
3.Got SHE storage domain and SHE-VM auto-imported.
4.Added additional clean ha-host without ovirt-hosted-engine-setup previously installed on it.
5.Additional ha-host became active as ha-host.
6.Set additional ha-host in to maintenance and removed it using UI.
7.Additional ha-host was removed with ha and broker services turned-off and HE config file was changed to same name with suffix '.undeployed' (/etc/ovirt-hosted-engine/hosted-engine.conf.undeployed).
8.Returned to step 4-5, hosted-engine.conf was created again, then I've blocked ssh connectivity from engine to host.
9.Continued to step 6.
10.Additional ha-host was removed from the engine, while Ansible playbook failed and ha-agent and ha-broker services were not affected and were running on host and /etc/ovirt-hosted-engine/hosted-engine.conf.undeployed was not created on removed host, just as expected.
Tested on these components:
Moving this RFE to verified.
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.
Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.