1578339 – [RFE] - Provide shutdown/startup Ansible script a RHV/RHHI environment.

Bug 1578339 - [RFE] - Provide shutdown/startup Ansible script a RHV/RHHI environment.

Summary: [RFE] - Provide shutdown/startup Ansible script a RHV/RHHI environment.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-ansible-roles
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.3.0
Target Release:	---
Assignee:	Simone Tiraboschi
QA Contact:	Nikolai Sednev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1610333 1613193 1613509 1662625
TreeView+	depends on / blocked

Reported:	2018-05-15 10:30 UTC by Doron Fediuck
Modified:	2020-03-12 06:58 UTC (History)
CC List:	11 users (show)
Fixed In Version:	ovirt-ansible-shutdown-env-1.0.0-1.el7ev
Doc Type:	Enhancement
Doc Text:	This release provides an Ansible role to ensure the correct shutdown of Red Hat Virtualization Manager or a Red Hat Hyperconverged Infrastructure environment.
Clone Of:
Clones:	1584848 1610333 1613193 1613509 (view as bug list)
Environment:
Last Closed:	2019-05-08 12:34:23 UTC
oVirt Team:	Integration
Target Upstream Version:
Embargoed:
Flags:	mavital: testing_plan_complete?

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	oVirt ovirt-ansible-shutdown-env pull 1	0	None	closed	First import	2020-03-17 17:15:47 UTC
Red Hat Product Errata	RHEA-2019:1064	0	None	None	None	2019-05-08 12:34:37 UTC

Description Doron Fediuck 2018-05-15 10:30:35 UTC

Description of problem:
Since shutting down the system in a graceful way requires multiple actions (global maintenance, etc) we need an Ansible module to allow automating the flow.
The module should be reusable so users could re-use it from command line.

Comment 1 Doron Fediuck 2018-05-15 10:32:47 UTC

Please consider the hyperconverge use case as well with regards to Gluster daemons.

Comment 2 Yedidyah Bar David 2018-05-16 06:38:38 UTC

Should this be a module, or a role? Is the answer to this question part of the requirements for this bug, or an implementation detail?

Should this be part of ovirt-hosted-engine-setup? If it's a role, I'd expect it to be in the "ovirt-ansible" [1] project, and if a module, probably part of the ansible builtin ovirt modules [2].

[1] https://github.com/oVirt/ovirt-ansible
[2] https://www.ovirt.org/develop/release-management/features/infra/ansible_modules/

Comment 3 Doron Fediuck 2018-05-16 07:30:24 UTC

Since roles may include modules, I'd go with a module.
The point is that it should be reusable for use cases such as ovirt system test / demo tools.

Comment 4 Martin Perina 2018-06-04 14:30:28 UTC

(In reply to Doron Fediuck from comment #3)
> Since roles may include modules, I'd go with a module.
> The point is that it should be reusable for use cases such as ovirt system
> test / demo tools.

I don't think this is a good fit to be a module, modules are more-or-less some low level building blocks. This effort would probably require a role which would need to handle stop of complete oVirt/RHV instance (both standalone and HE setups):

Stop flow
  1. Iterate all existing VMs (excluding HE VM on HE setup) and stop them
  2. Iterate all over existing hosts (excluding host where HE VM is running on)
  3. Shutdown HE VM (if stopping HE setup)
  4. Shutdown all existing hypervisor hosts 
  5. Shutdown engine host (if stopping non-HE setup)


It would also be nice to provide startup flow, although it would probably be available only when all hosts have configured power management (otherwise we don't have a way how to start a host)

Comment 5 Yaniv Lavi 2018-06-12 15:42:10 UTC

We should handle startup back, once the HE host or engine bare metal machine is back up.

Comment 6 Yaniv Lavi 2018-06-12 15:52:18 UTC

We need to support both HE and non HE automation.

Comment 7 Nikolai Sednev 2018-07-16 09:09:15 UTC

Hi Simone,
Can you please provide reproduction steps for this RFE?

Comment 8 Simone Tiraboschi 2018-07-26 10:24:48 UTC

(In reply to Martin Perina from comment #4)

> Stop flow
>   1. Iterate all existing VMs (excluding HE VM on HE setup) and stop them
>   2. Iterate all over existing hosts (excluding host where HE VM is running
> on)

This is more complex for the HC case: in the HC case the engine VM is served over a gluster volume that will become read only once less than 2 of three hosts serving the volume are up and so the engine VM will become suddenly paused in that case.

>   3. Shutdown HE VM (if stopping HE setup)

The engine currently refuses to stop its VM with:
 HostedEngine: Cannot shutdown VM. This VM is not managed by the engine.

We can do that with
 hosted-engine --vm-shutdown on the host where the engine VM is running.
For non HE case we need a root access to engine host.

>   4. Shutdown all existing hypervisor hosts 
>   5. Shutdown engine host (if stopping non-HE setup)

Comment 9 Simone Tiraboschi 2018-07-26 10:32:58 UTC

(In reply to Nikolai Sednev from comment #7)
> Hi Simone,
> Can you please provide reproduction steps for this RFE?

The idea is that you will need to run a small ansible playbook that just triggers  a ready to use ansible role.

VM and host list will come from a dynamic inventory from the engine.

Required role variables:
- engine API url
- engine admin user
- engine admin password
- engine host FQDN (if not HE)

Other requirements on the host where the shutdown playbook will be executed:
- root access to all the hyper-visors (we can potentially reduce this to HE host only in non HC case, and three HC hosts in HC case shutting down other hyper-visors via IPMI from the engine if configured)
- root access to the host where the engine is running (if not HE)

Yaniv, do you think that is this acceptable?

Comment 10 Sandro Bonazzola 2018-07-26 14:29:15 UTC

I would also consider to plan a shutdown in 2 minutes instead of shutting down immediately the vms and the hosts if possible, allowing whoever is logged in to save the work and exit.

Comment 11 Simone Tiraboschi 2018-07-26 15:14:04 UTC

Another option is to assume that the role is designed to be run just on the engine host (a vm or not enforcing it in the role) (the user can still execute it with ansible-playbook from another machine via ansible itself).

In this case we have a root access to all the hosts by design so that we can push to the hosts a bash script that wait for the HE VMs to be down and then triggers a shutdown on the host.

Comment 12 Simone Tiraboschi 2018-07-26 15:16:17 UTC

(In reply to Sandro Bonazzola from comment #10)
> I would also consider to plan a shutdown in 2 minutes instead of shutting
> down immediately the vms and the hosts if possible, allowing whoever is
> logged in to save the work and exit.

Probably we want two configurable values:
- N: a grace period before the shutdown on VMs (the shutdown will start in N minutes)
- M: a second timeout to poweroff the VMs if they doesn't correctly shutdown after N+M minutes

Comment 13 Yaniv Lavi 2018-07-29 14:34:31 UTC

(In reply to Simone Tiraboschi from comment #12)
> (In reply to Sandro Bonazzola from comment #10)
> > I would also consider to plan a shutdown in 2 minutes instead of shutting
> > down immediately the vms and the hosts if possible, allowing whoever is
> > logged in to save the work and exit.
> 
> Probably we want two configurable values:
> - N: a grace period before the shutdown on VMs (the shutdown will start in N
> minutes)

What would happen during N?
How would we announce shutdown to users? 

> - M: a second timeout to poweroff the VMs if they doesn't correctly shutdown
> after N+M minutes

I think we should let the libvirt graceful VM shutdown handle this.

Comment 15 Nikolai Sednev 2018-08-07 15:59:27 UTC

Works for me just as expected.
1.Installed ovirt-ansible-shutdown-env.noarch 0:1.0.0-0.1.master.20180806102555.el7 on engine-VM.
2.Created test.yml with these contents inside:
"
---
- name: oVirt shutdown environment
  hosts: localhost
  connection: local
  gather_facts: false

  vars:
    engine_url: https://ovirt-engine.example.com/ovirt-engine/api
    engine_user: admin@internal
    engine_password: 123456
    engine_cafile: /etc/pki/ovirt-engine/ca.pem

  roles:
    - oVirt.shutdown-env   
"
3.Configured 1 regular host with IPMI.
4.Had pair of ha-hosts without IPMI.
5.Had One guest VM with OS up and running.
6.Had one VM without OS and without disk, booting from PXE. 
7.Ran "ansible-playbook -i localhost, test.yml" from the engine VM.
8.Guest VMs went down.
9.Regular host was had been set in to local maintenance.
10.Regular host had been stopped over power management.
11.Global maintenance had been enabled on pair of ha-hosts.
12.ha-host without engine-VM had been turned off.
13.Engine-VM had been shutdown.
14.Last ha-host had been shutdown.
15.In case of powering on the environment it starts in global maintenance and it has to be removed manually "hosted-engine --set-maintenance --mode=none".
16.Engine-VM started by one of the ha-hosts.

Tested on these components on engine:
ovirt-engine-setup-4.2.5.2-0.1.el7ev.noarch
ovirt-ansible-shutdown-env-1.0.0-0.1.master.20180806102555.el7.noarch.rpm
Linux 3.10.0-862.9.1.el7.x86_64 #1 SMP Wed Jun 27 04:30:39 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Tested on these components on hosts:
openvswitch-selinux-extra-policy-1.0-6.el7fdp.noarch.rpm
ovirt-hosted-engine-ha-2.2.16-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.25-1.el7ev.noarch
Linux 3.10.0-862.10.2.el7.x86_64 #1 SMP Wed Jul 4 09:41:38 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 16 Sahina Bose 2018-08-09 08:48:11 UTC

Sas, can you test this on RHHI environment too?

Comment 18 Nikolai Sednev 2018-08-20 09:46:34 UTC

Moving to verified forth to https://bugzilla.redhat.com/show_bug.cgi?id=1578339#c15.

Comment 20 errata-xmlrpc 2019-05-08 12:34:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1064

Comment 21 SATHEESARAN 2020-03-12 06:58:12 UTC

(In reply to Sahina Bose from comment #16)
> Sas, can you test this on RHHI environment too?

This is already tested with RHHI-V environment

Note You need to log in before you can comment on or make changes to this bug.