Description of problem: Since shutting down the system in a graceful way requires multiple actions (global maintenance, etc) we need an Ansible module to allow automating the flow. The module should be reusable so users could re-use it from command line.
Please consider the hyperconverge use case as well with regards to Gluster daemons.
Should this be a module, or a role? Is the answer to this question part of the requirements for this bug, or an implementation detail? Should this be part of ovirt-hosted-engine-setup? If it's a role, I'd expect it to be in the "ovirt-ansible" [1] project, and if a module, probably part of the ansible builtin ovirt modules [2]. [1] https://github.com/oVirt/ovirt-ansible [2] https://www.ovirt.org/develop/release-management/features/infra/ansible_modules/
Since roles may include modules, I'd go with a module. The point is that it should be reusable for use cases such as ovirt system test / demo tools.
(In reply to Doron Fediuck from comment #3) > Since roles may include modules, I'd go with a module. > The point is that it should be reusable for use cases such as ovirt system > test / demo tools. I don't think this is a good fit to be a module, modules are more-or-less some low level building blocks. This effort would probably require a role which would need to handle stop of complete oVirt/RHV instance (both standalone and HE setups): Stop flow 1. Iterate all existing VMs (excluding HE VM on HE setup) and stop them 2. Iterate all over existing hosts (excluding host where HE VM is running on) 3. Shutdown HE VM (if stopping HE setup) 4. Shutdown all existing hypervisor hosts 5. Shutdown engine host (if stopping non-HE setup) It would also be nice to provide startup flow, although it would probably be available only when all hosts have configured power management (otherwise we don't have a way how to start a host)
We should handle startup back, once the HE host or engine bare metal machine is back up.
We need to support both HE and non HE automation.
Hi Simone, Can you please provide reproduction steps for this RFE?
(In reply to Martin Perina from comment #4) > Stop flow > 1. Iterate all existing VMs (excluding HE VM on HE setup) and stop them > 2. Iterate all over existing hosts (excluding host where HE VM is running > on) This is more complex for the HC case: in the HC case the engine VM is served over a gluster volume that will become read only once less than 2 of three hosts serving the volume are up and so the engine VM will become suddenly paused in that case. > 3. Shutdown HE VM (if stopping HE setup) The engine currently refuses to stop its VM with: HostedEngine: Cannot shutdown VM. This VM is not managed by the engine. We can do that with hosted-engine --vm-shutdown on the host where the engine VM is running. For non HE case we need a root access to engine host. > 4. Shutdown all existing hypervisor hosts > 5. Shutdown engine host (if stopping non-HE setup)
(In reply to Nikolai Sednev from comment #7) > Hi Simone, > Can you please provide reproduction steps for this RFE? The idea is that you will need to run a small ansible playbook that just triggers a ready to use ansible role. VM and host list will come from a dynamic inventory from the engine. Required role variables: - engine API url - engine admin user - engine admin password - engine host FQDN (if not HE) Other requirements on the host where the shutdown playbook will be executed: - root access to all the hyper-visors (we can potentially reduce this to HE host only in non HC case, and three HC hosts in HC case shutting down other hyper-visors via IPMI from the engine if configured) - root access to the host where the engine is running (if not HE) Yaniv, do you think that is this acceptable?
I would also consider to plan a shutdown in 2 minutes instead of shutting down immediately the vms and the hosts if possible, allowing whoever is logged in to save the work and exit.
Another option is to assume that the role is designed to be run just on the engine host (a vm or not enforcing it in the role) (the user can still execute it with ansible-playbook from another machine via ansible itself). In this case we have a root access to all the hosts by design so that we can push to the hosts a bash script that wait for the HE VMs to be down and then triggers a shutdown on the host.
(In reply to Sandro Bonazzola from comment #10) > I would also consider to plan a shutdown in 2 minutes instead of shutting > down immediately the vms and the hosts if possible, allowing whoever is > logged in to save the work and exit. Probably we want two configurable values: - N: a grace period before the shutdown on VMs (the shutdown will start in N minutes) - M: a second timeout to poweroff the VMs if they doesn't correctly shutdown after N+M minutes
(In reply to Simone Tiraboschi from comment #12) > (In reply to Sandro Bonazzola from comment #10) > > I would also consider to plan a shutdown in 2 minutes instead of shutting > > down immediately the vms and the hosts if possible, allowing whoever is > > logged in to save the work and exit. > > Probably we want two configurable values: > - N: a grace period before the shutdown on VMs (the shutdown will start in N > minutes) What would happen during N? How would we announce shutdown to users? > - M: a second timeout to poweroff the VMs if they doesn't correctly shutdown > after N+M minutes I think we should let the libvirt graceful VM shutdown handle this.
Works for me just as expected. 1.Installed ovirt-ansible-shutdown-env.noarch 0:1.0.0-0.1.master.20180806102555.el7 on engine-VM. 2.Created test.yml with these contents inside: " --- - name: oVirt shutdown environment hosts: localhost connection: local gather_facts: false vars: engine_url: https://ovirt-engine.example.com/ovirt-engine/api engine_user: admin@internal engine_password: 123456 engine_cafile: /etc/pki/ovirt-engine/ca.pem roles: - oVirt.shutdown-env " 3.Configured 1 regular host with IPMI. 4.Had pair of ha-hosts without IPMI. 5.Had One guest VM with OS up and running. 6.Had one VM without OS and without disk, booting from PXE. 7.Ran "ansible-playbook -i localhost, test.yml" from the engine VM. 8.Guest VMs went down. 9.Regular host was had been set in to local maintenance. 10.Regular host had been stopped over power management. 11.Global maintenance had been enabled on pair of ha-hosts. 12.ha-host without engine-VM had been turned off. 13.Engine-VM had been shutdown. 14.Last ha-host had been shutdown. 15.In case of powering on the environment it starts in global maintenance and it has to be removed manually "hosted-engine --set-maintenance --mode=none". 16.Engine-VM started by one of the ha-hosts. Tested on these components on engine: ovirt-engine-setup-4.2.5.2-0.1.el7ev.noarch ovirt-ansible-shutdown-env-1.0.0-0.1.master.20180806102555.el7.noarch.rpm Linux 3.10.0-862.9.1.el7.x86_64 #1 SMP Wed Jun 27 04:30:39 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) Tested on these components on hosts: openvswitch-selinux-extra-policy-1.0-6.el7fdp.noarch.rpm ovirt-hosted-engine-ha-2.2.16-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.25-1.el7ev.noarch Linux 3.10.0-862.10.2.el7.x86_64 #1 SMP Wed Jul 4 09:41:38 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo)
Sas, can you test this on RHHI environment too?
Moving to verified forth to https://bugzilla.redhat.com/show_bug.cgi?id=1578339#c15.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1064
(In reply to Sahina Bose from comment #16) > Sas, can you test this on RHHI environment too? This is already tested with RHHI-V environment