1420404 – [RFE] add ability to warm restart a VM so that VM starts on the same host when it is run by Run Once and rebooted

Bug 1420404 - [RFE] add ability to warm restart a VM so that VM starts on the same host when it is run by Run Once and rebooted

Summary: [RFE] add ability to warm restart a VM so that VM starts on the same host whe...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.0.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.2.0
Target Release:	4.2.0
Assignee:	Tomas Jelinek
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1519708 (view as bug list)
Depends On:
Blocks:	1560375
TreeView+	depends on / blocked

Reported:	2017-02-08 15:02 UTC by Igor Netkachev
Modified:	2022-03-13 14:11 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	The user can now decide whether a virtual machine should be warm or cold rebooted when started as "Run Once" in the Administration Portal. To facilitate this, the "Trap guest reboots" option has been renamed to "Rollback this configuration during reboots". This enables virtual machines to start on the same host when is it run as "Run Once" and then rebooted.
Clone Of:
Environment:
Last Closed:	2018-05-15 17:40:52 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:
Flags:	gklein: testing_plan_complete+

Attachments	(Terms of Use)
engine, vdsm, qemu logs, ovirt agent logs (1.06 MB, application/x-gzip) 2017-09-12 12:52 UTC, Marian Jankular	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-43442	None	None	None	2021-09-09 12:10:02 UTC
Red Hat Product Errata	RHEA-2018:1488	None	None	None	2018-05-15 17:42:08 UTC
oVirt gerrit	82652	master	MERGED	core: store volatile run in DB	2020-12-09 11:22:05 UTC

Description Igor Netkachev 2017-02-08 15:02:23 UTC

Description of problem:

In RHV 4.0 VM restarts are handled differently compared to earlier versions of RHEV - now VMs are performing so called 'cold reboot' which leads to situation where VM could restart on a different host than the one where VM has been running on before reboot, including RunOnce stateless (and stateful as well) VMs which were set by user to RunOnce on specific hypervisor.
It's clear that the above-mentioned change in RHV 4.0 is supposed to address an issue with 'kickstart' use-case where both RunOnce boot order and/or temporarily attached installation media were preserved across reboots initiated from within VM and VM kept booting from installation media or PXE and avoiding boot from HD. But forcing VM to do a 'cold reboot' causes issues for 'VDS' use-case where customer wants to keep the ability to start the VMs on any hypervisor of his choice and keep them running there, and to avoid pinning VMs to the hosts just in order to guarantee that the VM would restart on the same hypervisor. 


Version-Release number of selected component (if applicable):
RHV 4.0.0 and above


Actual results:
In a VDS scenario VM that was started on one hypervisor reboots on a different machine


Expected results:
Either introduce a way to force VM to do 'warm reboot' as in pre-RHV4.0 versions or provide a different way to force VM to reboot on the same host as where it was started.

Comment 3 Martin Tessun 2017-02-14 08:40:16 UTC

Hi Igor,

as far as I know a "reboot" of a VM does not kill/terminate the qemu process, but the reboot happens within this process.
As such there is no changes in placement, as for RHV-M the VM is still "up and running" (unless the guest agent tells RHV-M otherwise).

So I just tested this on my 4.0 setup.

Scenario 1:
- Start a RHEL7 VM with guest-agent installed and working.
- Check the qemu process-ID and start time
- Reboot the VM (from inside the guest)
- Check the qemu process-ID and start time
- Reboot the VM from RHV-M
- Check the qemu process-ID and start time

This whole procedure showed that the qemu process ID (as well as the start time of the process) did not change.

So I tried another scenario with no guest-agent installed:

Scenario 2:
- Start a RHEL7 VM with no guest-agent running.
- Check the qemu process-ID and start time
- Reboot the VM (from inside the guest)
- Check the qemu process-ID and start time
- Reboot the VM from RHV-M
- Check the qemu process-ID and start time

This test did also behave the same way as the previous test (so no changes of qemu PID as it was never recycled).

So could you please let us know, where you experienced the changed behaviour you are talking about, as I am not able to reproduce this behaviour with the above test scenario.

Thanks!
Martin

Comment 4 Martin Tessun 2017-02-20 16:40:28 UTC

Hi Igor,

just did some follow up tests after discussing this:

Scenario 3:
- Start a RHEL7 VM RunOnce pinned to a specific host.
- Check the qemu process-ID and start time
- Reboot the VM (from inside the guest)
- Check the qemu process-ID and start time
- Reboot the VM from RHV-M
- Check the qemu process-ID and start time

This one showed that rebooting from within the VM still did not change any settings, but booting from RHV-M itself, the VM was shutdown and the "RunOnce" was therefore no longer valid (and the VM was started with default configuration).

So I believe (after re-reading your scenario) that the Manager initiated reboot for "RunOnce" VMs is the one you are talking about.

The change for this was done on purpose to avoid e.g. installation or reboot loops, as you already mentioned.

@Michal: Could it be possible and feasible for "RunOnce" VMs to just "clear" stuff like cloud-init disks, etc. (so mainly Boot Options, Linux Boot Options and Initial Run) but besides this, keep the other values intact (so stuff like System, Host, Console and Custom Properties).

Another approach would be to have a switch for the reboot to either keep the RunOnce settings or to drop them with the reboot.

After reading the BZs that Igor mentioned, also a simple eject of the attahced floppy or CDROM would do for these issues, or do I miss something.

Personally I believe a reboot should still be a reboot and if something else is wanted, one should do a "powercycle", so maybe adding this as an alternative to the reboot would also be an option. (so you don't need to issue a shutdown and a start separately).

Any further thoughts on this one?

Comment 9 Michal Skrivanek 2017-04-10 05:21:28 UTC

(In reply to Martin Tessun from comment #4)
> Hi Igor,
> 
> just did some follow up tests after discussing this:
> 
> Scenario 3:
> - Start a RHEL7 VM RunOnce pinned to a specific host.
> - Check the qemu process-ID and start time
> - Reboot the VM (from inside the guest)
> - Check the qemu process-ID and start time
> - Reboot the VM from RHV-M
> - Check the qemu process-ID and start time
> 
> This one showed that rebooting from within the VM still did not change any
> settings, but booting from RHV-M itself, the VM was shutdown and the
> "RunOnce" was therefore no longer valid (and the VM was started with default
> configuration).

How is it different from previous test? In previous Scenarios the same reboot from RHV-M did not perform cold reboot?

Comment 10 Martin Tessun 2017-04-10 09:55:09 UTC

(In reply to Michal Skrivanek from comment #9)
> (In reply to Martin Tessun from comment #4)
> > Hi Igor,
> > 
> > just did some follow up tests after discussing this:
> > 
> > Scenario 3:
> > - Start a RHEL7 VM RunOnce pinned to a specific host.
> > - Check the qemu process-ID and start time
> > - Reboot the VM (from inside the guest)
> > - Check the qemu process-ID and start time
> > - Reboot the VM from RHV-M
> > - Check the qemu process-ID and start time
> > 
> > This one showed that rebooting from within the VM still did not change any
> > settings, but booting from RHV-M itself, the VM was shutdown and the
> > "RunOnce" was therefore no longer valid (and the VM was started with default
> > configuration).
> 
> How is it different from previous test? In previous Scenarios the same
> reboot from RHV-M did not perform cold reboot?

Correct. Previously (RHV 3.x), if you selected "Reboot" in the WebUI it did a warm reboot, so no machine settings did change.

So having a differentiator for warm vs. cold reboots would be great to mimic the old behaviour again (and helping customers to easier migrate their automated setups to RHV 4.x)

Comment 11 Michal Skrivanek 2017-04-10 10:09:42 UTC

(In reply to Martin Tessun from comment #10)
> (In reply to Michal Skrivanek from comment #9)
> > (In reply to Martin Tessun from comment #4)
> > > Hi Igor,
> > > 
> > > just did some follow up tests after discussing this:
> > > 
> > > Scenario 3:
> > > - Start a RHEL7 VM RunOnce pinned to a specific host.
> > > - Check the qemu process-ID and start time
> > > - Reboot the VM (from inside the guest)
> > > - Check the qemu process-ID and start time
> > > - Reboot the VM from RHV-M
> > > - Check the qemu process-ID and start time
> > > 
> > > This one showed that rebooting from within the VM still did not change any
> > > settings, but booting from RHV-M itself, the VM was shutdown and the
> > > "RunOnce" was therefore no longer valid (and the VM was started with default
> > > configuration).
> > 
> > How is it different from previous test? In previous Scenarios the same
> > reboot from RHV-M did not perform cold reboot?
> 
> Correct. Previously (RHV 3.x), if you selected "Reboot" in the WebUI it did
> a warm reboot, so no machine settings did change. 

are you sure? It still does warm reboot, only in 4.0 it performs a cold reboot when the warm one fails/times out. This was implemented in bug 751854 / bug 1054070.

> So having a differentiator for warm vs. cold reboots would be great to mimic
> the old behaviour again (and helping customers to easier migrate their
> automated setups to RHV 4.x)

There are couple of abandoned patches in the bugs above, but it didn't happen back then due to lack of consensus on behavior.

Comment 12 Martin Tessun 2017-06-08 15:48:40 UTC

(In reply to Michal Skrivanek from comment #11)
> (In reply to Martin Tessun from comment #10)
> > (In reply to Michal Skrivanek from comment #9)
> > > (In reply to Martin Tessun from comment #4)
> > > > Hi Igor,
> > > > 
> > > > just did some follow up tests after discussing this:
> > > > 
> > > > Scenario 3:
> > > > - Start a RHEL7 VM RunOnce pinned to a specific host.
> > > > - Check the qemu process-ID and start time
> > > > - Reboot the VM (from inside the guest)
> > > > - Check the qemu process-ID and start time
> > > > - Reboot the VM from RHV-M
> > > > - Check the qemu process-ID and start time
> > > > 
> > > > This one showed that rebooting from within the VM still did not change any
> > > > settings, but booting from RHV-M itself, the VM was shutdown and the
> > > > "RunOnce" was therefore no longer valid (and the VM was started with default
> > > > configuration).
> > > 
> > > How is it different from previous test? In previous Scenarios the same
> > > reboot from RHV-M did not perform cold reboot?
> > 
> > Correct. Previously (RHV 3.x), if you selected "Reboot" in the WebUI it did
> > a warm reboot, so no machine settings did change. 
> 
> are you sure? It still does warm reboot, only in 4.0 it performs a cold
> reboot when the warm one fails/times out. This was implemented in bug 751854
> / bug 1054070.

In 4.1 it does work this way. So I would suggest to CLOSE CURRENTRELEASE here.
Please reopen if it doesn't work this way for you.

> 
> > So having a differentiator for warm vs. cold reboots would be great to mimic
> > the old behaviour again (and helping customers to easier migrate their
> > automated setups to RHV 4.x)
> 
> There are couple of abandoned patches in the bugs above, but it didn't
> happen back then due to lack of consensus on behavior.

Comment 13 Marian Jankular 2017-08-14 09:47:34 UTC

Hi, 

i have tested this in 

rhevm-4.1.4.2-0.1.el7.noarch
vdsm-4.19.24-1.el7ev.x86_64

my results are 

- Start a RHEL7 VM RunOnce pinned to a specific host.
- Check the qemu process-ID and start time
- Reboot the VM (from inside the guest)
- Check the qemu process-ID and start time

RESULT - warm reboot, vm process  pid remains, "run once" mode remains


- Reboot the VM from RHV-M
- Check the qemu process-ID and start time

RESULT - cold reboot - vm was restarted on the other node without "run once mode"

Thus reopening the RFE

Comment 14 Michal Skrivanek 2017-08-16 11:29:01 UTC

(In reply to Marian Jankular from comment #13)
> Hi, 
> 
> i have tested this in 
> 
> rhevm-4.1.4.2-0.1.el7.noarch
> vdsm-4.19.24-1.el7ev.x86_64
> 
> my results are 
> 
> - Start a RHEL7 VM RunOnce pinned to a specific host.
> - Check the qemu process-ID and start time
> - Reboot the VM (from inside the guest)
> - Check the qemu process-ID and start time
> 
> RESULT - warm reboot, vm process  pid remains, "run once" mode remains
> 
> 
> - Reboot the VM from RHV-M
> - Check the qemu process-ID and start time
> 
> RESULT - cold reboot - vm was restarted on the other node without "run once
> mode"
> 
> Thus reopening the RFE

works for me. Add logs and more details then. Check specifically that the guest OS does react to request in correct way (ACPI enabled, and/or ovirt-guest-agent)

Comment 15 Michal Skrivanek 2017-09-07 12:44:14 UTC

ping

Comment 16 Marian Jankular 2017-09-12 12:52:13 UTC

Created attachment 1324870 [details]
engine, vdsm, qemu logs, ovirt agent logs

               vm_guid                |  vm_name  
--------------------------------------+-----------
 20cbb3fc-b6e4-4601-9784-3df77789d41e | rhel-test


i did new test with rhel 7.4 (last time it was centos so no agents installed)

current results:

- Start a RHEL7 VM RunOnce pinned to a specific host.
- Check the qemu process-ID and start time
- Reboot the VM (from inside the guest)
- Check the qemu process-ID and start time

RESULT - warm reboot, vm process  pid remains, "run once" mode remains


- Reboot the VM from RHV-M
- Check the qemu process-ID and start time

RESULT - cold reboot - vm was restarted on the same node without "run once mode"


attaching the logs

Comment 17 Michal Skrivanek 2017-09-12 13:34:05 UTC

thanks. next time please narrow down the occurrence (VM name, time frame)

We may want to improve logging, but for RunOnce the behavior is to perform a cold reboot. If it doesn't reproduce with regular runs then this is not a bug

Comment 18 Marian Jankular 2017-09-20 10:06:12 UTC

vm: rhel-test

1st reboot 2017-09-12 13:13:06 - 2017-09-12 13:14:09
2nd reboot 2017-09-12 14:32:10 - 2017-09-12 14:32:49

Comment 23 Tomas Jelinek 2017-10-10 07:35:08 UTC

Im sorry, I have forgotten about a feature called "volatile run".
What it does is this:
- in the run once dialog there is a new option called "Trap guest reboots"
- in API it is called "volatile"
- by default it is false (both in API and in the UI)
- if it is false, the guest reboot will not trigger cold reboots
- if it is true, it will

So basically the only non-configurable option here is the reboot from the webadmin, which is always cold.

I would propose to make this "volatile run" option persisted in DB so it will apply both for warm and cold reboots.

Comment 24 Tomas Jelinek 2017-10-11 09:09:51 UTC

To summarize this BZ and an offline discussion with Martin:
- Everything here is about VMs running as run once.
- in 4.1: 
  - the reboot from Web-UI/REST is always cold
  - the reboot from inside of guest is always warm

- in 4.2.alpha: 
  - the reboot from Web-UI/REST is always cold
  - the reboot from inside the guest is configurable in run once dialog/REST. The option is called "Trap guest reboots".

- the proposed patch (https://gerrit.ovirt.org/#/c/82652/) renames the option to "Preserve this configuration during reboots" and making both the reboots from inside of guest and from Web-UI/REST configurable by the same option. It is easy to merge this patch to master but depends on some code from 4.2 making it not-so-easy to backport.

Martin has proposed to merge https://gerrit.ovirt.org/#/c/82652/ to master (getting it in 4.2) and don't backport it. Is it acceptable?

Comment 33 Israel Pinto 2017-11-29 15:08:42 UTC

Verify with:
Version 4.2.0-0.5.master.el7

Steps:
Polarion test case	RHEVM3/workitem?id=RHEVM-24361
Polarion test case	RHEVM3/workitem?id=RHEVM-24251
Polarion test case	RHEVM3/workitem?id=RHEVM-23495

Results:
PASS

Comment 34 Michal Skrivanek 2018-01-02 12:15:48 UTC

*** Bug 1519708 has been marked as a duplicate of this bug. ***

Comment 35 Israel Pinto 2018-05-08 12:56:18 UTC

RUN:
https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/testrun?id=12121&tab=records&result=passed

Comment 38 errata-xmlrpc 2018-05-15 17:40:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 39 Franta Kust 2019-05-16 13:07:59 UTC

BZ<2>Jira Resync

Note You need to log in before you can comment on or make changes to this bug.