Bug 1631000 - reboot of a VM started from guest or from the Manager does not address pending changes (delta)
Summary: reboot of a VM started from guest or from the Manager does not address pendin...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks: 1417161
TreeView+ depends on / blocked
 
Reported: 2018-09-19 16:51 UTC by Andrea Perotti
Modified: 2021-12-10 17:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-04 14:51:29 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1519708 0 medium CLOSED [RFE] trigger virtual machine version upgrade from VM or automatically 2022-03-13 15:12:39 UTC
Red Hat Issue Tracker RHV-44284 0 None None None 2021-12-10 17:44:08 UTC

Internal Links: 1519708

Description Andrea Perotti 2018-09-19 16:51:20 UTC
Description of problem:
When executing changes in VMs that cannot be applied at runtime, a reboot of the VM do not trigger the apply of the changes on that VM

Version-Release number of selected component (if applicable):
RHV 4.2.6

How reproducible:
always

Steps to Reproduce:
1. turn on a VM with guest agents installed
2. execute change in the VM config that require a shutdown to be applied and produce the delta sign (like a VM rename or the change in the graphic subsystem from Spice to VNC)
3a. reboot the VM from guest 
3b. reboot from the RHV Manager 

Actual results:
the delta persist

Expected results:
RHV is smart enough to detect a reboot is on-going and to transform a reboot in a shutdown with subsequent poweron

Additional info:
This feature was declared as present in 4.2 but has not been demonstrated to be functional.
This behaviour is needed in order to reduce the maintenance effort of large clusters operated by more than one person, to get advantage of guest reboot to apply changes on the VM.

Comment 2 Michal Skrivanek 2018-09-20 10:51:53 UTC
Works for me correctly (VM without agent running, changed memory (&apply later), simulate Ctrl+Alt+Del in guest)

Please add more details/exact steps

It did not work for me when VM was not in Up state and clicked on Reboot in GUI (VM stayed down instead) - that's probably a bug:
2018-09-20 12:33:36,988+02 INFO  [org.ovirt.engine.core.bll.RebootVmCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [8c643518-1762-4a21-8f84-84c3848a162d] Running command: RebootVmCommand internal: false. Entities affected :  ID: 3371c2fe-d473-44dd-a1ff-5e3dada567fe Type: VMAction group REBOOT_VM with role type USER
2018-09-20 12:33:36,994+02 INFO  [org.ovirt.engine.core.bll.RebootVmCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [8c643518-1762-4a21-8f84-84c3848a162d] VM 'test' is performing cold reboot; run once: 'false', running as volatile: 'false', has next run configuration: 'true'
2018-09-20 12:33:37,134+02 INFO  [org.ovirt.engine.core.bll.ShutdownVmCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [4ff4cb7e] Running command: ShutdownVmCommand internal: true. Entities affected :  ID: 3371c2fe-d473-44dd-a1ff-5e3dada567fe Type: VMAction group SHUT_DOWN_VM with role type USER
2018-09-20 12:33:37,138+02 INFO  [org.ovirt.engine.core.bll.ShutdownVmCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [4ff4cb7e] Entered (VM 'test').
2018-09-20 12:33:37,139+02 INFO  [org.ovirt.engine.core.bll.ShutdownVmCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [4ff4cb7e] Cannot shutdown VM 'test', status is not up. Stopping instead.
2018-09-20 12:33:37,261+02 INFO  [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [329cef07] Running command: StopVmCommand internal: true. Entities affected :  ID: 3371c2fe-d473-44dd-a1ff-5e3dada567fe Type: VMAction group STOP_VM with role type USER
2018-09-20 12:33:37,305+02 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [329cef07] START, DestroyVmVDSCommand( DestroyVmVDSCommandParameters:{hostId='ebe13310-0fef-45ce-af53-6488b6120959', vmId='3371c2fe-d473-44dd-a1ff-5e3dada567fe', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'}), log id: 26b37f32
2018-09-20 12:33:37,308+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [329cef07] START, DestroyVDSCommand(HostName = dev-25.rhev.lab.eng.brq.redhat.com, DestroyVmVDSCommandParameters:{hostId='ebe13310-0fef-45ce-af53-6488b6120959', vmId='3371c2fe-d473-44dd-a1ff-5e3dada567fe', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='false'}), log id: 3a4579e
2018-09-20 12:33:38,534+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [329cef07] FINISH, DestroyVDSCommand, return: , log id: 3a4579e
2018-09-20 12:33:38,534+02 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (EE-ManagedThreadFactory-engine-Thread-102967) [329cef07] FINISH, DestroyVmVDSCommand, return: , log id: 26b37f32
2018-09-20 12:33:38,538+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM '3371c2fe-d473-44dd-a1ff-5e3dada567fe' was reported as Down on VDS 'ebe13310-0fef-45ce-af53-6488b6120959'(dev-25.rhev.lab.eng.brq.redhat.com)
2018-09-20 12:33:38,539+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] START, DestroyVDSCommand(HostName = dev-25.rhev.lab.eng.brq.redhat.com, DestroyVmVDSCommandParameters:{hostId='ebe13310-0fef-45ce-af53-6488b6120959', vmId='3371c2fe-d473-44dd-a1ff-5e3dada567fe', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 4ab995bb
2018-09-20 12:33:38,542+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] Failed to destroy VM '3371c2fe-d473-44dd-a1ff-5e3dada567fe' because VM does not exist, ignoring
2018-09-20 12:33:38,542+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] FINISH, DestroyVDSCommand, return: , log id: 4ab995bb
2018-09-20 12:33:38,542+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM '3371c2fe-d473-44dd-a1ff-5e3dada567fe'(test) moved from 'PoweringUp' --> 'Down'
2018-09-20 12:33:38,553+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-102967) [329cef07] EVENT_ID: USER_STOPPED_VM_INSTEAD_OF_SHUTDOWN(76), VM test was powered off ungracefully by admin@internal-authz (Host: dev-25.rhev.lab.eng.brq.redhat.com).
2018-09-20 12:33:38,573+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN(61), VM test is down.
2018-09-20 12:33:38,608+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-102967) [329cef07] EVENT_ID: USER_REBOOT_VM(157), User admin@internal-authz initiated reboot of VM test.
2018-09-20 12:33:38,614+02 INFO  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-102968) [55bbcd2e] Running command: ProcessDownVmCommand internal: true.

Comment 3 Andrea Perotti 2018-09-20 16:46:24 UTC
Two tests attempted:

a) change of the compatibility version of the cluster and the datacenter. 
After this change, all the vm had the delta and under the tab snapshot there is a snapshot with the description: Next Run configuration snapshot

b) rename of the VM

in both cases the reboot initiated from the guest did not trigger the resolution of the delta.

Is it possible that only certain type of pending changes (deltas) are addressed by the reboot transformed in stop + start?

Please let me know if you could benefit from logs as well.

Comment 4 Andrea Perotti 2018-09-20 17:34:37 UTC
Hi Michal, I confirm that for your scenario the delta has been addressed correctly.

The expectation is to have this behavior fully functional for every kind of delta, especially for compatibility version updates.

Comment 5 Michal Skrivanek 2018-09-24 10:15:45 UTC
(In reply to Andrea Perotti from comment #4)
> Hi Michal, I confirm that for your scenario the delta has been addressed
> correctly.
> 
> The expectation is to have this behavior fully functional for every kind of
> delta, especially for compatibility version updates.

so is there anything not working for you? The name was changed recently not to require a restart by popular demand, bug 1601514.

Comment 6 Andrea Perotti 2018-09-24 13:41:13 UTC
(In reply to Michal Skrivanek from comment #5)
> The name was changed recently not to require a restart by popular demand, bug 1601514.

And that was a great improvement, at the same time, a /!\ appears next to the VM when you change name, to remember that while the name is immediately changed, underlying log and process names are still the old ones, and a shutdown/startup is required to have complete coherence (name of VM + name in the logs + name in the qemu process).

Another scenario not working as expected is when you change of the compatibility version of the cluster and the datacenter. After the change, all the vm had the /!\ and under the tab snapshot there is a snapshot with the description: Next Run configuration snapshot .

In both situation, the reboot of a VM started from guest or from the Manager did not addressed the pending changes.

Hoped this better clarify the current issue.

Comment 7 Michal Skrivanek 2018-09-26 11:30:32 UTC
(In reply to Andrea Perotti from comment #6)
> (In reply to Michal Skrivanek from comment #5)
> > The name was changed recently not to require a restart by popular demand, bug 1601514.
> 
> And that was a great improvement, at the same time, a /!\ appears next to
> the VM when you change name, to remember that while the name is immediately
> changed, underlying log and process names are still the old ones, and a
> shutdown/startup is required to have complete coherence (name of VM + name
> in the logs + name in the qemu process).

I can see arguments for both behaviors...it was highly requested to not require a restart, so it's not doing it automatically
 
> Another scenario not working as expected is when you change of the
> compatibility version of the cluster and the datacenter. After the change,
> all the vm had the /!\ and under the tab snapshot there is a snapshot with
> the description: Next Run configuration snapshot .

ok. I see. YEah, changes in Cluster are "fake" from the VM perspective, they don't change anything in the VM itself.
It should work, so if it doesn't it is certainly a bug

> In both situation, the reboot of a VM started from guest or from the Manager
> did not addressed the pending changes.
> 
> Hoped this better clarify the current issue.

yes, thanks

Comment 8 Michal Skrivanek 2018-09-26 11:31:54 UTC
ntoe the Cluster level upgrade is the only thing which should trigger that behavior - is it the change you've tried? And then is it reboot from guest or webadmin UI or both that doesn't trigger the cold reboot?

Comment 9 Andrea Perotti 2018-09-27 13:13:19 UTC
(In reply to Michal Skrivanek from comment #8)
> ntoe the Cluster level upgrade is the only thing which should trigger that
> behavior - is it the change you've tried? 

Yes, that is the situation where this has been experienced.

> And then is it reboot from guest or webadmin UI or both that doesn't trigger the cold reboot?

from the guest for sure, cannot guarantee if also webmin UI had the problem or not.

Comment 10 Michal Skrivanek 2018-09-27 13:37:44 UTC
ok, then logs would be handy. If you change a cluster level then a guest-initiated reboot should trigger a cold reboot and shut down&power on instead.
We set that property on the VM so it can be checked easily in its xml (take a look at that running VM and use virsh -r dumpxml <vm>", before and after cluster upgrade. 

you should see <ovirt-vm:destroy_on_reboot type="bool">False</ovirt-vm:destroy_on_reboot> changing into <ovirt-vm:destroy_on_reboot type="bool">True</ovirt-vm:destroy_on_reboot>.
And if that is set and you initiate guest reboot it should go thorugh a cold reboot.

Comment 11 Andrea Perotti 2018-10-02 15:40:42 UTC
(In reply to Michal Skrivanek from comment #10)
> If you change a cluster level then a 
> guest-initiated reboot should trigger a cold reboot and shut down&power on
> instead.
> We set that property on the VM so it can be checked easily in its xml (take
> a look at that running VM and use virsh -r dumpxml <vm>", before and after
> cluster upgrade. 

Exactly from which release this is supposed to be present and fully working?

I've done some test on a RHV 4.1 env, upgraded in hosts and manager to RHV 4.2 with some VMs running on it.

> you should see <ovirt-vm:destroy_on_reboot
> type="bool">False</ovirt-vm:destroy_on_reboot> changing into
> <ovirt-vm:destroy_on_reboot type="bool">True</ovirt-vm:destroy_on_reboot>.
> And if that is set and you initiate guest reboot it should go thorugh a cold
> reboot.

We have seen 'destroy_on_reboot' set to True only when doing actions on VMs created when the cluster was already in 4.2 compatibility and so VMs were 4.2 type as well.

With VMs with the delta triggered by the migration to cluster 4.2, so effectively VMs still in 4.1, 'destroy_on_reboot' is always False, even when doing the RAM increase with "apply later" flag.

Michal, can you please confirm if this feature is supposed to work from 4.2 so *only* with manager in 4.2, hypervisors in 4.2, cluster in compatibility mode 4.2 and VMs started in 4.2 mode (not with delta). If you could confirm it, this would explain the experienced behaviour.

thanks

Comment 12 Michal Skrivanek 2018-10-02 18:02:19 UTC
(In reply to Andrea Perotti from comment #11)
ot.
> 
> We have seen 'destroy_on_reboot' set to True only when doing actions on VMs
> created when the cluster was already in 4.2 compatibility and so VMs were
> 4.2 type as well.

yes, it's a 4.2 feature, bug 1519708

> 
> With VMs with the delta triggered by the migration to cluster 4.2, so
> effectively VMs still in 4.1, 'destroy_on_reboot' is always False, even when
> doing the RAM increase with "apply later" flag.
> 
> Michal, can you please confirm if this feature is supposed to work from 4.2
> so *only* with manager in 4.2, hypervisors in 4.2, cluster in compatibility
> mode 4.2 and VMs started in 4.2 mode (not with delta). If you could confirm
> it, this would explain the experienced behaviour.

yes. everything needs to be 4.2

Comment 13 Andrea Perotti 2018-10-04 14:51:29 UTC
This solve the mystery then.

Thanks Michal for your support.


Note You need to log in before you can comment on or make changes to this bug.