Bug 1201760

Summary:	on_reboot=destroy does not work when rebooting from inside the guest
Product:	Red Hat Enterprise Linux 7	Reporter:	Jiri Lunacek <jiri.lunacek>
Component:	libvirt	Assignee:	John Ferlan <jferlan>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.0	CC:	dyuan, fjin, jferlan, mzhan, rbalakri, zhwang
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	libvirt-1.2.17-1.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-11-19 06:20:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jiri Lunacek 2015-03-13 12:45:40 UTC

Description of problem:
Combination of following parameters in libvirt xml definition causes the qemu-kvm guest not to be destroyed upon reboot from inside the guest.

  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>coredump-destroy</on_crash>

Version-Release number of selected component (if applicable):
libvirt-1.1.1-29.el7_0.7.x86_64

Steps to Reproduce:
1. Create qemu guest with "on" event parameters set like formerly written
2. Reboot from inside the guest OS

Actual results:
Qemu guest reboots without destroying the qemu process.

Expected results:
Qemu-kvm process should end.

Additional info:
This seems to be caused by this patch: https://www.redhat.com/archives/libvir-list/2013-April/msg01734.html
IMHO the -no-reboot flag should be added to the command line in more combinations than "destroy" in all parameters. VIR_DOMAIN_LIFECYCLE_CRASH_COREDUMP_DESTROY is one of them. Possibly "preserve" as mentioned in the original post https://www.redhat.com/archives/libvir-list/2013-April/msg01731.html .

Other option is that libvirt should solve on_reboot=destroy based on monitor event which it currently does not.
Only other reference to on_reboot parameter I could find in code was that it changes action of libvirt-initiated reboot request.

The only workarround to this is setting all parameters to destroy, which in our case disables the possibility to collect coredump of failed guest.

Comment 2 John Ferlan 2015-06-30 12:00:15 UTC

A patch has been sent for the "core" issue of not checking for 'coredump-destroy', see the following series:

http://www.redhat.com/archives/libvir-list/2015-June/msg01627.html


As for the other points raised:

1. Should "preserve" be used

Not sure I agree here. The closest concept of preserve is perhaps the 'suspend' or 'managedsave'/'save' commands. For the latter not all the domain resources would/could be preserved easily since all that does is save to a file.

2. Should solve on_reboot=destroy based on monitor event

Not quite sure I fully understand your meaning.  Which monitor event?

qemuMonitorEmitEvent
qemuMonitorEmitShutdown
qemuMonitorEmitReset
qemuMonitorEmitPowerdown
qemuMonitorEmitStop
qemuMonitorEmitResume
qemuMonitorEmitGuestPanic

Which are generated/called as a result of :

static qemuEventHandler eventHandlers[] = {
...
    { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, },
    { "POWERDOWN", qemuMonitorJSONHandlePowerdown, },
    { "RESET", qemuMonitorJSONHandleReset, },
    { "RESUME", qemuMonitorJSONHandleResume, },
    { "SHUTDOWN", qemuMonitorJSONHandleShutdown, },
    { "STOP", qemuMonitorJSONHandleStop, },
    { "SUSPEND", qemuMonitorJSONHandlePMSuspend, },
...

Comment 3 John Ferlan 2015-06-30 15:37:37 UTC

I have pushed the patches upstream:

commit 0b328383942f1a349ec11f88ce756c4807f236c2
Author: John Ferlan <jferlan>
Date:   Mon Jun 29 15:22:49 2015 -0400

    qemu: Add missing on_crash lifecycle type
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1201760
    
    When the domain "<on_crash>coredump-destroy</on_crash>" is set, the
    domain wasn't being destroyed, rather it was being rebooted.
    
    Add VIR_DOMAIN_LIFECYCLE_CRASH_COREDUMP_DESTROY to the list of
    on_crash types that cause "-no-reboot" to be added to the qemu
    command line.

$ git describe 0b328383942f1a349ec11f88ce756c4807f236c2
v1.2.17-rc1-15-g0b32838
$

If it's felt that either "preserved" or the monitor event is really a bug, then a separate issue should be filed for each.

Comment 5 Fangge Jin 2015-08-04 11:17:04 UTC

I have some questions when verifying this bug:

1.Why on_reboot=destroy should take effect only when all the three events are to *destroy?

2.Are the behaviour of virsh shutdown/reboot and shutdown/reboot inside the guest expected to be consistent? 

 I tested with this combination:
  <on_poweroff>restart</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>

 When I "shutdown -h now" inside the guest, it shut down.
 When I use "virsh shutdown <guest>, it restarted.

3.If I use "poweroff" inside the guest, should it take effect on event on_poweroff?

Comment 6 John Ferlan 2015-08-12 17:54:36 UTC

> 1.Why on_reboot=destroy should take effect only when all the three events are
> to *destroy?

Not quite true - although perhaps the meaning hasn't been conveyed well enough either the bz or in docs...

In order to properly build the QEMU commandline to add the "-no-reboot" flag, each of the three on_* conditions must be "destroy"; otherwise, the "-no-shutdown" flag is added.  These are described in qemu help as:

‘-no-reboot’

    Exit instead of rebooting. 

‘-no-shutdown’

    Don’t exit QEMU on guest shutdown, but instead only stop the emulation. This allows for instance switching to monitor to commit changes to the disk image. 

So essentially, if all 3 are destroy then QEMU will exit; however, if one isn't then perhaps it's easiest to describe things as QEMU will allow lifecycle event management to handle what gets done. 

Hopefully that helps - perhaps I've misread your question.

>2.Are the behaviour of virsh shutdown/reboot and shutdown/reboot inside the
> guest expected to be consistent? 
>
> I tested with this combination:
>  <on_poweroff>restart</on_poweroff>
>  <on_reboot>restart</on_reboot>
>  <on_crash>restart</on_crash>
>
> When I "shutdown -h now" inside the guest, it shut down.
> When I use "virsh shutdown <guest>, it restarted.

First off - my experience has been the answer to the consistency question is it's guest dependent. For alike guest and host OS's - it's perhaps more consistent than different guest and host OS's.  When the guest goes through it's shutdown it does certain things, sends certain messages, or certain signals which can be intercepted by the hypervisor (eg, qemu) and handled. Now if the guest doesn't send them or changes how it does things and the host doesn't know that - well consistency can be thrown out the window.  Other guest issues can also cause unforeseen things - let's say a guest shutdown is initiated, but somewhere along the way some driver in the guest causes a crash - well that can result in an unexpected action.

The short story - guest initiated lifecycle events are tricky and perhaps less reliable than developers want them to be.  Works great in one simple environment, but fails in a more complex real world environment.  

With respect to the actions you performed... Off the top of my head - 'shutdown -h now' would be akin to poweroff while 'virsh shutdown' would follow the "on_reboot" event and is "controlled more" by the hypervisor.

> 3.If I use "poweroff" inside the guest, should it take effect on event
> on_poweroff?

I had assumed so until I started looking at the code - seems on_poweroff really isn't handled. There was never an event handler created for it. Other lifecycle events have one (Shutdown, Reset, Stop, Resume), but nothing for poweroff. Seems that one was forgotten/missed in the initial implementation many years ago. I haven't looked at "all" the bugs, but I would assume no one has asked for poweroff to restart (it's counter-intuitive to what poweroff would do).

Comment 7 Fangge Jin 2015-08-20 08:12:13 UTC

I can reproduce this bug on build libvirt-1.1.1-29.el7.x86_64

Verify this on build libvirt-1.2.17-5.el7.x86_64

Steps:
1.Configure the life cycle events in guest xml as below:
...
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>coredump-destroy</on_crash>
...

2.Start the guest, 'reboot' inside the guest, guest shut down.

Comment 8 Fangge Jin 2015-08-20 09:14:22 UTC

Additional verification:

Scenario 1: (-no-reboot)

  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>coredump-destroy | destroy</on_crash>

  a) "reboot" guest inside the guest OS  ---> Guest shutdown
  b) Use "virsh reboot {domain}"         ---> Guest shutdown


Scenario 2: (-no-shutdown)

1)
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>coredump-destroy | destroy</on_crash>


  a) "reboot" guest inside the guest OS  ---> Guest reboot
  b) Use "virsh reboot {domain}"         ---> Guest reboot

2)
  <on_poweroff>restart</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>coredump-destroy | destroy</on_crash>


  a) "reboot" guest inside the guest OS  ---> Guest reboot
  b) Use "virsh reboot {domain}"         ---> Guest shutdown

3)
  <on_poweroff>restart</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>coredump-destroy | destroy</on_crash>


  a) "reboot" guest inside the guest OS  ---> Guest reboot
  b) Use "virsh reboot {domain}"         ---> Guest reboot

Comment 10 errata-xmlrpc 2015-11-19 06:20:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html