Bug 1470244

Summary: reboot leads to shutoff of qemu-kvm-vm if i6300esb-watchdog set to poweroff
Product: Red Hat Enterprise Linux 7 Reporter: Klaus Wenninger <kwenning>
Component: qemu-kvmAssignee: Richard W.M. Jones <rjones>
Status: CLOSED ERRATA QA Contact: FuXiangChun <xfu>
Severity: high Docs Contact: Yehuda Zimmerman <yzimmerm>
Priority: high    
Version: 7.4CC: cfeist, chayang, juzhang, kbenoit, knoel, kwenning, marcel.fischer, michen, pezhang, rbalakri, rjones, virt-bugs, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-1.5.3-147.el7 Doc Type: Bug Fix
Doc Text:
Guests no longer shut down unexpectedly during reboot On a Red Hat Enterprise Linux 7.4 guest running on *qemu-kvm-1.5.3-139.el7*, if the *i6300esb watchdog* was set to `poweroff`, the watchdog was triggered when shutting down due to the timeout being calculated incorrectly. Consequently, when rebooting the guest, it shut down instead. With this update, the timeout calculations in *qemu-kvm* have been corrected. As a result, the virtual machine reboots properly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 14:35:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1420851, 1469549, 1469551, 1469590    
Attachments:
Description Flags
VM XML to reproduce of Comment 16 none

Description Klaus Wenninger 2017-07-12 14:59:02 UTC
Description of problem:
rhel 7.4 guest running on qemu-kvm-1.5.3-139.el7
libvirt config:
...
<watchdog model='i6300esb action='poweroff'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </watchdog>
...
issuing a 'reboot' in a 'bash' leads to vm being shut off instead of a reboot inside the vm.

Version-Release number of selected component (if applicable):
part of kernel  3.10.0-686.el7

How reproducible:
100%

Steps to Reproduce:
1. setup a rhel 7.4 vm on qemu-kvm-1.5.3-139.el7
2. have the i6300esb watchdog enabled
   libvirt snippet:
...
<watchdog model='i6300esb action='poweroff'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </watchdog>
...
3. boot the vm and type 'reboot' in a bash

Actual results:
vm is shut off

Expected results:
vm should reboot internally

Additional info:
Seems not to happen on qemu-kvm-2.x from rhev.
But seems to still be solvable with qemu-kvm-1.5.3-139.el7 as host as it doesn't happen after unloading the i6300esb-module.

Comment 12 Richard W.M. Jones 2017-07-15 07:43:47 UTC
I tried to reproduce this by:

(1) Install 10:qemu-kvm-1.5.3-141.el7_4.1.x86_64 on the host.

(2) virt-builder rhel-7.3 --root-password password:123456 

(3) /usr/libexec/qemu-kvm -cpu host -machine pc,accel=kvm -m 2048 -drive file=rhel-7.3.img,format=raw,if=virtio -watchdog i6300esb -watchdog-action poweroff

I connected to the guest's console.  The i6300esb kernel module was
loaded automatically.  When I typed "reboot", the guest powered off
(ie. reproducing the bug).  The qemu process exited normally (exit code
0, no apparent crash, no error message printed).

So I can confirm this bug appears to be real.

Comment 13 Richard W.M. Jones 2017-07-15 07:51:09 UTC
Stack trace at exit:

#0  0x00007fffed794a80 in __GI_exit (status=status@entry=0) at exit.c:99
#1  0x00005555556afb43 in watchdog_perform_action ()
    at hw/watchdog/watchdog.c:130
#2  0x00005555556b00df in i6300esb_timer_expired (vp=0x555556cc1800)
    at hw/watchdog/wdt_i6300esb.c:197
#3  0x00005555556e9a26 in qemu_run_timers (clock=0x555556ced280)
    at qemu-timer.c:394
#4  0x00005555556e9b95 in qemu_run_all_timers (clock=<optimized out>)
    at qemu-timer.c:459
#5  0x00005555556e9b95 in qemu_run_all_timers () at qemu-timer.c:452
#6  0x00005555556b4d2e in main_loop_wait (nonblocking=<optimized out>)
    at main-loop.c:470
#7  0x00005555555cb150 in main () at vl.c:1995
#8  0x00005555555cb150 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4361

It's definitely not supposed to be triggering the watchdog
on exit.

There was a recent change in this part of the code which
may be related:

commit eb7a20a3616085d46aa6b4b4224e15587ec67e6e
Author: Li Qiang <liqiang6-s>
Date:   Mon Nov 28 17:49:04 2016 -0800

    watchdog: 6300esb: add exit function
    
    When the Intel 6300ESB watchdog is hot unplug. The timer allocated
    in realize isn't freed thus leaking memory leak. This patch avoid
    this through adding the exit function.
    
    Signed-off-by: Li Qiang <liqiang6-s>
    Message-Id: <583cde9c.3223ed0a.7f0c2.886e.com>
    Signed-off-by: Paolo Bonzini <pbonzini>

Comment 14 Richard W.M. Jones 2017-07-15 09:49:05 UTC
The same thing happens with qemu-kvm from RHEL 7.3, 7.2 and 7.1 (I didn't
try any earlier versions).  However I did not downgrade any other packages
so it might be another host package that causes this.

One thing I did notice is that the guest kernel writes to the watchdog
port just before reboot.  This is the sequence which is written by the
guest at reboot with my annotations:

  # pings the watchdog
i6300esb: i6300esb_mem_writew: addr = c, val = 80
i6300esb: i6300esb_mem_writew: addr = c, val = 86
i6300esb: i6300esb_mem_writew: addr = c, val = 100

  # writes to the lock reg, I think this enables the WDT
i6300esb: i6300esb_config_write: addr = 68, data = 2, len = 1
i6300esb: i6300esb_restart_timer: stage 1, timeout 15252014545

  # sets 0x4b000 into timer preload 1 & 2
i6300esb: i6300esb_mem_writew: addr = c, val = 80
i6300esb: i6300esb_mem_writew: addr = c, val = 86
i6300esb: i6300esb_mem_writel: addr = 0, val = 4b000
i6300esb: i6300esb_mem_writew: addr = c, val = 80
i6300esb: i6300esb_mem_writew: addr = c, val = 86
i6300esb: i6300esb_mem_writel: addr = 4, val = 4b000

  # pings the watchdog
i6300esb: i6300esb_mem_writew: addr = c, val = 80
i6300esb: i6300esb_mem_writew: addr = c, val = 86
i6300esb: i6300esb_mem_writew: addr = c, val = 100

  # here we see the problem: the timeout calculation is negative
i6300esb: i6300esb_restart_timer: stage 1, timeout -253951953748
i6300esb: i6300esb_mem_writew: addr = c, val = 80
i6300esb: i6300esb_mem_writew: addr = c, val = 86
i6300esb: i6300esb_mem_writew: addr = c, val = 100
i6300esb: i6300esb_restart_timer: stage 1, timeout -253951953748
i6300esb: i6300esb_mem_writew: addr = c, val = 80
i6300esb: i6300esb_timer_expired: stage 1
i6300esb: i6300esb_restart_timer: stage 2, timeout -253951953748
i6300esb: i6300esb_timer_expired: stage 2

Upstream there were a handful of commits which fixed negative timeout
calculations in this driver:

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=4bc7b4d56657ebf75b986ad46e959cf7232ff26a
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=fee562e9e41290a22623de83b673a8929ec5280d
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=9491e9bc019a365dfa9780f462984a0d052f4c0d

I cherry picked all three on top of qemu-kvm-1.5.3-141.el7_4.1 and that
fixed the problem for me.

I am clearing the regression and rhel-7.4-z flags, because I do not believe
this ever worked.

Comment 15 Richard W.M. Jones 2017-07-15 09:52:09 UTC
For your information, the equivalent bug in qemu-kvm-rhev (fixed
back in 2015) was https://bugzilla.redhat.com/show_bug.cgi?id=1198936

Comment 17 Pei Zhang 2017-07-17 02:44:44 UTC
Created attachment 1299609 [details]
VM XML to reproduce of  Comment 16

Steps of Comment 16:

1. Boot VM with watchdog, see attachment XML.

2. Reboot VM, then guest is shutdown, this bug is reproduced.

Comment 18 Klaus Wenninger 2017-07-17 08:19:15 UTC
btw. the issue does exist the other way round as well:

libvirt snippet:
...
<watchdog model='i6300esb action='reset'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </watchdog>
...

on cmdline do 'reboot --poweroff'

Result:
reboot of the vm instead of poweroff

Comment 19 Klaus Wenninger 2017-07-17 08:20:15 UTC
(In reply to Klaus Wenninger from comment #18)
> btw. the issue does exist the other way round as well:
> 
> libvirt snippet:
> ...
> <watchdog model='i6300esb action='reset'>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
> function='0x0'/>
>     </watchdog>
> ...
> 
> on cmdline do 'reboot --poweroff'
> 
> Result:
> reboot of the vm instead of poweroff

as was axpected due to the findings actually ...

Comment 20 Klaus Wenninger 2017-07-17 08:35:06 UTC
(In reply to Klaus Wenninger from comment #19)
> (In reply to Klaus Wenninger from comment #18)
> > btw. the issue does exist the other way round as well:
> > 
> > libvirt snippet:
> > ...
> > <watchdog model='i6300esb action='reset'>
> >       <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
> > function='0x0'/>
> >     </watchdog>
> > ...
> > 
> > on cmdline do 'reboot --poweroff'
> > 
> > Result:
> > reboot of the vm instead of poweroff
> 
> as was axpected due to the findings actually ...

strange thing though is that calling 'poweroff' seem to have the anticipated result - especially interesting as they are both linking to systemctl ...
didn't check in the systemctl code what it does differently when being called via the poweroff-link opposed to seeing the --poweroff switch.

Comment 21 Richard W.M. Jones 2017-07-17 09:10:26 UTC
This is to be expected.  The watchdog incorrectly fires on shutdown,
so whatever watchdog action is specified is whatever is done on shutdown.
For the reasons for this and the fix, see comment 14.

Comment 24 Miroslav Rezanina 2017-11-03 07:53:36 UTC
Fix included in qemu-kvm-1.5.3-147.el7

Comment 27 Richard W.M. Jones 2017-11-21 10:09:20 UTC
It's fine now after I made the corrections on Sunday.  Do I
need to press an 'approve' button?  I don't see one ..

Comment 29 FuXiangChun 2017-12-25 06:43:25 UTC
Reproduced bug with qemu-kvm-1.5.3-145.el7 & 3.10.0-824.el7.x86_64

1) /usr/libexec/qemu-kvm -name RHEL7.5-1 -machine pc -m 8G -smp 8,maxcpus=240,sockets=2,cores=2,threads=2 -cpu Opteron_G5 -rtc base=localtime,clock=host,driftfix=slew -nodefaults -vga qxl -serial unix:/tmp/serial0,server,nowait -device usb-ehci,id=usb1 -device usb-tablet,id=usb-tablet1 -boot menu=on -enable-kvm -monitor stdio -netdev tap,id=netdev0,vhost=on -device virtio-net-pci,mac=BA:BC:13:83:3F:1D,id=net0,netdev=netdev0,status=on -spice port=5800,disable-ticketing -qmp tcp:0:8888,server,nowait \

-drive file=rhel7.5-virtio-seabios.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive_sysdisk,id=device_sysdisk,bootindex=1 -vnc :1 \

-device i6300esb,id=watchdog0,addr=0x7 -watchdog-action poweroff \

2) reboot guest inside guest
#reboot 

Result:

Guest is shut off.


Verified guest with qemu-kvm-1.5.3-151.el7.x86_64 & 3.10.0-824.el7.x86_64

Steps is the same as above.

Result:
Guest reboot, It is expected result.  So, set this bug as verified.

Comment 33 errata-xmlrpc 2018-04-10 14:35:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0816