Bug 1470244
Summary: | reboot leads to shutoff of qemu-kvm-vm if i6300esb-watchdog set to poweroff | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Klaus Wenninger <kwenning> | ||||
Component: | qemu-kvm | Assignee: | Richard W.M. Jones <rjones> | ||||
Status: | CLOSED ERRATA | QA Contact: | FuXiangChun <xfu> | ||||
Severity: | high | Docs Contact: | Yehuda Zimmerman <yzimmerm> | ||||
Priority: | high | ||||||
Version: | 7.4 | CC: | cfeist, chayang, juzhang, kbenoit, knoel, kwenning, marcel.fischer, michen, pezhang, rbalakri, rjones, virt-bugs, virt-maint, xfu | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-1.5.3-147.el7 | Doc Type: | Bug Fix | ||||
Doc Text: |
Guests no longer shut down unexpectedly during reboot
On a Red Hat Enterprise Linux 7.4 guest running on *qemu-kvm-1.5.3-139.el7*, if the *i6300esb watchdog* was set to `poweroff`, the watchdog was triggered when shutting down due to the timeout being calculated incorrectly. Consequently, when rebooting the guest, it shut down instead. With this update, the timeout calculations in *qemu-kvm* have been corrected. As a result, the virtual machine reboots properly.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-04-10 14:35:07 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1420851, 1469549, 1469551, 1469590 | ||||||
Attachments: |
|
Description
Klaus Wenninger
2017-07-12 14:59:02 UTC
I tried to reproduce this by: (1) Install 10:qemu-kvm-1.5.3-141.el7_4.1.x86_64 on the host. (2) virt-builder rhel-7.3 --root-password password:123456 (3) /usr/libexec/qemu-kvm -cpu host -machine pc,accel=kvm -m 2048 -drive file=rhel-7.3.img,format=raw,if=virtio -watchdog i6300esb -watchdog-action poweroff I connected to the guest's console. The i6300esb kernel module was loaded automatically. When I typed "reboot", the guest powered off (ie. reproducing the bug). The qemu process exited normally (exit code 0, no apparent crash, no error message printed). So I can confirm this bug appears to be real. Stack trace at exit: #0 0x00007fffed794a80 in __GI_exit (status=status@entry=0) at exit.c:99 #1 0x00005555556afb43 in watchdog_perform_action () at hw/watchdog/watchdog.c:130 #2 0x00005555556b00df in i6300esb_timer_expired (vp=0x555556cc1800) at hw/watchdog/wdt_i6300esb.c:197 #3 0x00005555556e9a26 in qemu_run_timers (clock=0x555556ced280) at qemu-timer.c:394 #4 0x00005555556e9b95 in qemu_run_all_timers (clock=<optimized out>) at qemu-timer.c:459 #5 0x00005555556e9b95 in qemu_run_all_timers () at qemu-timer.c:452 #6 0x00005555556b4d2e in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:470 #7 0x00005555555cb150 in main () at vl.c:1995 #8 0x00005555555cb150 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4361 It's definitely not supposed to be triggering the watchdog on exit. There was a recent change in this part of the code which may be related: commit eb7a20a3616085d46aa6b4b4224e15587ec67e6e Author: Li Qiang <liqiang6-s> Date: Mon Nov 28 17:49:04 2016 -0800 watchdog: 6300esb: add exit function When the Intel 6300ESB watchdog is hot unplug. The timer allocated in realize isn't freed thus leaking memory leak. This patch avoid this through adding the exit function. Signed-off-by: Li Qiang <liqiang6-s> Message-Id: <583cde9c.3223ed0a.7f0c2.886e.com> Signed-off-by: Paolo Bonzini <pbonzini> The same thing happens with qemu-kvm from RHEL 7.3, 7.2 and 7.1 (I didn't try any earlier versions). However I did not downgrade any other packages so it might be another host package that causes this. One thing I did notice is that the guest kernel writes to the watchdog port just before reboot. This is the sequence which is written by the guest at reboot with my annotations: # pings the watchdog i6300esb: i6300esb_mem_writew: addr = c, val = 80 i6300esb: i6300esb_mem_writew: addr = c, val = 86 i6300esb: i6300esb_mem_writew: addr = c, val = 100 # writes to the lock reg, I think this enables the WDT i6300esb: i6300esb_config_write: addr = 68, data = 2, len = 1 i6300esb: i6300esb_restart_timer: stage 1, timeout 15252014545 # sets 0x4b000 into timer preload 1 & 2 i6300esb: i6300esb_mem_writew: addr = c, val = 80 i6300esb: i6300esb_mem_writew: addr = c, val = 86 i6300esb: i6300esb_mem_writel: addr = 0, val = 4b000 i6300esb: i6300esb_mem_writew: addr = c, val = 80 i6300esb: i6300esb_mem_writew: addr = c, val = 86 i6300esb: i6300esb_mem_writel: addr = 4, val = 4b000 # pings the watchdog i6300esb: i6300esb_mem_writew: addr = c, val = 80 i6300esb: i6300esb_mem_writew: addr = c, val = 86 i6300esb: i6300esb_mem_writew: addr = c, val = 100 # here we see the problem: the timeout calculation is negative i6300esb: i6300esb_restart_timer: stage 1, timeout -253951953748 i6300esb: i6300esb_mem_writew: addr = c, val = 80 i6300esb: i6300esb_mem_writew: addr = c, val = 86 i6300esb: i6300esb_mem_writew: addr = c, val = 100 i6300esb: i6300esb_restart_timer: stage 1, timeout -253951953748 i6300esb: i6300esb_mem_writew: addr = c, val = 80 i6300esb: i6300esb_timer_expired: stage 1 i6300esb: i6300esb_restart_timer: stage 2, timeout -253951953748 i6300esb: i6300esb_timer_expired: stage 2 Upstream there were a handful of commits which fixed negative timeout calculations in this driver: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=4bc7b4d56657ebf75b986ad46e959cf7232ff26a http://git.qemu.org/?p=qemu.git;a=commitdiff;h=fee562e9e41290a22623de83b673a8929ec5280d http://git.qemu.org/?p=qemu.git;a=commitdiff;h=9491e9bc019a365dfa9780f462984a0d052f4c0d I cherry picked all three on top of qemu-kvm-1.5.3-141.el7_4.1 and that fixed the problem for me. I am clearing the regression and rhel-7.4-z flags, because I do not believe this ever worked. For your information, the equivalent bug in qemu-kvm-rhev (fixed back in 2015) was https://bugzilla.redhat.com/show_bug.cgi?id=1198936 Created attachment 1299609 [details] VM XML to reproduce of Comment 16 Steps of Comment 16: 1. Boot VM with watchdog, see attachment XML. 2. Reboot VM, then guest is shutdown, this bug is reproduced. btw. the issue does exist the other way round as well: libvirt snippet: ... <watchdog model='i6300esb action='reset'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </watchdog> ... on cmdline do 'reboot --poweroff' Result: reboot of the vm instead of poweroff (In reply to Klaus Wenninger from comment #18) > btw. the issue does exist the other way round as well: > > libvirt snippet: > ... > <watchdog model='i6300esb action='reset'> > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' > function='0x0'/> > </watchdog> > ... > > on cmdline do 'reboot --poweroff' > > Result: > reboot of the vm instead of poweroff as was axpected due to the findings actually ... (In reply to Klaus Wenninger from comment #19) > (In reply to Klaus Wenninger from comment #18) > > btw. the issue does exist the other way round as well: > > > > libvirt snippet: > > ... > > <watchdog model='i6300esb action='reset'> > > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' > > function='0x0'/> > > </watchdog> > > ... > > > > on cmdline do 'reboot --poweroff' > > > > Result: > > reboot of the vm instead of poweroff > > as was axpected due to the findings actually ... strange thing though is that calling 'poweroff' seem to have the anticipated result - especially interesting as they are both linking to systemctl ... didn't check in the systemctl code what it does differently when being called via the poweroff-link opposed to seeing the --poweroff switch. This is to be expected. The watchdog incorrectly fires on shutdown, so whatever watchdog action is specified is whatever is done on shutdown. For the reasons for this and the fix, see comment 14. Fix included in qemu-kvm-1.5.3-147.el7 It's fine now after I made the corrections on Sunday. Do I need to press an 'approve' button? I don't see one .. Reproduced bug with qemu-kvm-1.5.3-145.el7 & 3.10.0-824.el7.x86_64 1) /usr/libexec/qemu-kvm -name RHEL7.5-1 -machine pc -m 8G -smp 8,maxcpus=240,sockets=2,cores=2,threads=2 -cpu Opteron_G5 -rtc base=localtime,clock=host,driftfix=slew -nodefaults -vga qxl -serial unix:/tmp/serial0,server,nowait -device usb-ehci,id=usb1 -device usb-tablet,id=usb-tablet1 -boot menu=on -enable-kvm -monitor stdio -netdev tap,id=netdev0,vhost=on -device virtio-net-pci,mac=BA:BC:13:83:3F:1D,id=net0,netdev=netdev0,status=on -spice port=5800,disable-ticketing -qmp tcp:0:8888,server,nowait \ -drive file=rhel7.5-virtio-seabios.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive_sysdisk,id=device_sysdisk,bootindex=1 -vnc :1 \ -device i6300esb,id=watchdog0,addr=0x7 -watchdog-action poweroff \ 2) reboot guest inside guest #reboot Result: Guest is shut off. Verified guest with qemu-kvm-1.5.3-151.el7.x86_64 & 3.10.0-824.el7.x86_64 Steps is the same as above. Result: Guest reboot, It is expected result. So, set this bug as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0816 |