Bug 2080207
Summary: | qemu iTCO watchdog (ICH9 TCO ACPI device) does not work | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Richard W.M. Jones <rjones> |
Component: | qemu-kvm | Assignee: | Michael S. Tsirkin <mst> |
qemu-kvm sub component: | Devices | QA Contact: | Yiqian Wei <yiwei> |
Status: | CLOSED NOTABUG | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | berrange, coli, jinzhao, juzhang, kchamart, mst, virt-maint |
Version: | 9.1 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-10-27 17:49:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2136889, 2137346 |
Description
Richard W.M. Jones
2022-04-29 09:04:26 UTC
> -device i6300esb,id=wdt0,bus=pcie-pci-bridge-0 \
It's probably using this device, not the iTCO watchdog.
(In reply to Richard W.M. Jones from comment #2) > > -device i6300esb,id=wdt0,bus=pcie-pci-bridge-0 \ > > It's probably using this device, not the iTCO watchdog. I tried with the iTCO watchdog,Can reproduce this bug,thanks (In reply to Yiqian Wei from comment #3) > (In reply to Richard W.M. Jones from comment #2) > > > -device i6300esb,id=wdt0,bus=pcie-pci-bridge-0 \ > > > > It's probably using this device, not the iTCO watchdog. > > I tried with the iTCO watchdog,Can reproduce this bug,thanks Reproduce cmd: boot a guest with "-global ICH9-LPC.noreboot=false" /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev node-name=file_ovmf_code,driver=file,filename=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \ -blockdev node-name=file_ovmf_vars,driver=file,filename=/home/yiwei/OVMF_VARS.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars \ -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 12G \ -object memory-backend-ram,size=12G,id=mem-machine_mem \ -smp 6 \ -cpu IvyBridge,enforce \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/yiwei/rhel9.1-ovmf.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:df:ca:53:c2:69,id=idz43iXV,netdev=idPOEPyA,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idPOEPyA,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio \ -qmp tcp:0:4444,server=on,wait=off \ -global ICH9-LPC.noreboot=false \ I can take this bug if Michael doesn't want it. I've spent some time debugging the iTCO_wdt support in QEMU before realizing this BZ existed. The core of the problem is that the second part of this condition in tco_timer_expired() never becomes true: if (!lpc->pin_strap.spkr_hi && !(gcs & ICH9_CC_GCS_NO_REBOOT)) { The 'gcs' value is to be written by the guest to enable/disable the watchdog. uint32_t gcs = pci_get_long(lpc->chip_config + ICH9_CC_GCS); The 'lpc->chip_config' memory region is handled by the lpc_ich9.c file rcrb_mmio_ops callbacks, specifically 'ich9_cc_write'. If I add a trace point to this method, I can see it being invoked twice during Linux guest boot ich9_cc_read addr 0x3410 val=0x20 len=4 ich9_cc_write addr 0x3410 val=0x0 len=4 ich9_cc_read addr 0x3410 val=0x0 len=4 ich9_cc_read addr 0x3410 val=0x0 len=4 ich9_cc_write addr 0x3410 val=0x20 len=4 ich9_cc_read addr 0x3410 val=0x20 len=4 Looking at the Linux guest driver, iTCO_wdt.c, this corresponds to the iTCO_wdt_probe method where it tests if the watchdog is functional /* Check chipset's NO_REBOOT bit */ if (p->update_no_reboot_bit(p->no_reboot_priv, false) && iTCO_vendor_check_noreboot_on()) { dev_info(dev, "unable to reset NO_REBOOT flag, device disabled by hardware/BIOS\n"); return -ENODEV; /* Cannot reset NO_REBOOT bit */ } /* Set the NO_REBOOT bit to prevent later reboots, just for sure */ p->update_no_reboot_bit(p->no_reboot_priv, true); So initially the memory region writes/reads are happening as expected The 'p->update_no_reboot_bit' method should subsequently be called when the watchdog driver starts in iTCO_wdt_start /* disable chipset's NO_REBOOT bit */ if (p->update_no_reboot_bit(p->no_reboot_priv, false)) { spin_unlock(&p->io_lock); dev_err(wd_dev->parent, "failed to reset NO_REBOOT flag, reboot disabled by hardware/BIOS\n"); return -EIO; } A systemtap probe in the guest shows that 'iTCO_wdt_start' is being invoked, however, 'ich9_cc_write' is never called in QEMU IOW, QEMU is never seeing that the guest driver is trying to toggle the noreboot flag in the LPC chip config memory region, and so the watchdog will never fire upon expiry. Also reported on RHEL-8 as bug 2136889, but lets keep investigation focus on this RHEL9 bug. Incidentally, the following shows guest kernel messages during iTCO_wdt.ko loading phase, interleaved with some QEMU trace points I added [ 0.933340] lpc_ich 0000:00:1f.0: I/O space for GPIO uninitialized [ 0.935767] iTCO_vendor_support: vendor-support=0 ich9_cc_read addr 0x3410 val=0x20 len=4 ich9_cc_write addr 0x3410 val=0x0 len=4 ich9_cc_read addr 0x3410 val=0x0 len=4 ich9_cc_read addr 0x3410 val=0x0 len=4 ich9_cc_write addr 0x3410 val=0x20 len=4 ich9_cc_read addr 0x3410 val=0x20 len=4 [ 0.940306] iTCO_wdt iTCO_wdt.1.auto: Found a ICH9 TCO device (Version=2, TCOBASE=0x0660) tco_io_write 0x4 0x8 tco_io_write 0x6 0x2 tco_io_write 0x6 0x4 tco_io_read 0x8 tco_io_read 0x12 tco_io_write 0x12 0x32 tco_io_read 0x12 tco_io_write 0x0 0x1 tco_timer_reload ticks=50 (30000 ms) [ 0.940836] iTCO_wdt iTCO_wdt.1.auto: initialized. heartbeat=30 sec (nowayout=0) [ 0.946827] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt [ 0.947191] i2c i2c-0: 1/1 memory slots populated (from DMI) [ 0.947480] i2c i2c-0: Memory type 0x07 not supported yet, not instantiating SPD (In reply to Daniel Berrangé from comment #8) > The 'p->update_no_reboot_bit' method should subsequently be called when the > watchdog driver starts in iTCO_wdt_start > > /* disable chipset's NO_REBOOT bit */ > if (p->update_no_reboot_bit(p->no_reboot_priv, false)) { > spin_unlock(&p->io_lock); > dev_err(wd_dev->parent, "failed to reset NO_REBOOT flag, reboot disabled > by hardware/BIOS\n"); > return -EIO; > } > > > A systemtap probe in the guest shows that 'iTCO_wdt_start' is being invoked, > however, 'ich9_cc_write' is never called in QEMU I was wrong about that. Only iTCO_wdt_ping is being invoked when /dev/watchdog0 is opened. IOW, Linux thinks the watchdog is already active at boot. QEMU's impl has TCO1_CNT_DEFAULT = 0x0000, The spec https://uefi.org/sites/default/files/resources/Watchdog%20Descriptor%20Table.pdf Section 2.2.7 TCO1_CNT – TCO1 Control Register says this about bit 11 [quote] 1 = The WDT will halt. It will not count, and thus cannot reach a value that would cause a timeout. 0 = The WDT is enabled to count (default). Note that the WDT will not reset or reload when this bit is changed. If the WDT is halted and then enabled, the WDT will continue counting starting at the value where it was halted. [/quote] Since we've set TCO1_CNT to 0x0, bit 11 is clear, and thus the WDT is presumed to be running Thus Linux will NOT invoke iTCO_wdt_start, only iTCO_wdt_ping As a result, it will never try to clear the 'noreboot' flag, and thus the watchdog will never fire. I've found that the RHEL-8 kernel 4.18.0-431.el8.x86_64 works fine with the iTCO watchdog impl by QEMU. I believe the behaviour change is caused by this kernel commit. commit 1ae3e78c08209ac657c59f6f7ea21bbbd7f6a1d4 Author: Mika Westerberg <mika.westerberg.com> Date: Tue Sep 21 13:29:00 2021 +0300 watchdog: iTCO_wdt: No need to stop the timer in probe The watchdog core can handle pinging of the watchdog before userspace opens the device. For this reason instead of stopping the timer, just mark it as running and let the watchdog core take care of it. Cc: Malin Jonsson <malin.jonsson> Signed-off-by: Mika Westerberg <mika.westerberg.com> Reviewed-by: Guenter Roeck <linux> Link: https://lore.kernel.org/r/20210921102900.61586-1-mika.westerberg@linux.intel.com Signed-off-by: Guenter Roeck <linux> Signed-off-by: Wim Van Sebroeck <wim> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c index 16ad2c9842c2..3f2f4343644f 100644 --- a/drivers/watchdog/iTCO_wdt.c +++ b/drivers/watchdog/iTCO_wdt.c @@ -423,6 +423,16 @@ static unsigned int iTCO_wdt_get_timeleft(struct watchdog_device *wd_dev) return time_left; } +static void iTCO_wdt_set_running(struct iTCO_wdt_private *p) +{ + u16 val; + + /* Bit 11: TCO Timer Halt -> 0 = The TCO timer is * enabled */ + val = inw(TCO1_CNT(p)); + if (!(val & BIT(11))) + set_bit(WDOG_HW_RUNNING, &p->wddev.status); +} + /* * Kernel Interfaces */ @@ -562,8 +572,7 @@ static int iTCO_wdt_probe(struct platform_device *pdev) watchdog_set_drvdata(&p->wddev, p); platform_set_drvdata(pdev, p); - /* Make sure the watchdog is not running */ - iTCO_wdt_stop(&p->wddev); + iTCO_wdt_set_running(p); /* Check that the heartbeat value is within it's range; if not reset to the default */ Prior to that change, Linux would explicitly set the Timer Halt bit, and set the Noreboot bit. After this change, Linux has itself set the Noreboot bit, but just probes for the current Timer Halt bit status. This leads to broken setup when the current Timer Halt bit is unset at boot time, which is the state with QEMU. The spec indicates that the Timer Halt bit must be unset initially, however, it also indicates that the BIOS would typically set this bit once running, so the spec does NOT imply that it will be unset when the OS boots. SeaBIOS/OVMF don't touch the iTCO WDT AFAICT, so in QEMU the guest OS is getting the iTCO in the exact state QEMU initializes. Possibly this Linux change was made based on assumption about what firmware would do to set the initial iTCO state and thus is not matching how QEMU sets it up. So I'm still not sure which is at fault. I can at least make QEMU work with both old and new linux guest behaviour by setting the Timer Halt bit by default with: diff --git a/hw/acpi/tco.c b/hw/acpi/tco.c index 9ebd3e5e64..5116b8ed61 100644 --- a/hw/acpi/tco.c +++ b/hw/acpi/tco.c @@ -21,7 +21,7 @@ enum { TCO_DAT_OUT_DEFAULT = 0x00, TCO1_STS_DEFAULT = 0x0000, TCO2_STS_DEFAULT = 0x0000, - TCO1_CNT_DEFAULT = 0x0000, + TCO1_CNT_DEFAULT = TCO_TMR_HLT, TCO2_CNT_DEFAULT = 0x0008, TCO_MESSAGE1_DEFAULT = 0x00, TCO_MESSAGE2_DEFAULT = 0x00, but this diverges from the documented spec defaults. Propose cloning to kernel - as it seems to be a regression there. And let's see what kernel devs upstream say about this. Makes sense? I'm surprised that the watchdog isn't halted at machine start. If the guest didn't know about the watchdog at all (say it was a non-Linux guest or a Linux kernel with this driver config'd out) wouldn't the machine hard reboot after N seconds? For i6300ESB the watchdog doesn't start running until the guest sets it up. (In reply to Michael S. Tsirkin from comment #14) > Propose cloning to kernel - as it seems to be a regression there. And let's > see what > kernel devs upstream say about this. Makes sense? Yes, I'm planning to contact the authors/reviewers of the upstream change for their POV (In reply to Richard W.M. Jones from comment #15) > I'm surprised that the watchdog isn't halted at machine start. If > the guest didn't know about the watchdog at all (say it was a non-Linux > guest or a Linux kernel with this driver config'd out) wouldn't the > machine hard reboot after N seconds? For i6300ESB the watchdog > doesn't start running until the guest sets it up. The idea is that the watchdog can even catch hangs in the firmware. I think the expectation is likely that the firmware disables the watchdog prior to the guest OS launching, but has a firmware menu option to leave the watchdog running if the user knows the OS has support. As mentioned above though, it doesn't really work out that way in VMs. I have a machine (Intel NUC11) with a physical iTCO watchdog. The boot messages with Fedora 36 (kernel 5.19) are: [ 25.671966] iTCO_vendor_support: vendor-support=0 [ 25.776238] iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400) [ 25.776416] iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) At Dan's suggestion I loaded the systemtap script at the end of this comment. Doing cat /dev/watchdog0 causes stap to print both of these messages: WDT start WDT ping and then each following cat /dev/watchdog0 prints: WDT ping --- probe module("iTCO-wdt").function("iTCO_wdt_start") { print("WDT start\n") } probe module("iTCO-wdt").function("iTCO_wdt_ping") { print("WDT ping\n") } probe module("iTCO-wdt").function("iTCO_wdt_stop") { print("WDT stop\n") } probe begin { print("BEGIN\n") } In the end QEMU's impl is correct and the problems lies in Linux >= 5.15 This commit causes a regression: commit 1ae3e78c08209ac657c59f6f7ea21bbbd7f6a1d4 Author: Mika Westerberg <mika.westerberg.com> Date: Tue Sep 21 13:29:00 2021 +0300 watchdog: iTCO_wdt: No need to stop the timer in probe The watchdog core can handle pinging of the watchdog before userspace opens the device. For this reason instead of stopping the timer, just mark it as running and let the watchdog core take care of it. Cc: Malin Jonsson <malin.jonsson> Signed-off-by: Mika Westerberg <mika.westerberg.com> Reviewed-by: Guenter Roeck <linux> Link: https://lore.kernel.org/r/20210921102900.61586-1-mika.westerberg@linux.intel.com Signed-off-by: Guenter Roeck <linux> Signed-off-by: Wim Van Sebroeck <wim> it marks the watchdog as running, but does NOT disable the "no reboot" flag. This is reported to upstream maintainers listed in that commit, and a fix is in progress. The problem doesn't exist in RHEL-9 kernel since that is 5.14 and does not appear to have backported the problem |