Bug 2080207

Summary:	qemu iTCO watchdog (ICH9 TCO ACPI device) does not work
Product:	Red Hat Enterprise Linux 9	Reporter:	Richard W.M. Jones <rjones>
Component:	qemu-kvm	Assignee:	Michael S. Tsirkin <mst>
qemu-kvm sub component:	Devices	QA Contact:	Yiqian Wei <yiwei>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	berrange, coli, jinzhao, juzhang, kchamart, mst, virt-maint
Version:	9.1	Keywords:	Triaged
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-10-27 17:49:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2136889, 2137346

Description Richard W.M. Jones 2022-04-29 09:04:26 UTC

Description of problem:

I couldn't get the qemu watchdog device to work, or it's somehow
incompatible with the Linux iTCO_wdt driver.

Version-Release number of selected component (if applicable):

qemu-kvm-7.0.0-1.el9.x86_64

How reproducible:

100%

Steps to Reproduce:

* Install (eg) a Fedora or other Linux guest on RHEL 9.1 host.

* Inside the guest, download and compile the watchdog test framework:
  http://git.annexia.org/?p=watchdog-test-framework.git;a=summary
  (Alternatively this may be compiled on the host, and the static binary
  copied into the guest)

* Shut down the guest and edit the libvirt XML configuration, adding:
  <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  ...
    <qemu:commandline>
      <qemu:arg value='-global'/>
      <qemu:arg value='ICH9-LPC.noreboot=false'/>
    </qemu:commandline>
  </domain>

* Start up the guest again and check that
  (a) -global ICH9-LPC.noreboot=false appears on the qemu command line
  (b) inside the guest, the TCO device is seen by the guest kernel:
   [    3.652611] iTCO_vendor_support: vendor-support=0
   [    3.665967] iTCO_wdt iTCO_wdt.1.auto: Found a ICH9 TCO device (Version=2, TCOBASE=0x0660)
   [    3.671363] iTCO_wdt iTCO_wdt.1.auto: initialized. heartbeat=30 sec (nowayout=0)

* Run the watchdog-test-framework binary inside the guest and follow instructions.

* The guest should reboot once the test framework stops pinging the guest.

However for me, the guest does *not* reboot at the last step, indicating
that something in qemu (or the guest kernel driver) is not working.

I see:

Stopping pings.
The watchdog action should happen at approximately 30 second mark.

0...
3...
6...
9...
12...
15...
18...
21...
24...
27...
30...   <- reboot should have happened around this point
33...
36...
[continues forever]

Comment 2 Richard W.M. Jones 2022-05-18 08:50:01 UTC

>    -device i6300esb,id=wdt0,bus=pcie-pci-bridge-0 \

It's probably using this device, not the iTCO watchdog.

Comment 3 Yiqian Wei 2022-05-19 02:28:19 UTC

(In reply to Richard W.M. Jones from comment #2)
> >    -device i6300esb,id=wdt0,bus=pcie-pci-bridge-0 \
> 
> It's probably using this device, not the iTCO watchdog.

I tried with the iTCO watchdog,Can reproduce this bug,thanks

Comment 4 Yiqian Wei 2022-05-19 02:51:55 UTC

(In reply to Yiqian Wei from comment #3)
> (In reply to Richard W.M. Jones from comment #2)
> > >    -device i6300esb,id=wdt0,bus=pcie-pci-bridge-0 \
> > 
> > It's probably using this device, not the iTCO watchdog.
> 
> I tried with the iTCO watchdog,Can reproduce this bug,thanks

Reproduce cmd:
boot a guest with "-global ICH9-LPC.noreboot=false"

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -blockdev node-name=file_ovmf_code,driver=file,filename=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \
    -blockdev node-name=file_ovmf_vars,driver=file,filename=/home/yiwei/OVMF_VARS.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars \
    -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 12G \
    -object memory-backend-ram,size=12G,id=mem-machine_mem  \
    -smp 6   \
    -cpu IvyBridge,enforce \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/yiwei/rhel9.1-ovmf.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:df:ca:53:c2:69,id=idz43iXV,netdev=idPOEPyA,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idPOEPyA,vhost=on   \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio  \
    -qmp tcp:0:4444,server=on,wait=off \
    -global ICH9-LPC.noreboot=false \

Comment 7 Richard W.M. Jones 2022-10-20 13:00:39 UTC

I can take this bug if Michael doesn't want it.

Comment 8 Daniel Berrangé 2022-10-24 14:19:05 UTC

I've spent some time debugging the iTCO_wdt support in QEMU before realizing this BZ existed. 

The core of the problem is that the second part of this condition in tco_timer_expired() never becomes true:

        if (!lpc->pin_strap.spkr_hi && !(gcs & ICH9_CC_GCS_NO_REBOOT)) {

The 'gcs' value is to be written by the guest to enable/disable the watchdog.

    uint32_t gcs = pci_get_long(lpc->chip_config + ICH9_CC_GCS);


The 'lpc->chip_config' memory region is handled by the  lpc_ich9.c file rcrb_mmio_ops callbacks, specifically  'ich9_cc_write'.

If I add a trace point to this method, I can see it being invoked twice during Linux guest boot

  ich9_cc_read addr 0x3410 val=0x20 len=4
  ich9_cc_write addr 0x3410 val=0x0 len=4
  ich9_cc_read addr 0x3410 val=0x0 len=4

  ich9_cc_read addr 0x3410 val=0x0 len=4
  ich9_cc_write addr 0x3410 val=0x20 len=4
  ich9_cc_read addr 0x3410 val=0x20 len=4

Looking at the Linux guest driver, iTCO_wdt.c, this corresponds to the iTCO_wdt_probe method where it tests if the watchdog is functional


	/* Check chipset's NO_REBOOT bit */
	if (p->update_no_reboot_bit(p->no_reboot_priv, false) &&
	    iTCO_vendor_check_noreboot_on()) {
		dev_info(dev, "unable to reset NO_REBOOT flag, device disabled by hardware/BIOS\n");
		return -ENODEV;	/* Cannot reset NO_REBOOT bit */
	}

	/* Set the NO_REBOOT bit to prevent later reboots, just for sure */
	p->update_no_reboot_bit(p->no_reboot_priv, true);


So initially the memory region writes/reads are happening as expected

The 'p->update_no_reboot_bit' method should subsequently be called when the watchdog driver starts in iTCO_wdt_start

	/* disable chipset's NO_REBOOT bit */
	if (p->update_no_reboot_bit(p->no_reboot_priv, false)) {
		spin_unlock(&p->io_lock);
		dev_err(wd_dev->parent, "failed to reset NO_REBOOT flag, reboot disabled by hardware/BIOS\n");
		return -EIO;
	}


A systemtap probe in the guest shows that 'iTCO_wdt_start' is being invoked, however, 'ich9_cc_write' is never called in QEMU

IOW, QEMU is never seeing that the guest driver is trying to toggle the noreboot flag in the LPC chip config memory region, and so the watchdog will never fire upon expiry.

Comment 10 Daniel Berrangé 2022-10-24 14:27:14 UTC

Also reported on RHEL-8 as bug 2136889, but lets keep investigation focus on this RHEL9 bug.

Comment 11 Daniel Berrangé 2022-10-24 15:04:54 UTC

Incidentally, the following shows guest kernel messages during iTCO_wdt.ko loading phase, interleaved with some QEMU trace points I added

[    0.933340] lpc_ich 0000:00:1f.0: I/O space for GPIO uninitialized
[    0.935767] iTCO_vendor_support: vendor-support=0
ich9_cc_read addr 0x3410 val=0x20 len=4
ich9_cc_write addr 0x3410 val=0x0 len=4
ich9_cc_read addr 0x3410 val=0x0 len=4
ich9_cc_read addr 0x3410 val=0x0 len=4
ich9_cc_write addr 0x3410 val=0x20 len=4
ich9_cc_read addr 0x3410 val=0x20 len=4
[    0.940306] iTCO_wdt iTCO_wdt.1.auto: Found a ICH9 TCO device (Version=2, TCOBASE=0x0660)
tco_io_write 0x4 0x8
tco_io_write 0x6 0x2
tco_io_write 0x6 0x4
tco_io_read 0x8
tco_io_read 0x12
tco_io_write 0x12 0x32
tco_io_read 0x12
tco_io_write 0x0 0x1
tco_timer_reload ticks=50 (30000 ms)
[    0.940836] iTCO_wdt iTCO_wdt.1.auto: initialized. heartbeat=30 sec (nowayout=0)
[    0.946827] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[    0.947191] i2c i2c-0: 1/1 memory slots populated (from DMI)
[    0.947480] i2c i2c-0: Memory type 0x07 not supported yet, not instantiating SPD

Comment 12 Daniel Berrangé 2022-10-24 18:15:11 UTC

(In reply to Daniel Berrangé from comment #8)

> The 'p->update_no_reboot_bit' method should subsequently be called when the
> watchdog driver starts in iTCO_wdt_start
> 
> 	/* disable chipset's NO_REBOOT bit */
> 	if (p->update_no_reboot_bit(p->no_reboot_priv, false)) {
> 		spin_unlock(&p->io_lock);
> 		dev_err(wd_dev->parent, "failed to reset NO_REBOOT flag, reboot disabled
> by hardware/BIOS\n");
> 		return -EIO;
> 	}
> 
> 
> A systemtap probe in the guest shows that 'iTCO_wdt_start' is being invoked,
> however, 'ich9_cc_write' is never called in QEMU

I was wrong about that. Only iTCO_wdt_ping is being invoked when /dev/watchdog0 is opened.

IOW, Linux thinks the watchdog is already active at boot.

QEMU's impl has

    TCO1_CNT_DEFAULT        = 0x0000,

The spec https://uefi.org/sites/default/files/resources/Watchdog%20Descriptor%20Table.pdf  

Section 2.2.7 TCO1_CNT – TCO1 Control Register says this about bit 11

[quote]

1 = The WDT will halt. It will not count, and thus cannot reach a value that would cause a
timeout.
0 = The WDT is enabled to count (default). Note that the WDT will not reset or reload
when this bit is changed. If the WDT is halted and then enabled, the WDT will continue
counting starting at the value where it was halted.
[/quote]

Since we've set TCO1_CNT to 0x0,  bit 11 is clear, and thus the WDT is presumed to be running

Thus Linux will NOT invoke iTCO_wdt_start, only iTCO_wdt_ping

As a result, it will never try to clear the 'noreboot' flag, and thus the watchdog will never fire.

Comment 13 Daniel Berrangé 2022-10-24 18:43:18 UTC

I've found that the RHEL-8  kernel 4.18.0-431.el8.x86_64 works fine with the iTCO watchdog impl by QEMU.

I believe the behaviour change is caused by this kernel commit.

commit 1ae3e78c08209ac657c59f6f7ea21bbbd7f6a1d4
Author: Mika Westerberg <mika.westerberg.com>
Date:   Tue Sep 21 13:29:00 2021 +0300

    watchdog: iTCO_wdt: No need to stop the timer in probe
    
    The watchdog core can handle pinging of the watchdog before userspace
    opens the device. For this reason instead of stopping the timer, just
    mark it as running and let the watchdog core take care of it.
    
    Cc: Malin Jonsson <malin.jonsson>
    Signed-off-by: Mika Westerberg <mika.westerberg.com>
    Reviewed-by: Guenter Roeck <linux>
    Link: https://lore.kernel.org/r/20210921102900.61586-1-mika.westerberg@linux.intel.com
    Signed-off-by: Guenter Roeck <linux>
    Signed-off-by: Wim Van Sebroeck <wim>

diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index 16ad2c9842c2..3f2f4343644f 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -423,6 +423,16 @@ static unsigned int iTCO_wdt_get_timeleft(struct watchdog_device *wd_dev)
        return time_left;
 }
 
+static void iTCO_wdt_set_running(struct iTCO_wdt_private *p)
+{
+       u16 val;
+
+       /* Bit 11: TCO Timer Halt -> 0 = The TCO timer is * enabled */
+       val = inw(TCO1_CNT(p));
+       if (!(val & BIT(11)))
+               set_bit(WDOG_HW_RUNNING, &p->wddev.status);
+}
+
 /*
  *     Kernel Interfaces
  */
@@ -562,8 +572,7 @@ static int iTCO_wdt_probe(struct platform_device *pdev)
        watchdog_set_drvdata(&p->wddev, p);
        platform_set_drvdata(pdev, p);
 
-       /* Make sure the watchdog is not running */
-       iTCO_wdt_stop(&p->wddev);
+       iTCO_wdt_set_running(p);
 
        /* Check that the heartbeat value is within it's range;
           if not reset to the default */


Prior to that change, Linux would explicitly set the Timer Halt bit, and set the Noreboot bit.

After this change, Linux has itself set the Noreboot bit, but just probes for the current Timer Halt bit status.

This leads to broken setup when the current Timer Halt bit is unset at boot time, which is the state with QEMU.

The spec indicates that the Timer Halt bit must be unset initially, however, it also indicates that the BIOS would typically set this bit once running, so the spec does NOT imply that it will be unset when the OS boots. SeaBIOS/OVMF don't touch the iTCO WDT AFAICT, so in QEMU the guest OS is getting the iTCO in the exact state QEMU initializes.

Possibly this Linux change was made based on assumption about what firmware would do to set the initial iTCO state and thus is not matching how QEMU sets it up.

So I'm still not sure which is at fault.

I can at least make QEMU work with both old and new linux guest behaviour by setting the Timer Halt bit by default with:

diff --git a/hw/acpi/tco.c b/hw/acpi/tco.c
index 9ebd3e5e64..5116b8ed61 100644
--- a/hw/acpi/tco.c
+++ b/hw/acpi/tco.c
@@ -21,7 +21,7 @@ enum {
     TCO_DAT_OUT_DEFAULT     = 0x00,
     TCO1_STS_DEFAULT        = 0x0000,
     TCO2_STS_DEFAULT        = 0x0000,
-    TCO1_CNT_DEFAULT        = 0x0000,
+    TCO1_CNT_DEFAULT        = TCO_TMR_HLT,
     TCO2_CNT_DEFAULT        = 0x0008,
     TCO_MESSAGE1_DEFAULT    = 0x00,
     TCO_MESSAGE2_DEFAULT    = 0x00,


but this diverges from the documented spec defaults.

Comment 14 Michael S. Tsirkin 2022-10-24 19:16:51 UTC

Propose cloning to kernel - as it seems to be a regression there. And let's see what
kernel devs upstream say about this. Makes sense?

Comment 15 Richard W.M. Jones 2022-10-24 21:26:39 UTC

I'm surprised that the watchdog isn't halted at machine start.  If
the guest didn't know about the watchdog at all (say it was a non-Linux
guest or a Linux kernel with this driver config'd out) wouldn't the
machine hard reboot after N seconds?  For i6300ESB the watchdog
doesn't start running until the guest sets it up.

Comment 16 Daniel Berrangé 2022-10-25 08:20:47 UTC

(In reply to Michael S. Tsirkin from comment #14)
> Propose cloning to kernel - as it seems to be a regression there. And let's
> see what
> kernel devs upstream say about this. Makes sense?

Yes, I'm planning to contact the authors/reviewers of the upstream change for their POV

(In reply to Richard W.M. Jones from comment #15)
> I'm surprised that the watchdog isn't halted at machine start.  If
> the guest didn't know about the watchdog at all (say it was a non-Linux
> guest or a Linux kernel with this driver config'd out) wouldn't the
> machine hard reboot after N seconds?  For i6300ESB the watchdog
> doesn't start running until the guest sets it up.

The idea is that the watchdog can even catch hangs in the firmware. I think the expectation is likely that the firmware disables the watchdog prior to the guest OS launching, but has a firmware menu option to leave the watchdog running if the user knows the OS has support. As mentioned above though, it doesn't really work out that way in VMs.

Comment 17 Richard W.M. Jones 2022-10-25 08:49:50 UTC

I have a machine (Intel NUC11) with a physical iTCO watchdog.  The boot
messages with Fedora 36 (kernel 5.19) are:

[   25.671966] iTCO_vendor_support: vendor-support=0
[   25.776238] iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
[   25.776416] iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)

At Dan's suggestion I loaded the systemtap script at the end of this comment.

Doing cat /dev/watchdog0 causes stap to print both of these messages:

WDT start
WDT ping

and then each following cat /dev/watchdog0 prints:

WDT ping

---

probe module("iTCO-wdt").function("iTCO_wdt_start") {
  print("WDT start\n")
}
 
probe module("iTCO-wdt").function("iTCO_wdt_ping") {
  print("WDT ping\n")
}
 
probe module("iTCO-wdt").function("iTCO_wdt_stop") {
  print("WDT stop\n")
}
 
probe begin {
  print("BEGIN\n")
}

Comment 18 Daniel Berrangé 2022-10-27 17:49:42 UTC

In the end QEMU's impl is correct and the problems lies in Linux >= 5.15

This commit causes a regression:

commit 1ae3e78c08209ac657c59f6f7ea21bbbd7f6a1d4
Author: Mika Westerberg <mika.westerberg.com>
Date:   Tue Sep 21 13:29:00 2021 +0300

    watchdog: iTCO_wdt: No need to stop the timer in probe
    
    The watchdog core can handle pinging of the watchdog before userspace
    opens the device. For this reason instead of stopping the timer, just
    mark it as running and let the watchdog core take care of it.
    
    Cc: Malin Jonsson <malin.jonsson>
    Signed-off-by: Mika Westerberg <mika.westerberg.com>
    Reviewed-by: Guenter Roeck <linux>
    Link: https://lore.kernel.org/r/20210921102900.61586-1-mika.westerberg@linux.intel.com
    Signed-off-by: Guenter Roeck <linux>
    Signed-off-by: Wim Van Sebroeck <wim>


it marks the watchdog as running, but does NOT disable the "no reboot" flag.

This is reported to upstream maintainers listed in that commit, and a fix is in progress.

The problem doesn't exist in RHEL-9 kernel since that is 5.14 and does not appear to have backported the problem