Bug 2137346

Summary: RFE: support 'TCO' watchdog built-in to Q35 machine
Product: Red Hat Enterprise Linux 9 Reporter: Daniel Berrangé <berrange>
Component: libvirtAssignee: Martin Kletzander <mkletzan>
libvirt sub component: General QA Contact: Lili Zhu <lizhu>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: dzheng, jdenemar, lmen, mdeng, rjones, sgott, virt-maint, xuzhang, yalzhang
Version: 9.0Keywords: FutureFeature, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-9.1.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:30:47 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 9.1.0
Embargoed:
Bug Depends On: 2080207    
Bug Blocks:    

Description Daniel Berrangé 2022-10-24 14:40:17 UTC
Description of problem:
The Q35 machine type chipset comes with unconditional support for a 'TCO' watchdog.

Linux guests automatically detect the TCO watchdog ahd load the 'iTCO_wdt' kmod to enable it.

There are two problems with this

 * Since it is a built-in device, there's no <watchdog> element needed to enable it, and thus also no way to set the watchdog action.

 * Even if configured, a weird decision by QEMU causes the watchdog action to never be triggered unless '-global ICH9-LPC.noreboot=off' is set.


This suggests we want

      <watchdog model='tco' action='poweroff'/>

to result in setting  -watchdog-action  and the ICH9-LPC.noreboot flag.

Oh, and the watchdog is currently broken - see bug 2080207 - so if trying to test this beware it won't work yet, which is rather unfortunate given that all guest OS are being given this watchdog unconditionally with no info that it is broken.

Comment 2 Daniel Berrangé 2023-01-12 15:55:45 UTC
(In reply to Daniel Berrangé from comment #0)
>  * Even if configured, a weird decision by QEMU causes the watchdog action
> to never be triggered unless '-global ICH9-LPC.noreboot=off' is set.

In QEMU 8.0 git this is now changed.

  commit a6b6414f0cf04636dc3d0c21ea4a2f19b7629c93
  Author: Daniel P. Berrangé <berrange>
  Date:   Fri Dec 16 07:57:48 2022 -0500

    hw/isa: enable TCO watchdog reboot pin strap by default


IOW, with -8.0 machine type versions or later, there will be no need for the ICH9-LPC.noreboot=off flag, it will default to 'on'. It is harmless to still turn it on explicitly though if we want compat with old QEMU

It is also still required to set a -watchdog-action, if the user wants something other than the default QEMU behaviour of 'reset'

IOW, if libvirt does NOTHING, when with new machine types the watchdog will work OOTB with q35, and result in guest resets. 

We still should express the watchdog in the XML though

> This suggests we want
> 
>       <watchdog model='tco' action='poweroff'/>
> 
> to result in setting  -watchdog-action  and the ICH9-LPC.noreboot flag.
> 
> Oh, and the watchdog is currently broken - see bug 2080207 - so if trying to
> test this beware it won't work yet, which is rather unfortunate given that
> all guest OS are being given this watchdog unconditionally with no info that
> it is broken.

This turned out to be a Linux kernel regression, which didn't affect RHEL-9 or earlier kernels, only Fedora 36/37.

Comment 3 Martin Kletzander 2023-01-31 11:40:06 UTC
Fixed upstream with c5340d5420012412ea298f0102cc7f113e87d89b..2dde3840b1d50e79f6b8161820fff9fe62f613a9

Comment 5 Lili Zhu 2023-03-20 03:09:07 UTC
Submit a PR:
https://github.com/autotest/tp-libvirt/pull/4811

This PR contains some basic test of tco watchdog, test passed.

Comment 10 errata-xmlrpc 2023-11-07 08:30:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6409