Bug 2137346 - RFE: support 'TCO' watchdog built-in to Q35 machine
Summary: RFE: support 'TCO' watchdog built-in to Q35 machine
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Martin Kletzander
QA Contact: Lili Zhu
URL:
Whiteboard:
Depends On: 2080207
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-24 14:40 UTC by Daniel Berrangé
Modified: 2023-07-26 08:29 UTC (History)
9 users (show)

Fixed In Version: libvirt-9.1.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Feature Request
Target Upstream Version: 9.1.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker LIBVIRTAT-14594 0 None None None 2023-07-26 08:26:55 UTC
Red Hat Issue Tracker RHELPLAN-137435 0 None None None 2022-10-24 14:52:59 UTC

Description Daniel Berrangé 2022-10-24 14:40:17 UTC
Description of problem:
The Q35 machine type chipset comes with unconditional support for a 'TCO' watchdog.

Linux guests automatically detect the TCO watchdog ahd load the 'iTCO_wdt' kmod to enable it.

There are two problems with this

 * Since it is a built-in device, there's no <watchdog> element needed to enable it, and thus also no way to set the watchdog action.

 * Even if configured, a weird decision by QEMU causes the watchdog action to never be triggered unless '-global ICH9-LPC.noreboot=off' is set.


This suggests we want

      <watchdog model='tco' action='poweroff'/>

to result in setting  -watchdog-action  and the ICH9-LPC.noreboot flag.

Oh, and the watchdog is currently broken - see bug 2080207 - so if trying to test this beware it won't work yet, which is rather unfortunate given that all guest OS are being given this watchdog unconditionally with no info that it is broken.

Comment 2 Daniel Berrangé 2023-01-12 15:55:45 UTC
(In reply to Daniel Berrangé from comment #0)
>  * Even if configured, a weird decision by QEMU causes the watchdog action
> to never be triggered unless '-global ICH9-LPC.noreboot=off' is set.

In QEMU 8.0 git this is now changed.

  commit a6b6414f0cf04636dc3d0c21ea4a2f19b7629c93
  Author: Daniel P. Berrangé <berrange>
  Date:   Fri Dec 16 07:57:48 2022 -0500

    hw/isa: enable TCO watchdog reboot pin strap by default


IOW, with -8.0 machine type versions or later, there will be no need for the ICH9-LPC.noreboot=off flag, it will default to 'on'. It is harmless to still turn it on explicitly though if we want compat with old QEMU

It is also still required to set a -watchdog-action, if the user wants something other than the default QEMU behaviour of 'reset'

IOW, if libvirt does NOTHING, when with new machine types the watchdog will work OOTB with q35, and result in guest resets. 

We still should express the watchdog in the XML though

> This suggests we want
> 
>       <watchdog model='tco' action='poweroff'/>
> 
> to result in setting  -watchdog-action  and the ICH9-LPC.noreboot flag.
> 
> Oh, and the watchdog is currently broken - see bug 2080207 - so if trying to
> test this beware it won't work yet, which is rather unfortunate given that
> all guest OS are being given this watchdog unconditionally with no info that
> it is broken.

This turned out to be a Linux kernel regression, which didn't affect RHEL-9 or earlier kernels, only Fedora 36/37.

Comment 3 Martin Kletzander 2023-01-31 11:40:06 UTC
Fixed upstream with c5340d5420012412ea298f0102cc7f113e87d89b..2dde3840b1d50e79f6b8161820fff9fe62f613a9

Comment 5 Lili Zhu 2023-03-20 03:09:07 UTC
Submit a PR:
https://github.com/autotest/tp-libvirt/pull/4811

This PR contains some basic test of tco watchdog, test passed.


Note You need to log in before you can comment on or make changes to this bug.