RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2136889 - iTCO_wdt fails to fire
Summary: iTCO_wdt fails to fire
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.5
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: ---
Assignee: Michael S. Tsirkin
QA Contact: Yiqian Wei
URL:
Whiteboard:
Depends On: 2080207
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-21 18:21 UTC by David Teigland
Modified: 2023-03-30 02:52 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-31 09:42:24 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-137311 0 None None None 2022-10-21 18:28:13 UTC

Description David Teigland 2022-10-21 18:21:51 UTC
Description of problem:

Use iTCO_wdt that is loaded by default in a vm (do not configure a watchdog in the xml.)
Open /dev/watchdog, don't ping it, and it fails to reset the vm after the timeout period.  
If I rmmod iTCO_wdt and load softdog or i6300ESB, then the watchdog will fire as expected.


Version-Release number of selected component (if applicable):

host:

$ uname -a
Linux bp-06.lab.msp.redhat.com 4.18.0-348.el8.x86_64 #1 SMP Mon Oct 4 12:17:22 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

$ rpm -q qemu-kvm
qemu-kvm-4.2.0-59.module+el8.5.0+12817+cb650d43.x86_64


vm:
$ uname -a
Linux localhost.localdomain 5.14.0-119.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jun 24 06:37:48 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Daniel Berrangé 2022-10-24 14:26:42 UTC
After some time debugging I found this problem is already known and reported as bug 2080207 in RHEL-9. Lets treat the RHEL-9 version as the primary one to investigate, and this one merely to track any possible backport once a solution is found.

Comment 6 Daniel Berrangé 2022-10-27 17:52:59 UTC
In the end QEMU's impl is correct and the problems lies in Linux >= 5.15

This commit causes a regression:

commit 1ae3e78c08209ac657c59f6f7ea21bbbd7f6a1d4
Author: Mika Westerberg <mika.westerberg.com>
Date:   Tue Sep 21 13:29:00 2021 +0300

    watchdog: iTCO_wdt: No need to stop the timer in probe
    
    The watchdog core can handle pinging of the watchdog before userspace
    opens the device. For this reason instead of stopping the timer, just
    mark it as running and let the watchdog core take care of it.
    
    Cc: Malin Jonsson <malin.jonsson>
    Signed-off-by: Mika Westerberg <mika.westerberg.com>
    Reviewed-by: Guenter Roeck <linux>
    Link: https://lore.kernel.org/r/20210921102900.61586-1-mika.westerberg@linux.intel.com
    Signed-off-by: Guenter Roeck <linux>
    Signed-off-by: Wim Van Sebroeck <wim>


it marks the watchdog as running, but does NOT disable the "no reboot" flag.

This is reported to upstream maintainers listed in that commit, and a fix is in progress.

The problem doesn't exist in RHEL-8 kernel since that is way older.


@David can you confirm that the guest OS you were testing with has a Linux >= 5.15 kernel

Comment 7 David Teigland 2022-10-27 18:25:27 UTC
Thanks for the update, yes the guest OS is 5.14.0-119.el9

Comment 8 David Teigland 2022-10-27 20:19:08 UTC
> > @David can you confirm that the guest OS you were testing with has a Linux >= 5.15 kernel
> yes the guest OS is 5.14.0-119.el9

Rereading that, I'm not sure my answer made sense... the guest kernel I'm testing is 5.14.0-119.el9 and iTCO_wdt fails to reset the vm.

$ ./a.out /dev/watchdog
counting to timeout 30...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 failed to fire after 30 seconds
32 failed to fire after 30 seconds
^C

[localhost ~]$ uname -a
Linux localhost.localdomain 5.14.0-119.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jun 24 06:37:48 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

[localhost ~]$ wdctl 
Device:        /dev/watchdog0
Identity:      iTCO_wdt [version 0]
Timeout:       30 seconds
Pre-timeout:    0 seconds
Timeleft:      21 seconds
FLAG           DESCRIPTION               STATUS BOOT-STATUS
KEEPALIVEPING  Keep alive ping reply          1           0
MAGICCLOSE     Supports magic close char      0           0
SETTIMEOUT     Set timeout (in seconds)       0           0

Comment 9 Daniel Berrangé 2022-10-28 08:22:42 UTC
I expect you have not set the flag  "-global ICH9-LPC.noreboot=false" for QEMU. This is something required to enable the watchdog in QEMU and tracked by bug 2137346 for libvirt integration.

Comment 13 Michael S. Tsirkin 2023-01-31 09:42:24 UTC
ok we are in agreement here. close/nextrelease this one.


Note You need to log in before you can comment on or make changes to this bug.