Bug 2168008
| Summary: | libvirt watchdog action=dump should not resume the vm after coredump the guest | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | yalzhang <yalzhang> |
| Component: | libvirt | Assignee: | Virtualization Maintenance <virt-maint> |
| libvirt sub component: | General | QA Contact: | Lili Zhu <lizhu> |
| Status: | CLOSED MIGRATED | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | fjin, hutao, lmen, mkletzan, mprivozn, rjones, virt-maint |
| Version: | 9.2 | Keywords: | MigratedToJIRA, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-07 21:23:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
yalzhang@redhat.com
2023-02-08 02:22:25 UTC
The issue here is that the watchdog software inside the guest and the watchdog emulation done by qemu is not expecting a watchdog that fires and then resumes the guest. Real watchdog hardware would never behave like this - it's always expected that if the watchdog fires, the machine is reset, reboots, and watchdog initialization is done over again. An effect of this is that after the watchdog has fired once, it will never fire again (making the watchdog ineffective until a human intervenes and reboots the machine). This would be, of course, a considerable change in the behaviour of action=dump, but I think we have to accept that we got this wrong initially. I think we just need to change this code so instead of starting up the vCPUs, it kills the domain: https://gitlab.com/libvirt/libvirt/-/blob/5155ab4b2a704285505dfea6ffee8b980fdaa29e/src/qemu/qemu_driver.c#L3459 It has been like this for ages. Does this mean that it never worked properly or something else changed? So whatever happens after the dump is not described anywhere. Anyone might expect the domain to get reset, destroyed, paused etc. The way this works has not changed since its inception in commit e19cdbfcf16dfec7308f8926b1660533c10bcc7d. Since this is not a qemu specificality, the dump is translated to pause by libvirt, we could theoretically add other actions, for example "dump+reset", "dump+pause", "dump+destroy", but I would rather not change the default, although we could document the current (maybe soon to be legacy) behaviour. We could use this BZ for documenting the behaviour of the option as it is now and if the other ones are wanted we can get another BZ for those. Would that be fine with you? Adding Hu Tao who was the original author of this commit: https://gitlab.com/libvirt/libvirt/-/commit/e19cdbfcf16dfec7308f8926b1660533c10bcc7d (In reply to Michal Privoznik from comment #2) > It has been like this for ages. Does this mean that it never worked properly > or something else changed? I mean, it works to some extent, it just screws up the watchdog state after the first time it fires. (In reply to Martin Kletzander from comment #3) > Since this is not a qemu specificality, the dump is translated to pause by > libvirt, we could theoretically add other actions, for example "dump+reset", > "dump+pause", "dump+destroy", but I would rather not change the default, > although we could document the current (maybe soon to be legacy) behaviour. > > We could use this BZ for documenting the behaviour of the option as it is > now and if the other ones are wanted we can get another BZ for those. Would > that be fine with you? dump+pause isn't really a good idea, but I guess so. I wonder if Hu Tao will have an opinion about this. Maybe Fujitsu depend on a particular behaviour. (In reply to Martin Kletzander from comment #3) > So whatever happens after the dump is not described anywhere. Anyone might > expect the domain to get reset, destroyed, paused etc. The way this works > has not changed since its inception in commit > e19cdbfcf16dfec7308f8926b1660533c10bcc7d. > > Since this is not a qemu specificality, the dump is translated to pause by > libvirt, we could theoretically add other actions, for example "dump+reset", > "dump+pause", "dump+destroy", but I would rather not change the default, > although we could document the current (maybe soon to be legacy) behaviour. > > We could use this BZ for documenting the behaviour of the option as it is > now and if the other ones are wanted we can get another BZ for those. Would > that be fine with you? It's fine for me. I'm expecting more input and discussion and finally we can get a solution fine with everyone. |