RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1613010 - Hibernate of host fails when KVM guest is running
Summary: Hibernate of host fails when KVM guest is running
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.5
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Hai Huang
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 980840 1594286
TreeView+ depends on / blocked
 
Reported: 2018-08-06 17:55 UTC by Paul Gozart
Modified: 2024-03-25 15:06 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1661276 (view as bug list)
Environment:
Last Closed: 2019-10-08 15:59:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3553191 0 None None None 2020-09-24 18:16:36 UTC

Internal Links: 1819466

Comment 4 Christophe Fergeau 2018-12-17 11:09:54 UTC
Yes, not clear why spice would be able to block host side hibernation, tentatively moving to QEMU..

Comment 6 Daniel Berrangé 2019-02-06 15:11:36 UTC
Libvirt already has logic which aims to inhibit shutdown while VMs are running. It should not be difficult to extend this to inhibit suspend too. We would need to make this a configurable policy though for libvirtd.

Comment 7 Jaroslav Suchanek 2019-02-15 13:10:18 UTC
I am sorry I do not see the potential value in implementing such feature.

While with shutdown inhibition the goal is to correctly shutdown/suspend -> start/resume running guests, with host suspend inhibition you have two options,
a) it's off, which is current situation, host OS is responsible for gracefully suspending all running processes including libvirt and qemu. The host kernel may get stuck if a (kernel) thread is not interruptible (or for other reasons).
b) it's on, than the host suspend would not proceed as the guest would not be suspendable.

Host suspend is clearly workstation/laptop use case, its inhibition may lead to undesired effects such as battery drain, etc.

What is the expectation from this requirement? It would not provide successful host suspend with running VMs, would it? Ought we focus on the root cause of the kernel hang?

Comment 8 Paul Gozart 2019-02-26 16:34:57 UTC
Where do we stand with this request please?

Comment 9 Michal Privoznik 2019-03-20 14:30:53 UTC
I agree with Jarda. What if pm-suspend would call 'virsh suspend' over every domain running and then 'virsh resume' over domains suspended earlier (note that there might be some domains suspended prior issuing pm-suspend and we don't want to resume those).

Comment 10 Michal Privoznik 2019-03-22 19:14:53 UTC
Looking at /usr/lib64/pm-utils/ there's a lot of scripts that are called on suspend/resume. If anything, there can be one for suspending/resuming libvirt domains. Alternatively, there might be a bug in KVM which would prevent host suspend. Moving to pm-utils for now to decide which way to go.

Comment 12 Jaroslav Škarvada 2019-06-14 15:47:21 UTC
Suspend/resume is primarily handled in userspace by systemd. The pm-utils are there for backward compatibility. The project is dead usptream and the package was removed from both Fedora and RHEL-8. So the benefit of adding such helper script to pm-utils is low.

Comment 13 Jan Synacek 2019-06-17 06:31:34 UTC
The proper components for this to be fixed in have already declined the request. What is systemd supposed to do with this? Is this another of those "we don't know where to put it" hacks that ends up in systemd?

Comment 15 Michal Privoznik 2019-06-18 15:49:51 UTC
(In reply to Jan Synacek from comment #13)
> The proper components for this to be fixed in have already declined the
> request. What is systemd supposed to do with this? Is this another of those
> "we don't know where to put it" hacks that ends up in systemd?

Let's start from the beginning, what do you think is the best place to fix this issue?

Comment 16 Daniel Berrangé 2019-06-18 16:03:42 UTC
(In reply to Michal Privoznik from comment #9)
> I agree with Jarda. What if pm-suspend would call 'virsh suspend' over every
> domain running and then 'virsh resume' over domains suspended earlier (note
> that there might be some domains suspended prior issuing pm-suspend and we
> don't want to resume those).

Calling "virsh suspend" and "virsh resume" doesn't do anything useful. The VMs' execution is already suspended by virtue of the host being suspended. The only useful thing would be to call the guest agent command to sync the clock upon resume, which merely requires some notification upon host resume.

Being able to optionally inhibit suspend when VMs are running is the useful thing and that can be done in libvirt itself by talking to the systemd login manager over dbus, in the same way we inhibit shutdown already.

I'm not seeing any need for extra features / changes in systemd here.

Comment 17 Paul Gozart 2019-06-21 15:41:25 UTC
Please keep in mind that the problem reported by my TAM customer is not that the processing of the KVM guest is mishandled upon host hibernation, but rather that the KVM host freezes.  The customer doesn't care so much whether the hibernation is ignored or the guest processing is interrupted, but they don't want this scenario to cause the host to lock up and require a hard reboot.

Comment 18 Daniel Berrangé 2019-06-21 15:55:23 UTC
Obviously the KVM host should not freeze/hang in this situation and that smells like a kernel bug. 

The issue is that S3/S4 of the host with VMs running is considered an unsupported scenario, so this bug ended up morphing into a way to disable suspend so that we don't get near the kernel bug in the first place.

Comment 19 Ademar Reis 2019-10-01 14:19:41 UTC
(In reply to Daniel Berrangé from comment #18)
> Obviously the KVM host should not freeze/hang in this situation and that
> smells like a kernel bug. 
> 

It's a kernel bug, but in an unsupported scenario, therefore low priority (for the kernel team). We closed many S3/S4 BZs as "CLOSED/NOTABUG" in the past. This scenario is not in our test plans, so fixing it won't be that valuable.

> The issue is that S3/S4 of the host with VMs running is considered an
> unsupported scenario, so this bug ended up morphing into a way to disable
> suspend so that we don't get near the kernel bug in the first place.

... ^^^ that's a problem we can prevent: crashing the host is very bad, so let's prevent it from ever happen. Given that historically speaking we had many S3/S4 issues and decided to declare it unsupported and don't test for it, let's disable it.

Comment 20 Jan Synacek 2019-10-02 06:35:01 UTC
(In reply to Michal Privoznik from comment #15)
> (In reply to Jan Synacek from comment #13)
> > The proper components for this to be fixed in have already declined the
> > request. What is systemd supposed to do with this? Is this another of those
> > "we don't know where to put it" hacks that ends up in systemd?
> 
> Let's start from the beginning, what do you think is the best place to fix
> this issue?

See comment 19.

Comment 21 Michal Privoznik 2019-10-02 08:35:45 UTC
(In reply to Ademar Reis from comment #19)
> (In reply to Daniel Berrangé from comment #18)
> > Obviously the KVM host should not freeze/hang in this situation and that
> > smells like a kernel bug. 
> > 
> 
> It's a kernel bug, but in an unsupported scenario, therefore low priority
> (for the kernel team). We closed many S3/S4 BZs as "CLOSED/NOTABUG" in the
> past. This scenario is not in our test plans, so fixing it won't be that
> valuable.
> 
> > The issue is that S3/S4 of the host with VMs running is considered an
> > unsupported scenario, so this bug ended up morphing into a way to disable
> > suspend so that we don't get near the kernel bug in the first place.
> 
> ... ^^^ that's a problem we can prevent: crashing the host is very bad, so
> let's prevent it from ever happen. Given that historically speaking we had
> many S3/S4 issues and decided to declare it unsupported and don't test for
> it, let's disable it.

Ademar, do you mean to disable suspend on libvirt level, that is - should libvirt do something which would prevent host suspend if there's a domain running?
Well, I've just tested this with 5.2.13-gentoo and was able to suspend & resume successfully with a KVM guest running. So did Jarda with RHEL-AV-8.1.0 and it worked. Therefore, prohibiting suspend in upstream libvirt looks too harsh to me because it obviously works, except for RHEL kernel. If anything, we can make this opt-in (since libvirt doesn't have way to test if the kernel its running under has the bug or not), at which point we would require users to change a config file, which no one is going to do. Our best should would be a downstream only patch.

And one philosophical question, if this is unsupported scenario and the component where the bug clearly lies in is refusing to fix it, why should libvirt? I don't want it to be a dump of bug workarounds.

Comment 22 Daniel Berrangé 2019-10-02 09:03:59 UTC
(In reply to Michal Privoznik from comment #21)
> And one philosophical question, if this is unsupported scenario and the
> component where the bug clearly lies in is refusing to fix it, why should
> libvirt? I don't want it to be a dump of bug workarounds.

That's not really the case here though. The original title / description of this bug report is that the host OS (probably kernel or KVM module) fails when suspending while VMs are running. This probable kernel or KVM bug was never even investigated, nor has the kernel team rejected any request to fix it since they've never been asked thus far. 

The bug was (IMHO) mistakenly turned into an RFE to block suspend when VMs are running, and systemd quite reasonably rejected that request on the basis that libvirt can already register a suspend blocker if it desires.

IMHO either we investigate the root problem with VMs running or we just admit this is going to be a WONTFIX. 

I'm reverting this bug back to its original title & component assignment, since it is clearly not a systemd problem, and any RFE to libvirt blocking suspend should be considered separately from resolution of the actual customer problem report.

Comment 24 Ademar Reis 2019-10-08 15:56:21 UTC
(In reply to Michal Privoznik from comment #21)
> (In reply to Ademar Reis from comment #19)
> > (In reply to Daniel Berrangé from comment #18)
> > > Obviously the KVM host should not freeze/hang in this situation and that
> > > smells like a kernel bug. 
> > > 
> > 
> > It's a kernel bug, but in an unsupported scenario, therefore low priority
> > (for the kernel team). We closed many S3/S4 BZs as "CLOSED/NOTABUG" in the
> > past. This scenario is not in our test plans, so fixing it won't be that
> > valuable.
> > 
> > > The issue is that S3/S4 of the host with VMs running is considered an
> > > unsupported scenario, so this bug ended up morphing into a way to disable
> > > suspend so that we don't get near the kernel bug in the first place.
> > 
> > ... ^^^ that's a problem we can prevent: crashing the host is very bad, so
> > let's prevent it from ever happen. Given that historically speaking we had
> > many S3/S4 issues and decided to declare it unsupported and don't test for
> > it, let's disable it.
> 
> Ademar, do you mean to disable suspend on libvirt level, that is - should
> libvirt do something which would prevent host suspend if there's a domain
> running?

Yes: libvirt disabling suspend of the host if a VM is running.

> Well, I've just tested this with 5.2.13-gentoo and was able to suspend &
> resume successfully with a KVM guest running. So did Jarda with
> RHEL-AV-8.1.0 and it worked. Therefore, prohibiting suspend in upstream
> libvirt looks too harsh to me because it obviously works, except for RHEL
> kernel. If anything, we can make this opt-in (since libvirt doesn't have way
> to test if the kernel its running under has the bug or not), at which point
> we would require users to change a config file, which no one is going to do.
> Our best should would be a downstream only patch.

I'm not talking about libvirt unconditionally disabling S3/S4 upstream/everywhere. I'm talking about disabling it by default in downstream RHEL, with a configuration switch for users who want to enable it back (and end up in an unsupported stated).

I know it works in many cases, but it's not reliable and we've decided long ago that we don't support S3/S4+KVM because of the multiple obscure bugs that customers hit. QE has not been testing it and we're not actively developing it. We closed and continue to close many S3/S4 bugs as WONTFIX. If you're interested in bug archaeology, please check this tracker: https://bugzilla.redhat.com/show_bug.cgi?id=923626

> 
> And one philosophical question, if this is unsupported scenario and the
> component where the bug clearly lies in is refusing to fix it, why should
> libvirt? I don't want it to be a dump of bug workarounds.

The story is much simpler than that: we don't support S3/S4 with KVM in RHEL and therefore we're not allocating resources to fix bugs or test it, so libvirt disables it by default in RHEL. Users who want to run an unsupported configuration can enable it. Upstream is a different story.

We can and should re-evaluate our support statement of S3/S4 with KVM, but the truth is that right now it's not supported.

Comment 25 Ademar Reis 2019-10-08 15:59:21 UTC
With all that said, I think it's time to close this RHEL7 BZ, as obviously we're way too late to fix it there.

We already have a RHEL8-AV RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1568487


Note You need to log in before you can comment on or make changes to this bug.