RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1049844 - [RFE] balloon: empty large balloon before hibernating
Summary: [RFE] balloon: empty large balloon before hibernating
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.0
Hardware: x86_64
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Luiz Capitulino
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1049845 (view as bug list)
Depends On:
Blocks: Virt-S3/S4-7.0 1049845
TreeView+ depends on / blocked
 
Reported: 2014-01-08 10:55 UTC by zhoujunqin
Modified: 2015-12-15 05:30 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
: 1049845 (view as bug list)
Environment:
Last Closed: 2015-12-11 16:34:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the qemu log file for guest qcow22 (2.04 KB, text/plain)
2014-02-07 06:40 UTC, zhoujunqin
no flags Details
the kernel logs for the guest (2.52 KB, text/plain)
2014-02-07 08:19 UTC, zhoujunqin
no flags Details
kernel log for guest qtest1 (2.72 KB, text/plain)
2014-02-10 08:11 UTC, zhoujunqin
no flags Details
kernel log for guest (8G) (2.72 KB, text/plain)
2014-02-10 09:32 UTC, zhoujunqin
no flags Details
kernel log for guest (200M) (141.86 KB, image/jpeg)
2014-02-10 09:35 UTC, zhoujunqin
no flags Details

Description zhoujunqin 2014-01-08 10:55:05 UTC
Description of problem:
Fail to do S4 after change running guest's memory from 4G to 2G

Version-Release number of selected component (if applicable):
kernel-3.10.0-65.el7.x86_64
libvirt-1.1.1-17.el7.x86_64
qemu-kvm-rhev-1.5.3-31.el7.x86_64
qemu-guest-agent-1.5.3-31.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Prepare a rhel7 guest with qemu-guest-agent service installing and with 4G memory
#virsh start qcow22 (with 4G mem)
# virsh list
 Id    Name                           State
----------------------------------------------------
 35    qcow22                         running
# virsh dominfo qcow22
 Id:             35
 Name:           qcow22
 UUID:           f0295000-dc19-481e-8379-8592c5e437f8
 OS Type:        hvm
 State:          running
 CPU(s):         1
 CPU time:       11.7s
 Max memory:     4194304 KiB
 Used memory:    4194304 KiB
 Persistent:     yes
 Autostart:      disable
 Managed save:   no
 Security model: selinux
 Security DOI:   0
 Security label: system_u:system_r:svirt_t:s0:c53,c111 (enforcing)
2.set the guest's memory to 2G
# virsh setmem qcow22 2G

# virsh dominfo qcow22
 Id:             35
 Name:           qcow22
 UUID:           f0295000-dc19-481e-8379-8592c5e437f8
 OS Type:        hvm
 State:          running
 CPU(s):         1
 CPU time:       12.5s
 Max memory:     4194304 KiB
 Used memory:    2097152 KiB
 Persistent:     yes
 Autostart:      disable
 Managed save:   no
 Security model: selinux
 Security DOI:   0
 Security label: system_u:system_r:svirt_t:s0:c53,c111 (enforcing)

3. DO S3/S4 on the host, the S3 can be excuted successfully, however will fail to do S4.
# virsh dompmsuspend qcow22 --target mem
Domain qcow22 successfully suspended
# virsh list
 Id    Name                           State
----------------------------------------------------
 2     qcow22                          pmsuspended

# virsh dompmwakeup qcow222
Domain qcow22 successfully woken up

# virsh dompmsuspend qcow22 --target disk
error: Domain qcow22 could not be suspended
error: internal error: unable to execute QEMU agent command 'guest-suspend-disk': child process has failed to suspend

4.I can also hit this issue on rhel6.5

Actual Results:
Failed to do S4 after change running guest's memory from 4G to 2G
Expected results:
Succeed in doing S4 after change running guest's memory from 4G to 2G

Additional info:

Comment 2 Jiri Denemark 2014-01-08 14:05:16 UTC
This is most likely a guest issue as it seems the guest fails to suspend when half of its memory was consumed by balloon driver. I'm moving this bug to qemu for further investigation.

Comment 3 Jiri Denemark 2014-01-08 14:16:50 UTC
*** Bug 1049845 has been marked as a duplicate of this bug. ***

Comment 4 Amit Shah 2014-01-28 13:19:41 UTC
Please post guest logs.

Does S4 without ballooning work fine?

Comment 5 zhoujunqin 2014-02-07 06:40:57 UTC
Created attachment 860381 [details]
the qemu log file for guest qcow22

Comment 6 Amit Shah 2014-02-07 07:35:58 UTC
I mean the guest kernel logs, where it shows s4 failed (and may have the reason).

Also, please answer the other question in comment 4.

Comment 7 zhoujunqin 2014-02-07 07:47:21 UTC
(In reply to Amit Shah from comment #6)
> I mean the guest kernel logs, where it shows s4 failed (and may have the
> reason).
> 
> Also, please answer the other question in comment 4.

Hi Amit Shah,
sorry to reply you so late for i had taken the spring festival these days.
1)i will post guest kernel logs again.
2)For your question:Does S4 without ballooning work fine?
i have tried that we can do S4 successfully with ballooning or not,but i can't setmem successfully without ballooning.
was this the expected result? please help have a check, thanks.
# virsh setmem qcow22 2G
error: Requested operation is not valid: Unable to change memory of active domain without the balloon device and guest OS balloon driver

Comment 8 zhoujunqin 2014-02-07 08:19:21 UTC
Created attachment 860390 [details]
the kernel logs for the guest

Comment 9 Amit Shah 2014-02-07 11:28:58 UTC
(In reply to zhoujunqin from comment #8)
> Created attachment 860390 [details]
> the kernel logs for the guest

The guest log looks alright.  Looks like the guest suspended and resumed fine in this run?

(In reply to zhoujunqin from comment #7)
> 2)For your question:Does S4 without ballooning work fine?
> i have tried that we can do S4 successfully with ballooning or not,but i
> can't setmem successfully without ballooning.
> was this the expected result? please help have a check, thanks.
> # virsh setmem qcow22 2G
> error: Requested operation is not valid: Unable to change memory of active
> domain without the balloon device and guest OS balloon driver

Not sure - this looks like a libvirt error, and it indicates the balloon device or the driver might be missing.

Did you start the VM with the balloon device?

Did the guest have enough time to boot itself (i.e. all modules are loaded)?

Is the virtio-balloon module is loaded, and not blacklisted)?

Comment 10 zhoujunqin 2014-02-08 03:29:10 UTC
(In reply to Amit Shah from comment #9)
> (In reply to zhoujunqin from comment #8)
> > Created attachment 860390 [details]
> > the kernel logs for the guest
> 
> The guest log looks alright.  Looks like the guest suspended and resumed
> fine in this run?
> 
> (In reply to zhoujunqin from comment #7)
> > 2)For your question:Does S4 without ballooning work fine?
> > i have tried that we can do S4 successfully with ballooning or not,but i
> > can't setmem successfully without ballooning.
> > was this the expected result? please help have a check, thanks.
> > # virsh setmem qcow22 2G
> > error: Requested operation is not valid: Unable to change memory of active
> > domain without the balloon device and guest OS balloon driver
> 
> Not sure - this looks like a libvirt error, and it indicates the balloon
> device or the driver might be missing.
> 
> Did you start the VM with the balloon device?

NO,in this situation,i started the VM without balloon device.
# virsh dumpxml qcow22
    <memballoon model='none'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
> 
> Did the guest have enough time to boot itself (i.e. all modules are loaded)?
> 
> Is the virtio-balloon module is loaded, and not blacklisted)?
After the guest boot up fully,i checked:
# lsmod |grep balloon

virtio-balloon module is not loaded,so if we start a guest without balloon device,and we try to setmem of the guest,we will meet such problem,thanks?

Comment 11 Amit Shah 2014-02-10 05:50:34 UTC
(In reply to zhoujunqin from comment #10)
> (In reply to Amit Shah from comment #9)
> > (In reply to zhoujunqin from comment #8)
> > > Created attachment 860390 [details]
> > > the kernel logs for the guest
> > 
> > The guest log looks alright.  Looks like the guest suspended and resumed
> > fine in this run?
> > 
> > (In reply to zhoujunqin from comment #7)
> > > 2)For your question:Does S4 without ballooning work fine?
> > > i have tried that we can do S4 successfully with ballooning or not,but i
> > > can't setmem successfully without ballooning.
> > > was this the expected result? please help have a check, thanks.
> > > # virsh setmem qcow22 2G
> > > error: Requested operation is not valid: Unable to change memory of active
> > > domain without the balloon device and guest OS balloon driver
> > 
> > Not sure - this looks like a libvirt error, and it indicates the balloon
> > device or the driver might be missing.
> > 
> > Did you start the VM with the balloon device?
> 
> NO,in this situation,i started the VM without balloon device.
> # virsh dumpxml qcow22
>     <memballoon model='none'>
>       <alias name='balloon0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
> function='0x0'/>
>     </memballoon>
> > 
> > Did the guest have enough time to boot itself (i.e. all modules are loaded)?
> > 
> > Is the virtio-balloon module is loaded, and not blacklisted)?
> After the guest boot up fully,i checked:
> # lsmod |grep balloon
> 
> virtio-balloon module is not loaded,so if we start a guest without balloon
> device,and we try to setmem of the guest,we will meet such problem,thanks?

Without the balloon device, the setmem command will definitely not work.

My intention in asking this question was whether S4 works fine without the balloon device, or without ballooning (meaning start with balloon device, but don't issue any setmem command).  In this scenario, does S4 work fine?  And in the scenario of issuing setmem before S4, does S4 not work at all?

In case S4 does not work, attach the guest kernel logs here.

Comment 12 zhoujunqin 2014-02-10 06:53:41 UTC
(In reply to Amit Shah from comment #11)
> (In reply to zhoujunqin from comment #10)
> > (In reply to Amit Shah from comment #9)
> > > (In reply to zhoujunqin from comment #8)
> > > > Created attachment 860390 [details]
> > > > the kernel logs for the guest
> > > 
> > > The guest log looks alright.  Looks like the guest suspended and resumed
> > > fine in this run?
> > > 
> > > (In reply to zhoujunqin from comment #7)
> > > > 2)For your question:Does S4 without ballooning work fine?
> > > > i have tried that we can do S4 successfully with ballooning or not,but i
> > > > can't setmem successfully without ballooning.
> > > > was this the expected result? please help have a check, thanks.
> > > > # virsh setmem qcow22 2G
> > > > error: Requested operation is not valid: Unable to change memory of active
> > > > domain without the balloon device and guest OS balloon driver
> > > 
> > > Not sure - this looks like a libvirt error, and it indicates the balloon
> > > device or the driver might be missing.
> > > 
> > > Did you start the VM with the balloon device?
> > 
> > NO,in this situation,i started the VM without balloon device.
> > # virsh dumpxml qcow22
> >     <memballoon model='none'>
> >       <alias name='balloon0'/>
> >       <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
> > function='0x0'/>
> >     </memballoon>
> > > 
> > > Did the guest have enough time to boot itself (i.e. all modules are loaded)?
> > > 
> > > Is the virtio-balloon module is loaded, and not blacklisted)?
> > After the guest boot up fully,i checked:
> > # lsmod |grep balloon
> > 
> > virtio-balloon module is not loaded,so if we start a guest without balloon
> > device,and we try to setmem of the guest,we will meet such problem,thanks?
> 
> Without the balloon device, the setmem command will definitely not work.
> 
> My intention in asking this question was whether S4 works fine without the
> balloon device, or without ballooning (meaning start with balloon device,
> but don't issue any setmem command).  In this scenario, does S4 work fine? 
> And in the scenario of issuing setmem before S4, does S4 not work at all?
> 
> In case S4 does not work, attach the guest kernel logs here.

To answer your question from two parts:
1)S4 works fine without the balloon device.(4G)
# virsh dumpxml qtest1 |grep balloon
    <memballoon model='none'>
      <alias name='balloon0'/>
    </memballoon>
# virsh dompmsuspend qtest1 --target disk 
Domain qtest1 successfully suspended
2)S4 works fine with the balloon device before setmem(from 4G to 2G)
# virsh dumpxml qtest1 |grep balloon
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
# virsh dompmsuspend qtest1 --target disk 
Domain qtest1 successfully suspended

If you also need more infomation,ask me,thanks.

Comment 13 Amit Shah 2014-02-10 07:24:05 UTC
OK so please get me guest kernel logs when you issue setmem before S4 and when the suspend fails to work.

Comment 14 zhoujunqin 2014-02-10 08:11:28 UTC
Created attachment 861275 [details]
kernel log for guest qtest1

Comment 15 Amit Shah 2014-02-10 08:37:58 UTC
Nothing looks wrong, this looks like the guest kernel is running out of memory to create the S4 image.

Please try a lower setmem value (200M), or a bigger guest (8G instead of 4G).

Comment 16 zhoujunqin 2014-02-10 09:31:21 UTC
(In reply to Amit Shah from comment #15)
> Nothing looks wrong, this looks like the guest kernel is running out of
> memory to create the S4 image.
> 
> Please try a lower setmem value (200M), or a bigger guest (8G instead of 4G).

I have tried in your ways,but failed.
1)setmem value (200M):
# virsh setmem qtest1 200M
# virsh dominfo qtest1
Id:             9
Name:           qtest1
UUID:           9b76e092-3941-4e36-abf9-ed6817b467dd
OS Type:        hvm
State:          running
CPU(s):         1
CPU time:       36.5s
Max memory:     7791616 KiB
Used memory:    560492 KiB
Persistent:     yes
Autostart:      disable
Managed save:   no
Security model: selinux
Security DOI:   0
Security label: system_u:system_r:svirt_t:s0:c261,c353 (enforcing)
The guest will reboot and stopped as picture1.
[root@zjqm ~]# virsh dompmsuspend qtest1 --target disk 
error: Domain qtest1 could not be suspended
error: Guest agent is not responding: Guest agent not available for now
2)a bigger guest (8G instead of 4G)
# virsh dominfo qtest1
Id:             8
Name:           qtest1
UUID:           9b76e092-3941-4e36-abf9-ed6817b467dd
OS Type:        hvm
State:          running
CPU(s):         1
CPU time:       12.0s
Max memory:     7791616 KiB
Used memory:    7791616 KiB
Persistent:     yes
Autostart:      disable
Managed save:   no
Security model: selinux
Security DOI:   0
Security label: system_u:system_r:svirt_t:s0:c319,c755 (enforcing)

# virsh dompmsuspend qtest1 --target disk 
Domain qtest1 successfully suspended

# virsh start qtest1 
Domain qtest1 started

# virsh setmem qtest1 2G

# virsh dompmsuspend qtest1 --target disk 
error: Domain qtest1 could not be suspended
error: internal error: unable to execute QEMU agent command 'guest-suspend-disk': child process has failed to suspend
and this kernel log i put it in next comment.

Comment 17 zhoujunqin 2014-02-10 09:32:05 UTC
Created attachment 861304 [details]
kernel log for guest (8G)

Comment 18 zhoujunqin 2014-02-10 09:35:58 UTC
Created attachment 861305 [details]
kernel log for guest (200M)

Comment 20 Luiz Capitulino 2014-08-26 18:58:46 UTC
Amit,

I've debugged this issue and would like to discuss with you what's the most appropriate resolution for it.

What's happening is that, as the guest was ballooned down, the kernel doesn't have enough memory left to allocate for the hibernation image. So, memory allocation just fails when trying to hibernate and the process is aborted.

This might seem like obvious and expected, given that the guest kernel is low on memory. However, the guest balloon driver does have support for automatically releasing ballooned memory on suspend/hibernation. This is indeed what happens when suspend/hibernate is successful: the balloon driver releases *all* the ballooned memory, this is done by virtballoon_freeze(). This action is not taken when the hibernation fails because virtballoon_freeze() is only called *after* the PM subsystem has allocated memory for the hibernation image. If this fails, which is the case here, then hibernation is aborted before virtballoon_freeze() runs.

So, why can't we add support to virtio for the Linux PM's _prepare_ callback? This callback is called really early during suspend/hibernation and we have a chance to empty the balloon before the hibernation image is allocated. I believe this would fix this problem. What do you think?

Comment 23 Amit Shah 2014-08-27 06:16:08 UTC
(In reply to Luiz Capitulino from comment #20)
> Amit,
> 
> I've debugged this issue and would like to discuss with you what's the most
> appropriate resolution for it.
> 
> What's happening is that, as the guest was ballooned down, the kernel
> doesn't have enough memory left to allocate for the hibernation image. So,
> memory allocation just fails when trying to hibernate and the process is
> aborted.
> 
> This might seem like obvious and expected, given that the guest kernel is
> low on memory. However, the guest balloon driver does have support for
> automatically releasing ballooned memory on suspend/hibernation. This is
> indeed what happens when suspend/hibernate is successful: the balloon driver
> releases *all* the ballooned memory, this is done by virtballoon_freeze().
> This action is not taken when the hibernation fails because
> virtballoon_freeze() is only called *after* the PM subsystem has allocated
> memory for the hibernation image. If this fails, which is the case here,
> then hibernation is aborted before virtballoon_freeze() runs.
> 
> So, why can't we add support to virtio for the Linux PM's _prepare_
> callback? This callback is called really early during suspend/hibernation
> and we have a chance to empty the balloon before the hibernation image is
> allocated. I believe this would fix this problem. What do you think?

Hm; kernel/power/hibernate.c:hibernation_snapshot() doesn't indicate there's much difference between a ->prepare() and ->freeze().

I did look at these differences when I did the initial support, and didn't find a way to get the balloon to free memory before the allocation of the hibernation image.

We need an ->early() callback, or better still, a hibernation notifier which tells us the system is about to go into hibernation.  At the time when I wrote the initial patches, the notifiers were platform-dependent, and I chose to not extend them then, and go ahead with the patches in the current state as they address the most common cases.

This bug could be taken as an RFE for a new callback that's executed before the allocation of the hibernation image.  However, there are some caveats into implementing them, but it's quite possible.

Comment 24 Luiz Capitulino 2014-08-27 13:34:20 UTC
Agreed. I'm willing to implement it, although not right now.

I have a few more questions, they are just for my own education and to help me implement this feature. Hope they are not stupid:

1. Why do we empty the balloon before S3/S4 today? Is it that we can't maintain some state between sleep/hibernation and resume?

2. On resume, virtballoon_restore() checks if there's any balloon operation "pending" from the hypervisor. Shouldn't it restore the balloon to its state prior to sleeping/hibernating instead? For example, today if you have 2560 pages in the balloon and you go to sleep, virtballoon_freeze() will release those pages before sleeping. When the guest resumes, the balloon will be empty (unless the guest sends a new balloon operation). Shouldn't virtballoon_restore() restore the balloon back to 2560 pages?

Thanks for your answers Amit!

Comment 25 Amit Shah 2014-09-01 05:47:15 UTC
(In reply to Luiz Capitulino from comment #24)
> Agreed. I'm willing to implement it, although not right now.
> 
> I have a few more questions, they are just for my own education and to help
> me implement this feature. Hope they are not stupid:

Of course not!

> 1. Why do we empty the balloon before S3/S4 today? Is it that we can't
> maintain some state between sleep/hibernation and resume?

For s4 / hibernation: qemu quits when s4 is successful.  Upon next start, the balloon value may or may not be preserved from before s4.  If there is a mismatch between the two balloon values, there'll be problems.  ie libvirt will have to maintain information of the prev balloon value and whether a guest went into s4.

For s3, now that you ask, I think it's not necessary to empty the balloon.  In fact, it may even be helpful for the host, since a sleeping guest isn't going to use any memory, and the host can use some of it.

> 2. On resume, virtballoon_restore() checks if there's any balloon operation
> "pending" from the hypervisor. Shouldn't it restore the balloon to its state
> prior to sleeping/hibernating instead? For example, today if you have 2560
> pages in the balloon and you go to sleep, virtballoon_freeze() will release
> those pages before sleeping. When the guest resumes, the balloon will be
> empty (unless the guest sends a new balloon operation). Shouldn't
> virtballoon_restore() restore the balloon back to 2560 pages?

The state shouldn't be restored in case the host changed the balloon allocation while the guest was sleeping.  This is most obvious in the S4 case: qemu quits, libvirt starts guest with a different balloon size.  The guest, on resume, should use the new value.

Comment 27 Luiz Capitulino 2015-07-13 20:48:48 UTC
New feature, moving to 7.3.

Comment 28 Luiz Capitulino 2015-12-11 16:34:02 UTC
I re-discussed this BZ with Amit and it turns out that we don't support S3/S4 for RHEL7 and there are no plans to support them for RHEL8.

So, giving that this is more like a feature for good S3/S4 support, I'll just close it as WONTFIX.


Note You need to log in before you can comment on or make changes to this bug.