Bug 1049844
Summary: | [RFE] balloon: empty large balloon before hibernating | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | zhoujunqin <juzhou> | ||||||||||||
Component: | qemu-kvm | Assignee: | Luiz Capitulino <lcapitulino> | ||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 7.0 | CC: | amit.shah, dyuan, hhuang, jdenemar, juzhang, juzhou, knoel, mzhan, rbalakri, virt-maint, zhwang | ||||||||||||
Target Milestone: | rc | Keywords: | FutureFeature | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Enhancement | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 1049845 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2015-12-11 16:34:02 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 923626, 1049845 | ||||||||||||||
Attachments: |
|
Description
zhoujunqin
2014-01-08 10:55:05 UTC
This is most likely a guest issue as it seems the guest fails to suspend when half of its memory was consumed by balloon driver. I'm moving this bug to qemu for further investigation. *** Bug 1049845 has been marked as a duplicate of this bug. *** Please post guest logs. Does S4 without ballooning work fine? Created attachment 860381 [details]
the qemu log file for guest qcow22
I mean the guest kernel logs, where it shows s4 failed (and may have the reason). Also, please answer the other question in comment 4. (In reply to Amit Shah from comment #6) > I mean the guest kernel logs, where it shows s4 failed (and may have the > reason). > > Also, please answer the other question in comment 4. Hi Amit Shah, sorry to reply you so late for i had taken the spring festival these days. 1)i will post guest kernel logs again. 2)For your question:Does S4 without ballooning work fine? i have tried that we can do S4 successfully with ballooning or not,but i can't setmem successfully without ballooning. was this the expected result? please help have a check, thanks. # virsh setmem qcow22 2G error: Requested operation is not valid: Unable to change memory of active domain without the balloon device and guest OS balloon driver Created attachment 860390 [details]
the kernel logs for the guest
(In reply to zhoujunqin from comment #8) > Created attachment 860390 [details] > the kernel logs for the guest The guest log looks alright. Looks like the guest suspended and resumed fine in this run? (In reply to zhoujunqin from comment #7) > 2)For your question:Does S4 without ballooning work fine? > i have tried that we can do S4 successfully with ballooning or not,but i > can't setmem successfully without ballooning. > was this the expected result? please help have a check, thanks. > # virsh setmem qcow22 2G > error: Requested operation is not valid: Unable to change memory of active > domain without the balloon device and guest OS balloon driver Not sure - this looks like a libvirt error, and it indicates the balloon device or the driver might be missing. Did you start the VM with the balloon device? Did the guest have enough time to boot itself (i.e. all modules are loaded)? Is the virtio-balloon module is loaded, and not blacklisted)? (In reply to Amit Shah from comment #9) > (In reply to zhoujunqin from comment #8) > > Created attachment 860390 [details] > > the kernel logs for the guest > > The guest log looks alright. Looks like the guest suspended and resumed > fine in this run? > > (In reply to zhoujunqin from comment #7) > > 2)For your question:Does S4 without ballooning work fine? > > i have tried that we can do S4 successfully with ballooning or not,but i > > can't setmem successfully without ballooning. > > was this the expected result? please help have a check, thanks. > > # virsh setmem qcow22 2G > > error: Requested operation is not valid: Unable to change memory of active > > domain without the balloon device and guest OS balloon driver > > Not sure - this looks like a libvirt error, and it indicates the balloon > device or the driver might be missing. > > Did you start the VM with the balloon device? NO,in this situation,i started the VM without balloon device. # virsh dumpxml qcow22 <memballoon model='none'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> > > Did the guest have enough time to boot itself (i.e. all modules are loaded)? > > Is the virtio-balloon module is loaded, and not blacklisted)? After the guest boot up fully,i checked: # lsmod |grep balloon virtio-balloon module is not loaded,so if we start a guest without balloon device,and we try to setmem of the guest,we will meet such problem,thanks? (In reply to zhoujunqin from comment #10) > (In reply to Amit Shah from comment #9) > > (In reply to zhoujunqin from comment #8) > > > Created attachment 860390 [details] > > > the kernel logs for the guest > > > > The guest log looks alright. Looks like the guest suspended and resumed > > fine in this run? > > > > (In reply to zhoujunqin from comment #7) > > > 2)For your question:Does S4 without ballooning work fine? > > > i have tried that we can do S4 successfully with ballooning or not,but i > > > can't setmem successfully without ballooning. > > > was this the expected result? please help have a check, thanks. > > > # virsh setmem qcow22 2G > > > error: Requested operation is not valid: Unable to change memory of active > > > domain without the balloon device and guest OS balloon driver > > > > Not sure - this looks like a libvirt error, and it indicates the balloon > > device or the driver might be missing. > > > > Did you start the VM with the balloon device? > > NO,in this situation,i started the VM without balloon device. > # virsh dumpxml qcow22 > <memballoon model='none'> > <alias name='balloon0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' > function='0x0'/> > </memballoon> > > > > Did the guest have enough time to boot itself (i.e. all modules are loaded)? > > > > Is the virtio-balloon module is loaded, and not blacklisted)? > After the guest boot up fully,i checked: > # lsmod |grep balloon > > virtio-balloon module is not loaded,so if we start a guest without balloon > device,and we try to setmem of the guest,we will meet such problem,thanks? Without the balloon device, the setmem command will definitely not work. My intention in asking this question was whether S4 works fine without the balloon device, or without ballooning (meaning start with balloon device, but don't issue any setmem command). In this scenario, does S4 work fine? And in the scenario of issuing setmem before S4, does S4 not work at all? In case S4 does not work, attach the guest kernel logs here. (In reply to Amit Shah from comment #11) > (In reply to zhoujunqin from comment #10) > > (In reply to Amit Shah from comment #9) > > > (In reply to zhoujunqin from comment #8) > > > > Created attachment 860390 [details] > > > > the kernel logs for the guest > > > > > > The guest log looks alright. Looks like the guest suspended and resumed > > > fine in this run? > > > > > > (In reply to zhoujunqin from comment #7) > > > > 2)For your question:Does S4 without ballooning work fine? > > > > i have tried that we can do S4 successfully with ballooning or not,but i > > > > can't setmem successfully without ballooning. > > > > was this the expected result? please help have a check, thanks. > > > > # virsh setmem qcow22 2G > > > > error: Requested operation is not valid: Unable to change memory of active > > > > domain without the balloon device and guest OS balloon driver > > > > > > Not sure - this looks like a libvirt error, and it indicates the balloon > > > device or the driver might be missing. > > > > > > Did you start the VM with the balloon device? > > > > NO,in this situation,i started the VM without balloon device. > > # virsh dumpxml qcow22 > > <memballoon model='none'> > > <alias name='balloon0'/> > > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' > > function='0x0'/> > > </memballoon> > > > > > > Did the guest have enough time to boot itself (i.e. all modules are loaded)? > > > > > > Is the virtio-balloon module is loaded, and not blacklisted)? > > After the guest boot up fully,i checked: > > # lsmod |grep balloon > > > > virtio-balloon module is not loaded,so if we start a guest without balloon > > device,and we try to setmem of the guest,we will meet such problem,thanks? > > Without the balloon device, the setmem command will definitely not work. > > My intention in asking this question was whether S4 works fine without the > balloon device, or without ballooning (meaning start with balloon device, > but don't issue any setmem command). In this scenario, does S4 work fine? > And in the scenario of issuing setmem before S4, does S4 not work at all? > > In case S4 does not work, attach the guest kernel logs here. To answer your question from two parts: 1)S4 works fine without the balloon device.(4G) # virsh dumpxml qtest1 |grep balloon <memballoon model='none'> <alias name='balloon0'/> </memballoon> # virsh dompmsuspend qtest1 --target disk Domain qtest1 successfully suspended 2)S4 works fine with the balloon device before setmem(from 4G to 2G) # virsh dumpxml qtest1 |grep balloon <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> # virsh dompmsuspend qtest1 --target disk Domain qtest1 successfully suspended If you also need more infomation,ask me,thanks. OK so please get me guest kernel logs when you issue setmem before S4 and when the suspend fails to work. Created attachment 861275 [details]
kernel log for guest qtest1
Nothing looks wrong, this looks like the guest kernel is running out of memory to create the S4 image. Please try a lower setmem value (200M), or a bigger guest (8G instead of 4G). (In reply to Amit Shah from comment #15) > Nothing looks wrong, this looks like the guest kernel is running out of > memory to create the S4 image. > > Please try a lower setmem value (200M), or a bigger guest (8G instead of 4G). I have tried in your ways,but failed. 1)setmem value (200M): # virsh setmem qtest1 200M # virsh dominfo qtest1 Id: 9 Name: qtest1 UUID: 9b76e092-3941-4e36-abf9-ed6817b467dd OS Type: hvm State: running CPU(s): 1 CPU time: 36.5s Max memory: 7791616 KiB Used memory: 560492 KiB Persistent: yes Autostart: disable Managed save: no Security model: selinux Security DOI: 0 Security label: system_u:system_r:svirt_t:s0:c261,c353 (enforcing) The guest will reboot and stopped as picture1. [root@zjqm ~]# virsh dompmsuspend qtest1 --target disk error: Domain qtest1 could not be suspended error: Guest agent is not responding: Guest agent not available for now 2)a bigger guest (8G instead of 4G) # virsh dominfo qtest1 Id: 8 Name: qtest1 UUID: 9b76e092-3941-4e36-abf9-ed6817b467dd OS Type: hvm State: running CPU(s): 1 CPU time: 12.0s Max memory: 7791616 KiB Used memory: 7791616 KiB Persistent: yes Autostart: disable Managed save: no Security model: selinux Security DOI: 0 Security label: system_u:system_r:svirt_t:s0:c319,c755 (enforcing) # virsh dompmsuspend qtest1 --target disk Domain qtest1 successfully suspended # virsh start qtest1 Domain qtest1 started # virsh setmem qtest1 2G # virsh dompmsuspend qtest1 --target disk error: Domain qtest1 could not be suspended error: internal error: unable to execute QEMU agent command 'guest-suspend-disk': child process has failed to suspend and this kernel log i put it in next comment. Created attachment 861304 [details]
kernel log for guest (8G)
Created attachment 861305 [details]
kernel log for guest (200M)
Amit, I've debugged this issue and would like to discuss with you what's the most appropriate resolution for it. What's happening is that, as the guest was ballooned down, the kernel doesn't have enough memory left to allocate for the hibernation image. So, memory allocation just fails when trying to hibernate and the process is aborted. This might seem like obvious and expected, given that the guest kernel is low on memory. However, the guest balloon driver does have support for automatically releasing ballooned memory on suspend/hibernation. This is indeed what happens when suspend/hibernate is successful: the balloon driver releases *all* the ballooned memory, this is done by virtballoon_freeze(). This action is not taken when the hibernation fails because virtballoon_freeze() is only called *after* the PM subsystem has allocated memory for the hibernation image. If this fails, which is the case here, then hibernation is aborted before virtballoon_freeze() runs. So, why can't we add support to virtio for the Linux PM's _prepare_ callback? This callback is called really early during suspend/hibernation and we have a chance to empty the balloon before the hibernation image is allocated. I believe this would fix this problem. What do you think? (In reply to Luiz Capitulino from comment #20) > Amit, > > I've debugged this issue and would like to discuss with you what's the most > appropriate resolution for it. > > What's happening is that, as the guest was ballooned down, the kernel > doesn't have enough memory left to allocate for the hibernation image. So, > memory allocation just fails when trying to hibernate and the process is > aborted. > > This might seem like obvious and expected, given that the guest kernel is > low on memory. However, the guest balloon driver does have support for > automatically releasing ballooned memory on suspend/hibernation. This is > indeed what happens when suspend/hibernate is successful: the balloon driver > releases *all* the ballooned memory, this is done by virtballoon_freeze(). > This action is not taken when the hibernation fails because > virtballoon_freeze() is only called *after* the PM subsystem has allocated > memory for the hibernation image. If this fails, which is the case here, > then hibernation is aborted before virtballoon_freeze() runs. > > So, why can't we add support to virtio for the Linux PM's _prepare_ > callback? This callback is called really early during suspend/hibernation > and we have a chance to empty the balloon before the hibernation image is > allocated. I believe this would fix this problem. What do you think? Hm; kernel/power/hibernate.c:hibernation_snapshot() doesn't indicate there's much difference between a ->prepare() and ->freeze(). I did look at these differences when I did the initial support, and didn't find a way to get the balloon to free memory before the allocation of the hibernation image. We need an ->early() callback, or better still, a hibernation notifier which tells us the system is about to go into hibernation. At the time when I wrote the initial patches, the notifiers were platform-dependent, and I chose to not extend them then, and go ahead with the patches in the current state as they address the most common cases. This bug could be taken as an RFE for a new callback that's executed before the allocation of the hibernation image. However, there are some caveats into implementing them, but it's quite possible. Agreed. I'm willing to implement it, although not right now. I have a few more questions, they are just for my own education and to help me implement this feature. Hope they are not stupid: 1. Why do we empty the balloon before S3/S4 today? Is it that we can't maintain some state between sleep/hibernation and resume? 2. On resume, virtballoon_restore() checks if there's any balloon operation "pending" from the hypervisor. Shouldn't it restore the balloon to its state prior to sleeping/hibernating instead? For example, today if you have 2560 pages in the balloon and you go to sleep, virtballoon_freeze() will release those pages before sleeping. When the guest resumes, the balloon will be empty (unless the guest sends a new balloon operation). Shouldn't virtballoon_restore() restore the balloon back to 2560 pages? Thanks for your answers Amit! (In reply to Luiz Capitulino from comment #24) > Agreed. I'm willing to implement it, although not right now. > > I have a few more questions, they are just for my own education and to help > me implement this feature. Hope they are not stupid: Of course not! > 1. Why do we empty the balloon before S3/S4 today? Is it that we can't > maintain some state between sleep/hibernation and resume? For s4 / hibernation: qemu quits when s4 is successful. Upon next start, the balloon value may or may not be preserved from before s4. If there is a mismatch between the two balloon values, there'll be problems. ie libvirt will have to maintain information of the prev balloon value and whether a guest went into s4. For s3, now that you ask, I think it's not necessary to empty the balloon. In fact, it may even be helpful for the host, since a sleeping guest isn't going to use any memory, and the host can use some of it. > 2. On resume, virtballoon_restore() checks if there's any balloon operation > "pending" from the hypervisor. Shouldn't it restore the balloon to its state > prior to sleeping/hibernating instead? For example, today if you have 2560 > pages in the balloon and you go to sleep, virtballoon_freeze() will release > those pages before sleeping. When the guest resumes, the balloon will be > empty (unless the guest sends a new balloon operation). Shouldn't > virtballoon_restore() restore the balloon back to 2560 pages? The state shouldn't be restored in case the host changed the balloon allocation while the guest was sleeping. This is most obvious in the S4 case: qemu quits, libvirt starts guest with a different balloon size. The guest, on resume, should use the new value. New feature, moving to 7.3. I re-discussed this BZ with Amit and it turns out that we don't support S3/S4 for RHEL7 and there are no plans to support them for RHEL8. So, giving that this is more like a feature for good S3/S4 support, I'll just close it as WONTFIX. |