Bug 1323988
Summary: | Ballooning doesn't work on power with 4k guests | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dr. David Alan Gilbert <dgilbert> | ||||||||
Component: | qemu-kvm-rhev | Assignee: | Thomas Huth <thuth> | ||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 7.3 | CC: | chayang, dgibson, hannsj_uhl, juzhang, knoel, mdeng, michen, qzhang, thuth, virt-maint | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | ppc64le | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1324092 (view as bug list) | Environment: | |||||||||
Last Closed: | 2016-04-29 14:43:49 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1308609, 1359843 | ||||||||||
Attachments: |
|
Description
Dr. David Alan Gilbert
2016-04-05 09:43:47 UTC
Drat. I thought I'd fixed the balloon on Power way back at IBM, but looks like that was just fixing it to the point of not blowing up when trying to use it - not actually balloooning usefully. So... the virtio-balloon is kinda broken by design, but we'll have to do what we can with what we have for now (virtio standardization efforts apparently have a new, better balloon design). IIRC, the balloon is described by spec as working in 4kiB chunks, so TARGET_PAGE_SIZE is Just Plain Wrong on that front. I think what we need to do is to batch contiguous 4kiB chunks listed by the guest until they make a whole host page, then do the madvise(). (In reply to David Gibson from comment #3) > Drat. I thought I'd fixed the balloon on Power way back at IBM, but looks > like that was just fixing it to the point of not blowing up when trying to > use it - not actually balloooning usefully. > > So... the virtio-balloon is kinda broken by design, but we'll have to do > what we can with what we have for now (virtio standardization efforts > apparently have a new, better balloon design). > > IIRC, the balloon is described by spec as working in 4kiB chunks, so > TARGET_PAGE_SIZE is Just Plain Wrong on that front. > > I think what we need to do is to batch contiguous 4kiB chunks listed by the > guest until they make a whole host page, then do the madvise(). Yes, you'd have to be careful about anything that doesn't start on a host page or anything like that. I'm not sure if there's anything that the guest could know to help it only make sane inflation requests. I've now tested the balloon on our P8 server, and at a first glance it seems to be working - I can see the amount of free memory in the host going up when I decrease the memory of the guest via the balloon. However, after looking at the code of QEMU and the kernel a little bit closer, it is clear that this is not working as expected *and might even cause memory corruption in some cases*: - The madvise syscall in the kernel rounds up the length parameter to a multiple of the host PAGE_SIZE = 65536 (see mm/madvise.c): len = (len_in + ~PAGE_MASK) & PAGE_MASK; - The madvise syscall in the kernel returns with an error if the address is not aligned to the host PAGE_SIZE. So for the very first 4k chunk of the 64k page, the madvise() succeeds, but for all other chunks, the call is rejected. Meaning that if QEMU tries to free the whole 64k page, it is accidentially working right. But if it only tries to free the 4k chunks that are not aligned to 64k, then it silently fails. *And if it only tries to free the first 4k chunk without all the others, then even guest memory corruption might occur!* We definitely need to fix this... Ah! So it's kind of the reverse of what we first thought. In fact in the common configuration (64kiB pages on both host and guest) it will work (by accident). But if you have a 4kiB page guest on a 64kiB page host you can get data corruption. Nasty. Same solution, AFAICT, though. Hi developers, As we all know it supports 64kib guest on ppc host.Is it possible for QE to create 4 kiB guest on a ppc host ? If there is a way please tell us.Thanks in advance. Min Min, Yes, it's possible to create a guest using 4kiB pages, but it will require building a custom kernel. AFAIK all current distributions supporting Power use 64kiB pages by default. (In reply to David Gibson from comment #8) > Min, > > Yes, it's possible to create a guest using 4kiB pages, but it will require > building a custom kernel. AFAIK all current distributions supporting Power > use 64kiB pages by default. Could you please help to build such a custom build ? Thanks a lot. Min (In reply to dengmin from comment #9) > Could you please help to build such a custom build ? Thanks a lot. I can try to help here. FWIW, I already tried to compile an upstream kernel with 4k pages, but I got an error while trying to compile it... I'll have a try with a downstream kernel next, I'll then let you know the results. Suggested patch upstream: https://patchwork.ozlabs.org/patch/609982/ Created attachment 1147048 [details]
KernelPanic
Failure on my guest.
(In reply to dengmin from comment #14) > Failure on my guest. Looking at that screenshot, the only thing I could imagine is, that you accidentially tried to install my 4k kernel on a little endian guest. Could you please verify that you're guest is a big endian installation, not a little endian one? Decreased the priority/severity a little bit since the problem only occurs with 4k guests on 64k hosts - and 4k (server) guests are rather hard to find in the wild nowadays. Created attachment 1147105 [details]
screenshotfor64kitB
Created attachment 1147106 [details]
64kib
I'm closing this ticket for PPC, since there are no 4k guests available in the wild (all major distros seem to use 64k page size guests nowadays), and we thus don't have a real problem here. We still can track the issue itself in BZ 1324092 (but we certainly do not need two tickets to track this). *** This bug has been marked as a duplicate of bug 1324092 *** |