RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1323988 - Ballooning doesn't work on power with 4k guests
Summary: Ballooning doesn't work on power with 4k guests
Keywords:
Status: CLOSED DUPLICATE of bug 1324092
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: ppc64le
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Thomas Huth
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: RHEV4.0PPC RHV4.1PPC
TreeView+ depends on / blocked
 
Reported: 2016-04-05 09:43 UTC by Dr. David Alan Gilbert
Modified: 2016-07-25 14:18 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1324092 (view as bug list)
Environment:
Last Closed: 2016-04-29 14:43:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
KernelPanic (32.00 KB, image/png)
2016-04-14 06:31 UTC, Min Deng
no flags Details
screenshotfor64kitB (332.94 KB, image/png)
2016-04-14 09:59 UTC, Min Deng
no flags Details
64kib (332.94 KB, image/png)
2016-04-14 09:59 UTC, Min Deng
no flags Details

Description Dr. David Alan Gilbert 2016-04-05 09:43:47 UTC
Description of problem:
This is based on an observation looking at the ballooning code.
(and discussion with Amit and Laurent)

the 'balloon_page' code does an:

       qemu_madvise(addr, TARGET_PAGE_SIZE,
               deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);

thus if the host page size is larger than TARGET_PAGE_SIZE (which I think is the case when Power is configured for 16 or 64k pages) that qemu_madvise should always fail and never actually discard any memory from the host.
So ballooning will appear to work but you just never get any RAM back on the host.

Version-Release number of selected component (if applicable):
2.6 and older

How reproducible:
theoretically 100%

Steps to Reproduce:
1. Create a large VM using lots of RAM
2. Use that RAM in the guest by something that dirties a lot of it
3. Record the amount of host RAM used
4. Inflate the balloon in the guest to free most of the RAM
5. Record the amount of host RAM used

Actual results:
From looking at the code I reckon 3 & 5 will be similar and will not reflect the ballooned RAM.

Expected results:
The host RAM in (5) should decrease by an amount similar to the amount of ballooned memory.

Additional info:

Comment 3 David Gibson 2016-04-06 01:11:45 UTC
Drat.  I thought I'd fixed the balloon on Power way back at IBM, but looks like that was just fixing it to the point of not blowing up when trying to use it - not actually balloooning usefully.

So... the virtio-balloon is kinda broken by design, but we'll have to do what we can with what we have for now (virtio standardization efforts apparently have a new, better balloon design).

IIRC, the balloon is described by spec as working in 4kiB chunks, so TARGET_PAGE_SIZE is Just Plain Wrong on that front.

I think what we need to do is to batch contiguous 4kiB chunks listed by the guest until they make a whole host page, then do the madvise().

Comment 4 Dr. David Alan Gilbert 2016-04-06 09:59:22 UTC
(In reply to David Gibson from comment #3)
> Drat.  I thought I'd fixed the balloon on Power way back at IBM, but looks
> like that was just fixing it to the point of not blowing up when trying to
> use it - not actually balloooning usefully.
> 
> So... the virtio-balloon is kinda broken by design, but we'll have to do
> what we can with what we have for now (virtio standardization efforts
> apparently have a new, better balloon design).
> 
> IIRC, the balloon is described by spec as working in 4kiB chunks, so
> TARGET_PAGE_SIZE is Just Plain Wrong on that front.
> 
> I think what we need to do is to batch contiguous 4kiB chunks listed by the
> guest until they make a whole host page, then do the madvise().

Yes, you'd have to be careful about anything that doesn't start on a host page or anything like that.
I'm not sure if there's anything that the guest could know to help it only make sane inflation requests.

Comment 5 Thomas Huth 2016-04-11 18:09:16 UTC
I've now tested the balloon on our P8 server, and at a first glance it seems to be working - I can see the amount of free memory in the host going up when I decrease the memory of the guest via the balloon.
However, after looking at the code of QEMU and the kernel a little bit closer, it is clear that this is not working as expected *and might even cause memory corruption in some cases*:

- The madvise syscall in the kernel rounds up the length parameter to a multiple of the host PAGE_SIZE = 65536 (see mm/madvise.c):

	len = (len_in + ~PAGE_MASK) & PAGE_MASK;

- The madvise syscall in the kernel returns with an error if the address is not aligned to the host PAGE_SIZE.

So for the very first 4k chunk of the 64k page, the madvise() succeeds, but for all other chunks, the call is rejected. Meaning that if QEMU tries to free the whole 64k page, it is accidentially working right. But if it only tries to free the 4k chunks that are not aligned to 64k, then it silently fails. *And if it only tries to free the first 4k chunk without all the others, then even guest memory corruption might occur!*

We definitely need to fix this...

Comment 6 David Gibson 2016-04-12 01:28:18 UTC
Ah!  So it's kind of the reverse of what we first thought.  In fact in the common configuration (64kiB pages on both host and guest) it will work (by accident).  But if you have a 4kiB page guest on a 64kiB page host you can get data corruption.  Nasty.

Same solution, AFAICT, though.

Comment 7 Min Deng 2016-04-12 07:00:25 UTC
Hi developers,
   As we all know it supports 64kib guest on ppc host.Is it possible for QE to create 4 kiB guest on a ppc host ? If there is a way please tell us.Thanks in advance.

Min

Comment 8 David Gibson 2016-04-13 05:40:29 UTC
Min,

Yes, it's possible to create a guest using 4kiB pages, but it will require building a custom kernel.  AFAIK all current distributions supporting Power use 64kiB pages by default.

Comment 9 Min Deng 2016-04-13 06:09:03 UTC
(In reply to David Gibson from comment #8)
> Min,
> 
> Yes, it's possible to create a guest using 4kiB pages, but it will require
> building a custom kernel.  AFAIK all current distributions supporting Power
> use 64kiB pages by default.

 Could you please help to build such a custom build ? Thanks a lot.
Min

Comment 10 Thomas Huth 2016-04-13 06:23:41 UTC
(In reply to dengmin from comment #9)
>  Could you please help to build such a custom build ? Thanks a lot.

I can try to help here. FWIW, I already tried to compile an upstream kernel with 4k pages, but I got an error while trying to compile it... I'll have a try with a downstream kernel next, I'll then let you know the results.

Comment 12 Thomas Huth 2016-04-13 11:54:18 UTC
Suggested patch upstream: https://patchwork.ozlabs.org/patch/609982/

Comment 14 Min Deng 2016-04-14 06:31:43 UTC
Created attachment 1147048 [details]
KernelPanic

Failure on my guest.

Comment 16 Thomas Huth 2016-04-14 06:52:11 UTC
(In reply to dengmin from comment #14)
> Failure on my guest.

Looking at that screenshot, the only thing I could imagine is, that you accidentially tried to install my 4k kernel on a little endian guest. Could you please verify that you're guest is a big endian installation, not a little endian one?

Comment 17 Thomas Huth 2016-04-14 09:14:50 UTC
Decreased the priority/severity a little bit since the problem only occurs with 4k guests on 64k hosts - and 4k (server) guests are rather hard to find in the wild nowadays.

Comment 19 Min Deng 2016-04-14 09:59:08 UTC
Created attachment 1147105 [details]
screenshotfor64kitB

Comment 20 Min Deng 2016-04-14 09:59:54 UTC
Created attachment 1147106 [details]
64kib

Comment 23 Thomas Huth 2016-04-29 14:43:49 UTC
I'm closing this ticket for PPC, since there are no 4k guests available in the wild (all major distros seem to use 64k page size guests nowadays), and we thus don't have a real problem here. We still can track the issue itself in BZ 1324092 (but we certainly do not need two tickets to track this).

*** This bug has been marked as a duplicate of bug 1324092 ***


Note You need to log in before you can comment on or make changes to this bug.