Bug 507846 - Balloon driver gives up too easily when ballooning up under memory pressure
Summary: Balloon driver gives up too easily when ballooning up under memory pressure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Andrew Jones
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 514490
TreeView+ depends on / blocked
 
Reported: 2009-06-24 13:59 UTC by Ian Campbell
Modified: 2011-01-13 20:49 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 507847 (view as bug list)
Environment:
Last Closed: 2011-01-13 20:49:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
linux-2.6.18-xen.hg 897:329ea0ccb344 ported to 2.6.18-128.1.10.el5 (4.42 KB, patch)
2009-06-24 13:59 UTC, Ian Campbell
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Ian Campbell 2009-06-24 13:59:47 UTC
Created attachment 349241 [details]
linux-2.6.18-xen.hg 897:329ea0ccb344 ported to  2.6.18-128.1.10.el5

Description of problem:

We recently noticed that the standard Xen balloon driver for Linux will give up on ballooning if it encounters memory pressure. When this occurs ballooning will stop until the target is reset, regardless of how much memory subsequently becomes available

Version-Release number of selected component (if applicable):

2.6.18-128.1.10.el5 and all previous kernel releases.

How reproducible:

Easily

Steps to Reproduce:

Depend somewhat on the amount of the host RAM available. The trick is to balloon a domain down and then create a guest which uses enough memory to cause the original to exhaust memory as it balloons back up.

1. Create a new domain with e.g. 2G RAM
2. Instruct domain to balloon to 1G RAM: xenstore-write /local/domain/<domid>/memory/target 1048576  
3. Create a second domain which uses up enough RAM to leave <1G free.
4. Instruct the original domain to balloon back up to 2G: xenstore-write /local/domain/<domid>/memory/target 2097152  
5. Shutdown domain 2
  
Actual results:

In step 4 the first domain will stop ballooning up somewhere between 1 and 2G, in step 5 it will remain at that same value.

Expected results:

In step 5 the original domain will balloon up to the full 2G.

Additional info:

This was fixed by the upstream changeset
http://xenbits.xensource.com/linux-2.6.18-xen.hg?cs=329ea0ccb344 I have attached a version ported to the 2.6.18-128.1.10.el5 kernel.

Comment 4 Paolo Bonzini 2010-06-24 12:14:16 UTC
Well, the patch seems all but intrusive to me... the gist of it is simply this:

	reservation.nr_extents   = nr_pages;
	rc = HYPERVISOR_memory_op(
		XENMEM_populate_physmap, &reservation);
-	if (rc < nr_pages) {
-		if (rc > 0) {
-			int ret;
-
-			/* We hit the Xen hard limit: reprobe. */
-			reservation.nr_extents = rc;
-			ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
-					&reservation);
-			BUG_ON(ret != rc);
-		}
- 		goto out;
-	}
+	if (rc < 0)
+		goto out;
 
-	for (i = 0; i < nr_pages; i++) {
+	for (i = 0; i < rc; i++) {


the rest is just removing the write_only bs.hard_limit variable.

Comment 5 Ian Campbell 2010-06-25 08:10:23 UTC
It was a long while ago but IIRC the bs.hard_limit essentially tracks the point at which the balloon driver gave up on ballooning up and this patch removes that behaviour and hence the meaning of the variable and therefore it was removed at the same time.

Your (psuedo?)patch doesn't seem to include the removal of the update of bs.hard_limit which the original patch included. It should be included just after the "if (rc > 0) { }" block (either as unchanged  context if it is staying or as a removed line) unless your kernel has changed significantly since I last looked?

If you are worried about the ABI of the /proc/xen/balloon file then leaving in the "Xen hard limit:     ??? kB" which is currently printed until the first time an upwards balloon is printed might make sense.

Comment 6 Paolo Bonzini 2010-06-25 11:37:03 UTC
> Your (psuedo?)patch doesn't seem to include the removal of the update of
> bs.hard_limit which the original patch included. 

Yes, it was just a pseudo patch.  I removed everything regarding bs.hard_limit just to show my point.

I or Andrew might split the patch and post it again to see if we're more lucky this time.

Comment 7 RHEL Program Management 2010-08-04 12:09:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Jarod Wilson 2010-09-17 14:01:55 UTC
in kernel-2.6.18-222.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 12 Chris Ward 2010-11-18 12:22:20 UTC
~~ ATTN Customers & Partners - RHEL 5.6 Public Beta available on RHN ~~

A fix for this BZ should be present and testable in the release. 

If this Bugzilla is verified as resolved, please update the Verified field
above with an appropriate value and include a summary of the testing executed
and the results obtained.

If you encounter any issues or have questions while testing, please describe
them and set this bug into NEED_INFO. 

If you encounter new defects or have additional patches to request for
inclusion, promptly escalate the new issues through your support
representative.

Finally, FYI: future Beta kernels can be found here:
 http://people.redhat.com/jwilson/el5/

Note: Bugs with the 'OtherQA' keyword require Third-Party testing to confirm
the request has been properly addressed. 
See:
 https://bugzilla.redhat.com/describekeywords.cgi#OtherQA

Comment 13 Lei Wang 2010-12-21 07:02:43 UTC
Test steps with host has 4G memory:
1. create rhel5 PV guest guest1 with 3G memory
2. balloon down guest1 memory to 1G
3. create another guest guest2 with memory 1G
4. balloon up guest1 memory to 3G
5. check /proc/xen/balloon in guest1
6. shutdown guest2
7. check /proc/xen/balloon in guest1 again

host:
x86 and x86_64
xen-3.0.3-120.el5
kernel-xen-2.6.18-237.el5

Reproduce with:
guest: kernel-xen-2.6.18-194.el5

At step5:
[root@dhcp-66-82-191 ~]# cat /proc/xen/balloon
Current allocation:  2098144 kB
Requested target:    3145728 kB
... ...
Xen hard limit:      2098144 kB

At step7:
[root@dhcp-66-82-191 ~]# cat /proc/xen/balloon
Current allocation:  2098144 kB
Requested target:    3145728 kB
... ...
Xen hard limit:      2098144 kB


Verified with:
guest: kernel-xen-2.6.18-238.el5
At step5:
# cat /proc/xen/balloon
Current allocation:  2098220 kB
Requested target:    3145728 kB
... ...
Xen hard limit:          ??? kB

At step7:
# cat /proc/xen/balloon
Current allocation:  3145728 kB
Requested target:    3145728 kB
... ...
Xen hard limit:          ??? kB

According to the result above, move to VERIFIED.

Comment 15 errata-xmlrpc 2011-01-13 20:49:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.