Bug 507846

Summary: Balloon driver gives up too easily when ballooning up under memory pressure
Product: Red Hat Enterprise Linux 5 Reporter: Ian Campbell <ijc>
Component: kernel-xenAssignee: Andrew Jones <drjones>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: clalance, drjones, leiwang, mshao, pbonzini, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 507847 (view as bug list) Environment:
Last Closed: 2011-01-13 20:49:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514490    
Attachments:
Description Flags
linux-2.6.18-xen.hg 897:329ea0ccb344 ported to 2.6.18-128.1.10.el5 none

Description Ian Campbell 2009-06-24 13:59:47 UTC
Created attachment 349241 [details]
linux-2.6.18-xen.hg 897:329ea0ccb344 ported to  2.6.18-128.1.10.el5

Description of problem:

We recently noticed that the standard Xen balloon driver for Linux will give up on ballooning if it encounters memory pressure. When this occurs ballooning will stop until the target is reset, regardless of how much memory subsequently becomes available

Version-Release number of selected component (if applicable):

2.6.18-128.1.10.el5 and all previous kernel releases.

How reproducible:

Easily

Steps to Reproduce:

Depend somewhat on the amount of the host RAM available. The trick is to balloon a domain down and then create a guest which uses enough memory to cause the original to exhaust memory as it balloons back up.

1. Create a new domain with e.g. 2G RAM
2. Instruct domain to balloon to 1G RAM: xenstore-write /local/domain/<domid>/memory/target 1048576  
3. Create a second domain which uses up enough RAM to leave <1G free.
4. Instruct the original domain to balloon back up to 2G: xenstore-write /local/domain/<domid>/memory/target 2097152  
5. Shutdown domain 2
  
Actual results:

In step 4 the first domain will stop ballooning up somewhere between 1 and 2G, in step 5 it will remain at that same value.

Expected results:

In step 5 the original domain will balloon up to the full 2G.

Additional info:

This was fixed by the upstream changeset
http://xenbits.xensource.com/linux-2.6.18-xen.hg?cs=329ea0ccb344 I have attached a version ported to the 2.6.18-128.1.10.el5 kernel.

Comment 4 Paolo Bonzini 2010-06-24 12:14:16 UTC
Well, the patch seems all but intrusive to me... the gist of it is simply this:

	reservation.nr_extents   = nr_pages;
	rc = HYPERVISOR_memory_op(
		XENMEM_populate_physmap, &reservation);
-	if (rc < nr_pages) {
-		if (rc > 0) {
-			int ret;
-
-			/* We hit the Xen hard limit: reprobe. */
-			reservation.nr_extents = rc;
-			ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
-					&reservation);
-			BUG_ON(ret != rc);
-		}
- 		goto out;
-	}
+	if (rc < 0)
+		goto out;
 
-	for (i = 0; i < nr_pages; i++) {
+	for (i = 0; i < rc; i++) {


the rest is just removing the write_only bs.hard_limit variable.

Comment 5 Ian Campbell 2010-06-25 08:10:23 UTC
It was a long while ago but IIRC the bs.hard_limit essentially tracks the point at which the balloon driver gave up on ballooning up and this patch removes that behaviour and hence the meaning of the variable and therefore it was removed at the same time.

Your (psuedo?)patch doesn't seem to include the removal of the update of bs.hard_limit which the original patch included. It should be included just after the "if (rc > 0) { }" block (either as unchanged  context if it is staying or as a removed line) unless your kernel has changed significantly since I last looked?

If you are worried about the ABI of the /proc/xen/balloon file then leaving in the "Xen hard limit:     ??? kB" which is currently printed until the first time an upwards balloon is printed might make sense.

Comment 6 Paolo Bonzini 2010-06-25 11:37:03 UTC
> Your (psuedo?)patch doesn't seem to include the removal of the update of
> bs.hard_limit which the original patch included. 

Yes, it was just a pseudo patch.  I removed everything regarding bs.hard_limit just to show my point.

I or Andrew might split the patch and post it again to see if we're more lucky this time.

Comment 7 RHEL Program Management 2010-08-04 12:09:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Jarod Wilson 2010-09-17 14:01:55 UTC
in kernel-2.6.18-222.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 12 Chris Ward 2010-11-18 12:22:20 UTC
~~ ATTN Customers & Partners - RHEL 5.6 Public Beta available on RHN ~~

A fix for this BZ should be present and testable in the release. 

If this Bugzilla is verified as resolved, please update the Verified field
above with an appropriate value and include a summary of the testing executed
and the results obtained.

If you encounter any issues or have questions while testing, please describe
them and set this bug into NEED_INFO. 

If you encounter new defects or have additional patches to request for
inclusion, promptly escalate the new issues through your support
representative.

Finally, FYI: future Beta kernels can be found here:
 http://people.redhat.com/jwilson/el5/

Note: Bugs with the 'OtherQA' keyword require Third-Party testing to confirm
the request has been properly addressed. 
See:
 https://bugzilla.redhat.com/describekeywords.cgi#OtherQA

Comment 13 Lei Wang 2010-12-21 07:02:43 UTC
Test steps with host has 4G memory:
1. create rhel5 PV guest guest1 with 3G memory
2. balloon down guest1 memory to 1G
3. create another guest guest2 with memory 1G
4. balloon up guest1 memory to 3G
5. check /proc/xen/balloon in guest1
6. shutdown guest2
7. check /proc/xen/balloon in guest1 again

host:
x86 and x86_64
xen-3.0.3-120.el5
kernel-xen-2.6.18-237.el5

Reproduce with:
guest: kernel-xen-2.6.18-194.el5

At step5:
[root@dhcp-66-82-191 ~]# cat /proc/xen/balloon
Current allocation:  2098144 kB
Requested target:    3145728 kB
... ...
Xen hard limit:      2098144 kB

At step7:
[root@dhcp-66-82-191 ~]# cat /proc/xen/balloon
Current allocation:  2098144 kB
Requested target:    3145728 kB
... ...
Xen hard limit:      2098144 kB


Verified with:
guest: kernel-xen-2.6.18-238.el5
At step5:
# cat /proc/xen/balloon
Current allocation:  2098220 kB
Requested target:    3145728 kB
... ...
Xen hard limit:          ??? kB

At step7:
# cat /proc/xen/balloon
Current allocation:  3145728 kB
Requested target:    3145728 kB
... ...
Xen hard limit:          ??? kB

According to the result above, move to VERIFIED.

Comment 15 errata-xmlrpc 2011-01-13 20:49:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html