Created attachment 349241 [details] linux-2.6.18-xen.hg 897:329ea0ccb344 ported to 2.6.18-128.1.10.el5 Description of problem: We recently noticed that the standard Xen balloon driver for Linux will give up on ballooning if it encounters memory pressure. When this occurs ballooning will stop until the target is reset, regardless of how much memory subsequently becomes available Version-Release number of selected component (if applicable): 2.6.18-128.1.10.el5 and all previous kernel releases. How reproducible: Easily Steps to Reproduce: Depend somewhat on the amount of the host RAM available. The trick is to balloon a domain down and then create a guest which uses enough memory to cause the original to exhaust memory as it balloons back up. 1. Create a new domain with e.g. 2G RAM 2. Instruct domain to balloon to 1G RAM: xenstore-write /local/domain/<domid>/memory/target 1048576 3. Create a second domain which uses up enough RAM to leave <1G free. 4. Instruct the original domain to balloon back up to 2G: xenstore-write /local/domain/<domid>/memory/target 2097152 5. Shutdown domain 2 Actual results: In step 4 the first domain will stop ballooning up somewhere between 1 and 2G, in step 5 it will remain at that same value. Expected results: In step 5 the original domain will balloon up to the full 2G. Additional info: This was fixed by the upstream changeset http://xenbits.xensource.com/linux-2.6.18-xen.hg?cs=329ea0ccb344 I have attached a version ported to the 2.6.18-128.1.10.el5 kernel.
Well, the patch seems all but intrusive to me... the gist of it is simply this: reservation.nr_extents = nr_pages; rc = HYPERVISOR_memory_op( XENMEM_populate_physmap, &reservation); - if (rc < nr_pages) { - if (rc > 0) { - int ret; - - /* We hit the Xen hard limit: reprobe. */ - reservation.nr_extents = rc; - ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, - &reservation); - BUG_ON(ret != rc); - } - goto out; - } + if (rc < 0) + goto out; - for (i = 0; i < nr_pages; i++) { + for (i = 0; i < rc; i++) { the rest is just removing the write_only bs.hard_limit variable.
It was a long while ago but IIRC the bs.hard_limit essentially tracks the point at which the balloon driver gave up on ballooning up and this patch removes that behaviour and hence the meaning of the variable and therefore it was removed at the same time. Your (psuedo?)patch doesn't seem to include the removal of the update of bs.hard_limit which the original patch included. It should be included just after the "if (rc > 0) { }" block (either as unchanged context if it is staying or as a removed line) unless your kernel has changed significantly since I last looked? If you are worried about the ABI of the /proc/xen/balloon file then leaving in the "Xen hard limit: ??? kB" which is currently printed until the first time an upwards balloon is printed might make sense.
> Your (psuedo?)patch doesn't seem to include the removal of the update of > bs.hard_limit which the original patch included. Yes, it was just a pseudo patch. I removed everything regarding bs.hard_limit just to show my point. I or Andrew might split the patch and post it again to see if we're more lucky this time.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-222.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
~~ ATTN Customers & Partners - RHEL 5.6 Public Beta available on RHN ~~ A fix for this BZ should be present and testable in the release. If this Bugzilla is verified as resolved, please update the Verified field above with an appropriate value and include a summary of the testing executed and the results obtained. If you encounter any issues or have questions while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patches to request for inclusion, promptly escalate the new issues through your support representative. Finally, FYI: future Beta kernels can be found here: http://people.redhat.com/jwilson/el5/ Note: Bugs with the 'OtherQA' keyword require Third-Party testing to confirm the request has been properly addressed. See: https://bugzilla.redhat.com/describekeywords.cgi#OtherQA
Test steps with host has 4G memory: 1. create rhel5 PV guest guest1 with 3G memory 2. balloon down guest1 memory to 1G 3. create another guest guest2 with memory 1G 4. balloon up guest1 memory to 3G 5. check /proc/xen/balloon in guest1 6. shutdown guest2 7. check /proc/xen/balloon in guest1 again host: x86 and x86_64 xen-3.0.3-120.el5 kernel-xen-2.6.18-237.el5 Reproduce with: guest: kernel-xen-2.6.18-194.el5 At step5: [root@dhcp-66-82-191 ~]# cat /proc/xen/balloon Current allocation: 2098144 kB Requested target: 3145728 kB ... ... Xen hard limit: 2098144 kB At step7: [root@dhcp-66-82-191 ~]# cat /proc/xen/balloon Current allocation: 2098144 kB Requested target: 3145728 kB ... ... Xen hard limit: 2098144 kB Verified with: guest: kernel-xen-2.6.18-238.el5 At step5: # cat /proc/xen/balloon Current allocation: 2098220 kB Requested target: 3145728 kB ... ... Xen hard limit: ??? kB At step7: # cat /proc/xen/balloon Current allocation: 3145728 kB Requested target: 3145728 kB ... ... Xen hard limit: ??? kB According to the result above, move to VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html