Version-Release: vdsm-4.13.0-0.5.beta1.el6ev.x86_64 mom-0.3.2-6.el6ev.noarch Description: When Hypervisor's memory pressure grows, MOM is supposed to reduce guests memory to make more memory available to the Hypervisor. But instead of select only the guests with free memory available, MOM is reducing all guests memory. The consequence is guests with high memory load using swap. Additional info: - Hypervisor: proliant - Guests: rhel64-1, rhel64-2 and rhel64-3 - MOM ready: 2013-11-01 14:01:03,090 - mom.Monitor - INFO - GuestMonitor-rhel64_3 starting 2013-11-01 14:01:03,091 - mom.Monitor - INFO - GuestMonitor-rhel64_2 starting 2013-11-01 14:01:03,094 - mom.Monitor - INFO - GuestMonitor-rhel64_1 starting 2013-11-01 14:01:03,110 - mom.Monitor - INFO - GuestMonitor-rhel64_3 is ready 2013-11-01 14:01:03,111 - mom.Monitor - INFO - GuestMonitor-rhel64_2 is ready 2013-11-01 14:01:03,112 - mom.Monitor - INFO - GuestMonitor-rhel64_1 is ready - Memory status: [root@proliant mom.d]# date; free -m Fri Nov 1 14:02:17 BRT 2013 total used free shared buffers cached Mem: 3787 986 2801 0 11 71 -/+ buffers/cache: 903 2884 Swap: 4095 498 3597 [root@rhel64-1 ~]# date; free -m Fri Nov 1 14:03:35 BRT 2013 total used free shared buffers cached Mem: 1877 141 1736 0 6 28 -/+ buffers/cache: 106 1770 Swap: 4031 26 4005 [root@rhel64-2 ~]# date; free -m Fri Nov 1 14:03:41 BRT 2013 total used free shared buffers cached Mem: 1877 143 1734 0 6 30 -/+ buffers/cache: 106 1770 Swap: 4031 28 4003 [root@rhel64-3 ~]# date; free -m Fri Nov 1 14:03:46 BRT 2013 total used free shared buffers cached Mem: 1877 281 1595 0 26 98 -/+ buffers/cache: 156 1720 Swap: 4031 0 4031 - Starting memory pressure on rhel64-1 and rhel64-2: [root@rhel64-1 ~]# ./bang.bin 1800 & [1] 3838 [root@rhel64-1 ~]# Allocating 1800MB to work on. [root@rhel64-1 ~]# date; free -m Fri Nov 1 14:03:49 BRT 2013 total used free shared buffers cached Mem: 1877 1817 60 0 0 13 -/+ buffers/cache: 1803 74 Swap: 4031 150 3881 [root@rhel64-2 ~]# ./bang.bin 1800 & [1] 3890 [root@rhel64-2 ~]# Allocating 1800MB to work on. [root@rhel64-2 ~]# date; free -m Fri Nov 1 14:04:20 BRT 2013 total used free shared buffers cached Mem: 1877 1736 140 0 6 30 -/+ buffers/cache: 1699 178 Swap: 4031 28 4003 - No pressure on rhel64-3: [root@rhel64-3 ~]# date; free -m Fri Nov 1 14:04:38 BRT 2013 total used free shared buffers cached Mem: 1877 282 1595 0 26 98 -/+ buffers/cache: 157 1720 Swap: 4031 0 4031 - MOM working: 2013-11-01 14:04:50,426 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 2097152 to 1992294 2013-11-01 14:04:51,511 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats() error: The ovirt-guest-agent is not active 2013-11-01 14:04:52,769 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 2097152 to 1992294 2013-11-01 14:04:52,863 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 2097152 to 1992294 2013-11-01 14:04:52,983 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:364 run:1 sleep_millisecs:43 2013-11-01 14:05:01,769 - mom.Monitor - INFO - GuestMonitor-rhel64_2 is ready 2013-11-01 14:05:04,451 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 1992296 to 1892681 2013-11-01 14:05:04,504 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 1992296 to 1892681 2013-11-01 14:05:04,517 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 1992296 to 1892681 2013-11-01 14:05:04,587 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:664 run:1 sleep_millisecs:43 2013-11-01 14:05:14,633 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 1892684 to 1798049 2013-11-01 14:05:14,835 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 1892684 to 1798049 2013-11-01 14:05:14,885 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 1892684 to 1798049 2013-11-01 14:05:14,937 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:964 run:1 sleep_millisecs:43 2013-11-01 14:05:24,983 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 1798052 to 1708149 2013-11-01 14:05:25,080 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 1798052 to 1708149 2013-11-01 14:05:25,160 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 1798052 to 1708149 2013-11-01 14:05:25,202 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:1250 run:1 sleep_millisecs:43 2013-11-01 14:05:35,242 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 1708152 to 1793559 2013-11-01 14:05:35,281 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 1708152 to 1793559 2013-11-01 14:05:35,314 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 1708152 to 1793559 - After some MOM work, guests memory status: [root@rhel64-1 ~]# date; free -m Fri Nov 1 14:05:16 BRT 2013 total used free shared buffers cached Mem: 1585 1507 78 0 0 6 -/+ buffers/cache: 1500 85 Swap: 4031 487 3544 [root@rhel64-2 ~]# date; free -m Fri Nov 1 14:05:18 BRT 2013 total used free shared buffers cached Mem: 1585 1509 75 0 0 10 -/+ buffers/cache: 1499 86 Swap: 4031 491 3540 [root@rhel64-3 ~]# date; free -m Fri Nov 1 14:05:20 BRT 2013 total used free shared buffers cached Mem: 1585 281 1303 0 26 98 -/+ buffers/cache: 156 1428 Swap: 4031 0 4031 - Notice MOM is reducing memory for all guests, even high loaded ones. Expected results: MOM not reducing guests memory beyond free memory limit.
Just a few things I'd like to clarify; 1. Once a VM starts swapping MOM should detect it and stop inflating it / start deflating. 2. The floor limit MOM is using is based on the "Physical Memory Guaranteed" settings we define for each VM in the Resource Allocation sub-tab of the new/edit VM dialog. Can you please provide the numbers set for these VMs?
We are not getting swap information from the guest agent so this is definitely an issue. Because when we inflate the balloon, the VM will also put some of its data to swap and mom will then think that there is still enough reclaimable memory in the vm. To fix this we would have to modify the guest agent(s) - linux, win, mom collectors and vdsm policy for mom. There is also an error in the policy and we have a fix for that - http://gerrit.ovirt.org/#/c/19416/1/doc/balloon.rules
(In reply to Doron Fediuck from comment #1) > Just a few things I'd like to clarify; > 1. Once a VM starts swapping MOM should detect it and stop inflating it / > start deflating. > > 2. The floor limit MOM is using is based on the "Physical Memory Guaranteed" > settings we define for each VM in the Resource Allocation sub-tab of the > new/edit > VM dialog. Can you please provide the numbers set for these VMs? Defined Memory: 2048 MB Physical Memory Guaranteed: 512 MB
(In reply to Martin Sivák from comment #2) > We are not getting swap information from the guest agent so this is > definitely an issue. Because when we inflate the balloon, the VM will also > put some of its data to swap and mom will then think that there is still > enough reclaimable memory in the vm. My concern is also about all VMs being inflated/deflated with the same amount of memory, regardless their individual memory status. Is this an issue? Shouldn't MOM balloon VMs with different patterns considering its individual memory availability? > > To fix this we would have to modify the guest agent(s) - linux, win, mom > collectors and vdsm policy for mom. > > There is also an error in the policy and we have a fix for that - > http://gerrit.ovirt.org/#/c/19416/1/doc/balloon.rules
> My concern is also about all VMs being inflated/deflated with the same > amount of memory, That is fixed in the referenced changeset.
The mom policy is designed to behave in one of two ways depending on the severity of host memory pressure. Under moderate pressure (between 5% and 20% free host memory) we try to balloon away only unused memory in guests. Under severe host pressure (< 5% free) we purposefully cause guest swapping in order to keep the host itself from entering a swap storm. Since you are observing guest swapping, could you share the state of host memory during that behavior? If the host has <5% free (counting Cached pages as free) then I would argue that the policy is behaving as designed.
Adam: there are two issues there - Your fix http://gerrit.ovirt.org/#/c/19416/1/doc/balloon.rules is not present in the version he uses and that causes all the VMs to disregard the computed minimum in favour of the hard minimum (which is lower and the same for all VM). - When you have most of the memory in swap, MoM will think RAM is (almost) free and inflate the balloon because we do not have any info about the swap usage in the policy. But you are totally right about the two modes of ballooning we use.
should this bug be on MODIFIED? if all patches are in, please move to ON_QA and mark fixed in is23
Unfortunately not, there are some patches that are still missing.
The mom part is ready and vddm contains fixed policy. So it should behave better now. There are still situations where this won't be enough (mostly related to swap usage) and all the related bugs add support for dealing with it.
mom stops to change size of balloon after first change change consulted with msivak, patch in process moving back to ASSIGNED
MoM used the same variable stack for all policy runs. That caused old variable values to be used sometimes.
mom-0.3.2-8.el6ev tested in
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0064.html