DescriptionVinzenz Feenstra [evilissimo]
2013-11-18 08:43:15 UTC
+++ This bug was initially created as a clone of Bug #1030515 +++
+++ This bug was initially created as a clone of Bug #1025845 +++
Version-Release:
vdsm-4.13.0-0.5.beta1.el6ev.x86_64
mom-0.3.2-6.el6ev.noarch
Description:
When Hypervisor's memory pressure grows, MOM is supposed to reduce guests memory to make more memory available to the Hypervisor. But instead of select only the guests with free memory available, MOM is reducing all guests memory. The consequence is guests with high memory load using swap.
Additional info:
- Hypervisor: proliant
- Guests: rhel64-1, rhel64-2 and rhel64-3
- MOM ready:
2013-11-01 14:01:03,090 - mom.Monitor - INFO - GuestMonitor-rhel64_3 starting
2013-11-01 14:01:03,091 - mom.Monitor - INFO - GuestMonitor-rhel64_2 starting
2013-11-01 14:01:03,094 - mom.Monitor - INFO - GuestMonitor-rhel64_1 starting
2013-11-01 14:01:03,110 - mom.Monitor - INFO - GuestMonitor-rhel64_3 is ready
2013-11-01 14:01:03,111 - mom.Monitor - INFO - GuestMonitor-rhel64_2 is ready
2013-11-01 14:01:03,112 - mom.Monitor - INFO - GuestMonitor-rhel64_1 is ready
- Memory status:
[root@proliant mom.d]# date; free -m
Fri Nov 1 14:02:17 BRT 2013
total used free shared buffers cached
Mem: 3787 986 2801 0 11 71
-/+ buffers/cache: 903 2884
Swap: 4095 498 3597
[root@rhel64-1 ~]# date; free -m
Fri Nov 1 14:03:35 BRT 2013
total used free shared buffers cached
Mem: 1877 141 1736 0 6 28
-/+ buffers/cache: 106 1770
Swap: 4031 26 4005
[root@rhel64-2 ~]# date; free -m
Fri Nov 1 14:03:41 BRT 2013
total used free shared buffers cached
Mem: 1877 143 1734 0 6 30
-/+ buffers/cache: 106 1770
Swap: 4031 28 4003
[root@rhel64-3 ~]# date; free -m
Fri Nov 1 14:03:46 BRT 2013
total used free shared buffers cached
Mem: 1877 281 1595 0 26 98
-/+ buffers/cache: 156 1720
Swap: 4031 0 4031
- Starting memory pressure on rhel64-1 and rhel64-2:
[root@rhel64-1 ~]# ./bang.bin 1800 &
[1] 3838
[root@rhel64-1 ~]# Allocating 1800MB to work on.
[root@rhel64-1 ~]# date; free -m
Fri Nov 1 14:03:49 BRT 2013
total used free shared buffers cached
Mem: 1877 1817 60 0 0 13
-/+ buffers/cache: 1803 74
Swap: 4031 150 3881
[root@rhel64-2 ~]# ./bang.bin 1800 &
[1] 3890
[root@rhel64-2 ~]# Allocating 1800MB to work on.
[root@rhel64-2 ~]# date; free -m
Fri Nov 1 14:04:20 BRT 2013
total used free shared buffers cached
Mem: 1877 1736 140 0 6 30
-/+ buffers/cache: 1699 178
Swap: 4031 28 4003
- No pressure on rhel64-3:
[root@rhel64-3 ~]# date; free -m
Fri Nov 1 14:04:38 BRT 2013
total used free shared buffers cached
Mem: 1877 282 1595 0 26 98
-/+ buffers/cache: 157 1720
Swap: 4031 0 4031
- MOM working:
2013-11-01 14:04:50,426 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 2097152 to 1992294
2013-11-01 14:04:51,511 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats() error: The ovirt-guest-agent is not active
2013-11-01 14:04:52,769 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 2097152 to 1992294
2013-11-01 14:04:52,863 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 2097152 to 1992294
2013-11-01 14:04:52,983 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:364 run:1 sleep_millisecs:43
2013-11-01 14:05:01,769 - mom.Monitor - INFO - GuestMonitor-rhel64_2 is ready
2013-11-01 14:05:04,451 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 1992296 to 1892681
2013-11-01 14:05:04,504 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 1992296 to 1892681
2013-11-01 14:05:04,517 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 1992296 to 1892681
2013-11-01 14:05:04,587 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:664 run:1 sleep_millisecs:43
2013-11-01 14:05:14,633 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 1892684 to 1798049
2013-11-01 14:05:14,835 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 1892684 to 1798049
2013-11-01 14:05:14,885 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 1892684 to 1798049
2013-11-01 14:05:14,937 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:964 run:1 sleep_millisecs:43
2013-11-01 14:05:24,983 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 1798052 to 1708149
2013-11-01 14:05:25,080 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 1798052 to 1708149
2013-11-01 14:05:25,160 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 1798052 to 1708149
2013-11-01 14:05:25,202 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:1250 run:1 sleep_millisecs:43
2013-11-01 14:05:35,242 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_3 from 1708152 to 1793559
2013-11-01 14:05:35,281 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_2 from 1708152 to 1793559
2013-11-01 14:05:35,314 - mom.Controllers.Balloon - INFO - Ballooning guest:rhel64_1 from 1708152 to 1793559
- After some MOM work, guests memory status:
[root@rhel64-1 ~]# date; free -m
Fri Nov 1 14:05:16 BRT 2013
total used free shared buffers cached
Mem: 1585 1507 78 0 0 6
-/+ buffers/cache: 1500 85
Swap: 4031 487 3544
[root@rhel64-2 ~]# date; free -m
Fri Nov 1 14:05:18 BRT 2013
total used free shared buffers cached
Mem: 1585 1509 75 0 0 10
-/+ buffers/cache: 1499 86
Swap: 4031 491 3540
[root@rhel64-3 ~]# date; free -m
Fri Nov 1 14:05:20 BRT 2013
total used free shared buffers cached
Mem: 1585 281 1303 0 26 98
-/+ buffers/cache: 156 1428
Swap: 4031 0 4031
- Notice MOM is reducing memory for all guests, even high loaded ones.
Expected results:
MOM not reducing guests memory beyond free memory limit.
--- Additional comment from Doron Fediuck on 2013-11-03 06:25:09 EST ---
Just a few things I'd like to clarify;
1. Once a VM starts swapping MOM should detect it and stop inflating it / start deflating.
2. The floor limit MOM is using is based on the "Physical Memory Guaranteed" settings we define for each VM in the Resource Allocation sub-tab of the new/edit
VM dialog. Can you please provide the numbers set for these VMs?
--- Additional comment from Martin Sivák on 2013-11-04 04:42:04 EST ---
We are not getting swap information from the guest agent so this is definitely an issue. Because when we inflate the balloon, the VM will also put some of its data to swap and mom will then think that there is still enough reclaimable memory in the vm.
To fix this we would have to modify the guest agent(s) - linux, win, mom collectors and vdsm policy for mom.
There is also an error in the policy and we have a fix for that - http://gerrit.ovirt.org/#/c/19416/1/doc/balloon.rules
--- Additional comment from Amador Pahim on 2013-11-04 06:54:39 EST ---
(In reply to Doron Fediuck from comment #1)
> Just a few things I'd like to clarify;
> 1. Once a VM starts swapping MOM should detect it and stop inflating it /
> start deflating.
>
> 2. The floor limit MOM is using is based on the "Physical Memory Guaranteed"
> settings we define for each VM in the Resource Allocation sub-tab of the
> new/edit
> VM dialog. Can you please provide the numbers set for these VMs?
Defined Memory: 2048 MB
Physical Memory Guaranteed: 512 MB
--- Additional comment from Amador Pahim on 2013-11-04 08:33:28 EST ---
(In reply to Martin Sivák from comment #2)
> We are not getting swap information from the guest agent so this is
> definitely an issue. Because when we inflate the balloon, the VM will also
> put some of its data to swap and mom will then think that there is still
> enough reclaimable memory in the vm.
My concern is also about all VMs being inflated/deflated with the same amount of memory, regardless their individual memory status. Is this an issue? Shouldn't MOM balloon VMs with different patterns considering its individual memory availability?
>
> To fix this we would have to modify the guest agent(s) - linux, win, mom
> collectors and vdsm policy for mom.
>
> There is also an error in the policy and we have a fix for that -
> http://gerrit.ovirt.org/#/c/19416/1/doc/balloon.rules
--- Additional comment from Martin Sivák on 2013-11-04 08:41:54 EST ---
> My concern is also about all VMs being inflated/deflated with the same
> amount of memory,
That is fixed in the referenced changeset.
--- Additional comment from Adam Litke on 2013-11-04 08:51:18 EST ---
The mom policy is designed to behave in one of two ways depending on the severity of host memory pressure. Under moderate pressure (between 5% and 20% free host memory) we try to balloon away only unused memory in guests. Under severe host pressure (< 5% free) we purposefully cause guest swapping in order to keep the host itself from entering a swap storm. Since you are observing guest swapping, could you share the state of host memory during that behavior? If the host has <5% free (counting Cached pages as free) then I would argue that the policy is behaving as designed.
--- Additional comment from Martin Sivák on 2013-11-04 09:28:50 EST ---
Adam: there are two issues there
- Your fix http://gerrit.ovirt.org/#/c/19416/1/doc/balloon.rules is not present in the version he uses and that causes all the VMs to disregard the computed minimum in favour of the hard minimum (which is lower and the same for all VM).
- When you have most of the memory in swap, MoM will think RAM is (almost) free and inflate the balloon because we do not have any info about the swap usage in the policy.
But you are totally right about the two modes of ballooning we use.