Bug 1649328 - no memory deflation/inflation in mom (unchanged balloonInfo).
Summary: no memory deflation/inflation in mom (unchanged balloonInfo).
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.3.2
: ---
Assignee: Andrej Krejcir
QA Contact: Polina
URL:
Whiteboard:
Depends On: 1676695
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-13 12:03 UTC by Polina
Modified: 2019-03-19 10:03 UTC (History)
6 users (show)

Fixed In Version: v4.30.9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-19 10:03:23 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.3+
rule-engine: blocker+


Attachments (Terms of Use)
mom, engine, vdsm logs (967.04 KB, application/x-gzip)
2018-11-13 12:03 UTC, Polina
no flags Details
mom debug (963.07 KB, application/gzip)
2018-12-20 15:42 UTC, Polina
no flags Details
mom and engine logs (403.22 KB, application/gzip)
2019-01-06 14:18 UTC, Polina
no flags Details

Description Polina 2018-11-13 12:03:07 UTC
Created attachment 1505205 [details]
mom, engine, vdsm logs

Description of problem: Regression issue for memory ballooning tests in 4.3 (the tests pass in 4.2). No memory deflation happens while mom policy is set and most of the free host memory is allocated.

Version-Release number of selected component (if applicable): 
vdsm-4.30.1-35.git4e0049c.el7.x86_64
ovirt-engine-4.3.0-0.0.master.20181101091940.git61310aa.el7.noarch

How reproducible: 100%

Steps to Reproduce:
1. On host change the MOM defvar pressure_threshold in  file /etc/vdsm/mom.d/02-balloon.policy to "0.40"
2. Enable ballooning for the host: 1. on cluster check Enable Memory Balloon Optimization (Edit Cluster/Optimization tab), 2. deactivate/activate the host (or it could be done with Sync Mom Policy in Cluster, Host tab)
3. Disable swapping on host.
4.  Update existed vm mom_vm_0 with {mamory size:2048MB, max memory 4096 MB, memory guarantee 1024 MB}
5. Start VM's ['mom_vm_0']
6. Allocate on host 70% of free memory int(host_free_memory * 0.7)

Expected results:

"balloonInfo" got from vdsm-client VM getStats vmID="6abe3f61-27bd-43d1-b724-8c1b8cdc53e8"  must change to have have balloon_max - 1024 > balloon_cur

Actual results:
balloon info in vdsm-client is not changed as expected. remains the same as on vm start.
"balloonInfo": {
            "balloon_max": "2097152",
            "balloon_cur": "2097152",
            "balloon_target": "2097152",
            "balloon_min": "1048576"
        },


mom.log 
2018-11-12 18:38:33,008 - mom.Monitor - ERROR - Unexpected collection error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/Monitor.py", line 95, in collect
    collected = c.collect()
  File "/usr/lib/python2.7/site-packages/mom/Collectors/GuestBalloon.py", line 41, in collect
    stat = self.hypervisor_iface.getVmBalloonInfo(self.uuid)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmRpcBase.py", line 80, in getVmBalloonInfo
    vm = self._getVmStats(uuid)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmRpcBase.py", line 155, in _getVmStats
    raise HypervisorInterfaceError("VM %s does not exist" % vmId)
HypervisorInterfaceError: VM 66fb330a-0c4d-4a21-8a03-80773f7a218d does not exist
2018-11-12 18:38:33,009 - mom.Monitor - ERROR - Unexpected collection error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/Monitor.py", line 95, in collect
    collected = c.collect()
  File "/usr/lib/python2.7/site-packages/mom/Collectors/GuestCpuTune.py", line 44, in collect
    stat = self.hypervisor_iface.getVmCpuTuneInfo(self.uuid)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmRpcBase.py", line 100, in getVmCpuTuneInfo
    vm = self._getVmStats(uuid)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmRpcBase.py", line 155, in _getVmStats
    raise HypervisorInterfaceError("VM %s does not exist" % vmId)
HypervisorInterfaceError: VM 66fb330a-0c4d-4a21-8a03-80773f7a218d does not exist


Additional info: logs attached

Comment 1 Red Hat Bugzilla Rules Engine 2018-11-20 23:52:24 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Ryan Barry 2018-12-06 14:00:32 UTC
Can you please re-test now that libvirt/guest-agent bugs are resolved?

Comment 3 Polina 2018-12-10 09:52:18 UTC
Re-tested with the following packages. The problem still happens

ovirt-guest-agent-common-1.0.14-1.20181008062431.git30a9b91.el7.noarch
ovirt-release-master-4.3.0-0.1.master.20181101000103.git023a723.el7.noarch
libvirt-4.5.0-10.el7_6.3.x86_64

Comment 4 Michal Skrivanek 2018-12-10 10:29:32 UTC
I see in ligs that the highes memUsed percentage was 26 which means that there was 74% available so there was no ballooning needed. Please make sure you enable mom debug logs and capture the total/free memory on host at the time you think the balloon should inflate.

Comment 5 Polina 2018-12-20 15:42:02 UTC
Created attachment 1515919 [details]
mom debug

please see attached mom.log with DEBUG info in debug.tar.gz
Host before the memory allocating
[root@lynx22 ~]# free
              total        used        free      shared  buff/cache   available
Mem:       32660836     7892352    24098308       27324      670176    24252948
Swap:             0           0           0

Host after the memory allocating
[root@lynx22 ~]# free
              total        used        free      shared  buff/cache   available
Mem:       32660836    25075744     6912952       27356      672140     7069024
Swap:             0           0           0



"balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "2097152", 
            "balloon_target": "2097152", 
            "balloon_min": "1048576"

Comment 6 Michal Skrivanek 2019-01-03 11:40:13 UTC
I still see the lowest memUsed available was 23%, the threshold is 20%. please repeat.

Comment 7 Polina 2019-01-06 14:18:54 UTC
Created attachment 1518797 [details]
mom and engine logs

In the previous test the threshold is 40% and not the default 20. So, the inflation must happen.

Anyway, now I repeated the test in the latest master 4.3 environment ovirt-engine-4.3.0-0.4.master.20181231193012.git1f27a84.el7.noarch. (the same test passes in 4.2) with the default defvar pressure_threshold 0.20 and allocating about 85% of the host memory.

So, before the memory allocation we have:
free
              total        used        free      shared  buff/cache   available
Mem:       32711616     8003820    23900772       18148      807024    24197248
free -h
Mem:            31G        7.6G         22G         17M        788M         23G


After the memory allocation:
free
Mem:       32711616    28621672     3280708       18180      809236     3578512
free -h
Mem:            31G         27G        3.1G         17M        790M        3.4G

The configuration of:
# If the percentage of host free memory drops below this value
# then we will consider the host to be under memory pressure
(defvar pressure_threshold 0.20)

I attach again debug mom.log and engine.log.
please look at the logs at about: 2019-01-06 15:32:
2019-01-06 15:32:12,647+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [] VM 'edb57570-0091-4222-9fed-5d02ed4f9247'(mom_vm_0) moved from 'WaitForLaunch' --> 'PoweringUp'

Comment 8 Polina 2019-01-06 14:24:36 UTC
just wanted to note that there is a message "DEBUG - Field 'mem_free' not known." in mom.log. don't know if it relates to the problem.

Comment 9 Ryan Barry 2019-01-21 13:34:20 UTC
Re-targeting, because these bugs either do not have blocker+, or do not have a patch posted

Comment 10 Aileen O'Connor 2019-02-12 11:32:38 UTC
Reviewed at Exec Program call and agreed to keep as a blocker.

Comment 11 Andrej Krejcir 2019-02-19 16:46:18 UTC
This may have the same cause as Bug 1676695.

After I applied the patch that fixes it, ballooning is working as expected.

Comment 12 Michal Skrivanek 2019-02-21 13:32:37 UTC
for retest

Comment 13 Polina 2019-02-24 13:03:23 UTC
verified on ovirt-engine-4.3.1.2-0.0.master.20190220155021.git90ab3d9.el7.noarch
by running the automation mom tests

Comment 14 Sandro Bonazzola 2019-03-19 10:03:23 UTC
This bugzilla is included in oVirt 4.3.2 release, published on March 19th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.