Created attachment 1505205 [details] mom, engine, vdsm logs Description of problem: Regression issue for memory ballooning tests in 4.3 (the tests pass in 4.2). No memory deflation happens while mom policy is set and most of the free host memory is allocated. Version-Release number of selected component (if applicable): vdsm-4.30.1-35.git4e0049c.el7.x86_64 ovirt-engine-4.3.0-0.0.master.20181101091940.git61310aa.el7.noarch How reproducible: 100% Steps to Reproduce: 1. On host change the MOM defvar pressure_threshold in file /etc/vdsm/mom.d/02-balloon.policy to "0.40" 2. Enable ballooning for the host: 1. on cluster check Enable Memory Balloon Optimization (Edit Cluster/Optimization tab), 2. deactivate/activate the host (or it could be done with Sync Mom Policy in Cluster, Host tab) 3. Disable swapping on host. 4. Update existed vm mom_vm_0 with {mamory size:2048MB, max memory 4096 MB, memory guarantee 1024 MB} 5. Start VM's ['mom_vm_0'] 6. Allocate on host 70% of free memory int(host_free_memory * 0.7) Expected results: "balloonInfo" got from vdsm-client VM getStats vmID="6abe3f61-27bd-43d1-b724-8c1b8cdc53e8" must change to have have balloon_max - 1024 > balloon_cur Actual results: balloon info in vdsm-client is not changed as expected. remains the same as on vm start. "balloonInfo": { "balloon_max": "2097152", "balloon_cur": "2097152", "balloon_target": "2097152", "balloon_min": "1048576" }, mom.log 2018-11-12 18:38:33,008 - mom.Monitor - ERROR - Unexpected collection error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/Monitor.py", line 95, in collect collected = c.collect() File "/usr/lib/python2.7/site-packages/mom/Collectors/GuestBalloon.py", line 41, in collect stat = self.hypervisor_iface.getVmBalloonInfo(self.uuid) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmRpcBase.py", line 80, in getVmBalloonInfo vm = self._getVmStats(uuid) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmRpcBase.py", line 155, in _getVmStats raise HypervisorInterfaceError("VM %s does not exist" % vmId) HypervisorInterfaceError: VM 66fb330a-0c4d-4a21-8a03-80773f7a218d does not exist 2018-11-12 18:38:33,009 - mom.Monitor - ERROR - Unexpected collection error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/Monitor.py", line 95, in collect collected = c.collect() File "/usr/lib/python2.7/site-packages/mom/Collectors/GuestCpuTune.py", line 44, in collect stat = self.hypervisor_iface.getVmCpuTuneInfo(self.uuid) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmRpcBase.py", line 100, in getVmCpuTuneInfo vm = self._getVmStats(uuid) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmRpcBase.py", line 155, in _getVmStats raise HypervisorInterfaceError("VM %s does not exist" % vmId) HypervisorInterfaceError: VM 66fb330a-0c4d-4a21-8a03-80773f7a218d does not exist Additional info: logs attached
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Can you please re-test now that libvirt/guest-agent bugs are resolved?
Re-tested with the following packages. The problem still happens ovirt-guest-agent-common-1.0.14-1.20181008062431.git30a9b91.el7.noarch ovirt-release-master-4.3.0-0.1.master.20181101000103.git023a723.el7.noarch libvirt-4.5.0-10.el7_6.3.x86_64
I see in ligs that the highes memUsed percentage was 26 which means that there was 74% available so there was no ballooning needed. Please make sure you enable mom debug logs and capture the total/free memory on host at the time you think the balloon should inflate.
Created attachment 1515919 [details] mom debug please see attached mom.log with DEBUG info in debug.tar.gz Host before the memory allocating [root@lynx22 ~]# free total used free shared buff/cache available Mem: 32660836 7892352 24098308 27324 670176 24252948 Swap: 0 0 0 Host after the memory allocating [root@lynx22 ~]# free total used free shared buff/cache available Mem: 32660836 25075744 6912952 27356 672140 7069024 Swap: 0 0 0 "balloonInfo": { "balloon_max": "2097152", "balloon_cur": "2097152", "balloon_target": "2097152", "balloon_min": "1048576"
I still see the lowest memUsed available was 23%, the threshold is 20%. please repeat.
Created attachment 1518797 [details] mom and engine logs In the previous test the threshold is 40% and not the default 20. So, the inflation must happen. Anyway, now I repeated the test in the latest master 4.3 environment ovirt-engine-4.3.0-0.4.master.20181231193012.git1f27a84.el7.noarch. (the same test passes in 4.2) with the default defvar pressure_threshold 0.20 and allocating about 85% of the host memory. So, before the memory allocation we have: free total used free shared buff/cache available Mem: 32711616 8003820 23900772 18148 807024 24197248 free -h Mem: 31G 7.6G 22G 17M 788M 23G After the memory allocation: free Mem: 32711616 28621672 3280708 18180 809236 3578512 free -h Mem: 31G 27G 3.1G 17M 790M 3.4G The configuration of: # If the percentage of host free memory drops below this value # then we will consider the host to be under memory pressure (defvar pressure_threshold 0.20) I attach again debug mom.log and engine.log. please look at the logs at about: 2019-01-06 15:32: 2019-01-06 15:32:12,647+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [] VM 'edb57570-0091-4222-9fed-5d02ed4f9247'(mom_vm_0) moved from 'WaitForLaunch' --> 'PoweringUp'
just wanted to note that there is a message "DEBUG - Field 'mem_free' not known." in mom.log. don't know if it relates to the problem.
Re-targeting, because these bugs either do not have blocker+, or do not have a patch posted
Reviewed at Exec Program call and agreed to keep as a blocker.
This may have the same cause as Bug 1676695. After I applied the patch that fixes it, ballooning is working as expected.
for retest
verified on ovirt-engine-4.3.1.2-0.0.master.20190220155021.git90ab3d9.el7.noarch by running the automation mom tests
This bugzilla is included in oVirt 4.3.2 release, published on March 19th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.