Created attachment 1543175 [details] engine, vdsm, mom_debug logs Description of problem: The same test for memory ballooning Stats changing while allocating hosts' memory passes successfully when guest OS is rhel7.6 and doesn't pass when OS is rhel8.0. Version-Release number of selected component (if applicable): ovirt-engine-4.3.2-0.1.el7.noarch vdsm-4.30.10-1.el7ev.x86_64 qemu-guest-agent-2.12.0-63.module+el8+2833+c7d6d092.x86_64 How reproducible:100% when VM is running with qemu-guest-agent-2.12.0-63.module+el8+2833+c7d6d092.x86_64 Steps to Reproduce: 1. On host change the MOM defvar pressure_threshold in file /etc/vdsm/mom.d/02-balloon.policy to "0.40" 2. Enable ballooning for the host: 1. on cluster check Enable Memory Balloon Optimization (Edit Cluster/Optimization tab), 2. deactivate/activate the host (or it could be done with Sync Mom Policy in Cluster, Host tab) 3. Disable swapping on host. 4. Update existed vm mom_vm_0 with {mamory size:2048MB, max memory 4096 MB, memory guarantee 1024 MB} 5. Start VM's ['mom_vm_0'] 6. Allocate on host 70% of free memory int(host_free_memory * 0.7) before allocation free -h total used free shared buff/cache available Mem: 31G 7.5G 16G 313M 7.0G 22G "balloonInfo": { "balloon_max": "2097152", "balloon_cur": "2097152", "balloon_target": "2097152", "balloon_min": "1048576" after allocation: free -h total used free shared buff/cache available Mem: 31G 23G 436M 313M 7.1G 6.6G Actual results: balloonInfo in vdsm-client never changes. Please look at the attached logs - in engine.log the test scenario starts at 2019-03-12 14:23:30,813+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-52) [] VM '22c61d4b-d813-40c4-bd34-4147fd13227a'(mom_vm_0) moved from 'PoweringUp' --> 'Up' vmID="22c61d4b-d813-40c4-bd34-4147fd13227a" Expected results: "balloonInfo" got from vdsm-client VM getStats vmID="6abe3f61-27bd-43d1-b724-8c1b8cdc53e8" must decrease. When the same test runs for guest OS rhel 7.6 , the balloonInfo changes, like "balloonInfo": { "balloon_max": "2097152", "balloon_cur": "1798052", "balloon_target": "1708149", "balloon_min": "1048576" Additional info: mom debug attached
what did you plan to do with memUsage reported from ovirt-ga?
needs a vdsm fix
posted patches fixing this
Created attachment 1548486 [details] mom, Hi , I've verified it on ovirt-engine-4.3.2.2-0.0.master.20190325152515.gitd0a215d.el7.noarch and vdsm-4.30.11-23.git276b602.el7.x86_64. all the previously failed tests are successful now. However, there is some new failure in an edge case ( this test passed ok before the latest fix). The test goal is to check that there is no balloon deflation/inflation while guest-agent is down. The scenario is exactly as in https://bugzilla.redhat.com/show_bug.cgi?id=1649328#c0 with the difference that after VM is started (before starting host memory allocation) I stop qemu-guest-agent service and expect no changes in "balloonInfo" after memory. What is actually happens - the "balloonInfo" is unexpectedly changed. host before memory allocation: free total used free shared buff/cache available Mem: 130421900 8954468 114973900 182880 6493532 120233888 free -h total used free shared buff/cache available Mem: 124G 8.5G 109G 178M 6.2G 114G "balloonInfo": { "balloon_max": "2097152", "balloon_cur": "2097152", "balloon_target": "2097152", "balloon_min": "1048576" host memory after allocation: free total used free shared buff/cache available Mem: 130421900 93141372 30785508 182912 6495020 36047212 free -h total used free shared buff/cache available Mem: 124G 88G 29G 178M 6.2G 34 "balloonInfo": { "balloon_max": "2097152", "balloon_cur": "1464528", "balloon_target": "1391301", "balloon_min": "1048576" }, I attach the logs (engine, vdsm, mom debug). The test starts at 019-03-27 11:23:04,540 Please, let me know if we can fix the case in a scope of this BZ or we can close this BZ and file the issue in a separate minor bug.
Let's file in a separate bug. Tomas, does ballooning on EL8 depend on QGA, or does it work without it? If it does, this may not be a bug at all. Information I found was contradictory.
No, the ballooning does not depend on QGA at all.
Great. The issue in comment#4 is NOTABUG, then, Polina -- let's move this to VERIFIED
Hi Ryan and Tomas, I don't say that ballooning depends on QGA. I'm just saying that in the last Upstream engine (ovirt-engine-4.3.2.2-0.0.master.20190325152515.gitd0a215d.el7.noarch) containing the fix for all the recent ballooning failures this test started to fail. At the same time in the Downstream engine (ovirt-engine-4.3.2.1-0.1.el7.noarch), this test passes successfully. Both setups have the same qemu-guest-agent-2.12.0-2.el7.x86_64.
I'm not sure this test is valid. Is it modeled after some former oVirt GA test-case? The memory statistics are gathered directly from balloon driver in guest and not from QEMU-GA. The behavior should be the same whether QEMU-GA is running or not. But if I understand comment #4 correctly you expect different behavior when QEMU-GA is (is not) running. Please correct me if I misunderstood.
yes, you understand me correctly. this is what test does - expects no balloonInfo changed (in vdsm-client VM getStats vmID="<ID>") when quemu-quest-agent service is down. about when this test is modeled - it is an old test, part of the regression set from 2017.
So the test case is not valid any more and should be removed or changed. As said, we don't rely on gust agent for the information. Instead the information is taken directly from libvirt/qemu on host (which in turn has it from balloon driver in guest). The only ways that I can think of that would prevent ballooning are: - disable ballooning in VM properties, so that VM has no balloon device at all - use guest without balloon driver, e.g. fresh Windows without guest tools installed
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.