Bug 1687832 - Memory ballooning VM Stats remain unchanged upon host's memory allocation for when guest OS is RHEL8.
Summary: Memory ballooning VM Stats remain unchanged upon host's memory allocation for...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.30.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.3.3
: ---
Assignee: Tomáš Golembiovský
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks: 1661283
TreeView+ depends on / blocked
 
Reported: 2019-03-12 13:12 UTC by Polina
Modified: 2019-04-16 13:58 UTC (History)
5 users (show)

Fixed In Version: vdsm-4.30.12
Clone Of:
Environment:
Last Closed: 2019-04-16 13:58:17 UTC
oVirt Team: Virt
Embargoed:
pagranat: needinfo-
michal.skrivanek: ovirt-4.3?


Attachments (Terms of Use)
engine, vdsm, mom_debug logs (924.81 KB, application/gzip)
2019-03-12 13:12 UTC, Polina
no flags Details
mom, (2.27 MB, application/gzip)
2019-03-27 09:54 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 98583 0 'None' MERGED vmstats: compute memUsage from balloon stats 2021-01-10 20:44:58 UTC
oVirt gerrit 98584 0 'None' MERGED vmstats: collect buffers and caches 2021-01-10 20:44:23 UTC
oVirt gerrit 98753 0 'None' MERGED vmstats: compute memUsage from balloon stats 2021-01-10 20:44:23 UTC
oVirt gerrit 98754 0 'None' MERGED vmstats: collect buffers and caches 2021-01-10 20:44:23 UTC

Description Polina 2019-03-12 13:12:02 UTC
Created attachment 1543175 [details]
engine, vdsm, mom_debug logs

Description of problem:
The same test for memory ballooning Stats changing while allocating hosts' memory passes successfully when guest OS is rhel7.6 and doesn't pass when OS is rhel8.0.

Version-Release number of selected component (if applicable):
ovirt-engine-4.3.2-0.1.el7.noarch
vdsm-4.30.10-1.el7ev.x86_64
qemu-guest-agent-2.12.0-63.module+el8+2833+c7d6d092.x86_64

How reproducible:100% when VM is running with qemu-guest-agent-2.12.0-63.module+el8+2833+c7d6d092.x86_64

Steps to Reproduce:
1. On host change the MOM defvar pressure_threshold in  file /etc/vdsm/mom.d/02-balloon.policy to "0.40"
2. Enable ballooning for the host: 1. on cluster check Enable Memory Balloon Optimization (Edit Cluster/Optimization tab), 2. deactivate/activate the host (or it could be done with Sync Mom Policy in Cluster, Host tab)
3. Disable swapping on host.
4.  Update existed vm mom_vm_0 with {mamory size:2048MB, max memory 4096 MB, memory guarantee 1024 MB}
5. Start VM's ['mom_vm_0']
6. Allocate on host 70% of free memory int(host_free_memory * 0.7)

before allocation 
free -h
              total        used        free      shared  buff/cache   available
Mem:            31G        7.5G         16G        313M        7.0G         22G

"balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "2097152", 
            "balloon_target": "2097152", 
            "balloon_min": "1048576"

after allocation:
free -h
              total        used        free      shared  buff/cache   available
Mem:            31G         23G        436M        313M        7.1G        6.6G


Actual results:
balloonInfo in vdsm-client never changes.

Please look at the attached logs - in engine.log the test scenario starts at 

2019-03-12 14:23:30,813+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-52) [] VM '22c61d4b-d813-40c4-bd34-4147fd13227a'(mom_vm_0) moved from 'PoweringUp' --> 'Up'

vmID="22c61d4b-d813-40c4-bd34-4147fd13227a"


Expected results:
"balloonInfo" got from vdsm-client VM getStats vmID="6abe3f61-27bd-43d1-b724-8c1b8cdc53e8"  must decrease. 
When the same test runs for guest OS rhel 7.6 , the balloonInfo changes, like

"balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "1798052", 
            "balloon_target": "1708149", 
            "balloon_min": "1048576"


Additional info: mom debug attached

Comment 1 Michal Skrivanek 2019-03-13 08:01:02 UTC
what did you plan to do with memUsage reported from ovirt-ga?

Comment 2 Michal Skrivanek 2019-03-13 12:34:42 UTC
needs a vdsm fix

Comment 3 Tomáš Golembiovský 2019-03-19 10:56:32 UTC
posted patches fixing this

Comment 4 Polina 2019-03-27 09:54:58 UTC
Created attachment 1548486 [details]
mom,

Hi , 

I've verified it on ovirt-engine-4.3.2.2-0.0.master.20190325152515.gitd0a215d.el7.noarch and vdsm-4.30.11-23.git276b602.el7.x86_64.

all the previously failed tests are successful now. However, there is some new failure in an edge case ( this test passed ok before the latest fix).

The test goal is to check that there is no balloon deflation/inflation while guest-agent is down.

The scenario is exactly as in https://bugzilla.redhat.com/show_bug.cgi?id=1649328#c0 with the difference that after VM is started (before starting host memory allocation) I stop qemu-guest-agent service and expect no changes in "balloonInfo" after memory.
What is actually happens  - the "balloonInfo" is unexpectedly changed.

host before memory allocation:

free
              total        used        free      shared  buff/cache   available
Mem:      130421900     8954468   114973900      182880     6493532   120233888

free -h
              total        used        free      shared  buff/cache   available
Mem:           124G        8.5G        109G        178M        6.2G        114G

        "balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "2097152", 
            "balloon_target": "2097152", 
            "balloon_min": "1048576"

host memory after allocation:
free
             total        used        free      shared  buff/cache   available
Mem:      130421900    93141372    30785508      182912     6495020    36047212

free -h
              total        used        free      shared  buff/cache   available
Mem:           124G         88G         29G        178M        6.2G         34

       "balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "1464528", 
            "balloon_target": "1391301", 
            "balloon_min": "1048576"
        }, 


I attach the logs (engine, vdsm, mom debug). The test starts at 019-03-27 11:23:04,540 

Please, let me know if we can fix the case in a scope of this BZ or we can close this BZ and file the issue in a separate minor bug.

Comment 5 Ryan Barry 2019-03-27 10:40:54 UTC
Let's file in a separate bug.

Tomas, does ballooning on EL8 depend on QGA, or does it work without it? If it does, this may not be a bug at all. Information I found was contradictory.

Comment 6 Tomáš Golembiovský 2019-03-27 10:58:06 UTC
No, the ballooning does not depend on QGA at all.

Comment 7 Ryan Barry 2019-03-27 11:06:20 UTC
Great. The issue in comment#4 is NOTABUG, then, Polina -- let's move this to VERIFIED

Comment 8 Polina 2019-03-28 09:36:31 UTC
Hi Ryan and Tomas,  I don't say that ballooning depends on QGA. 
I'm just saying that in the last Upstream engine (ovirt-engine-4.3.2.2-0.0.master.20190325152515.gitd0a215d.el7.noarch) containing the fix for all the recent ballooning failures this test started to fail. 
At the same time in the Downstream engine (ovirt-engine-4.3.2.1-0.1.el7.noarch), this test passes successfully. Both setups have the same qemu-guest-agent-2.12.0-2.el7.x86_64.

Comment 9 Tomáš Golembiovský 2019-03-28 11:40:13 UTC
I'm not sure this test is valid. Is it modeled after some former oVirt GA test-case?

The memory statistics are gathered directly from balloon driver in guest and not from QEMU-GA. The behavior should be the same whether QEMU-GA is running or not. But if I understand comment #4 correctly you expect different behavior when QEMU-GA is (is not) running. Please correct me if I misunderstood.

Comment 10 Polina 2019-03-28 12:29:15 UTC
yes, 
you understand me correctly. this is what test does - expects no balloonInfo changed (in vdsm-client VM getStats vmID="<ID>") when quemu-quest-agent service is down. 
about when this test is modeled  - it is an old test, part of the regression set from 2017.

Comment 12 Tomáš Golembiovský 2019-03-28 13:38:27 UTC
So the test case is not valid any more and should be removed or changed. As said, we don't rely on gust agent for the information. Instead the information is taken directly from libvirt/qemu on host (which in turn has it from balloon driver in guest).

The only ways that I can think of that would prevent ballooning are:
  - disable ballooning in VM properties, so that VM has no balloon device at all
  - use guest without balloon driver, e.g. fresh Windows without guest tools installed

Comment 13 Sandro Bonazzola 2019-04-16 13:58:17 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.