Bug 1687832

Summary: Memory ballooning VM Stats remain unchanged upon host's memory allocation for when guest OS is RHEL8.
Product: [oVirt] vdsm Reporter: Polina <pagranat>
Component: GeneralAssignee: Tomáš Golembiovský <tgolembi>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.30.0CC: bugs, mavital, michal.skrivanek, rbarry, tgolembi
Target Milestone: ovirt-4.3.3Flags: pagranat: needinfo-
michal.skrivanek: ovirt-4.3?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.30.12 Doc Type: If docs needed, set a value
Doc Text:
Memory statistic for guests running only QEMU Guest Agent were not properly reported which caused ballooning to fail and memory consumption was not available in UI. This is now fixed
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-16 13:58:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1661283    
Attachments:
Description Flags
engine, vdsm, mom_debug logs
none
mom, none

Description Polina 2019-03-12 13:12:02 UTC
Created attachment 1543175 [details]
engine, vdsm, mom_debug logs

Description of problem:
The same test for memory ballooning Stats changing while allocating hosts' memory passes successfully when guest OS is rhel7.6 and doesn't pass when OS is rhel8.0.

Version-Release number of selected component (if applicable):
ovirt-engine-4.3.2-0.1.el7.noarch
vdsm-4.30.10-1.el7ev.x86_64
qemu-guest-agent-2.12.0-63.module+el8+2833+c7d6d092.x86_64

How reproducible:100% when VM is running with qemu-guest-agent-2.12.0-63.module+el8+2833+c7d6d092.x86_64

Steps to Reproduce:
1. On host change the MOM defvar pressure_threshold in  file /etc/vdsm/mom.d/02-balloon.policy to "0.40"
2. Enable ballooning for the host: 1. on cluster check Enable Memory Balloon Optimization (Edit Cluster/Optimization tab), 2. deactivate/activate the host (or it could be done with Sync Mom Policy in Cluster, Host tab)
3. Disable swapping on host.
4.  Update existed vm mom_vm_0 with {mamory size:2048MB, max memory 4096 MB, memory guarantee 1024 MB}
5. Start VM's ['mom_vm_0']
6. Allocate on host 70% of free memory int(host_free_memory * 0.7)

before allocation 
free -h
              total        used        free      shared  buff/cache   available
Mem:            31G        7.5G         16G        313M        7.0G         22G

"balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "2097152", 
            "balloon_target": "2097152", 
            "balloon_min": "1048576"

after allocation:
free -h
              total        used        free      shared  buff/cache   available
Mem:            31G         23G        436M        313M        7.1G        6.6G


Actual results:
balloonInfo in vdsm-client never changes.

Please look at the attached logs - in engine.log the test scenario starts at 

2019-03-12 14:23:30,813+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-52) [] VM '22c61d4b-d813-40c4-bd34-4147fd13227a'(mom_vm_0) moved from 'PoweringUp' --> 'Up'

vmID="22c61d4b-d813-40c4-bd34-4147fd13227a"


Expected results:
"balloonInfo" got from vdsm-client VM getStats vmID="6abe3f61-27bd-43d1-b724-8c1b8cdc53e8"  must decrease. 
When the same test runs for guest OS rhel 7.6 , the balloonInfo changes, like

"balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "1798052", 
            "balloon_target": "1708149", 
            "balloon_min": "1048576"


Additional info: mom debug attached

Comment 1 Michal Skrivanek 2019-03-13 08:01:02 UTC
what did you plan to do with memUsage reported from ovirt-ga?

Comment 2 Michal Skrivanek 2019-03-13 12:34:42 UTC
needs a vdsm fix

Comment 3 Tomáš Golembiovský 2019-03-19 10:56:32 UTC
posted patches fixing this

Comment 4 Polina 2019-03-27 09:54:58 UTC
Created attachment 1548486 [details]
mom,

Hi , 

I've verified it on ovirt-engine-4.3.2.2-0.0.master.20190325152515.gitd0a215d.el7.noarch and vdsm-4.30.11-23.git276b602.el7.x86_64.

all the previously failed tests are successful now. However, there is some new failure in an edge case ( this test passed ok before the latest fix).

The test goal is to check that there is no balloon deflation/inflation while guest-agent is down.

The scenario is exactly as in https://bugzilla.redhat.com/show_bug.cgi?id=1649328#c0 with the difference that after VM is started (before starting host memory allocation) I stop qemu-guest-agent service and expect no changes in "balloonInfo" after memory.
What is actually happens  - the "balloonInfo" is unexpectedly changed.

host before memory allocation:

free
              total        used        free      shared  buff/cache   available
Mem:      130421900     8954468   114973900      182880     6493532   120233888

free -h
              total        used        free      shared  buff/cache   available
Mem:           124G        8.5G        109G        178M        6.2G        114G

        "balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "2097152", 
            "balloon_target": "2097152", 
            "balloon_min": "1048576"

host memory after allocation:
free
             total        used        free      shared  buff/cache   available
Mem:      130421900    93141372    30785508      182912     6495020    36047212

free -h
              total        used        free      shared  buff/cache   available
Mem:           124G         88G         29G        178M        6.2G         34

       "balloonInfo": {
            "balloon_max": "2097152", 
            "balloon_cur": "1464528", 
            "balloon_target": "1391301", 
            "balloon_min": "1048576"
        }, 


I attach the logs (engine, vdsm, mom debug). The test starts at 019-03-27 11:23:04,540 

Please, let me know if we can fix the case in a scope of this BZ or we can close this BZ and file the issue in a separate minor bug.

Comment 5 Ryan Barry 2019-03-27 10:40:54 UTC
Let's file in a separate bug.

Tomas, does ballooning on EL8 depend on QGA, or does it work without it? If it does, this may not be a bug at all. Information I found was contradictory.

Comment 6 Tomáš Golembiovský 2019-03-27 10:58:06 UTC
No, the ballooning does not depend on QGA at all.

Comment 7 Ryan Barry 2019-03-27 11:06:20 UTC
Great. The issue in comment#4 is NOTABUG, then, Polina -- let's move this to VERIFIED

Comment 8 Polina 2019-03-28 09:36:31 UTC
Hi Ryan and Tomas,  I don't say that ballooning depends on QGA. 
I'm just saying that in the last Upstream engine (ovirt-engine-4.3.2.2-0.0.master.20190325152515.gitd0a215d.el7.noarch) containing the fix for all the recent ballooning failures this test started to fail. 
At the same time in the Downstream engine (ovirt-engine-4.3.2.1-0.1.el7.noarch), this test passes successfully. Both setups have the same qemu-guest-agent-2.12.0-2.el7.x86_64.

Comment 9 Tomáš Golembiovský 2019-03-28 11:40:13 UTC
I'm not sure this test is valid. Is it modeled after some former oVirt GA test-case?

The memory statistics are gathered directly from balloon driver in guest and not from QEMU-GA. The behavior should be the same whether QEMU-GA is running or not. But if I understand comment #4 correctly you expect different behavior when QEMU-GA is (is not) running. Please correct me if I misunderstood.

Comment 10 Polina 2019-03-28 12:29:15 UTC
yes, 
you understand me correctly. this is what test does - expects no balloonInfo changed (in vdsm-client VM getStats vmID="<ID>") when quemu-quest-agent service is down. 
about when this test is modeled  - it is an old test, part of the regression set from 2017.

Comment 12 Tomáš Golembiovský 2019-03-28 13:38:27 UTC
So the test case is not valid any more and should be removed or changed. As said, we don't rely on gust agent for the information. Instead the information is taken directly from libvirt/qemu on host (which in turn has it from balloon driver in guest).

The only ways that I can think of that would prevent ballooning are:
  - disable ballooning in VM properties, so that VM has no balloon device at all
  - use guest without balloon driver, e.g. fresh Windows without guest tools installed

Comment 13 Sandro Bonazzola 2019-04-16 13:58:17 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.