Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1061875

Summary: MOM not working properly with multiple VMs
Product: Red Hat Enterprise Virtualization Manager Reporter: Lukas Svaty <lsvaty>
Component: momAssignee: Martin Sivák <msivak>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Lukas Svaty <lsvaty>
Severity: urgent Docs Contact: Cheryn Tan <chetan>
Priority: urgent    
Version: 3.3.0CC: dfediuck, gklein, iheim, mavital, rlandman, sherold, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.3   
Hardware: Unspecified   
OS: Linux   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-13 08:40:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1078909, 1142926    
Attachments:
Description Flags
MOM log of of started VMs
none
VDSM log of actions none

Description Lukas Svaty 2014-02-05 19:42:45 UTC
Created attachment 859827 [details]
MOM log of of started VMs

Description of problem:
MOM is not counting appropriate statistics from all VMs, if there is bigger amount of VMs on the system. In my tests my statistics show only first 13Vms running.

Version-Release number of selected component (if applicable):
is32
mom-0.3.2-8.el6ev.noarch
vdsm-4.13.2-0.7.el6ev.x86_64
libvirt-0.10.2-29.el6_5.2.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Enable Balloon optimization on cluster
2. Create 16 or more VMs
3. get Stats from mom.getStatistics() form XMLRPC


Actual results:
MOM stop counting statistics from VMs run after 13th VM

Expected results:
MOM should consider statistics of all VMs in the system. In case of overloading the system some VMs balloon won't be deflated/inflated or KSM won't be working properly with all VMs. 

Additional info:
XMLRPC output from MOM:
{'guests': {'balloon-1': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 6,
                          'host_minor_faults': 2,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402664,
                          'mem_unused': 334888,
                          'minor_fault': 124,
                          'rss': 76286,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-10': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 3,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403436,
                           'mem_unused': 338616,
                           'minor_fault': 131,
                           'rss': 73302,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-11': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 0,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403428,
                           'mem_unused': 338608,
                           'minor_fault': 131,
                           'rss': 73303,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-12': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 9,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 401440,
                           'mem_unused': 336624,
                           'minor_fault': 13,
                           'rss': 73842,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-13': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 4,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403384,
                           'mem_unused': 338680,
                           'minor_fault': 13,
                           'rss': 73456,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-2': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 3,
                          'host_minor_faults': 21,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402792,
                          'mem_unused': 336076,
                          'minor_fault': 137,
                          'rss': 75057,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-3': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 0,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 398856,
                          'mem_unused': 332044,
                          'minor_fault': 137,
                          'rss': 76621,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-4': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 0,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400344,
                          'mem_unused': 333524,
                          'minor_fault': 137,
                          'rss': 75570,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-5': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 31,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400884,
                          'mem_unused': 334076,
                          'minor_fault': 138,
                          'rss': 74121,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-6': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 1,
                          'host_minor_faults': 9,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400732,
                          'mem_unused': 334020,
                          'minor_fault': 137,
                          'rss': 76436,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-7': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 6,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402732,
                          'mem_unused': 335936,
                          'minor_fault': 14,
                          'rss': 74988,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-8': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 2,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402508,
                          'mem_unused': 335696,
                          'minor_fault': 13,
                          'rss': 76063,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-9': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 1,
                          'host_minor_faults': 13,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402652,
                          'mem_unused': 335936,
                          'minor_fault': 2525,
                          'rss': 75218,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0}},
 'host': {'anon_pages': 7429488,
          'ksm_full_scans': 2029,
          'ksm_pages_shared': 44387,
          'ksm_pages_sharing': 564636,
          'ksm_pages_to_scan': 200,
          'ksm_pages_unshared': 218736,
          'ksm_pages_volatile': 56585,
          'ksm_run': 1,
          'ksm_shareable': 33204704,
          'ksm_sleep_millisecs': 20,
          'ksmd_cpu_usage': 3,
          'mem_available': 8030296,
          'mem_free': 259812,
          'mem_unused': 147340,
          'swap_in': 87,
          'swap_out': 0,
          'swap_total': 2097144,
          'swap_usage': 63792}}

Comment 1 Lukas Svaty 2014-02-05 19:43:32 UTC
Created attachment 859828 [details]
VDSM log of actions

Comment 2 Lukas Svaty 2014-02-06 11:15:59 UTC
Tried this with 16 smaller VMs 256MB/128MB (memory/guaranteed memory) and it seems to be working fine. However problem with 512/256MB VMs still persists.

Comment 3 Martin Sivák 2014-02-10 16:12:27 UTC
Can you attach the full mom.log? I suspect that some of your VMs are not running the quest agent..

Comment 4 Lukas Svaty 2014-02-13 08:40:50 UTC
After installing new environment (setup and hosts) for this bug it seems  it's working now. Since I can't provide any new logs. I'm closing this to INSUFFICIENT_DATA. If must have been something with misconfiguration as Martin suggested. If the bug appears again I'll reopen this with appropriate logs.