Bug 1061875 - MOM not working properly with multiple VMs
Summary: MOM not working properly with multiple VMs
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: mom
Version: 3.3.0
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.3.3
Assignee: Martin Sivák
QA Contact: Lukas Svaty
Cheryn Tan
URL:
Whiteboard: sla
Depends On:
Blocks: rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2014-02-05 19:42 UTC by Lukas Svaty
Modified: 2016-02-10 20:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-13 08:40:50 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
MOM log of of started VMs (10.99 KB, text/x-log)
2014-02-05 19:42 UTC, Lukas Svaty
no flags Details
VDSM log of actions (1.07 MB, text/x-log)
2014-02-05 19:43 UTC, Lukas Svaty
no flags Details

Description Lukas Svaty 2014-02-05 19:42:45 UTC
Created attachment 859827 [details]
MOM log of of started VMs

Description of problem:
MOM is not counting appropriate statistics from all VMs, if there is bigger amount of VMs on the system. In my tests my statistics show only first 13Vms running.

Version-Release number of selected component (if applicable):
is32
mom-0.3.2-8.el6ev.noarch
vdsm-4.13.2-0.7.el6ev.x86_64
libvirt-0.10.2-29.el6_5.2.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Enable Balloon optimization on cluster
2. Create 16 or more VMs
3. get Stats from mom.getStatistics() form XMLRPC


Actual results:
MOM stop counting statistics from VMs run after 13th VM

Expected results:
MOM should consider statistics of all VMs in the system. In case of overloading the system some VMs balloon won't be deflated/inflated or KSM won't be working properly with all VMs. 

Additional info:
XMLRPC output from MOM:
{'guests': {'balloon-1': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 6,
                          'host_minor_faults': 2,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402664,
                          'mem_unused': 334888,
                          'minor_fault': 124,
                          'rss': 76286,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-10': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 3,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403436,
                           'mem_unused': 338616,
                           'minor_fault': 131,
                           'rss': 73302,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-11': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 0,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403428,
                           'mem_unused': 338608,
                           'minor_fault': 131,
                           'rss': 73303,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-12': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 9,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 401440,
                           'mem_unused': 336624,
                           'minor_fault': 13,
                           'rss': 73842,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-13': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 4,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403384,
                           'mem_unused': 338680,
                           'minor_fault': 13,
                           'rss': 73456,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-2': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 3,
                          'host_minor_faults': 21,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402792,
                          'mem_unused': 336076,
                          'minor_fault': 137,
                          'rss': 75057,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-3': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 0,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 398856,
                          'mem_unused': 332044,
                          'minor_fault': 137,
                          'rss': 76621,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-4': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 0,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400344,
                          'mem_unused': 333524,
                          'minor_fault': 137,
                          'rss': 75570,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-5': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 31,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400884,
                          'mem_unused': 334076,
                          'minor_fault': 138,
                          'rss': 74121,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-6': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 1,
                          'host_minor_faults': 9,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400732,
                          'mem_unused': 334020,
                          'minor_fault': 137,
                          'rss': 76436,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-7': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 6,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402732,
                          'mem_unused': 335936,
                          'minor_fault': 14,
                          'rss': 74988,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-8': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 2,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402508,
                          'mem_unused': 335696,
                          'minor_fault': 13,
                          'rss': 76063,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-9': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 1,
                          'host_minor_faults': 13,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402652,
                          'mem_unused': 335936,
                          'minor_fault': 2525,
                          'rss': 75218,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0}},
 'host': {'anon_pages': 7429488,
          'ksm_full_scans': 2029,
          'ksm_pages_shared': 44387,
          'ksm_pages_sharing': 564636,
          'ksm_pages_to_scan': 200,
          'ksm_pages_unshared': 218736,
          'ksm_pages_volatile': 56585,
          'ksm_run': 1,
          'ksm_shareable': 33204704,
          'ksm_sleep_millisecs': 20,
          'ksmd_cpu_usage': 3,
          'mem_available': 8030296,
          'mem_free': 259812,
          'mem_unused': 147340,
          'swap_in': 87,
          'swap_out': 0,
          'swap_total': 2097144,
          'swap_usage': 63792}}

Comment 1 Lukas Svaty 2014-02-05 19:43:32 UTC
Created attachment 859828 [details]
VDSM log of actions

Comment 2 Lukas Svaty 2014-02-06 11:15:59 UTC
Tried this with 16 smaller VMs 256MB/128MB (memory/guaranteed memory) and it seems to be working fine. However problem with 512/256MB VMs still persists.

Comment 3 Martin Sivák 2014-02-10 16:12:27 UTC
Can you attach the full mom.log? I suspect that some of your VMs are not running the quest agent..

Comment 4 Lukas Svaty 2014-02-13 08:40:50 UTC
After installing new environment (setup and hosts) for this bug it seems  it's working now. Since I can't provide any new logs. I'm closing this to INSUFFICIENT_DATA. If must have been something with misconfiguration as Martin suggested. If the bug appears again I'll reopen this with appropriate logs.


Note You need to log in before you can comment on or make changes to this bug.