Created attachment 859827 [details] MOM log of of started VMs Description of problem: MOM is not counting appropriate statistics from all VMs, if there is bigger amount of VMs on the system. In my tests my statistics show only first 13Vms running. Version-Release number of selected component (if applicable): is32 mom-0.3.2-8.el6ev.noarch vdsm-4.13.2-0.7.el6ev.x86_64 libvirt-0.10.2-29.el6_5.2.x86_64 How reproducible: 100% Steps to Reproduce: 1. Enable Balloon optimization on cluster 2. Create 16 or more VMs 3. get Stats from mom.getStatistics() form XMLRPC Actual results: MOM stop counting statistics from VMs run after 13th VM Expected results: MOM should consider statistics of all VMs in the system. In case of overloading the system some VMs balloon won't be deflated/inflated or KSM won't be working properly with all VMs. Additional info: XMLRPC output from MOM: {'guests': {'balloon-1': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 6, 'host_minor_faults': 2, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 402664, 'mem_unused': 334888, 'minor_fault': 124, 'rss': 76286, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-10': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 3, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 403436, 'mem_unused': 338616, 'minor_fault': 131, 'rss': 73302, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-11': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 0, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 403428, 'mem_unused': 338608, 'minor_fault': 131, 'rss': 73303, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-12': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 9, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 401440, 'mem_unused': 336624, 'minor_fault': 13, 'rss': 73842, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-13': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 4, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 403384, 'mem_unused': 338680, 'minor_fault': 13, 'rss': 73456, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-2': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 3, 'host_minor_faults': 21, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 402792, 'mem_unused': 336076, 'minor_fault': 137, 'rss': 75057, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-3': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 0, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 398856, 'mem_unused': 332044, 'minor_fault': 137, 'rss': 76621, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-4': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 0, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 400344, 'mem_unused': 333524, 'minor_fault': 137, 'rss': 75570, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-5': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 31, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 400884, 'mem_unused': 334076, 'minor_fault': 138, 'rss': 74121, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-6': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 1, 'host_minor_faults': 9, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 400732, 'mem_unused': 334020, 'minor_fault': 137, 'rss': 76436, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-7': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 6, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 402732, 'mem_unused': 335936, 'minor_fault': 14, 'rss': 74988, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-8': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 0, 'host_minor_faults': 2, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 402508, 'mem_unused': 335696, 'minor_fault': 13, 'rss': 76063, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}, 'balloon-9': {'balloon_cur': 524288, 'balloon_max': 524288, 'balloon_min': 262144, 'host_major_faults': 1, 'host_minor_faults': 13, 'major_fault': 0, 'mem_available': 502256, 'mem_free': 402652, 'mem_unused': 335936, 'minor_fault': 2525, 'rss': 75218, 'swap_in': 0, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 0}}, 'host': {'anon_pages': 7429488, 'ksm_full_scans': 2029, 'ksm_pages_shared': 44387, 'ksm_pages_sharing': 564636, 'ksm_pages_to_scan': 200, 'ksm_pages_unshared': 218736, 'ksm_pages_volatile': 56585, 'ksm_run': 1, 'ksm_shareable': 33204704, 'ksm_sleep_millisecs': 20, 'ksmd_cpu_usage': 3, 'mem_available': 8030296, 'mem_free': 259812, 'mem_unused': 147340, 'swap_in': 87, 'swap_out': 0, 'swap_total': 2097144, 'swap_usage': 63792}}
Created attachment 859828 [details] VDSM log of actions
Tried this with 16 smaller VMs 256MB/128MB (memory/guaranteed memory) and it seems to be working fine. However problem with 512/256MB VMs still persists.
Can you attach the full mom.log? I suspect that some of your VMs are not running the quest agent..
After installing new environment (setup and hosts) for this bug it seems it's working now. Since I can't provide any new logs. I'm closing this to INSUFFICIENT_DATA. If must have been something with misconfiguration as Martin suggested. If the bug appears again I'll reopen this with appropriate logs.