Bug 879930
| Summary: | ovirt-engine-backend [Scalability]: The queries getstorage_domains_by_storagepoolid && getdisksvmguid caused postmaster processes to consume constantly 100%cpu. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Omri Hochman <ohochman> | ||||||
| Component: | ovirt-engine | Assignee: | mkublin <mkublin> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | vvyazmin <vvyazmin> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 3.1.0 | CC: | bazulay, dyasny, hateya, iheim, lpeer, mkublin, Rhev-m-bugs, sgrinber, tvvcox, yeylon, ykaul, yzaslavs | ||||||
| Target Milestone: | --- | Keywords: | TestBlocker | ||||||
| Target Release: | 3.2.0 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | infra | ||||||||
| Fixed In Version: | sf1 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
The getdisksvmguid query from the GetVmStatsVDSCommand was run on every running virtual machine, causing the postmaster processes on remote database to consume 100% CPU power. This query is now no longer run, which reduces the CPU usage to 30% when GetVmStatsVDSCommand is run.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2013-06-10 21:23:25 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 651554 [details]
engine.log
I have a patch that was tested on that environment, we checked it with Omri and it shows good results http://gerrit.ovirt.org/#/c/9468/ , patch upstream. Merged upstream I can back port these changes easily to downstream version 3.0.x and 3.1.x This bug is currently attached to errata RHEA-2013:14491. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance. No issues are found Verified on RHEVM 3.2 - SF13.1 environment: RHEVM: rhevm-3.2.0-10.19.beta2.el6ev.noarch VDSM: vdsm-4.10.2-15.0.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-18.el6_4.3.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64 SANLOCK: sanlock-2.6-2.el6.x86_64 Tested on environment with 800 VM's and 52 hosts (50 of them was fake host) on FC and iSCSI Data Center Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0888.html |
Created attachment 651553 [details] pg_log ovirt-engine-backend[Scalability]: The queries getstorage_domains_by_storagepoolid && getdisksvmguid caused postmaster processes to consume constantly 100%cpu. Description: ************* On Scale environment (details below), postmaster processes on the remote DB physical machine consume constantly 100% cpu, investigate pg_log for queries that takes more than 1 second to return showed that the following : getstorage_domains_by_storagepoolid && getdisksvmguid are very frequent and takes long time. RHEVM Environment: ******************* - RHEVM (Build IC24.4) installed on physical - Postgresql remote DB on another physical machine. Objects in RHEVM: ***************** - Total 31 Hosts. - Total 50 iSCSI Storage Domains + 1 ISO + 1 Export - Total 1400+ running XP VM's (1NIC ,1HD) , - 2300 Users/Groups pg_log (queries that took more than 1 second) : ********************************************** LOG: duration: 3339.596 ms execute S_8: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '4c67e140-55ad-4371-a1c1-dd0f80ae5623', $2 = NULL, $3 = 'f' LOG: duration: 3681.452 ms execute S_2: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '8ddcec08-f7ed-4890-98ec-d510dac88c5f', $2 = NULL, $3 = 'f' LOG: duration: 5153.521 ms execute S_8: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '1f116259-f701-4620-bd78-bdbba46da6c7', $2 = NULL, $3 = 'f' LOG: duration: 1607.459 ms execute S_2: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '4700471d-6d82-4bb3-ba94-09f28879c880', $2 = NULL, $3 = 'f' LOG: duration: 4932.313 ms execute S_1: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '76d207cd-eee9-41df-beb4-d59cfea75ed8', $2 = NULL, $3 = 'f' LOG: duration: 4033.496 ms execute S_2: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '8ac38784-4a6c-4a9e-8be2-c98128c2a297', $2 = NULL, $3 = 'f' LOG: duration: 5088.718 ms execute S_10: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '921b887f-0110-472a-ad2f-439bdd61cfc5', $2 = NULL, $3 = 'f' LOG: duration: 1996.767 ms execute S_3: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = 'e52b853c-fd23-49c0-8182-90e5c5726277', $2 = NULL, $3 = 'f' LOG: duration: 1414.949 ms execute S_11: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '69c6fbca-cd24-4a42-8e6a-accadf55436d', $2 = NULL, $3 = 'f' LOG: duration: 1699.736 ms execute S_2: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '98fdadf7-4bea-489f-b3ff-96c25e662a2d', $2 = NULL, $3 = 'f' LOG: duration: 2528.628 ms execute S_2: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = 'd9cf59cc-a8f0-4b29-af5a-2ba0c03fdec7', $2 = NULL, $3 = 'f' LOG: duration: 1034.817 ms execute S_1: select * from getdisksvmguid($1, $2, $3) DETAIL: parameters: $1 = '5e3204fc-1271-4317-a516-d2958eae3cd6', $2 = NULL, $3 = 'f' LOG: duration: 1109.804 ms execute <unnamed>: select * from getstorage_domains_by_storagepoolid($1, $2, $3) DETAIL: parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f' LOG: duration: 1071.572 ms execute S_25: select * from getstorage_domains_by_storagepoolid($1, $2, $3) DETAIL: parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f' LOG: duration: 1262.678 ms execute S_25: select * from getstorage_domains_by_storagepoolid($1, $2, $3) DETAIL: parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f' LOG: duration: 1680.993 ms execute S_27: select * from getstorage_domains_by_storagepoolid($1, $2, $3) DETAIL: parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f' LOG: duration: 1221.055 ms execute S_24: select * from getstorage_domains_by_storagepoolid($1, $2, $3) DETAIL: parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f' TOP - on Remote DB Machine: ******************************** top - 15:11:26 up 5 days, 5:04, 4 users, load average: 16.58, 16.64, 16.89 Tasks: 369 total, 16 running, 353 sleeping, 0 stopped, 0 zombie Cpu(s): 99.5%us, 0.1%sy, 0.0%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 32870284k total, 3148328k used, 29721956k free, 154756k buffers Swap: 16506872k total, 0k used, 16506872k free, 887716k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27399 postgres 20 0 255m 64m 25m R 90.7 0.2 24:33.73 postmaster 27405 postgres 20 0 239m 56m 26m R 88.7 0.2 19:18.91 postmaster 27390 postgres 20 0 255m 73m 28m R 81.5 0.2 18:56.06 postmaster 27394 postgres 20 0 239m 54m 25m R 80.8 0.2 22:02.85 postmaster 27402 postgres 20 0 256m 72m 26m R 80.8 0.2 23:10.96 postmaster 27408 postgres 20 0 255m 72m 26m R 80.8 0.2 22:51.95 postmaster 27415 postgres 20 0 256m 72m 26m R 80.1 0.2 21:09.54 postmaster 27395 postgres 20 0 256m 72m 26m R 78.8 0.2 22:50.98 postmaster 27409 postgres 20 0 239m 55m 26m R 77.8 0.2 16:50.49 postmaster 27398 postgres 20 0 240m 56m 26m R 77.5 0.2 23:00.27 postmaster 27411 postgres 20 0 254m 70m 25m R 75.8 0.2 22:34.93 postmaster 27416 postgres 20 0 252m 68m 26m R 73.2 0.2 21:04.19 postmaster 27417 postgres 20 0 235m 50m 25m R 70.2 0.2 16:26.87 postmaster 27400 postgres 20 0 252m 68m 26m S 53.3 0.2 19:51.41 postmaster 27410 postgres 20 0 249m 62m 25m R 48.7 0.2 4:43.03 postmaster 27413 postgres 20 0 250m 65m 26m R 38.7 0.2 24:23.52 postmaster 27406 postgres 20 0 255m 68m 25m S 15.6 0.2 19:50.49 postmaster