Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 879930 - ovirt-engine-backend [Scalability]: The queries getstorage_domains_by_storagepoolid && getdisksvmguid caused postmaster processes to consume constantly 100%cpu.
ovirt-engine-backend [Scalability]: The queries getstorage_domains_by_storage...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
high Severity urgent
: ---
: 3.2.0
Assigned To: mkublin
vvyazmin@redhat.com
infra
: TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-25 10:33 EST by Omri Hochman
Modified: 2015-09-22 09 EDT (History)
12 users (show)

See Also:
Fixed In Version: sf1
Doc Type: Bug Fix
Doc Text:
The getdisksvmguid query from the GetVmStatsVDSCommand was run on every running virtual machine, causing the postmaster processes on remote database to consume 100% CPU power. This query is now no longer run, which reduces the CPU usage to 30% when GetVmStatsVDSCommand is run.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-10 17:23:25 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
pg_log (288.47 KB, application/octet-stream)
2012-11-25 10:33 EST, Omri Hochman
no flags Details
engine.log (2.79 MB, application/octet-stream)
2012-11-25 10:34 EST, Omri Hochman
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 9468 None None None Never
Red Hat Product Errata RHSA-2013:0888 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.2 update 2013-06-10 20:55:41 EDT

  None (edit)
Description Omri Hochman 2012-11-25 10:33:03 EST
Created attachment 651553 [details]
pg_log

ovirt-engine-backend[Scalability]: The queries getstorage_domains_by_storagepoolid && getdisksvmguid caused postmaster processes to consume constantly  100%cpu.    

Description:
*************
On Scale environment (details below), postmaster processes on the remote DB physical machine consume constantly 100% cpu, investigate pg_log for queries  that takes more than 1 second to return showed that the following :  getstorage_domains_by_storagepoolid && getdisksvmguid are very frequent and takes long time. 

RHEVM Environment:
*******************
- RHEVM (Build IC24.4) installed on physical  
- Postgresql remote DB on another physical machine.

Objects in RHEVM: 
*****************
- Total 31 Hosts. 
- Total 50 iSCSI Storage Domains +  1 ISO + 1 Export
- Total 1400+ running XP VM's (1NIC ,1HD) ,
- 2300 Users/Groups

pg_log (queries that took more than 1 second) :
**********************************************
LOG:  duration: 3339.596 ms  execute S_8: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '4c67e140-55ad-4371-a1c1-dd0f80ae5623', $2 = NULL, $3 = 'f'
LOG:  duration: 3681.452 ms  execute S_2: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '8ddcec08-f7ed-4890-98ec-d510dac88c5f', $2 = NULL, $3 = 'f'
LOG:  duration: 5153.521 ms  execute S_8: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '1f116259-f701-4620-bd78-bdbba46da6c7', $2 = NULL, $3 = 'f'
LOG:  duration: 1607.459 ms  execute S_2: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '4700471d-6d82-4bb3-ba94-09f28879c880', $2 = NULL, $3 = 'f'
LOG:  duration: 4932.313 ms  execute S_1: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '76d207cd-eee9-41df-beb4-d59cfea75ed8', $2 = NULL, $3 = 'f'
LOG:  duration: 4033.496 ms  execute S_2: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '8ac38784-4a6c-4a9e-8be2-c98128c2a297', $2 = NULL, $3 = 'f'
LOG:  duration: 5088.718 ms  execute S_10: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '921b887f-0110-472a-ad2f-439bdd61cfc5', $2 = NULL, $3 = 'f'
LOG:  duration: 1996.767 ms  execute S_3: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = 'e52b853c-fd23-49c0-8182-90e5c5726277', $2 = NULL, $3 = 'f'
LOG:  duration: 1414.949 ms  execute S_11: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '69c6fbca-cd24-4a42-8e6a-accadf55436d', $2 = NULL, $3 = 'f'
LOG:  duration: 1699.736 ms  execute S_2: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '98fdadf7-4bea-489f-b3ff-96c25e662a2d', $2 = NULL, $3 = 'f'
LOG:  duration: 2528.628 ms  execute S_2: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = 'd9cf59cc-a8f0-4b29-af5a-2ba0c03fdec7', $2 = NULL, $3 = 'f'
LOG:  duration: 1034.817 ms  execute S_1: select * from  getdisksvmguid($1, $2, $3)
DETAIL:  parameters: $1 = '5e3204fc-1271-4317-a516-d2958eae3cd6', $2 = NULL, $3 = 'f'
LOG:  duration: 1109.804 ms  execute <unnamed>: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
DETAIL:  parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f'
LOG:  duration: 1071.572 ms  execute S_25: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
DETAIL:  parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f'
LOG:  duration: 1262.678 ms  execute S_25: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
DETAIL:  parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f'
LOG:  duration: 1680.993 ms  execute S_27: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
DETAIL:  parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f'
LOG:  duration: 1221.055 ms  execute S_24: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
DETAIL:  parameters: $1 = '4a90a284-adbb-465c-bffd-e1703b2c5a66', $2 = NULL, $3 = 'f'



TOP - on Remote DB Machine:
********************************
top - 15:11:26 up 5 days,  5:04,  4 users,  load average: 16.58, 16.64, 16.89
Tasks: 369 total,  16 running, 353 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.5%us,  0.1%sy,  0.0%ni,  0.3%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  32870284k total,  3148328k used, 29721956k free,   154756k buffers
Swap: 16506872k total,        0k used, 16506872k free,   887716k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                      
27399 postgres  20   0  255m  64m  25m R 90.7  0.2  24:33.73 postmaster                                                                                                   
27405 postgres  20   0  239m  56m  26m R 88.7  0.2  19:18.91 postmaster                                                                                                   
27390 postgres  20   0  255m  73m  28m R 81.5  0.2  18:56.06 postmaster                                                                                                   
27394 postgres  20   0  239m  54m  25m R 80.8  0.2  22:02.85 postmaster                                                                                                   
27402 postgres  20   0  256m  72m  26m R 80.8  0.2  23:10.96 postmaster                                                                                                   
27408 postgres  20   0  255m  72m  26m R 80.8  0.2  22:51.95 postmaster                                                                                                   
27415 postgres  20   0  256m  72m  26m R 80.1  0.2  21:09.54 postmaster                                                                                                   
27395 postgres  20   0  256m  72m  26m R 78.8  0.2  22:50.98 postmaster                                                                                                   
27409 postgres  20   0  239m  55m  26m R 77.8  0.2  16:50.49 postmaster                                                                                                   
27398 postgres  20   0  240m  56m  26m R 77.5  0.2  23:00.27 postmaster                                                                                                   
27411 postgres  20   0  254m  70m  25m R 75.8  0.2  22:34.93 postmaster                                                                                                   
27416 postgres  20   0  252m  68m  26m R 73.2  0.2  21:04.19 postmaster                                                                                                   
27417 postgres  20   0  235m  50m  25m R 70.2  0.2  16:26.87 postmaster                                                                                                   
27400 postgres  20   0  252m  68m  26m S 53.3  0.2  19:51.41 postmaster                                                                                                   
27410 postgres  20   0  249m  62m  25m R 48.7  0.2   4:43.03 postmaster                                                                                                   
27413 postgres  20   0  250m  65m  26m R 38.7  0.2  24:23.52 postmaster                                                                                                   
27406 postgres  20   0  255m  68m  25m S 15.6  0.2  19:50.49 postmaster
Comment 1 Omri Hochman 2012-11-25 10:34:36 EST
Created attachment 651554 [details]
engine.log
Comment 2 mkublin 2012-11-26 07:54:19 EST
I have a patch that was tested on that environment, we checked it with Omri and it shows good results
Comment 3 mkublin 2012-11-27 02:39:02 EST
http://gerrit.ovirt.org/#/c/9468/ , patch upstream.
Comment 4 mkublin 2012-11-28 02:40:37 EST
Merged upstream
Comment 6 mkublin 2012-12-12 02:34:13 EST
I can back port these changes easily to downstream version 3.0.x and 3.1.x
Comment 12 Cheryn Tan 2013-04-03 02:51:16 EDT
This bug is currently attached to errata RHEA-2013:14491. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.

* Consequence: What happens when the bug presents.

* Fix: What was done to fix the bug.

* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes

Thanks in advance.
Comment 13 vvyazmin@redhat.com 2013-04-19 01:14:37 EDT
No issues are found

Verified on RHEVM 3.2 - SF13.1 environment:

RHEVM: rhevm-3.2.0-10.19.beta2.el6ev.noarch
VDSM: vdsm-4.10.2-15.0.el6ev.x86_64
LIBVIRT: libvirt-0.10.2-18.el6_4.3.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64
SANLOCK: sanlock-2.6-2.el6.x86_64


Tested on environment with 800 VM's and 52 hosts (50 of them was fake host) on FC and iSCSI Data Center
Comment 14 errata-xmlrpc 2013-06-10 17:23:25 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0888.html

Note You need to log in before you can comment on or make changes to this bug.