Bug 1412626
| Summary: | [scale] High Database Load after updating to oVirt 4.0.4 (select * from getdisksvmguid) | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | christian.grundmann | ||||||||||||||||||
| Component: | Database.Core | Assignee: | Liron Aravot <laravot> | ||||||||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | eberman | ||||||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||
| Version: | 4.0.4 | CC: | amureini, bugs, christian.grundmann, tnisan | ||||||||||||||||||
| Target Milestone: | ovirt-4.1.1 | Keywords: | Performance | ||||||||||||||||||
| Target Release: | 4.1.1.3 | Flags: | rule-engine:
ovirt-4.1+
rule-engine: exception+ |
||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||
| Last Closed: | 2017-04-21 09:30:39 UTC | Type: | Bug | ||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
| Embargoed: | |||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
|
Description
christian.grundmann
2017-01-12 13:35:57 UTC
Allon, can you please have a look? Hi Christian, can you please specify what version of postgres are you using? additionally - did you try to reproduce the issue on more then one setup? Additionally, I'd appreciate if you could run EXPLAIN ANALYZE for that query on your setup. Hi,
did not install postgres per Hand, so i think it was installed by engine?
rpm -qa |grep postgres
postgresql-libs-9.2.18-1.el7.x86_64
postgresql-9.2.18-1.el7.x86_64
postgresql-jdbc-9.2.1002-5.el7.noarch
postgresql-server-9.2.18-1.el7.x86_64
Can't tell which Parameters used for the Original Query used 'template guid' 't', NULL, 'f' like in the other bug?
engine=> EXPLAIN ANALYZE select * from getdisksvmguid('3987798e-8d6e-4da2-b66d-81cd08d5dc41', 't', NULL, 'f');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Function Scan on getdisksvmguid (cost=0.25..10.25 rows=1000 width=3107) (actual time=46.452..46.453 rows=1 loops=1)
Total runtime: 46.515 ms
(2 rows)
but thats in 4.0.3 (where the Problem doesn't happen) as a can't have my engine on 4.0.6
I also have only that System no others to try
Thanks christian,
I'd appreciate if you could create a db dump from you setup and attach it or alternatively -
1. Create a db dump of your setup
2. Create a new db and restore the data from the dump to it
3. Run the following and save the output:
a. explain (ANALYZE true, COSTS true) select * from all_disks_for_vms;
b. explain (ANALYZE true, COSTS true) select * from getdisksvmguid('3987798e-8d6e-4da2-b66d-81cd08d5dc41', 't', NULL, 'f');
4. Create the VIEW as defined in 4.0.6 (see the attached sql file).
5. Run the queries in step 3 again and save the output.
Additionally, I'd appreciate if you could answer the following:
1. Did you try multiple times? Did you manage to reproduce always or only in some of the times?
2. Do you have the engine log from the last time you tried that on 4.0.6? I'd appreciate if you could attach it as well.
Created attachment 1245581 [details]
create view sql
Clarification - run all the tests on the cloned database of course. Created attachment 1245833 [details]
alldisks_post_view
Created attachment 1245834 [details]
alldisks_pre_view
Created attachment 1245835 [details]
getdisksvmguid_post_view
Created attachment 1245836 [details]
getdisksvmguid_pre_view
Attached pre_view Queries from original Dump and post_View Queries after creating the views 1. Did you try multiple times? Did you manage to reproduce always or only in some of the times? I am creating a lot of VMs during the day so the problem was directly after the update so I did the downgrade without trying around a log. 2. Do you have the engine log from the last time you tried that on 4.0.6? I'd appreciate if you could attach it as well. no did not save the logs Thanks Christian, The problem cause is that after the changes made in BZ 1302752 the postgres optimizer chooses to perform join and then to reduce the rows instead of performing seq scan - when the rows count in those table is very big that results in worse performance as can be seen in [1]. Generally speaking, the postgres optimizer optimally should've generated the same plan for both cases (EXISTS/LEFT JOIN) - which means that for both cases it should've perform seq scan - it may be that because of the complexity of the query that doesn't happen. Any chance you can install the latest stable release of postgres for your os on a different machine, restore your database dump there and to perform the test again? Thanks, Liron [1] - ------------------------------------------------------------------------------- old view (using EXISTS) - -> -> Seq Scan on snapshots (cost=0.00..6.27 rows=1 width=0) (actual time=0.012..0.012 rows=0 loops=292) Filter: ((public.images.image_group_id = memory_metadata_disk_id) OR (public.images.image_group_id = memory_dump_disk_id)) Rows Removed by Filter: 151 .. Total runtime: 10.278 ms Rows Removed by Filter: 151 ------------------------------------------------------------------------------- Updated view (Using joins)- we can see 44092 rows removed. ------------------------------------------------------------------------------- -> Nested Loop Left Join (cost=0.00..788.16 rows=1 width=119) (actual time=0.125..5.806 rows=292 loops=1) Join Filter: ((public.images.image_group_id = memory_snap.memory_metadata_disk_id) OR (public.images.image_group_id = memory_snap.memory_dump_disk_id)) Rows Removed by Join Filter: 44092 .. Total runtime: 433.765 ms ------------------------------------------------------------------------------- Created attachment 1245920 [details]
Zip with Postgres 9.6 Reports
in Produktion i have postgresql 9.2.18 I rerun the Tests with postgresql 9.6.1 (attached zip) but its only a small improvement 345.151 ms vs. 10.700 ms in ovirt 4.0.3 Thanks Chrisitan, I've also tried to simulate the case in my local env as well (with 2000 disks/2000 snapshots) using postgres 9.5, the optimizer did optimize the query for both to use the seq scan instead of performing a join - therefore the generated query plan is related to your data (and we need to let the postgres developers to check if they have an issue here, if you could open a postgres bug on that it'll be great as they may need more data from you). To not depend on the postgres implementation for that case, I'll change the query to avoid the join after testing that it works in high scale system as expected and not causing to a performance degradation. Thanks for your input, Liron (In reply to Liron Aravot from comment #17) > Thanks Chrisitan, > I've also tried to simulate the case in my local env as well (with 2000 > disks/2000 snapshots) using postgres 9.5, the optimizer did optimize the > query for both to use the seq scan instead of performing a join - therefore > the generated query plan is related to your data (and we need to let the > postgres developers to check if they have an issue here, if you could open a > postgres bug on that it'll be great as they may need more data from you). > > To not depend on the postgres implementation for that case, I'll change the > query to avoid the join after testing that it works in high scale system as > expected and not causing to a performance degradation. > > Thanks for your input, > Liron Hi, i don't think i understand the problem enough to write the postgres bug report. Would be greate if you could do it. If more input from me is necessary I can provide it. Thx Christian Hi Christian, The optimizer wrong planning should be caused by the fact that your db statistics aren't updated -
In order to confirm that, can you please perform the following on the cloned db you created and we tested on before:
1. Create the VIEW as defined in 4.0.6 (see the attached sql file).
2. attach the output of: select * from pg_stat_all_tables;
2. run VACUUM ANALYZE;
3. run
a. explain (ANALYZE true, COSTS true) select * from all_disks_for_vms;
b. explain (ANALYZE true, COSTS true) select * from
getdisksvmguid('3987798e-8d6e-4da2-b66d-81cd08d5dc41', 't', NULL, 'f');
and attach the output?
thanks.
Created attachment 1248994 [details]
Zip with Reports after Analyze
(In reply to christian.grundmann from comment #20) > Created attachment 1248994 [details] > Zip with Reports after Analyze Hi Christian,could you update if issue was solved ? explain (ANALYZE true, COSTS true) select * from all_disks_for_vms;
Total runtime: 43.922 ms
explain (ANALYZE true, COSTS true) select * from getdisksvmguid('3987798e-8d6e-4da2-b66d-81cd08d5dc41', 't', NULL, 'f');
Total runtime: 40.331 ms
Times are from Production not the Clone DB
works for me on oVirt 4.1.1.6-1.el7.centos
(In reply to christian.grundmann from comment #22) > explain (ANALYZE true, COSTS true) select * from all_disks_for_vms; > Total runtime: 43.922 ms > explain (ANALYZE true, COSTS true) select * from > getdisksvmguid('3987798e-8d6e-4da2-b66d-81cd08d5dc41', 't', NULL, 'f'); > Total runtime: 40.331 ms > > Times are from Production not the Clone DB > > works for me on oVirt 4.1.1.6-1.el7.centos moving to verified thank you! |