Bug 1791111
| Summary: | cleanup-data-bunch is really slow | ||
|---|---|---|---|
| Product: | [Community] Spacewalk | Reporter: | vandry.jeanmathieu |
| Component: | Server | Assignee: | Michael Mráka <mmraka> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Satellite QA List <satqe-list> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.9 | CC: | vandry.jeanmathieu |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | spacewalk-java-2.10.19-1 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-03-19 12:24:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1802137 | ||
Fixed in spacewalk git by
commit 0e7249a4260d0a460c7bde85b69170d81655d8d5
1791111 - improved performance of cleanup-data-bunch
Spacewalk 2.10 has been released. https://github.com/spacewalkproject/spacewalk/wiki/ReleaseNotes210 |
Description of problem: When the task cleanup-data-bunch is running on our spacewalk it can go for more then 12h before finishing. Version-Release number of selected component (if applicable): 2.9 with postgresql10 How reproducible: explain analyze verbose delete from rhnpackagechangelogdata WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnpackagechangelogrec ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Delete on public.rhnpackagechangelogdata (cost=0.44..161614354955.65 rows=363380 width=6) (actual time=331675751.850..331675751.851 rows=0 loops=1) -> Seq Scan on public.rhnpackagechangelogdata (cost=0.44..161614354955.65 rows=363380 width=6) (actual time=331675751.846..331675751.847 rows=0 loops=1) Output: rhnpackagechangelogdata.ctid Filter: (NOT (SubPlan 1)) Rows Removed by Filter: 727584 SubPlan 1 -> Materialize (cost=0.44..444568.04 rows=73893 width=6) (actual time=0.004..218.664 rows=363792 loops=727584) Output: rhnpackagechangelogrec.changelog_data_id -> Unique (cost=0.44..443909.58 rows=73893 width=6) (actual time=0.049..4399.265 rows=727584 loops=1) Output: rhnpackagechangelogrec.changelog_data_id -> Index Only Scan using rhn_pkg_clr_cld_uq on public.rhnpackagechangelogrec (cost=0.44..395645.44 rows=19305654 width=6) (actual time=0.049..2145.520 rows=19305525 loops=1) Output: rhnpackagechangelogrec.changelog_data_id Heap Fetches: 9078 Planning time: 0.704 ms Execution time: 331675756.156 ms Steps to Reproduce: explain analyze verbose delete from rhnpackagechangelogdata WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnpackagechangelogrec ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Delete on public.rhnpackagechangelogdata (cost=0.44..161614354955.65 rows=363380 width=6) (actual time=331675751.850..331675751.851 rows=0 loops=1) -> Seq Scan on public.rhnpackagechangelogdata (cost=0.44..161614354955.65 rows=363380 width=6) (actual time=331675751.846..331675751.847 rows=0 loops=1) Output: rhnpackagechangelogdata.ctid Filter: (NOT (SubPlan 1)) Rows Removed by Filter: 727584 SubPlan 1 -> Materialize (cost=0.44..444568.04 rows=73893 width=6) (actual time=0.004..218.664 rows=363792 loops=727584) Output: rhnpackagechangelogrec.changelog_data_id -> Unique (cost=0.44..443909.58 rows=73893 width=6) (actual time=0.049..4399.265 rows=727584 loops=1) Output: rhnpackagechangelogrec.changelog_data_id -> Index Only Scan using rhn_pkg_clr_cld_uq on public.rhnpackagechangelogrec (cost=0.44..395645.44 rows=19305654 width=6) (actual time=0.049..2145.520 rows=19305525 loops=1) Output: rhnpackagechangelogrec.changelog_data_id Heap Fetches: 9078 Planning time: 0.704 ms Execution time: 331675756.156 ms Actual results: Execution time: 331675756.156 ms Expected results: explain analyze verbose delete from rhnpackagechangelogdata where not exists ( select distinct 1 from rhnpackagechangelogrec where rhnpackagechangelogrec.changelog_data_id = rhnpackagechangelogdata.id ); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Delete on public.rhnpackagechangelogdata (cost=0.44..361393.90 rows=652868 width=12) (actual time=3201.388..3201.388 rows=0 loops=1) -> Nested Loop Anti Join (cost=0.44..361393.90 rows=652868 width=12) (actual time=3201.385..3201.385 rows=0 loops=1) Output: rhnpackagechangelogdata.ctid, rhnpackagechangelogrec.ctid -> Seq Scan on public.rhnpackagechangelogdata (cost=0.00..26162.61 rows=726761 width=12) (actual time=0.005..343.214 rows=727584 loops=1) Output: rhnpackagechangelogdata.ctid, rhnpackagechangelogdata.id -> Index Scan using rhn_pkg_clr_cld_uq on public.rhnpackagechangelogrec (cost=0.44..5.59 rows=261 width=12) (actual time=0.004..0.004 rows=1 loops=727584) Output: rhnpackagechangelogrec.ctid, rhnpackagechangelogrec.changelog_data_id Index Cond: (rhnpackagechangelogrec.changelog_data_id = rhnpackagechangelogdata.id) Planning time: 0.263 ms Execution time: 3201.425 ms Additional info: By changing the query from: delete from rhnpackagechangelogdata WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnpackagechangelogrec ); to: delete from rhnpackagechangelogdata where not exists ( select distinct 1 from rhnpackagechangelogrec where rhnpackagechangelogrec.changelog_data_id = rhnpackagechangelogdata.id ); it speed up things and make more sense in term of speed. https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_NOT_IN