Description of problem: To perform sort and avg function a big data sets, postgres needs memory. When the memory is out, it will create temp files to store the intermediate data. Observing an installation with ~60 hosts and few hundreds of vms, I see hundreds of temp files, with gigs of info, create over a period of an hour. On pg 9.2 the work_mem parameter is 1mb which is very low for this kind of workload. pg 9.5 default work_mem is 4mb. Changing the value to 8mb resulted almost no usage of temp file, and no visible increase of mem usage. Probably it is freed fast enough or just low number of client connections needs it (dashboard + 2 dwhd connections?) 4mb may be too low but okay for small installations 8mb did real difference. 16mb may completely remove temp files, but may be if a lot of connections will need the memory for complex queries.
So I assume your suggesting is to change the setting to 8mb? If so, please change the title accordingly to reflect what's requested in this bug. Thanks!
BTW, it's also interesting that the files are created on /usr/local/... - which means on the local disk, even if the PG DB itself is stored on a fast shared storage of some kind!
Oved I'd like to describe the problem in the subject and not the solution, cause this might lead to other solution (or no solution at all :) ) (In reply to Yaniv Kaul from comment #2) > BTW, it's also interesting that the files are created on /usr/local/... - > which means on the local disk, even if the PG DB itself is stored on a fast > shared storage of some kind! It is an interesting trade-off and really depends on on the machine setup. Since the setting is per instance, potentially this may be a case were we will want to have 2 different pg instances, one for DWH and one for engine. Currently I don't see engine connections utilizing the work_mem but this may change when if we will change a frequent query in the engine.
Roy, are you pushing this to 4.2?
Let me clear this out with mperina and emesika first, cause I was under the impression they are going to.
Here is an excerpt from a pgbadger report analysis (will attach parts of it to this bug) from a system with 30/700/1300 hosts/vms/disks Temporary Files Activity Temp Files Day Hour Count Total size Average size Oct 31 15 660 5.76 GiB 8.94 MiB 16 1,418 11.70 GiB 8.45 MiB 17 715 2.33 GiB 3.34 MiB Top queries responsible for that is *audit_log*, *tags*, and *dashboard queries* (naturally needs sorting and hashing to produce the charts) Queries generating the most temporary files (N) Rank Count Total size Min size Max size Avg size Query *1 748 15.99 GiB 21.86 MiB 21.91 MiB 21.89 MiB * SELECT * FROM ( ( SELECT DISTINCT audit_log.* FROM audit_log WHERE NOT deleted ) ORDER BY audit_log_id DESC ) AS t1 offset ( 0 0 ) LIMIT 0; *2 97 1.51 GiB 15.91 MiB 15.91 MiB 15.91 MiB * SELECT upper ( CAST ( entity_id AS char ( 0 ) ) ) AS entity_join_id, upper ( CAST ( parent_id AS char ( 0 ) ) ) AS parent_join_id FROM v4_1_latest_tags_relations upper ( CAST ( parent_id AS char ( 0 ) ) ) AS parent_join_id FROM v4_1_latest_tags_relations upper ( CAST ( parent_id AS char ( 0 ) ) ) AS parent_join_id FROM v4_1_latest_tags_relations; *3 92 800.98 MiB 4.82 KiB 38.48 MiB 8.71 MiB * SELECT name, avg ( total ) AS total, avg ( used ) AS used, max ( previous_used ) AS previous_used, max ( previous_total ) AS previous_total, max ( vm_disk_usage_percent ) AS usage_percent FROM ( SELECT history_datetime, vms.vm_name AS name, sum ( vm_disk_size_mb ) AS total, sum ( vm_disk_actual_size_mb ) AS used, COALESCE ( max ( previous_actual_size ), 0 ) AS previous_used, COALESCE ( max ( previous_total_size ), 0 ) AS previous_t
Note: The impact of this settings is that each connection can allocate this amount of memory to perform the query. Before we go on and increase this number to make sure any installation size would fit into the work_mem we first optimize the queries to consume less memory(is sorting needed, is merging needed, do we have the right indexes to work on the proper dataset size etc)
Created attachment 1355054 [details] PgBadger report demonstaring the size of temp files generated
Created attachment 1355055 [details] Number of temp files generated
Created attachment 1355056 [details] Temporary files activity
Further analysis show we are generating sql query that does distinct * event though we don't need to. This generates very large structure in order to sort it to make sure we are distinct. See this explain analyze with distinct * Explain Analyze for audit_log query generate by the search[1] with 143,000 records: # with distinct audit_log.* : cost=149597 rows=142941 width=2091 # with distinct audit_log.audit_log_id : cost=14809 rows=142941 width=8 # data set size to process (rows * width / 1024 / 1024) : 285mb vs 1mb [1] EXPLAIN ANALYZE SELECT * FROM ( ( SELECT DISTINCT audit_log.* FROM audit_log WHERE NOT deleted ) ORDER BY audit_log_id DESC ) AS t1 LIMIT 100;
correction for comment 11 - the 'distinct' keyword needs to go away and not adding a field to the distinct expression First part of the solution is to eliminate useless distinct on search queries for tables/view which are unique by they're very definition. This alone creates a far more small dataset and demands few resources. Since search queries, speciially for audit_log, are common queries the UI fires (nt only) it should be a nice improvement. Example of one enviornment with ~150,000 records, the differences are notable. Each record is a query, each column result is with/without distinct usage: search query: "Events:" avg query time: 400ms/90ms (3x-4x improvement) Disk usage Sort: 45456kb/0kb Query plan: Sorting and merging/Index Scan
On scale should be easily reproducible, thx, P.
Does this change the minimum and recommended requirements in terms of memory to be allocated for the engine?
I don't think so but we need the infra/scale QE to verify my analysis. The production setup I was tweaking + my dev setup didn't show increased consumption yet.
On a system with 4,000 Vms and 400 hosts work_mem is set to 8MB and tmp folder is mounted to the main storage, I can see very little temp_files activity,thus verified the bug. postgres=# SHOW work_mem; work_mem ---------- 8MB (1 row) postgres=# select datname,temp_files,temp_bytes from pg_stat_database; datname | temp_files | temp_bytes -----------+------------+------------ template1 | 0 | 0 template0 | 0 | 0 postgres | 0 | 0 engine | 135 | 168329476 DB server -=>>ls -ltr /var/opt/rh/rh-postgresql95/lib/pgsql/data/base/pgsql_tmp/ total 0 DWH server -=>>ls -ltr /mnt/dwh_custom_pg/pg_data/base/pgsql_tmp/ total 0
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.