Description of problem: Provide cron script to execute vacuum full on engine and dwh databases, by default execute each night Actual results: VACUUM FULL is needed to be run manually Expected results: Automate VACUUM FULL execution Additional info:
I'd solve this within two parts: 1. Create script executing VACUUM FULL on db specified by command line parameters (so we can distinguish between engine and DWH db if needed) 2. Add module to engine-setup, which will configure automatic execution of this script for each db separately each day for example around 2am and show info about that to users, so they can adjust the time as needed
I had this running on a live system: - The full vacuum performed on rhev.tlv reclaimed 1G of space - It elapsed around 150-200 seconds for the whole db - It elapsed 40s for the audit_log table only (the biggest table) - The cron job created at /etc/cron.d/pg_vacuum ``` # Run FULL vacuum on the engine db at 02:00 0 2 * * * postgres psql engine -c 'VACUUM (VERBOSE, FULL, ANALYZE)' ``` note: before running full vacuum we should validate we have enough available space which is bigger than the biggest table. Get biggest tables: ``` bash-4.2$ psql engine -c '\dt+ public.*' | awk -F "|" '{print $5 _ $2}' | egrep "MB|GB" ```
(In reply to Roy Golan from comment #2) > I had this running on a live system: > > - The full vacuum performed on rhev.tlv reclaimed 1G of space > - It elapsed around 150-200 seconds for the whole db > - It elapsed 40s for the audit_log table only (the biggest table) > - The cron job created at /etc/cron.d/pg_vacuum > > ``` > # Run FULL vacuum on the engine db at 02:00 > 0 2 * * * postgres psql engine -c 'VACUUM (VERBOSE, FULL, ANALYZE)' > ``` > > note: before running full vacuum we should validate we have enough available > space which is bigger than the biggest table. > > Get biggest tables: > ``` > bash-4.2$ psql engine -c '\dt+ public.*' | awk -F "|" '{print $5 _ $2}' | > egrep "MB|GB" > ``` Any news on how long it takes now that it runs regularly on RHEV.TLV setup? Any stats for DWH?
This is running nightly over 2 weeks, for some reason I have only this night durations in the logs. The initial run was 5-8 times longer. # Per table - duration: 5640.865 ms vm_statistics - duration: 14506.883 ms vds_interface_statistics - duration: 43135.406 ms audit_log - duration: 2593.314 ms vm_dynamic # Overall ~2 minutes 2016-11-24 02:01:54 IST [326]: [7-1] db=engine,user=postgres,app=psql,client=[local] LOG: duration: 110753.654 ms statement: VACUUM (VERBOSE, FULL, ANALYZE) # slowest is audit_log took 50s
(In reply to Roy Golan from comment #4) > This is running nightly over 2 weeks, for some reason I have only this night > durations in the logs. The initial run was 5-8 times longer. > > # Per table > - duration: 5640.865 ms vm_statistics > - duration: 14506.883 ms vds_interface_statistics > - duration: 43135.406 ms audit_log > - duration: 2593.314 ms vm_dynamic > > # Overall ~2 minutes > 2016-11-24 02:01:54 IST [326]: [7-1] > db=engine,user=postgres,app=psql,client=[local] LOG: duration: 110753.654 > ms statement: VACUUM (VERBOSE, FULL, ANALYZE) > > # slowest is audit_log took 50s This probably means it needs to run indeed daily. i'd look at what we can do for audit log (partitioning?), as it is taking ~half of the time of the vacuuming. How much space have we saved (it'll be interesting to see what is our performance of vacuum). (note that in newer PG releases, vacuum performance has increased, AFAIK). Y.
Having some seconds thoughts on it following re-reading [1] and talking with didi. We can't ignore that running this nightly also means downtime. Locking tables for minutes isn't not a good thing. Users who are batching jobs during the night may not find this acceptable. Also, if the autovacuum is tuned correctly, we may not need to reclaim the space, only if the disk is flooded. So perhaps what we should do is to supply it as a tool, but prevent from using scheduling it. [1] Postgres wiki suggest not to use FULL for routine maintenance https://wiki.postgresql.org/wiki/VACUUM_FULL
(In reply to Roy Golan from comment #6) > Having some seconds thoughts on it following re-reading [1] and talking with > didi. We can't ignore that running this nightly also means > downtime. Locking tables for minutes isn't not a good thing. Users who are > batching jobs during the night may not find this acceptable. Also, if the > autovacuum is tuned correctly, we may not need to reclaim the space, only if > the disk is flooded. > > So perhaps what we should do is to supply it as a tool, but prevent from > using scheduling it. > > [1] Postgres wiki suggest not to use FULL for routine maintenance > https://wiki.postgresql.org/wiki/VACUUM_FULL What about doing it during upgrade?
(In reply to Yaniv Kaul from comment #7) > What about doing it during upgrade? Makes sense, optionally. We should still provide a tool. Some users prefer to collect several different time-consuming actions to a single downtime event, some prefer to spread them. IMO.
BTW, it will be very useful to provide some report as well, to help the user decide if/when this should be done. E.g. one or more of: "Your database is XX% fragmented" (not sure what this means exactly, but that's what most defrag tools report) "The tables with the highest number of fragments are: table number of fragments audit_log 1000 vm_static 10 ... " "You have XX GB of unused, but allocated, free space". Of course, only if this requires no downtime/locking and does not take too long (even on a large fragmented db).
Example script - http://postgresql.freeideas.cz/script-vacuum-full-tables/ (was hoping to find an Ansible that does that - did not find yet).
postgres ships a tools for that 'vacuumdb' so combined with the /usr/share/ovirt-engine/dbscripts/engine-psql.sh we should have all we need
(In reply to Roy Golan from comment #11) > postgres ships a tools for that 'vacuumdb' so combined with the > /usr/share/ovirt-engine/dbscripts/engine-psql.sh we should have all we need This script is downstream-only. If we only do this from engine-setup, we already have there infrastructure for handling the db. We mainly need to decide what we want.
Created attachment 1236859 [details] Vacuum fails log When running engine-setup with full vacuum it may stuck in a deadlock, Happened twice on a new and old database. After force breaking the setup the database cannot be restored.
(In reply to Shahar Havivi from comment #13) > Created attachment 1236859 [details] > Vacuum fails log > > When running engine-setup with full vacuum it may stuck in a deadlock, > Happened twice on a new and old database. > After force breaking the setup the database cannot be restored. Thanks a lot for reporting that. Does the restore from dump of the install fails? Please add your install log and db version. Next time if you ran into it please add the following outputs if you can: - select * from pg_stat_activity - select * from pg_locks - ps -fLU postgres Might be that the setup is leaving something open (psycop?) needs further investigation. CI tests with upgrade are WIP and should be active very soon so hopefully we would be able to catch more of this, if it happens.
(In reply to Roy Golan from comment #14) (PostgreSQL) 9.4.4 the log is attachment 1236859 [details] the restore from dump failed as well, I installed a new database so I can't post the queries.
verified in ovirt-engine-setup-4.1.1.2-0.1.el7.noarch