Bug 1388430 - [RFE] Provide a tool to execute vacuum full on engine database
Summary: [RFE] Provide a tool to execute vacuum full on engine database
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Setup.Engine
Version: 4.0.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.1.1
: 4.1.1.2
Assignee: Roy Golan
QA Contact: Lucie Leistnerova
URL: http://www.ovirt.org/develop/release-...
Whiteboard:
Depends On:
Blocks: 1409766 1416049
TreeView+ depends on / blocked
 
Reported: 2016-10-25 10:39 UTC by Eli Mesika
Modified: 2019-04-28 14:20 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
This release adds a maintenance tool to run vacuum actions on the engine database (or specific tables). This tool optimizes table stats and compacts the internals of tables, resulting in less disk space usage, more efficient future maintenance work, and updated table stats for better query planning. Also provided is an engine-setup dialog that offers to perform vacuum during upgrades. This can be automated by the answer file.
Clone Of:
: 1409766 (view as bug list)
Environment:
Last Closed: 2017-04-21 09:39:20 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: exception+
lsvaty: testing_plan_complete+
ylavi: planning_ack+
rule-engine: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)
Vacuum fails log (2.32 MB, text/plain)
2017-01-03 12:16 UTC, Shahar Havivi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 67188 0 master ABANDONED db: Run full vacuum using cron 2020-12-09 08:24:31 UTC
oVirt gerrit 68318 0 master MERGED packaging: Add a tool to perform vacuum on engine db 2020-12-09 08:24:31 UTC
oVirt gerrit 69128 0 master ABANDONED answer-file: Run vacuum full on every upgrade 2020-12-09 08:24:33 UTC
oVirt gerrit 70634 0 master MERGED packaging: Don't run vacuum tool by default 2020-12-09 08:25:00 UTC
oVirt gerrit 71372 0 ovirt-engine-4.1 MERGED packaging: Add a tool to perform vacuum on engine db 2020-12-09 08:24:32 UTC
oVirt gerrit 71373 0 ovirt-engine-4.1 MERGED packaging: Don't run vacuum tool by default 2020-12-09 08:25:00 UTC

Description Eli Mesika 2016-10-25 10:39:01 UTC
Description of problem:

Provide cron script to execute vacuum full on engine and dwh databases, by default execute each night 


Actual results:
VACUUM FULL is needed to be run manually 


Expected results:
Automate VACUUM FULL execution 


Additional info:

Comment 1 Martin Perina 2016-10-25 12:30:03 UTC
I'd solve this within two parts:

1. Create script executing VACUUM FULL on db specified by command line parameters (so we can distinguish between engine and DWH db if needed)

2. Add module to engine-setup, which will configure automatic execution of this script for each db separately each day for example around 2am and show info about that to users, so they can adjust the time as needed

Comment 2 Roy Golan 2016-11-03 12:03:24 UTC
I had this running on a live system:

- The full vacuum performed on rhev.tlv reclaimed 1G of space
- It elapsed around 150-200 seconds for the whole db
- It elapsed 40s for the audit_log table only (the biggest table)
- The cron job created at /etc/cron.d/pg_vacuum

```
# Run FULL vacuum on the engine db at 02:00
0 2 * * * postgres psql engine -c 'VACUUM (VERBOSE, FULL, ANALYZE)'
```

note: before running full vacuum we should validate we have enough available space which is bigger than the biggest table.

Get biggest tables:
```
bash-4.2$ psql engine -c '\dt+ public.*' | awk -F "|" '{print $5 _ $2}' | egrep "MB|GB"
```

Comment 3 Yaniv Kaul 2016-11-24 08:34:20 UTC
(In reply to Roy Golan from comment #2)
> I had this running on a live system:
> 
> - The full vacuum performed on rhev.tlv reclaimed 1G of space
> - It elapsed around 150-200 seconds for the whole db
> - It elapsed 40s for the audit_log table only (the biggest table)
> - The cron job created at /etc/cron.d/pg_vacuum
> 
> ```
> # Run FULL vacuum on the engine db at 02:00
> 0 2 * * * postgres psql engine -c 'VACUUM (VERBOSE, FULL, ANALYZE)'
> ```
> 
> note: before running full vacuum we should validate we have enough available
> space which is bigger than the biggest table.
> 
> Get biggest tables:
> ```
> bash-4.2$ psql engine -c '\dt+ public.*' | awk -F "|" '{print $5 _ $2}' |
> egrep "MB|GB"
> ```

Any news on how long it takes now that it runs regularly on RHEV.TLV setup?
Any stats for DWH?

Comment 4 Roy Golan 2016-11-24 10:40:56 UTC
This is running nightly over 2 weeks, for some reason I have only this night durations in the logs. The initial run was 5-8 times longer.

# Per table
 - duration: 5640.865 ms   vm_statistics
 - duration: 14506.883 ms  vds_interface_statistics
 - duration: 43135.406 ms  audit_log
 - duration: 2593.314 ms   vm_dynamic

# Overall ~2 minutes
2016-11-24 02:01:54 IST [326]: [7-1] db=engine,user=postgres,app=psql,client=[local] LOG:  duration: 110753.654 ms  statement: VACUUM (VERBOSE, FULL, ANALYZE)

# slowest is audit_log took 50s

Comment 5 Yaniv Kaul 2016-11-24 10:58:34 UTC
(In reply to Roy Golan from comment #4)
> This is running nightly over 2 weeks, for some reason I have only this night
> durations in the logs. The initial run was 5-8 times longer.
> 
> # Per table
>  - duration: 5640.865 ms   vm_statistics
>  - duration: 14506.883 ms  vds_interface_statistics
>  - duration: 43135.406 ms  audit_log
>  - duration: 2593.314 ms   vm_dynamic
> 
> # Overall ~2 minutes
> 2016-11-24 02:01:54 IST [326]: [7-1]
> db=engine,user=postgres,app=psql,client=[local] LOG:  duration: 110753.654
> ms  statement: VACUUM (VERBOSE, FULL, ANALYZE)
> 
> # slowest is audit_log took 50s

This probably means it needs to run indeed daily. 
i'd look at what we can do for audit log (partitioning?), as it is taking ~half of the time of the vacuuming.

How much space have we saved (it'll be interesting to see what is our performance of vacuum). (note that in newer PG releases, vacuum performance has increased, AFAIK).
Y.

Comment 6 Roy Golan 2016-11-24 13:39:18 UTC
Having some seconds thoughts on it following re-reading [1] and talking with didi. We can't ignore that running this nightly also means downtime. Locking tables for minutes isn't not a good thing. Users who are batching jobs during the night may not find this acceptable. Also, if the autovacuum is tuned correctly, we may not need to reclaim the space, only if the disk is flooded.

So perhaps what we should do is to supply it as a tool, but prevent from using scheduling it.

[1] Postgres wiki suggest not to use FULL for routine maintenance https://wiki.postgresql.org/wiki/VACUUM_FULL

Comment 7 Yaniv Kaul 2016-11-30 09:10:29 UTC
(In reply to Roy Golan from comment #6)
> Having some seconds thoughts on it following re-reading [1] and talking with
> didi. We can't ignore that running this nightly also means
> downtime. Locking tables for minutes isn't not a good thing. Users who are
> batching jobs during the night may not find this acceptable. Also, if the
> autovacuum is tuned correctly, we may not need to reclaim the space, only if
> the disk is flooded.
> 
> So perhaps what we should do is to supply it as a tool, but prevent from
> using scheduling it.
> 
> [1] Postgres wiki suggest not to use FULL for routine maintenance
> https://wiki.postgresql.org/wiki/VACUUM_FULL

What about doing it during upgrade?

Comment 8 Yedidyah Bar David 2016-11-30 12:39:55 UTC
(In reply to Yaniv Kaul from comment #7)
> What about doing it during upgrade?

Makes sense, optionally. We should still provide a tool.

Some users prefer to collect several different time-consuming actions to a single downtime event, some prefer to spread them. IMO.

Comment 9 Yedidyah Bar David 2016-11-30 12:46:13 UTC
BTW, it will be very useful to provide some report as well, to help the user decide if/when this should be done. E.g. one or more of:

"Your database is XX% fragmented" (not sure what this means exactly, but that's what most defrag tools report)

"The tables with the highest number of fragments are:
table              number of fragments
audit_log          1000
vm_static          10
...
"

"You have XX GB of unused, but allocated, free space".

Of course, only if this requires no downtime/locking and does not take too long (even on a large fragmented db).

Comment 10 Yaniv Kaul 2016-11-30 13:40:13 UTC
Example script - http://postgresql.freeideas.cz/script-vacuum-full-tables/

(was hoping to find an Ansible that does that - did not find yet).

Comment 11 Roy Golan 2016-12-05 13:57:38 UTC
postgres ships a tools for that 'vacuumdb' so combined with the  /usr/share/ovirt-engine/dbscripts/engine-psql.sh we should have all we need

Comment 12 Yedidyah Bar David 2016-12-05 14:14:08 UTC
(In reply to Roy Golan from comment #11)
> postgres ships a tools for that 'vacuumdb' so combined with the 
> /usr/share/ovirt-engine/dbscripts/engine-psql.sh we should have all we need

This script is downstream-only. If we only do this from engine-setup, we already have there infrastructure for handling the db. We mainly need to decide what we want.

Comment 13 Shahar Havivi 2017-01-03 12:16:08 UTC
Created attachment 1236859 [details]
Vacuum fails log

When running engine-setup with full vacuum it may stuck in a deadlock,
Happened twice on a new and old database.
After force breaking the setup the database cannot be restored.

Comment 14 Roy Golan 2017-01-03 13:34:02 UTC
(In reply to Shahar Havivi from comment #13)
> Created attachment 1236859 [details]
> Vacuum fails log
> 
> When running engine-setup with full vacuum it may stuck in a deadlock,
> Happened twice on a new and old database.
> After force breaking the setup the database cannot be restored.


Thanks a lot for reporting that.
Does the restore from dump of the install fails?

Please add your install log and db version. Next time if you ran into it please add the following outputs if you can:
- select * from pg_stat_activity
- select * from pg_locks
- ps -fLU postgres


Might be that the setup is leaving something open (psycop?) needs further investigation.

CI tests with upgrade are WIP and should be active very soon so hopefully we would be able to catch more of this, if it happens.

Comment 15 Shahar Havivi 2017-01-03 13:41:01 UTC
(In reply to Roy Golan from comment #14)

(PostgreSQL) 9.4.4
the log is attachment 1236859 [details]
the restore from dump failed as well, I installed a new database so I can't post the queries.

Comment 16 Lucie Leistnerova 2017-02-27 14:04:13 UTC
verified in ovirt-engine-setup-4.1.1.2-0.1.el7.noarch


Note You need to log in before you can comment on or make changes to this bug.