1388433 – [RFE] Change auto-vacuum configuration defaults to match oVirt db usage

Bug 1388433 - [RFE] Change auto-vacuum configuration defaults to match oVirt db usage

Summary: [RFE] Change auto-vacuum configuration defaults to match oVirt db usage

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Database.Core
Sub Component:
Version:	4.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.1.2
Target Release:	4.1.2
Assignee:	Roy Golan
QA Contact:	Lucie Leistnerova
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1393270 1411756 1447624
TreeView+	depends on / blocked

Reported:	2016-10-25 10:44 UTC by Eli Mesika
Modified:	2017-06-26 09:01 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-05-23 08:19:50 UTC
oVirt Team:	Infra
Embargoed:
Dependent Products:
Flags:	mgoldboi: ovirt-4.1+ mgoldboi: exception+ lsvaty: testing_plan_complete- mgoldboi: planning_ack+ mperina: devel_ack+ pstehlik: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1393270	high	CLOSED	Database table, vm_dynamic, grows without limit until the disk fills up.	2021-08-30 13:34:21 UTC
oVirt gerrit	68676	'None'	MERGED	packaging: setup: More aggressive autovacuum configuration	2020-12-03 04:29:56 UTC
oVirt gerrit	70309	'None'	MERGED	packaging: setup: More aggressive autovacuum configuration	2020-12-03 04:29:56 UTC
oVirt gerrit	70422	'None'	MERGED	packaging: setup: DB configuration plugin	2020-12-03 04:29:56 UTC
oVirt gerrit	70454	'None'	ABANDONED	packaging: setup: report all misconfigured items at once	2020-12-03 04:29:56 UTC
oVirt gerrit	71889	'None'	ABANDONED	packaging: setup: Add INVALID_CONFIG_ITEMS	2020-12-03 04:29:56 UTC
oVirt gerrit	72111	'None'	MERGED	packaging: setup: DB configuration plugin	2020-12-03 04:29:56 UTC
oVirt gerrit	73818	'None'	MERGED	packaging: Update only invalid values in postgres.conf	2020-12-03 04:29:57 UTC
oVirt gerrit	73942	'None'	MERGED	packaging: Extract the value correctly from postgresql.conf	2020-12-03 04:29:57 UTC
oVirt gerrit	74316	'None'	MERGED	packaging: Introduce unit test	2020-12-03 04:29:58 UTC
oVirt gerrit	74572	'None'	MERGED	packaging: Update only invalid values in postgres.conf	2020-12-03 04:29:57 UTC
oVirt gerrit	74573	'None'	MERGED	packaging: Extract the value correctly from postgresql.conf	2020-12-03 04:29:58 UTC
oVirt gerrit	74574	'None'	MERGED	packaging: Introduce unit test	2020-12-03 04:29:57 UTC

Internal Links: 1393270

Description Eli Mesika 2016-10-25 10:44:18 UTC

Description of problem:
 Change default autovacuuming settings, by default it's execute too often (on 50 tuples update), which makes postgresql process very busy.
We should change the default for all newly created db in engine-setup and probably update doc how to do that for existing ones


Version-Release number of selected component (if applicable):


Actual results:

currently in postgresql.conf the defaults are:

autovacuum_vacuum_threshold = 50 
autovacuum_analyze_threshold = 50


Expected results:

should be changed to :
autovacuum_vacuum_threshold = 5000 
autovacuum_analyze_threshold = 10000



Additional info:

Comment 1 Martin Perina 2016-10-25 12:25:51 UTC

We have to change this for all new automatically provisioned databases, but I'd also add this to "db customization" documentation (if we have any) and also show warning about that during each engine-setup execution.

Comment 2 Michal Skrivanek 2016-11-03 10:06:04 UTC

Any issue doing that for existing dbs? I would expect large existing deployments actually suffer the most right now.

Comment 3 Yedidyah Bar David 2016-11-03 12:51:52 UTC

(In reply to Michal Skrivanek from comment #2)
> Any issue doing that for existing dbs? I would expect large existing
> deployments actually suffer the most right now.

Generally speaking, we do not change postgresql.conf on upgrade, only on provisioning. We try to behave as if postgresql (and its conf) is not owned by us, as if the "normal" behavior is using a remote db. So we confine changes to configuration to be only during provisioning. Also, we do not have currently code for "warning", although that's easier to add.

If we only do the minimum, we'll only change this during provisioning and error during upgrade. See as an example the change for bug 1331168. Of course, everything is possible...

Comment 4 Martin Perina 2016-11-04 10:24:43 UTC

I'd say that changing this pro newly provisioned db and display  a warning (or error?) during engine-setup execution, if not set properly, is enough (I'd assume that large customers may have already changed postgresql.conf to fit their loads by tuning some other parameters).

Comment 7 Roy Golan 2016-11-15 13:32:20 UTC

autovacuum is a very important process for the database and raising the global threshold is a no-go:
1. we have variance in our table usage - some update heavy, rest isn't. 
large global value will starve those tables and they won't be cleaned.
2. there is no single value which is good. The system can grow. We should be more dynamic
3. co-hosted DWH is contending on vacuum workers

What I suggest is a per-table tuning for the update-heavy tables:
- vds_cpu_statistics  (should be removed?)
- vds_interface_statistics
- vds_statistics
- vm_dynamic
- vm_statistics
- vm_interface_statistics

This will set the scale factor to 1:
```
ALTER table NAME SET (autovacuum_vacuum_scale_factor=1);
```

The formula [1] to determine if to run vacuum for a table is:
```
threshold = vacuum base threshold + vacuum scale factor * number of tuples
```

So using defaults threshold and scale factor (50 and 0.2 respectively) and vm table have 1000 records then the effective threshold is:
 50 + (1000 * 0.2) = 250

I took load factor =1 because most of the vms and host should be running then most of the records would be updated regularly.

[1] https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#AUTOVACUUM

Comment 8 Martin Perina 2016-11-16 10:07:39 UTC

Moving back to infra, as discussed with Roy we can create an upgrade script for relevant tables and alter autovacuum_vacuum_scale_factor for them. Customer can adapt this setting later if needed

Comment 9 Roy Golan 2016-12-15 10:34:44 UTC

A recent change on a production system we made plus more research and tests I made  shows that the factor should be actually smaller. The smaller amount of work the vacuum has to do the better it behaves. The vacuum algorithm also has the cost limit factor, when it reached the worker stops working and waits for the next time with tuples waiting to be collected. 

The vacuum factor is going to set to 0.01 to be fairly aggressive and still support table growth. threshold will stay default.

Comment 12 Martin Perina 2017-01-14 20:36:43 UTC

Moving to 4.1.1 as we most probably won't be able to finish that before RC build

Comment 13 Yedidyah Bar David 2017-02-13 14:17:52 UTC

Removing 71889 because it's not really required for this bug. It (or a variation) will only be needed if/when dwh wants to add its own extra config items.

Comment 14 Lucie Leistnerova 2017-03-02 15:39:07 UTC

engine-setup checks by upgrade the postgres configuration and correctly writes changes that should be made

          Found the following problems in PostgreSQL configuration for the Engine database:
           autovacuum_vacuum_scale_factor required to be at most 0.01
           autovacuum_max_workers required to be at least 6
           maintenance_work_mem required to be at least 65536
          Please set:
           autovacuum_vacuum_scale_factor = 0.01
           autovacuum_max_workers = 6
           maintenance_work_mem = 65536

but when I chose to Fix automatically it also changed the setting that was already correct

#autovacuum_analyze_scale_factor = 0.1
...
autovacuum_analyze_scale_factor = 0.05

-->

#autovacuum_analyze_scale_factor = 0.1
...
# autovacuum_analyze_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.075


tested in ovirt-engine-setup-4.1.1.2-0.1.el7.noarch

Comment 15 Red Hat Bugzilla Rules Engine 2017-03-02 15:39:13 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 16 Martin Perina 2017-03-06 13:47:59 UTC

Roy, could you please take a look?

Comment 17 Yaniv Kaul 2017-03-08 19:33:07 UTC

(In reply to Martin Perina from comment #16)
> Roy, could you please take a look?

ping?

Comment 18 Oved Ourfali 2017-03-15 09:34:17 UTC

Roy - what's the status here?

Comment 19 Lucie Leistnerova 2017-05-02 14:34:08 UTC

Now it doesn't rewrite already good values, but there is a problem with maintenance_work_mem when value is set with units.

maintenance_work_mem = 16MB 

engine-setup ends with error
[ ERROR ] Failed to execute stage 'Misc configuration': invalid literal for int() with base 10: '16MB'

2017-05-02 16:22:49 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/configuration.py", line 222, in _updatePGConf
    transaction=localtransaction,
  File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/postgres.py", line 284, in _updatePostgresConf
    needUpdate, content = dbovirtutils.getUpdatedPGConf(content)
  File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/database.py", line 1004, in getUpdatedPGConf
    expected=item['expected']
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/configuration.py", line 128, in <lambda>
    int(current) >= int(expected)
ValueError: invalid literal for int() with base 10: '16MB'


tested in ovirt-engine-setup-4.1.2-0.1.el7.noarch

Comment 20 Oved Ourfali 2017-05-02 16:34:49 UTC

Roy, can you please take a look?

Comment 21 Roy Golan 2017-05-03 09:12:43 UTC

(In reply to Lucie Leistnerova from comment #19)
> Now it doesn't rewrite already good values, but there is a problem with
> maintenance_work_mem when value is set with units.
> 
> maintenance_work_mem = 16MB 

We don't support units and our scripts never did - i.e you are exposing a limitation that was already there from the beginning. 

I suggest we should VERIFY this RFE and open a new bug for this specific issue since this RFE didn't introduce that.

Comment 22 Lucie Leistnerova 2017-05-03 11:03:35 UTC

Alright. I created Bug 1447624
verified in ovirt-engine-setup-4.1.2-0.1.el7.noarch

Note You need to log in before you can comment on or make changes to this bug.