Bug 1388433
Summary: | [RFE] Change auto-vacuum configuration defaults to match oVirt db usage | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Eli Mesika <emesika> |
Component: | Database.Core | Assignee: | Roy Golan <rgolan> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Lucie Leistnerova <lleistne> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.0.0 | CC: | bugs, didi, eheftman, jentrena, lleistne, lsvaty, mgoldboi, michal.skrivanek, mperina, oourfali, rgolan |
Target Milestone: | ovirt-4.1.2 | Keywords: | FutureFeature |
Target Release: | 4.1.2 | Flags: | mgoldboi:
ovirt-4.1+
mgoldboi: exception+ lsvaty: testing_plan_complete- mgoldboi: planning_ack+ mperina: devel_ack+ pstehlik: testing_ack+ |
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: |
Previously, the Manager’s PostgreSQL tables that were frequently updated would fill up with obsolete data, creating the risk of disk flooding and transaction ID wraparound issues.
This release has introduced a more aggressive vacuum daemon configuration for collecting obsolete rows. This enables the Manager’s tables to remain healthy, and for disk space usage to be better correlated with the actual amount of kept data.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-05-23 08:19:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1393270, 1411756, 1447624 |
Description
Eli Mesika
2016-10-25 10:44:18 UTC
We have to change this for all new automatically provisioned databases, but I'd also add this to "db customization" documentation (if we have any) and also show warning about that during each engine-setup execution. Any issue doing that for existing dbs? I would expect large existing deployments actually suffer the most right now. (In reply to Michal Skrivanek from comment #2) > Any issue doing that for existing dbs? I would expect large existing > deployments actually suffer the most right now. Generally speaking, we do not change postgresql.conf on upgrade, only on provisioning. We try to behave as if postgresql (and its conf) is not owned by us, as if the "normal" behavior is using a remote db. So we confine changes to configuration to be only during provisioning. Also, we do not have currently code for "warning", although that's easier to add. If we only do the minimum, we'll only change this during provisioning and error during upgrade. See as an example the change for bug 1331168. Of course, everything is possible... I'd say that changing this pro newly provisioned db and display a warning (or error?) during engine-setup execution, if not set properly, is enough (I'd assume that large customers may have already changed postgresql.conf to fit their loads by tuning some other parameters). autovacuum is a very important process for the database and raising the global threshold is a no-go: 1. we have variance in our table usage - some update heavy, rest isn't. large global value will starve those tables and they won't be cleaned. 2. there is no single value which is good. The system can grow. We should be more dynamic 3. co-hosted DWH is contending on vacuum workers What I suggest is a per-table tuning for the update-heavy tables: - vds_cpu_statistics (should be removed?) - vds_interface_statistics - vds_statistics - vm_dynamic - vm_statistics - vm_interface_statistics This will set the scale factor to 1: ``` ALTER table NAME SET (autovacuum_vacuum_scale_factor=1); ``` The formula [1] to determine if to run vacuum for a table is: ``` threshold = vacuum base threshold + vacuum scale factor * number of tuples ``` So using defaults threshold and scale factor (50 and 0.2 respectively) and vm table have 1000 records then the effective threshold is: 50 + (1000 * 0.2) = 250 I took load factor =1 because most of the vms and host should be running then most of the records would be updated regularly. [1] https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#AUTOVACUUM Moving back to infra, as discussed with Roy we can create an upgrade script for relevant tables and alter autovacuum_vacuum_scale_factor for them. Customer can adapt this setting later if needed A recent change on a production system we made plus more research and tests I made shows that the factor should be actually smaller. The smaller amount of work the vacuum has to do the better it behaves. The vacuum algorithm also has the cost limit factor, when it reached the worker stops working and waits for the next time with tuples waiting to be collected. The vacuum factor is going to set to 0.01 to be fairly aggressive and still support table growth. threshold will stay default. Moving to 4.1.1 as we most probably won't be able to finish that before RC build Removing 71889 because it's not really required for this bug. It (or a variation) will only be needed if/when dwh wants to add its own extra config items. engine-setup checks by upgrade the postgres configuration and correctly writes changes that should be made Found the following problems in PostgreSQL configuration for the Engine database: autovacuum_vacuum_scale_factor required to be at most 0.01 autovacuum_max_workers required to be at least 6 maintenance_work_mem required to be at least 65536 Please set: autovacuum_vacuum_scale_factor = 0.01 autovacuum_max_workers = 6 maintenance_work_mem = 65536 but when I chose to Fix automatically it also changed the setting that was already correct #autovacuum_analyze_scale_factor = 0.1 ... autovacuum_analyze_scale_factor = 0.05 --> #autovacuum_analyze_scale_factor = 0.1 ... # autovacuum_analyze_scale_factor = 0.05 autovacuum_analyze_scale_factor = 0.075 tested in ovirt-engine-setup-4.1.1.2-0.1.el7.noarch Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Roy, could you please take a look? (In reply to Martin Perina from comment #16) > Roy, could you please take a look? ping? Roy - what's the status here? Now it doesn't rewrite already good values, but there is a problem with maintenance_work_mem when value is set with units. maintenance_work_mem = 16MB engine-setup ends with error [ ERROR ] Failed to execute stage 'Misc configuration': invalid literal for int() with base 10: '16MB' 2017-05-02 16:22:49 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/configuration.py", line 222, in _updatePGConf transaction=localtransaction, File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/postgres.py", line 284, in _updatePostgresConf needUpdate, content = dbovirtutils.getUpdatedPGConf(content) File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/database.py", line 1004, in getUpdatedPGConf expected=item['expected'] File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/configuration.py", line 128, in <lambda> int(current) >= int(expected) ValueError: invalid literal for int() with base 10: '16MB' tested in ovirt-engine-setup-4.1.2-0.1.el7.noarch Roy, can you please take a look? (In reply to Lucie Leistnerova from comment #19) > Now it doesn't rewrite already good values, but there is a problem with > maintenance_work_mem when value is set with units. > > maintenance_work_mem = 16MB We don't support units and our scripts never did - i.e you are exposing a limitation that was already there from the beginning. I suggest we should VERIFY this RFE and open a new bug for this specific issue since this RFE didn't introduce that. Alright. I created Bug 1447624 verified in ovirt-engine-setup-4.1.2-0.1.el7.noarch |