Bug 1535590
| Summary: | [RFE] Recommendations and validations for pre-upgrade maintenance | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Chris Fields <cfields> | |
| Component: | RFEs | Assignee: | OSP Team <rhos-maint> | |
| Status: | NEW --- | QA Contact: | ||
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 13.0 (Queens) | CC: | alifshit, bdobreli, ccamacho, cfields, jpretori, mandreou, markmc, mburns, morazi, mrussell, sclewis, smooney, spower, srevivo | |
| Target Milestone: | --- | Keywords: | FutureFeature | |
| Target Release: | --- | Flags: | bdobreli:
needinfo?
(smooney) |
|
| Hardware: | All | |||
| OS: | All | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Known Issue | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1542077 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1542077 | |||
I will raise specific bugs on all other DFG to gather this info: [Nova]: https://bugzilla.redhat.com/show_bug.cgi?id=1540637 Hello,
So, from Rocky we have available the command:
nova-manage db purge
This command will delete data from the shadow tables after 'db archive_deleted_rows' has been run.
This was added upstream for Rocky [1].
Then we have support in upstream TripleO to configure this purge in cron jobs [2]
For previous versions, we received a NACK from Nova to backport this feature, so we will need to document how to truncate the Nova tables (automated with the 'nova-manage db purge' command).
This needs to be done by truncating those Nova tables starting with
_SHADOW_TABLE_PREFIX
This is the python code checking the tables for purging:
def _purgeable_tables(metadata):
return [t for t in metadata.sorted_tables
if (t.name.startswith(_SHADOW_TABLE_PREFIX) and not
t.name.endswith('migrate_version'))]
[1]: https://review.openstack.org/#/q/topic:bp/purge-db+(status:open+OR+status:merged)
[2]: https://review.openstack.org/#/q/topic:bp-db-cleanup
Thanks for the update, Carlos. If I am understanding correctly, if we want to clean up nova shadow tables in any release prior to Rocky we will need to document what the purge_shadow_tables function is doing [1]. Is that right? [1]: https://review.openstack.org/#/c/550171/8/nova/db/sqlalchemy/api.py Hello Chris, Indeed that will be the solution prior to Rocky. And from Rocky we can execute the CLI commands based on 2 steps. 1- Archive the deleted rows. (nova-manage db archive_deleted_rows) 2- Purge the shadow tables. (nova-manage db purge) 3(optional)- Two steps in a single command. (nova-manage db archive_deleted_rows --purge) Moving back to 'NEW' to be scoped and reviewed for priority. *** Bug 1539064 has been marked as a duplicate of this bug. *** This RFE is not marked as an MVP for 17.0, so it is being moved for consideration to OSP 17.1. As stated in the OSP Program Call, QE and Docs only have the capacity to verify and document MVP features for OSP 17.0. |
Description of problem: Database pruning guidance is needed for customers performing major version and fast forward OSP upgrades. The following OSP databases hold on to deleted artifacts until they are purged: keystone, heat, cinder, glance, nova. Specific concerns are: - Are the currently scheduled automated cleans ups (see below) keeping too many deleted artifacts that will unreasonably impact the database portion of the upgrade? For example, if the heat db has thousands of deleted stacks the database upgrade will take much longer than necessary. - Are there timeout params that need to be tweaked for upgrade based on the size of these dbs? -The *-manage commands take a parameter to limit either the number of rows that are purged or the number of days to keep. What is our recommendation for pruning these db’s before an upgrade? Default db cleanups: keystone #default (OSP 10) this is run each hour (crontab -u keystone -l); scheduled in: /usr/share/openstack-puppet/modules/keystone/manifests/cron/token_flush.pp 1 * * * * keystone-manage token_flush >>/var/log/keystone/keystone-tokenflush.log 2>&1 heat #default (OSP 10) this runs once/day and keeps 30 days (crontab -u heat -l); scheduled in: /usr/share/openstack-puppet/modules/heat/manifests/cron/purge_deleted.pp 1 0 * * * sleep `expr ${RANDOM} \% 3600`; heat-manage purge_deleted -g days 30 >>/dev/null 2>&1 cinder #default (OSP 10) this runs once/day and keeps 30 days (crontab -u cinder -l); scheduled in /usr/share/openstack-puppet/modules/cinder/manifests/cron/db_purge.pp 1 0 * * * cinder-manage db purge 30 >>/dev/null 2>&1 glance #not run by default (OSP 10) glance-manage db purge --age_in_days <days?> --max-rows <?> nova Instances, when deleted, remain in the active db tables until archived to the shadow_* tables - which happens automatically every 12 hours (crontab -u nova -l); scheduled in: /usr/share/openstack-puppet/modules/nova/manifests/cron/archive_deleted_rows.pp 1 */12 * * * nova-manage db archive_deleted_rows --max_rows 100 >>/dev/null 2>&1 Is there any process that cleans up the shadow_* tables? - Is a schema change in the shadow_* tables a concern for db upgrades with a lot of deleted artifacts? Should these tables be trimmed when upgrading? Is there a community standard for doing so? - Also do we want to recommend running this before upgrade? nova-manage db null_instance_uuid_scan [--delete] ceilometer ceilometer does not keep deleted artifacts until purged like the other mentioned databases. However, because of OSP Director default the ceilometer db size may grow unreasonably large. If the upgrade is from OSP 10 or later the customer may be using gnocchi, which is typically is configured for meters and is cleaned up by it’s own archive policies. If ceilometer is used then, depending on OSP version, these /etc/ceilometer/ceilometer.conf params should be checked: time_to_live: #pre OSP 10 metering_time_to_live: OSP 10 event_time_to_live: OSP 10 If any of the above values are ‘-1’ then ceilometer is configured to keeps samples/events forever. Before upgrading customers may want to adjust those values and run ceilometer-expirer - see solution https://access.redhat.com/solutions/2219091. Even if gnocchi is being used it is likely that events are still stored in a database (grep ^event_dispatch /etc/ceilometer/ceilometer.conf ) and if so the event_time_to_live should be noted and potentially events clean up (see solution above). Version-Release number of selected component (if applicable): OSP 10->13 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: