Bug 1980932
Summary: | [RFE][RH OSP18] Add nova-manage cleanup command for 'task_log' database records | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | melanie witt <mwitt> |
Component: | openstack-tripleo-heat-templates | Assignee: | Martin Schuppert <mschuppe> |
Status: | CLOSED DEFERRED | QA Contact: | Joe H. Rahme <jhakimra> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | unspecified | CC: | alifshit, bdobreli, dasmith, efoley, egallen, eglynn, jhakimra, kchamart, mburns, molasaga, mschuppe, nova-maint, sbauza, sgordon, stephenfin, vromanso |
Target Milestone: | --- | Keywords: | FutureFeature, Patch, Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1780867 | Environment: | |
Last Closed: | 2023-04-13 19:03:15 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | Xena |
Embargoed: | |||
Bug Depends On: | 1780867 | ||
Bug Blocks: | 1381612 |
Description
melanie witt
2021-07-09 23:04:21 UTC
(In reply to melanie witt from comment #0) > We will need to know before proceeding: > > * Does Telemetry still use this API? If not, then we need not care about how > long task_log records are retained. Telemetry doesn't use this API. It consumes the notifications from the bus as they are emitted. > > * How long does Telemetry need the task_log records to stay available in > nova after they are generated? > It does not. As long as this cleanup only effects the logs and not the notifications that are being generated, there is no conflict. If cleanup/refactor effects the generation of the notification, then we might have some customer impact. In short: Ceilometer will receive all the notifications, configuration dictates whether a particular notification will be stored or not. (In reply to Emma Foley from comment #1) > > (In reply to melanie witt from comment #0) > > We will need to know before proceeding: > > > > * Does Telemetry still use this API? If not, then we need not care about how > > long task_log records are retained. > > Telemetry doesn't use this API. It consumes the notifications from the bus > as they are emitted. Thank you for clarifying this. I have proposed a correction to the upstream release note accordingly [1]. > > * How long does Telemetry need the task_log records to stay available in > > nova after they are generated? > > > It does not. > > > > As long as this cleanup only effects the logs and not the notifications that > are being generated, there is no conflict. > > If cleanup/refactor effects the generation of the notification, then we > might have some customer impact. > In short: Ceilometer will receive all the notifications, configuration > dictates whether a particular notification will be stored or not. OK, great. Upon inspecting the code [2], we can see that the notification is emitted after the task_log record is created: task_log.begin_task() for instance in instances: try: compute_utils.notify_usage_exists( self.notifier, context, instance, self.host, ignore_missing_network_data=False) successes += 1 except Exception: LOG.exception('Failed to generate usage ' 'audit for instance ' 'on host %s', self.host, instance=instance) errors += 1 task_log.errors = errors task_log.message = ( 'Instance usage audit ran for host %s, %s instances in %s seconds.' % (self.host, num_instances, time.time() - start_time)) task_log.end_task() The begin_task() call creates the task_log database record, then the notifications are emitted without any use of the task_log record. The end_task() call retrieves the task_log record and saves the 'errors' and 'message' data to it. There is a chance of a race between a 'nova-manage db archive_deleted_rows --task-log' cron job run that could cause the newly created task_log record to be swept away sometime in between begin_task() and end_task(). If this happens, then end_task() will raise exception.TaskNotRunning [3]. And if that happens, the periodic task will continue to run without issue because we run our oslo.service periodic tasks using the default raise_on_error=False [4][5]. Based on the fact that racing archival of task_log database records will not impact notifications or crash the periodic task, it is safe to add the --task-log option to the 'nova-manage db archive_deleted_rows' command in the nova_cron and no additional logic is required. [1] https://review.opendev.org/c/openstack/nova/+/801144 [2] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/compute/manager.py#L9566-L9583 [3] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/db/sqlalchemy/api.py#L4330 [4] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/service.py#L210 [5] https://github.com/openstack/oslo.service/blob/091fd6510a29891f74f1ab141ef158065f9a56fd/oslo_service/periodic_task.py#L218 It's in POST state, but the remaining work is not clear to me. Could you clarify it? Moving back to NEW to ensure we explore how to do this in a NextGen world. Now tracked in https://issues.redhat.com/browse/OSPRH-71 |