This rhbz is for the deployment part of the RFE. The --task-log flag should be added to the existing 'nova-manage db archive_deleted_rows' command that runs during the nova_cron cron job. There are some concerns which require knowledge of Telemetry that we need to consider when adding the --task-log flag. The task_log table records are exposed via nova's /os-instance_usage_audit_log API, which is called by the Telemetry service to collect usage data. We will need to know before proceeding: * Does Telemetry still use this API? If not, then we need not care about how long task_log records are retained. * How long does Telemetry need the task_log records to stay available in nova after they are generated? By default, if the --before flag is not also used when calling 'nova-manage db archive_deleted_rows', *all* task_log records will be archived at the time of the call, leaving nothing behind. In order to preserve some task_log records for historical and audit purposes, --task-log *must* be used along with --before. [1] https://docs.openstack.org/api-ref/compute/?expanded=list-server-usage-audits-detail#list-server-usage-audits +++ This bug was initially created as a clone of Bug #1780867 +++ I'm going to use this rhbz for an RFE to add cleanup for nova.task_log database records as a new nova-manage command, as described here: http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009245.html This will be done as a Wishlist bug upstream.
(In reply to melanie witt from comment #0) > We will need to know before proceeding: > > * Does Telemetry still use this API? If not, then we need not care about how > long task_log records are retained. Telemetry doesn't use this API. It consumes the notifications from the bus as they are emitted. > > * How long does Telemetry need the task_log records to stay available in > nova after they are generated? > It does not. As long as this cleanup only effects the logs and not the notifications that are being generated, there is no conflict. If cleanup/refactor effects the generation of the notification, then we might have some customer impact. In short: Ceilometer will receive all the notifications, configuration dictates whether a particular notification will be stored or not.
(In reply to Emma Foley from comment #1) > > (In reply to melanie witt from comment #0) > > We will need to know before proceeding: > > > > * Does Telemetry still use this API? If not, then we need not care about how > > long task_log records are retained. > > Telemetry doesn't use this API. It consumes the notifications from the bus > as they are emitted. Thank you for clarifying this. I have proposed a correction to the upstream release note accordingly [1]. > > * How long does Telemetry need the task_log records to stay available in > > nova after they are generated? > > > It does not. > > > > As long as this cleanup only effects the logs and not the notifications that > are being generated, there is no conflict. > > If cleanup/refactor effects the generation of the notification, then we > might have some customer impact. > In short: Ceilometer will receive all the notifications, configuration > dictates whether a particular notification will be stored or not. OK, great. Upon inspecting the code [2], we can see that the notification is emitted after the task_log record is created: task_log.begin_task() for instance in instances: try: compute_utils.notify_usage_exists( self.notifier, context, instance, self.host, ignore_missing_network_data=False) successes += 1 except Exception: LOG.exception('Failed to generate usage ' 'audit for instance ' 'on host %s', self.host, instance=instance) errors += 1 task_log.errors = errors task_log.message = ( 'Instance usage audit ran for host %s, %s instances in %s seconds.' % (self.host, num_instances, time.time() - start_time)) task_log.end_task() The begin_task() call creates the task_log database record, then the notifications are emitted without any use of the task_log record. The end_task() call retrieves the task_log record and saves the 'errors' and 'message' data to it. There is a chance of a race between a 'nova-manage db archive_deleted_rows --task-log' cron job run that could cause the newly created task_log record to be swept away sometime in between begin_task() and end_task(). If this happens, then end_task() will raise exception.TaskNotRunning [3]. And if that happens, the periodic task will continue to run without issue because we run our oslo.service periodic tasks using the default raise_on_error=False [4][5]. Based on the fact that racing archival of task_log database records will not impact notifications or crash the periodic task, it is safe to add the --task-log option to the 'nova-manage db archive_deleted_rows' command in the nova_cron and no additional logic is required. [1] https://review.opendev.org/c/openstack/nova/+/801144 [2] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/compute/manager.py#L9566-L9583 [3] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/db/sqlalchemy/api.py#L4330 [4] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/service.py#L210 [5] https://github.com/openstack/oslo.service/blob/091fd6510a29891f74f1ab141ef158065f9a56fd/oslo_service/periodic_task.py#L218
It's in POST state, but the remaining work is not clear to me. Could you clarify it?
Moving back to NEW to ensure we explore how to do this in a NextGen world.
Now tracked in https://issues.redhat.com/browse/OSPRH-71