Bug 1980932

Summary:	[RFE][RH OSP18] Add nova-manage cleanup command for 'task_log' database records
Product:	Red Hat OpenStack	Reporter:	melanie witt <mwitt>
Component:	openstack-tripleo-heat-templates	Assignee:	Martin Schuppert <mschuppe>
Status:	CLOSED DEFERRED	QA Contact:	Joe H. Rahme <jhakimra>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	CC:	alifshit, bdobreli, dasmith, efoley, egallen, eglynn, jhakimra, kchamart, mburns, molasaga, mschuppe, nova-maint, sbauza, sgordon, stephenfin, vromanso
Target Milestone:	---	Keywords:	FutureFeature, Patch, Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:	1780867	Environment:
Last Closed:	2023-04-13 19:03:15 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:	Xena
Embargoed:
Bug Depends On:	1780867
Bug Blocks:	1381612

Description melanie witt 2021-07-09 23:04:21 UTC

This rhbz is for the deployment part of the RFE. The --task-log flag should be added to the existing 'nova-manage db archive_deleted_rows' command that runs during the nova_cron cron job.

There are some concerns which require knowledge of Telemetry that we need to consider when adding the --task-log flag. The task_log table records are exposed via nova's /os-instance_usage_audit_log API, which is called by the Telemetry service to collect usage data.

We will need to know before proceeding:

* Does Telemetry still use this API? If not, then we need not care about how long task_log records are retained.

* How long does Telemetry need the task_log records to stay available in nova after they are generated?

By default, if the --before flag is not also used when calling 'nova-manage db archive_deleted_rows', *all* task_log records will be archived at the time of the call, leaving nothing behind. In order to preserve some task_log records for historical and audit purposes, --task-log *must* be used along with --before.

[1] https://docs.openstack.org/api-ref/compute/?expanded=list-server-usage-audits-detail#list-server-usage-audits

+++ This bug was initially created as a clone of Bug #1780867 +++

I'm going to use this rhbz for an RFE to add cleanup for nova.task_log database records as a new nova-manage command, as described here:

http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009245.html

This will be done as a Wishlist bug upstream.

Comment 1 Emma Foley 2021-07-16 14:34:09 UTC


(In reply to melanie witt from comment #0)
> We will need to know before proceeding:
> 
> * Does Telemetry still use this API? If not, then we need not care about how
> long task_log records are retained.

Telemetry doesn't use this API. It consumes the notifications from the bus as they are emitted.

> 
> * How long does Telemetry need the task_log records to stay available in
> nova after they are generated?
> 
It does not.



As long as this cleanup only effects the logs and not the notifications that are being generated, there is no conflict.

If cleanup/refactor effects the generation of the notification, then we might have some customer impact.
In short: Ceilometer will receive all the notifications, configuration dictates whether a particular notification will be stored or not.

Comment 2 melanie witt 2021-07-16 22:25:29 UTC

(In reply to Emma Foley from comment #1)
> 
> (In reply to melanie witt from comment #0)
> > We will need to know before proceeding:
> > 
> > * Does Telemetry still use this API? If not, then we need not care about how
> > long task_log records are retained.
> 
> Telemetry doesn't use this API. It consumes the notifications from the bus
> as they are emitted.

Thank you for clarifying this. I have proposed a correction to the upstream release note accordingly [1].

> > * How long does Telemetry need the task_log records to stay available in
> > nova after they are generated?
> > 
> It does not.
> 
> 
> 
> As long as this cleanup only effects the logs and not the notifications that
> are being generated, there is no conflict.
> 
> If cleanup/refactor effects the generation of the notification, then we
> might have some customer impact.
> In short: Ceilometer will receive all the notifications, configuration
> dictates whether a particular notification will be stored or not.

OK, great. Upon inspecting the code [2], we can see that the notification is emitted after the task_log record is created:

        task_log.begin_task()
        for instance in instances:
            try:
                compute_utils.notify_usage_exists(
                    self.notifier, context, instance, self.host,
                    ignore_missing_network_data=False)
                successes += 1
            except Exception:
                LOG.exception('Failed to generate usage '
                              'audit for instance '
                              'on host %s', self.host,
                              instance=instance)
                errors += 1
        task_log.errors = errors
        task_log.message = (
            'Instance usage audit ran for host %s, %s instances in %s seconds.'
            % (self.host, num_instances, time.time() - start_time))
        task_log.end_task()

The begin_task() call creates the task_log database record, then the notifications are emitted without any use of the task_log record. The end_task() call retrieves the task_log record and saves the 'errors' and 'message' data to it.

There is a chance of a race between a 'nova-manage db archive_deleted_rows --task-log' cron job run that could cause the newly created task_log record to be swept away sometime in between begin_task() and end_task(). If this happens, then end_task() will raise exception.TaskNotRunning [3]. And if that happens, the periodic task will continue to run without issue because we run our oslo.service periodic tasks using the default raise_on_error=False [4][5].

Based on the fact that racing archival of task_log database records will not impact notifications or crash the periodic task, it is safe to add the --task-log option to the 'nova-manage db archive_deleted_rows' command in the nova_cron and no additional logic is required.

[1] https://review.opendev.org/c/openstack/nova/+/801144
[2] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/compute/manager.py#L9566-L9583
[3] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/db/sqlalchemy/api.py#L4330
[4] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/service.py#L210
[5] https://github.com/openstack/oslo.service/blob/091fd6510a29891f74f1ab141ef158065f9a56fd/oslo_service/periodic_task.py#L218

Comment 4 Bogdan Dobrelya 2022-09-16 13:46:19 UTC

It's in POST state, but the remaining work is not clear to me.
Could you clarify it?

Comment 6 Artom Lifshitz 2023-04-13 18:50:37 UTC

Moving back to NEW to ensure we explore how to do this in a NextGen world.

Comment 7 Artom Lifshitz 2023-04-13 19:03:15 UTC

Now tracked in https://issues.redhat.com/browse/OSPRH-71