Bug 1980932 - [RFE][RH OSP18] Add nova-manage cleanup command for 'task_log' database records
Summary: [RFE][RH OSP18] Add nova-manage cleanup command for 'task_log' database records
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Martin Schuppert
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On: 1780867
Blocks: 1381612
TreeView+ depends on / blocked
 
Reported: 2021-07-09 23:04 UTC by melanie witt
Modified: 2023-04-13 19:03 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 1780867
Environment:
Last Closed: 2023-04-13 19:03:15 UTC
Target Upstream Version: Xena
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 801924 0 None MERGED Add support for --task-log option of db archive 2021-07-28 16:17:05 UTC
OpenStack gerrit 801939 0 None MERGED Enable archive task_log records while archiving the database 2021-07-28 16:17:06 UTC
Red Hat Issue Tracker OSP-24172 0 None None None 2023-04-13 19:03:37 UTC
Red Hat Issue Tracker OSP-6139 0 None None None 2022-02-03 18:57:46 UTC

Description melanie witt 2021-07-09 23:04:21 UTC
This rhbz is for the deployment part of the RFE. The --task-log flag should be added to the existing 'nova-manage db archive_deleted_rows' command that runs during the nova_cron cron job.

There are some concerns which require knowledge of Telemetry that we need to consider when adding the --task-log flag. The task_log table records are exposed via nova's /os-instance_usage_audit_log API, which is called by the Telemetry service to collect usage data.

We will need to know before proceeding:

* Does Telemetry still use this API? If not, then we need not care about how long task_log records are retained.

* How long does Telemetry need the task_log records to stay available in nova after they are generated?

By default, if the --before flag is not also used when calling 'nova-manage db archive_deleted_rows', *all* task_log records will be archived at the time of the call, leaving nothing behind. In order to preserve some task_log records for historical and audit purposes, --task-log *must* be used along with --before.

[1] https://docs.openstack.org/api-ref/compute/?expanded=list-server-usage-audits-detail#list-server-usage-audits

+++ This bug was initially created as a clone of Bug #1780867 +++

I'm going to use this rhbz for an RFE to add cleanup for nova.task_log database records as a new nova-manage command, as described here:

http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009245.html

This will be done as a Wishlist bug upstream.

Comment 1 Emma Foley 2021-07-16 14:34:09 UTC

(In reply to melanie witt from comment #0)
> We will need to know before proceeding:
> 
> * Does Telemetry still use this API? If not, then we need not care about how
> long task_log records are retained.

Telemetry doesn't use this API. It consumes the notifications from the bus as they are emitted.

> 
> * How long does Telemetry need the task_log records to stay available in
> nova after they are generated?
> 
It does not.



As long as this cleanup only effects the logs and not the notifications that are being generated, there is no conflict.

If cleanup/refactor effects the generation of the notification, then we might have some customer impact.
In short: Ceilometer will receive all the notifications, configuration dictates whether a particular notification will be stored or not.

Comment 2 melanie witt 2021-07-16 22:25:29 UTC
(In reply to Emma Foley from comment #1)
> 
> (In reply to melanie witt from comment #0)
> > We will need to know before proceeding:
> > 
> > * Does Telemetry still use this API? If not, then we need not care about how
> > long task_log records are retained.
> 
> Telemetry doesn't use this API. It consumes the notifications from the bus
> as they are emitted.

Thank you for clarifying this. I have proposed a correction to the upstream release note accordingly [1].

> > * How long does Telemetry need the task_log records to stay available in
> > nova after they are generated?
> > 
> It does not.
> 
> 
> 
> As long as this cleanup only effects the logs and not the notifications that
> are being generated, there is no conflict.
> 
> If cleanup/refactor effects the generation of the notification, then we
> might have some customer impact.
> In short: Ceilometer will receive all the notifications, configuration
> dictates whether a particular notification will be stored or not.

OK, great. Upon inspecting the code [2], we can see that the notification is emitted after the task_log record is created:

        task_log.begin_task()
        for instance in instances:
            try:
                compute_utils.notify_usage_exists(
                    self.notifier, context, instance, self.host,
                    ignore_missing_network_data=False)
                successes += 1
            except Exception:
                LOG.exception('Failed to generate usage '
                              'audit for instance '
                              'on host %s', self.host,
                              instance=instance)
                errors += 1
        task_log.errors = errors
        task_log.message = (
            'Instance usage audit ran for host %s, %s instances in %s seconds.'
            % (self.host, num_instances, time.time() - start_time))
        task_log.end_task()

The begin_task() call creates the task_log database record, then the notifications are emitted without any use of the task_log record. The end_task() call retrieves the task_log record and saves the 'errors' and 'message' data to it.

There is a chance of a race between a 'nova-manage db archive_deleted_rows --task-log' cron job run that could cause the newly created task_log record to be swept away sometime in between begin_task() and end_task(). If this happens, then end_task() will raise exception.TaskNotRunning [3]. And if that happens, the periodic task will continue to run without issue because we run our oslo.service periodic tasks using the default raise_on_error=False [4][5].

Based on the fact that racing archival of task_log database records will not impact notifications or crash the periodic task, it is safe to add the --task-log option to the 'nova-manage db archive_deleted_rows' command in the nova_cron and no additional logic is required.

[1] https://review.opendev.org/c/openstack/nova/+/801144
[2] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/compute/manager.py#L9566-L9583
[3] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/db/sqlalchemy/api.py#L4330
[4] https://github.com/openstack/nova/blob/3545356ae3a719442833cb8c3c911408d4bd3c15/nova/service.py#L210
[5] https://github.com/openstack/oslo.service/blob/091fd6510a29891f74f1ab141ef158065f9a56fd/oslo_service/periodic_task.py#L218

Comment 4 Bogdan Dobrelya 2022-09-16 13:46:19 UTC
It's in POST state, but the remaining work is not clear to me.
Could you clarify it?

Comment 6 Artom Lifshitz 2023-04-13 18:50:37 UTC
Moving back to NEW to ensure we explore how to do this in a NextGen world.

Comment 7 Artom Lifshitz 2023-04-13 19:03:15 UTC
Now tracked in https://issues.redhat.com/browse/OSPRH-71


Note You need to log in before you can comment on or make changes to this bug.