Bug 1726256 - nova-manage archive_deleted_rows doesn't cleanup nova_cell0.instance_id_mappings, nova_api.request_specs, nova_api.consumers, nova_api.instance_mappings, nova.instance_id_mappings and nova.task_log
Summary: nova-manage archive_deleted_rows doesn't cleanup nova_cell0.instance_id_mappi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z11
: 13.0 (Queens)
Assignee: melanie witt
QA Contact: Paras Babbar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-02 12:41 UTC by David Hill
Modified: 2023-10-20 23:02 UTC (History)
14 users (show)

Fixed In Version: openstack-nova-17.0.12-13.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 11:27:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 697398 0 'None' NEW delete consumers which no longer have allocations 2021-02-12 15:39:31 UTC
Red Hat Issue Tracker OSP-28240 0 None None None 2023-09-07 20:14:48 UTC
Red Hat Knowledge Base (Solution) 4660001 0 None None None 2019-12-16 15:48:10 UTC
Red Hat Product Errata RHBA-2020:0759 0 None None None 2020-03-10 11:28:35 UTC

Description David Hill 2019-07-02 12:41:03 UTC
Description of problem:
nova-manage archive_deleted_rows doesn't cleanup instance_id_mappings:

MariaDB [nova_cell0]> SELECT table_name, table_rows FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = "nova_cell0" order by table_rows desc                                │································································
    -> ;                                                                                                                                                                     │································································
+--------------------------------------------+------------+                                                                                                                  │································································
| table_name                                 | table_rows |                                                                                                                  │································································
+--------------------------------------------+------------+                                                                                                                  │································································
| instance_id_mappings                       |    1113607 |     




MariaDB [nova]> SELECT table_name, table_rows FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = "nova" order by table_rows desc;                                           │································································
+--------------------------------------------+------------+                                                                                                                  │································································
| table_name                                 | table_rows |                                                                                                                  │································································
+--------------------------------------------+------------+                                                                                                                  │································································
| instance_id_mappings                       |    2028944 |           


Version-Release number of selected component (if applicable):
openstack-nova-common-17.0.9-9.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Run nova-manage archive_deleted_rows
2. 
3.

Actual results:
Cleans up all the tables except instance_id_mappings

Expected results:
Should also cleanup instance_id_mappings

Additional info:

Comment 1 David Hill 2019-07-02 12:57:13 UTC
There's a bunch of other tables in this situation too:

MariaDB [nova_cell0]> SELECT table_name, table_rows FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = "nova_cell0" order by table_rows desc;                               │································································
+--------------------------------------------+------------+                                                                                                                  │································································
| table_name                                 | table_rows |                                                                                                                  │································································
+--------------------------------------------+------------+                                                                                                                  │································································
| instance_id_mappings                       |    1114038 |                                                                                                                  │································································
| shadow_instance_system_metadata            |     417237 |                                                                                                                  │································································
| shadow_instance_metadata                   |      72261 |                                                                                                                  │································································
| instance_system_metadata                   |      17214 |                                                                                                                  │································································
| shadow_block_device_mapping                |      12133 |                                                                                                                  │································································
| shadow_instance_info_caches                |      11979 |                                                                                                                  │································································
| shadow_instances                           |      11646 |                                                                                                                  │································································
| shadow_instance_extra                      |      10789 |                                                                                                                  │································································
| shadow_instance_faults                     |      10633 |                                                                                                                  │································································
| instance_metadata                          |       2733 |                                                                                                                  │································································
| instance_info_caches                       |        441 |                                                                                                                  │································································
| block_device_mapping                       |        440 |                                                                                                                  │································································
| instances                                  |        420 |                                                                                                                  │································································
| instance_extra                             |        401 |                                                                                                                  │································································
| instance_faults                            |        377 |                                                                                                                  │································································
| s3_images                                  |        274 |                                                                                                                  │································································
| security_groups                            |         76 |                                                                                                                  │································································
| shadow_instance_actions_events             |          1 |            


MariaDB [nova_api]> SELECT table_name, table_rows FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = "nova_api" and table_rows > 0 order by table_rows desc;                │································································
+--------------------+------------+                                                                                                                                          │································································
| table_name         | table_rows |                                                                                                                                          │································································
+--------------------+------------+                                                                                                                                          │································································
| consumers          |    1711625 |                                                                                                                                          │································································
| instance_mappings  |     504378 |                                                                                                                                          │································································
| request_specs      |     391240 |                                                                                                                                          │································································
| allocations        |      11395 |                                                                                                                                          │································································
| quotas             |       2218 |                                                                                                                                          │································································
| key_pairs          |        863 |                                                                                                                                          │································································
| inventories        |        330 |                                                                                                                                          │································································
| users              |        269 |                                                                                                                                          │································································
| projects           |        254 |                                                                                                                                          │································································
| traits             |        164 |                                                                                                                                          │································································
| aggregate_hosts    |        112 |                                                                                                                                          │································································
| resource_providers |        111 |                                                                                                                                          │································································
| host_mappings      |        111 |                                                                                                                                          │································································
| flavor_extra_specs |         68 |                                                                                                                                          │································································
| flavors            |         59 |                                                                                                                                          │································································
| flavor_projects    |         37 |                                                                                                                                          │································································
| aggregate_metadata |         28 |                                                                                                                                          │································································
| build_requests     |          5 |                                                                                                                                          │································································
| aggregates         |          5 |                                                                                                                                          │································································
| cell_mappings      |          2 |                                                                                                                                          │································································
+--------------------+------------+                                                                                                                                          │································································
20 rows in set (0.00 sec)      


MariaDB [nova]> SELECT table_name, table_rows FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = "nova" and table_rows > 0 order by table_rows desc;                        │································································
+---------------------------------+------------+                                                                                                                             │································································
| table_name                      | table_rows |                                                                                                                             │································································
+---------------------------------+------------+                                                                                                                             │································································
| instance_id_mappings            |    2029811 |                                                                                                                             │································································
| shadow_instance_system_metadata |     741766 |                                                                                                                             │································································
| task_log                        |     239250 |                                                                                                                             │································································
| shadow_instance_metadata        |      96226 |                                                                                                                             │································································
| instance_system_metadata        |      81522 |                                                                                                                             │································································
| shadow_instance_actions         |      41923 |                                                                                                                             │································································
| instance_actions_events         |      38227 |                                                                                                                             │································································
| shadow_instance_actions_events  |      37733 |                                                                                                                             │································································
| shadow_block_device_mapping     |      23639 |                                                                                                                             │································································
| shadow_instance_extra           |      21506 |                                                                                                                             │································································
| shadow_instances                |      18927 |                                                                                                                             │································································
| shadow_virtual_interfaces       |      16480 |                                                                                                                             │································································
| shadow_instance_info_caches     |      16006 |                                                                                                                             │································································
| instance_actions                |      13559 |                                                                                                                             │································································
| shadow_instance_faults          |       8479 |                                                                                                                             │································································
| instance_metadata               |       6793 |                                                                                                                             │································································
| block_device_mapping            |       4320 |                                                                                                                             │································································
| virtual_interfaces              |       3944 |                                                                                                                             │································································
| migrations                      |       3940 |                                                                                                                             │································································
| instance_info_caches            |       3742 |                                                                                                                             │································································
| instances                       |       3083 |                                                                                                                             │································································
| s3_images                       |       3071 |                                                                                                                             │································································
| instance_extra                  |       2881 |                                                                                                                             │································································
| instance_faults                 |       1494 |                                                                                                                             │································································
| security_groups                 |        254 |                                                                                                                             │································································
| services                        |        126 |                                                                                                                             │································································
| compute_nodes                   |         56 |                                                                                                                             │································································
| shadow_migrations               |         14 |                                                                                                                             │································································
| tags                            |          9 |                                                                                                                             │································································
+---------------------------------+------------+                                                                                                      │····


This is not a problem in most environment with low usage but in this environment, I think they're creating an instance every second or so and those numbers keep increasing.

Comment 2 Matthew Booth 2019-07-05 12:05:56 UTC

*** This bug has been marked as a duplicate of bug 1693815 ***

Comment 3 David Hill 2019-07-05 12:28:05 UTC
MariaDB [nova_api]> select created_at from request_specs limit 10;
+---------------------+
| created_at          |
+---------------------+
| 2018-10-10 12:55:16 |
| 2018-11-14 09:06:33 |
| 2018-11-14 09:26:22 |
| 2018-11-21 15:44:21 |
| 2018-11-29 00:55:09 |
| 2018-12-03 11:37:19 |
| 2018-12-03 21:35:31 |
| 2018-12-10 12:16:57 |
| 2018-12-10 12:20:20 |
| 2018-12-10 12:23:33 |
+---------------------+
10 rows in set (0.00 sec)

MariaDB [nova_api]> select count(*) from request_specs;
+----------+
| count(*) |
+----------+
|   519463 |
+----------+

Comment 4 David Hill 2019-07-05 12:29:16 UTC
MariaDB [nova_api]> select count(*) from consumers;
+----------+
| count(*) |
+----------+
|  1897238 |
+----------+
1 row in set (0.70 sec)

MariaDB [nova_api]> select created_at from consumers limit 10;
+---------------------+
| created_at          |
+---------------------+
| 2018-10-10 12:33:12 |
| 2018-10-10 12:40:22 |
| 2018-10-10 12:42:40 |
| 2018-10-10 12:42:40 |
| 2018-10-10 12:42:40 |
| 2018-10-10 12:42:41 |
| 2018-10-10 12:42:41 |
| 2018-10-10 12:42:41 |
| 2018-10-10 12:42:41 |
| 2018-10-10 12:42:41 |
+---------------------+
10 rows in set (0.01 sec)

Comment 5 David Hill 2019-07-05 12:30:21 UTC
MariaDB [nova_api]> select count(*) from instance_mappings;
+----------+
| count(*) |
+----------+
|   519267 |
+----------+
1 row in set (0.11 sec)

MariaDB [nova_api]> select created_at from instance_mappings limit 10;
+---------------------+
| created_at          |
+---------------------+
| 2018-10-10 12:55:16 |
| 2018-11-14 09:06:33 |
| 2018-11-14 09:26:22 |
| 2018-11-21 15:44:24 |
| 2018-11-29 00:55:12 |
| 2018-12-03 11:37:19 |
| 2018-12-03 21:35:31 |
| 2018-12-10 12:16:57 |
| 2018-12-10 12:20:20 |
| 2018-12-10 12:23:33 |
+---------------------+
10 rows in set (0.00 sec)

Comment 6 David Hill 2019-07-05 12:36:21 UTC
MariaDB [nova]> select count(*) from task_log;
+----------+
| count(*) |
+----------+
|   257761 |
+----------+
1 row in set (0.05 sec)

MariaDB [nova]> select created_at from task_log limit 10;
+---------------------+
| created_at          |
+---------------------+
| 2018-10-08 18:16:31 |
| 2018-10-08 18:17:08 |
| 2018-10-08 19:00:00 |
| 2018-10-08 19:00:51 |
| 2018-10-08 20:01:35 |
| 2018-10-08 20:01:58 |
| 2018-10-08 21:00:17 |
| 2018-10-08 21:00:37 |
| 2018-10-08 22:00:15 |
| 2018-10-08 22:00:30 |
+---------------------+
10 rows in set (0.00 sec)

Comment 7 Matthew Booth 2019-07-05 13:36:49 UTC
To summarise comments 3, 4, 5, and 6, based on the volume of them remaining the following tables also appear not to be cleaned up:

* nova_api.request_specs
* nova_api.consumers
* nova_api.instance_mappings
* nova_api.task_log

Comment 8 Matthew Booth 2019-07-05 13:43:22 UTC
Based on my investigation of instance_id_mapping I suspect that this isn't an issue with archive_delete_rows, but an artifact of not deleting them. If this is the case we might want to split each of these out into separate bugs, as the fixes are likely to be quite different in each case.

To confirm, please could you check how many deleted rows there are in each of the 4 tables above compared to non-deleted rows? I'm expecting them to be entirely or almost entirely non-deleted. If not we'll have to look elsewhere.

Comment 9 Matthew Booth 2019-07-05 13:48:15 UTC
(In reply to Matthew Booth from comment #7)
> To summarise comments 3, 4, 5, and 6, based on the volume of them remaining
> the following tables also appear not to be cleaned up:
> 
> * nova_api.request_specs
> * nova_api.consumers
> * nova_api.instance_mappings
> * nova_api.task_log

^^^ should have been nova.task_log

Comment 10 Matthew Booth 2019-07-05 13:57:12 UTC
The 3 tables in nova_api do not have shadow tables, or the SoftDeleteMixin, so archive_delete_rows explicitly excludes them:

* nova_api.request_specs
* nova_api.consumers
* nova_api.instance_mappings

nova.task_log has both. Likely tasks are not being deleted. Will need to look

Comment 11 David Hill 2019-07-05 14:05:42 UTC
There're no deleted_at/deleted rows in those tables:

MariaDB [nova_api]> desc request_specs;
+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| created_at    | datetime    | YES  |     | NULL    |                |
| updated_at    | datetime    | YES  |     | NULL    |                |
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| instance_uuid | varchar(36) | NO   | UNI | NULL    |                |
| spec          | mediumtext  | NO   |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)

MariaDB [nova_api]> desc consumers;
+------------+-------------+------+-----+---------+----------------+
| Field      | Type        | Null | Key | Default | Extra          |
+------------+-------------+------+-----+---------+----------------+
| created_at | datetime    | YES  |     | NULL    |                |
| updated_at | datetime    | YES  |     | NULL    |                |
| id         | int(11)     | NO   | PRI | NULL    | auto_increment |
| uuid       | varchar(36) | NO   | UNI | NULL    |                |
| project_id | int(11)     | NO   | MUL | NULL    |                |
| user_id    | int(11)     | NO   |     | NULL    |                |
+------------+-------------+------+-----+---------+----------------+
6 rows in set (0.01 sec)
MariaDB [nova_api]> desc instance_mappings;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| created_at    | datetime     | YES  |     | NULL    |                |
| updated_at    | datetime     | YES  |     | NULL    |                |
| id            | int(11)      | NO   | PRI | NULL    | auto_increment |
| instance_uuid | varchar(36)  | NO   | UNI | NULL    |                |
| cell_id       | int(11)      | YES  | MUL | NULL    |                |
| project_id    | varchar(255) | NO   | MUL | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)



And task_log might be due to no housekeeping tools to delete older values of task logs:
MariaDB [nova]> desc task_log;
+------------------+--------------+------+-----+---------+----------------+
| Field            | Type         | Null | Key | Default | Extra          |
+------------------+--------------+------+-----+---------+----------------+
| created_at       | datetime     | YES  |     | NULL    |                |
| updated_at       | datetime     | YES  |     | NULL    |                |
| deleted_at       | datetime     | YES  |     | NULL    |                |
| id               | int(11)      | NO   | PRI | NULL    | auto_increment |
| task_name        | varchar(255) | NO   | MUL | NULL    |                |
| state            | varchar(255) | NO   |     | NULL    |                |
| host             | varchar(255) | NO   | MUL | NULL    |                |
| period_beginning | datetime     | NO   | MUL | NULL    |                |
| period_ending    | datetime     | NO   | MUL | NULL    |                |
| message          | varchar(255) | NO   |     | NULL    |                |
| task_items       | int(11)      | YES  |     | NULL    |                |
| errors           | int(11)      | YES  |     | NULL    |                |
| deleted          | int(11)      | YES  |     | NULL    |                |
+------------------+--------------+------+-----+---------+----------------+
13 rows in set (0.00 sec)

I guess we would need a new tool to delete older entries in task_log.

Comment 13 melanie witt 2019-07-10 03:18:42 UTC
The issue with request_specs and instance_mappings not being deleted during archive_deleted_rows was a bug upstream [1][2] that was fixed on 2017-11-21 in Queens [3] and should be included in any version of OSP 13. It was also backported to Pike and released in version 16.1.5 upstream.

The fix doesn't help with request_specs and instance_mappings for instances that were deleted prior to the fix though. Manual clean up is required for request_specs and instance_mappings that are related to instances that were deleted before the fix was present. To find what is safe to delete, one would need to script something like "if nova_api.request_specs.instance_uuid not found in nova.instances or nova_cell0.instances, delete request spec". Same for instance_mappings. If any matching record exists in the nova.instances or nova_cell0.instances tables, you should not delete the record. Records should only be deleted if no reference to them can be found in nova.instances or nova_cell0.instances.

[1] https://bugs.launchpad.net/nova/+bug/1724621
[2] https://bugs.launchpad.net/nova/+bug/1678056
[3] https://review.opendev.org/515034

Comment 14 melanie witt 2019-07-10 03:39:59 UTC
The nova_api.consumers table actually belongs to the placement service, which is a separate extracted service starting in the Train release. Because the consumers table belongs to placement, nova does not take care of deleting anything in it during archive_deleted_rows. If one would like to clean up orphaned records in the consumers table, it must be done manually. To find what is safe to delete, one would need to script something like "if nova_api.consumers.uuid not found in nova.instances.uuid or nova_cell0.instances.uuid, delete consumer". If any matching record exists in nova.instances.uuid or nova_cell0.instances.uuid, you should not delete the record. Records should only be deleted if no reference to them can be found in nova.instances or nova_cell0.instances.

Comment 15 melanie witt 2019-07-10 03:45:37 UTC
For nova.task_log, I see no DB API method for deleting them, so there was apparently no intention for ever deleting them. I'm not familiar with what the task_log records are, so more investigation is needed to determine what they are for and how to go about cleaning them up.

Comment 16 melanie witt 2019-07-12 02:40:00 UTC
(In reply to melanie witt from comment #14)
> The nova_api.consumers table actually belongs to the placement service,
> which is a separate extracted service starting in the Train release. Because
> the consumers table belongs to placement, nova does not take care of
> deleting anything in it during archive_deleted_rows. If one would like to
> clean up orphaned records in the consumers table, it must be done manually.
> To find what is safe to delete, one would need to script something like "if
> nova_api.consumers.uuid not found in nova.instances.uuid or
> nova_cell0.instances.uuid, delete consumer". If any matching record exists
> in nova.instances.uuid or nova_cell0.instances.uuid, you should not delete
> the record. Records should only be deleted if no reference to them can be
> found in nova.instances or nova_cell0.instances.

Update: I found that clean up of consumer records was a bug upstream [1] that was fixed in Rocky [2]. So, integrated placement (in nova) will clean up consumer records as of the Rocky release and extracted placement has the fixed behavior since placement version 0.1.0 [3]. The fix will not clean up orphaned consumer records from before the fix was applied to the code. Such records must be cleaned up manually.

[1] https://bugs.launchpad.net/nova/+bug/1780799
[2] https://review.opendev.org/581086
[3] https://github.com/openstack/placement/commit/580a3b1da67ce79ce7408d0540976e59e43020b9

Comment 18 melanie witt 2019-09-06 18:02:55 UTC
(In reply to David Hill from comment #17)
> Hello guys,
> 
>   I've got a similar issue in RHOSP10 ... should this also be fixed there ?

<snip of nova.instance_id_mappings and nova.task_log table rows not being cleaned up>

Apologies for the delayed reply -- I had missed this comment.

The issue with nova.instance_id_mappings was fixed in RHOSP10 and is available in openstack-nova-14.1.0-47.el7ost [1]. If the extra rows were created before this version, they will need to be cleaned up manually. There is an explanation about how to do the manual cleanup in comment 12.

For the issue of nova.task_log entries, I got some info from upstream today [2] and learned that these records are related to the server usage audit log API [3] which is dependent on the [DEFAULT]/instance_usage_audit config option being set to True (it defaults to False). Enablement of the config option causes nova-compute to emit notifications which are consumed by OpenStack Telemetry. When the config option is enabled, records will be periodically created in the nova.task_log table. And there is no cleanup of the records in nova.

I found that we enable the config option in THT here [5], which causes generation of the nova.task_log records.

In the upstream conversation [2], the possibility of deprecation of the server usage audit log API was mentioned. Some questions: is Telemetry still actively using the audit log notifications? If not, we could make a change to THT to stop setting [DEFAULT]/instance_usage_audit = True and prevent further unused nova.task_log records from being created.

Either way, cleanup of nova.task_log must be done manually. Perhaps something like, delete all records prior to a timestamp for which the customer no longer needs Telemetry data.

I'm not sure what the next step upstream will be for nova.task_log. First, I want to verify whether the data is still used by Telemetry. If not, we would deprecate and remove the server usage audit log API altogether. If so, maybe we could add a new nova-manage command for cleaning up nova.task_log records that has a --before <date> option as part of it.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1696757
[2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-09-06.log.html#t2019-09-06T14:19:50
[3] https://docs.openstack.org/api-ref/compute/#server-usage-audit-log-os-instance-usage-audit-log
[4] https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.instance_usage_audit
[5] https://github.com/openstack/tripleo-heat-templates/blob/ed6253eac4683a8beecda075cdbb1abae0ce6b90/deployment/nova/nova-compute-container-puppet.yaml#L522

Comment 19 melanie witt 2019-09-07 00:01:53 UTC
(In reply to melanie witt from comment #18)
> I'm not sure what the next step upstream will be for nova.task_log. First, I
> want to verify whether the data is still used by Telemetry. If not, we would
> deprecate and remove the server usage audit log API altogether. If so, maybe
> we could add a new nova-manage command for cleaning up nova.task_log records
> that has a --before <date> option as part of it.

I've started a ML thread here:

http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009199.html

Comment 20 melanie witt 2019-10-16 15:13:09 UTC
I'm going to use this rhbz for an RFE to add cleanup for nova.task_log database records as a new nova-manage command, as described here:

http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009245.html

since the nova.task_log records are the only item mentioned in this rhbz that do *not* yet have a solution for cleanup.

Comment 21 melanie witt 2019-12-08 02:19:07 UTC
(In reply to melanie witt from comment #20)
> I'm going to use this rhbz for an RFE to add cleanup for nova.task_log
> database records as a new nova-manage command, as described here:
> 
> http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009245.
> html
> 
> since the nova.task_log records are the only item mentioned in this rhbz
> that do *not* yet have a solution for cleanup.

After thinking about it more, I have created a new, separate rhbz for adding cleanup for nova.task_log database records upstream:

https://bugzilla.redhat.com/show_bug.cgi?id=1780867

For this rhbz, there is still an open issue with the nova_api.consumers table cleanup where a fix landed in Rocky upstream. This still leaves a problem for Queens (OSP13) users and recently mporrato encountered a situation where > 4M nova_api.consumers records had built up in the system, unable to be cleaned up.

In response to this, I have proposed a backport of the upstream Rocky fix to Queens:

https://review.opendev.org/697398

which is considered low-priority as Queens is in the Extended Maintenance stage of the lifecycle upstream.

I will also pursue backport of this fix downstream to OSP13 and use this rhbz for the backport.

Comment 22 melanie witt 2019-12-08 02:24:37 UTC
Re-posting most of comment 12 as public:

The issue with instance_id_mappings not being deleted was a bug upstream [1] that was fixed on 2018-09-21 in Stein [2] and backported to Rocky (version 18.0.2 released on 2018-10-08) and Queens (version 17.0.11 released on 2019-07-10). It was backported to OSP 13 and was released in version openstack-nova-17.0.9-10.el7ost.

The fix doesn't help with instance_id_mappings for instances that were deleted before the fix was applied though. Manual steps are required for records related to instances that were related before the fix was applied.

To be safest, I would suggest setting the 'deleted' field for instance_id_mappings that match instances where deleted != 0 (similar to how instances are considered deleted before archival):

  update instance_id_mappings set deleted = id where uuid in (select uuid from instances where deleted != 0);

This will make it so the affected instance_id_mappings will be archived at the same time that instances are archived.

and then delete instance_id_mappings that do not match any instances:

  delete from instance_id_mappings where uuid not in (select uuid from instances);

This will delete instance_id_mappings that match no instances in the database.

[1] https://bugs.launchpad.net/nova/+bug/1786298
[2] https://review.opendev.org/591558

Comment 23 melanie witt 2019-12-11 03:12:46 UTC
(In reply to melanie witt from comment #16)
> (In reply to melanie witt from comment #14)
> > The nova_api.consumers table actually belongs to the placement service,
> > which is a separate extracted service starting in the Train release. Because
> > the consumers table belongs to placement, nova does not take care of
> > deleting anything in it during archive_deleted_rows. If one would like to
> > clean up orphaned records in the consumers table, it must be done manually.
> > To find what is safe to delete, one would need to script something like "if
> > nova_api.consumers.uuid not found in nova.instances.uuid or
> > nova_cell0.instances.uuid, delete consumer". If any matching record exists
> > in nova.instances.uuid or nova_cell0.instances.uuid, you should not delete
> > the record. Records should only be deleted if no reference to them can be
> > found in nova.instances or nova_cell0.instances.
> 
> Update: I found that clean up of consumer records was a bug upstream [1]
> that was fixed in Rocky [2]. So, integrated placement (in nova) will clean
> up consumer records as of the Rocky release and extracted placement has the
> fixed behavior since placement version 0.1.0 [3]. The fix will not clean up
> orphaned consumer records from before the fix was applied to the code. Such
> records must be cleaned up manually.

After talking to mporrato in #rhos-compute, I realized there is a simpler way to manually cleanup orphaned consumers table records. Noting it here:

  delete from nova_api.consumers where nova_api.uuid not in (select nova_api.instance_uuid from nova_api.instance_mappings);

> [1] https://bugs.launchpad.net/nova/+bug/1780799
> [2] https://review.opendev.org/581086
> [3]
> https://github.com/openstack/placement/commit/
> 580a3b1da67ce79ce7408d0540976e59e43020b9

Comment 24 melanie witt 2019-12-11 03:33:43 UTC
(In reply to melanie witt from comment #23)
> (In reply to melanie witt from comment #16)
> > (In reply to melanie witt from comment #14)
> > > The nova_api.consumers table actually belongs to the placement service,
> > > which is a separate extracted service starting in the Train release. Because
> > > the consumers table belongs to placement, nova does not take care of
> > > deleting anything in it during archive_deleted_rows. If one would like to
> > > clean up orphaned records in the consumers table, it must be done manually.
> > > To find what is safe to delete, one would need to script something like "if
> > > nova_api.consumers.uuid not found in nova.instances.uuid or
> > > nova_cell0.instances.uuid, delete consumer". If any matching record exists
> > > in nova.instances.uuid or nova_cell0.instances.uuid, you should not delete
> > > the record. Records should only be deleted if no reference to them can be
> > > found in nova.instances or nova_cell0.instances.
> > 
> > Update: I found that clean up of consumer records was a bug upstream [1]
> > that was fixed in Rocky [2]. So, integrated placement (in nova) will clean
> > up consumer records as of the Rocky release and extracted placement has the
> > fixed behavior since placement version 0.1.0 [3]. The fix will not clean up
> > orphaned consumer records from before the fix was applied to the code. Such
> > records must be cleaned up manually.
> 
> After talking to mporrato in #rhos-compute, I realized there is a simpler
> way to manually cleanup orphaned consumers table records. Noting it here:
> 
>   delete from nova_api.consumers where nova_api.uuid not in (select
> nova_api.instance_uuid from nova_api.instance_mappings);

Sorry, correction:

  delete from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings);

> > [1] https://bugs.launchpad.net/nova/+bug/1780799
> > [2] https://review.opendev.org/581086
> > [3]
> > https://github.com/openstack/placement/commit/
> > 580a3b1da67ce79ce7408d0540976e59e43020b9

Comment 32 errata-xmlrpc 2020-03-10 11:27:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0759


Note You need to log in before you can comment on or make changes to this bug.