Bug 2120749

Summary: Cinder backup does not cleanup rbd volume snapshot (ceph)
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: openstack-cinderAssignee: Eric Harney <eharney>
Status: CLOSED MIGRATED QA Contact: Evelina Shames <eshames>
Severity: high Docs Contact: RHOS Documentation Team <rhos-docs>
Priority: medium    
Version: 16.2 (Train)CC: alolivei, eharney, eprado, mlaniel, rvaradar
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-12-10 14:07:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Flusche 2022-08-23 16:36:42 UTC
Description of problem:
Cinder backup does not cleanup rbd volume snapshot (ceph)

Version-Release number of selected component (if applicable):
16.2 current

How reproducible:
100%

Steps to Reproduce:

Lab Example:

$ openstack volume create  --size 10 test01
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| consistencygroup_id | None                                 |
| created_at          | 2022-08-23T16:19:38.000000           |
| description         | None                                 |
| encrypted           | False                                |
| id                  | 03ccce5c-0572-4855-9bab-0bb706d2793f |
| migration_status    | None                                 |
| multiattach         | False                                |
| name                | test01                               |
| properties          |                                      |
| replication_status  | None                                 |
| size                | 10                                   |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| status              | creating                             |
| type                | tripleo                              |
| updated_at          | None                                 |
| user_id             | 13deedfd3f1a4b609794840cc1f9367e     |
+---------------------+--------------------------------------+

$ openstack volume list |grep test01
| 03ccce5c-0572-4855-9bab-0bb706d2793f | test01      | available |   10 |                                |

- Verify volume in ceph:

# rbd du -p volumes volume-03ccce5c-0572-4855-9bab-0bb706d2793f
NAME                                        PROVISIONED USED 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f      10 GiB  0 B 

- Create backup

$ openstack volume backup create test01
+-------+--------------------------------------+
| Field | Value                                |
+-------+--------------------------------------+
| id    | bd86b8c6-ba93-4a3f-ab84-b4954fa46d2f |
| name  | None                                 |
+-------+--------------------------------------+

- Now a snapshot exist in volumes pool:

# rbd du -p volumes volume-03ccce5c-0572-4855-9bab-0bb706d2793f
NAME                                                                                                            PROVISIONED USED 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f.snap.1661271680.6115808      10 GiB  0 B 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f                                                                          10 GiB  0 B 
<TOTAL>                                                                                                              10 GiB  0 B 

- Backup in backups pool:

# rbd du -p backups
NAME                                                                                                                                                        PROVISIONED USED 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f.backup.bd86b8c6-ba93-4a3f-ab84-b4954fa46d2f.snap.1661271680.6115808      10 GiB  0 B 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f.backup.bd86b8c6-ba93-4a3f-ab84-b4954fa46d2f                                                                          10 GiB  0 B 
<TOTAL>                                                                                                                                                          10 GiB  0 B 

- Remove backup:

$ openstack volume backup delete bd86b8c6-ba93-4a3f-ab84-b4954fa46d2f

- Snaphost still exists in volumes pool:

# rbd du -p volumes volume-03ccce5c-0572-4855-9bab-0bb706d2793f
NAME                                                                                                            PROVISIONED USED 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f.snap.1661271680.6115808      10 GiB  0 B 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f                                                                          10 GiB  0 B 
<TOTAL>                                                                                                              10 GiB  0 B 

- Backup has been removed in backups pool:

# rbd du -p backups
(nil)

- Create 2nd backup:

$ openstack volume backup create test01
+-------+--------------------------------------+
| Field | Value                                |
+-------+--------------------------------------+
| id    | 23a44964-e1b2-4e23-8730-b82e338d0086 |
| name  | None                                 |
+-------+--------------------------------------+

- Now we have multiple snapshots in volumes pool:

# rbd du -p volumes volume-03ccce5c-0572-4855-9bab-0bb706d2793f
NAME                                                                                                            PROVISIONED USED 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f.snap.1661271680.6115808      10 GiB  0 B 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f.snap.1661271809.5704668      10 GiB  0 B 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f                                                                          10 GiB  0 B 
<TOTAL>                                                                                                              10 GiB  0 B 

- And a single backup

# rbd du -p backups
NAME                                                                                                                                                        PROVISIONED USED 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f.backup.23a44964-e1b2-4e23-8730-b82e338d0086.snap.1661271809.5704668      10 GiB  0 B 
volume-03ccce5c-0572-4855-9bab-0bb706d2793f.backup.23a44964-e1b2-4e23-8730-b82e338d0086                                                                          10 GiB  0 B 
<TOTAL>                                                                                                                                                          10 GiB  0 B 

- These snapshots continue to build up in volumes pools with additional backups.
- I don't see any errors in cinder volume or backup logs:

# grep -i error /var/log/containers/cinder/cinder-volume.log
(nil)

# grep -i error /var/log/containers/cinder/cinder-backup.log
(nil)

Comment 2 Eric Harney 2022-09-26 20:03:29 UTC
The root complaint here seems to be that backup snapshots are causing extra RBD storage to be used, and backup snapshots are not always deleted when backups are deleted.

https://review.opendev.org/c/openstack/cinder/+/810457 is an upstream attempt to address the storage usage by limiting the number of backup snapshots that are stored, deleting them when they are no longer necessary.  This means that less space is held, but backups take longer to restore.  This patch is still in review and needs to be assessed for whether these snapshots can always be successfully deleted.  (Currently, the Ceph backup driver attempts to delete the snapshots when backups are deleted, but it isn't clear whether this will always succeed.)

Comment 3 Alexon Oliveira 2024-03-08 18:36:12 UTC
(In reply to Eric Harney from comment #2)
> The root complaint here seems to be that backup snapshots are causing extra
> RBD storage to be used, and backup snapshots are not always deleted when
> backups are deleted.
> 
> https://review.opendev.org/c/openstack/cinder/+/810457 is an upstream
> attempt to address the storage usage by limiting the number of backup
> snapshots that are stored, deleting them when they are no longer necessary. 
> This means that less space is held, but backups take longer to restore. 
> This patch is still in review and needs to be assessed for whether these
> snapshots can always be successfully deleted.  (Currently, the Ceph backup
> driver attempts to delete the snapshots when backups are deleted, but it
> isn't clear whether this will always succeed.)

Eric, what's the current status of this BZ? It's been more than a whole year without any interaction here. Thanks.