Bug 1917710 - Cinder snapshots remain in the system after e2e job
Summary: Cinder snapshots remain in the system after e2e job
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: ShiftStack Bugwatcher
QA Contact: Jon Uriarte
URL:
Whiteboard:
: 1909136 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-19 09:01 UTC by Mike Fedosin
Modified: 2023-03-21 15:49 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:01:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mike Fedosin 2021-01-19 09:01:44 UTC
Description of problem:
When I start the CSI certification job [1] for Cinder CSI driver, at least one snapshot remains in CI undeleted.

[1] https://github.com/openshift/release/pull/14243

How reproducible:

Always

Steps to Reproduce:
1. Start the CI job https://github.com/openshift/release/pull/14243
2. After it's done, check the snapshots in CI with `openstack volume snapshot list` command

Actual results:
At least one snapshot remains

Expected results:
There are no snapshots from the CI job

Comment 3 Pierre Prinetti 2021-05-27 15:31:04 UTC

*** This bug has been marked as a duplicate of bug 1909136 ***

Comment 4 rlobillo 2021-06-08 13:44:17 UTC
Reopening it as it has been observed while running the csi test suite. 

Versions:
  OCP: 4.8.0-0.nightly-2021-06-03-055145
  OSP: RHOS-16.1-RHEL-8-20210323.n.0
IPI installation.

There are two testcases that are failing: 

- External Storage [Driver: cinder.csi.openstack.org] [Testpattern: Dynamic Snapshot (delete policy)] snapshottable[Feature:VolumeSnapshotDataSource] volume snapshot controller  should check snapshot fields, check restore correctly works after modifying source data, check deletion

- External Storage [Driver: cinder.csi.openstack.org] [Testpattern: Pre-provisioned Snapshot (delete policy)] snapshottable[Feature:VolumeSnapshotDataSource] volume snapshot controller  should check snapshot fields, check restore correctly works after modifying source data, check deletion

In a nutshell, the tests are creating a pod attached to PVC, destroying the pod, creating an snapshot. Then it creates another pod which is using a PVC based on the previously taken snapshot and then destroying everything. The test is failing while destroying the first PVC, as it has a snapshot, it cannot be destroyed:

	$ oc logs openstack-cinder-csi-driver-controller-6784688d86-ksk8n -n openshift-cluster-csi-drivers csi-driver
	[...]
	E0608 10:16:56.210435       1 utils.go:85] GRPC error: rpc error: code = Internal desc = DeleteVolume failed with error Bad request with: [DELETE https://overcloud.redhat.local:13776/v3/b20e10e10b514fb8a196b7734776b991/volumes/8fc77933-e2b5-4e0f-8cdd-5554b5bb0406], error message: {"badRequest": {"code": 400, "message": "Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer."}}

The volume and the snapshot are still present from OSP perspective:

	$ openstack volume show 8fc77933-e2b5-4e0f-8cdd-5554b5bb0406
	+------------------------------+-------------------------------------------------+
	| Field                        | Value                                           |
	+------------------------------+-------------------------------------------------+
	| attachments                  | []                                              |
	| availability_zone            | cinderAZ0                                       |
	| bootable                     | false                                           |
	| consistencygroup_id          | None                                            |
	| created_at                   | 2021-06-08T10:13:08.000000                      |
	| description                  | Created by OpenStack Cinder CSI driver          |
	| encrypted                    | False                                           |
	| id                           | 8fc77933-e2b5-4e0f-8cdd-5554b5bb0406            |
	| multiattach                  | False                                           |
	| name                         | pvc-a85e9a2b-8356-4136-a5e0-26a54df95f36        |
	| os-vol-tenant-attr:tenant_id | b20e10e10b514fb8a196b7734776b991                |
	| properties                   | cinder.csi.openstack.org/cluster='ostest-wjzt5' |
	| replication_status           | None                                            |
	| size                         | 1                                               |
	| snapshot_id                  | None                                            |
	| source_volid                 | None                                            |
	| status                       | available                                       |
	| type                         | tripleo                                         |
	| updated_at                   | 2021-06-08T10:13:56.000000                      |
	| user_id                      | 6752583b0f3141bcbd63848cceb9e67e                |
	+------------------------------+-------------------------------------------------+

	$ openstack volume snapshot show snapshot-8b9c29c0-8c75-4f88-ad83-f36fb3cfca85
	+--------------------------------------------+-----------------------------------------------+
	| Field                                      | Value                                         |
	+--------------------------------------------+-----------------------------------------------+
	| created_at                                 | 2021-06-08T10:13:29.000000                    |
	| description                                | Created by OpenStack Cinder CSI driver        |
	| id                                         | 3fe214d0-eb63-4bea-bbc4-747e80b4f310          |
	| name                                       | snapshot-8b9c29c0-8c75-4f88-ad83-f36fb3cfca85 |
	| os-extended-snapshot-attributes:progress   | 100%                                          |
	| os-extended-snapshot-attributes:project_id | b20e10e10b514fb8a196b7734776b991              |
	| properties                                 |                                               |
	| size                                       | 1                                             |
	| status                                     | available                                     |
	| updated_at                                 | 2021-06-08T10:14:07.000000                    |
	| volume_id                                  | 8fc77933-e2b5-4e0f-8cdd-5554b5bb0406          |
	+--------------------------------------------+-----------------------------------------------+

However on OCP, the resources have dissapeared (neither PVC nor volumeSnapShot).

	$ oc get pvc -A
	NAMESPACE   NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE                                                   
	demo        pvc-cinder-az0   Bound    pvc-f5b28358-c883-415f-95d5-51ef458e85a8   1Gi        RWO            topology-aware-cinder-az0   4d19h                                                 
	demo        pvc-cinder-az1   Bound    pvc-174a5a6e-e66e-4b30-9f7d-e085e2039f29   1Gi        RWO            topology-aware-cinder-az1   4d19h                                                 
	$ oc get volumesnapshot -A
	No resources found

must-gather and test-suite logs on http://file.rdu.redhat.com/rlobillo/BZ1917710.tgz

Comment 6 Pierre Prinetti 2021-07-23 08:52:11 UTC
@rlobillo 
This bug was closed as a duplicate of Bug 1909136, which is still open.
Do you have reason to believe it's not a duplicate?

Comment 7 rlobillo 2021-07-23 09:35:39 UTC
Hello @Pierre.

The duplicated BZ is related to the deletion of snapshots while cluster removal. This one is however related to two failing test cases - The test is failing while destroying the first PVC, as it has a snapshot, it cannot be destroyed.

Comment 9 Pierre Prinetti 2021-07-23 10:06:39 UTC
Understood. Both bugs may have the same root cause in Cinder CSI not being able to mark snapshots with some identifier, but once that limitation is lifted each bug may require a separate fix (and separate verification).

Thank you for the explanation.

Comment 11 Martin André 2021-08-24 16:15:57 UTC
IIUC this is a Cinder issue that is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1989680.

Comment 12 ShiftStack Bugwatcher 2021-11-25 16:11:14 UTC
Removing the Triaged keyword because:
* the target release value is missing

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 13 Emilien Macchi 2021-12-15 16:40:56 UTC
*** Bug 1909136 has been marked as a duplicate of this bug. ***

Comment 16 Shiftzilla 2023-03-09 01:01:01 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-8838


Note You need to log in before you can comment on or make changes to this bug.