Description of problem: When I start the CSI certification job [1] for Cinder CSI driver, at least one snapshot remains in CI undeleted. [1] https://github.com/openshift/release/pull/14243 How reproducible: Always Steps to Reproduce: 1. Start the CI job https://github.com/openshift/release/pull/14243 2. After it's done, check the snapshots in CI with `openstack volume snapshot list` command Actual results: At least one snapshot remains Expected results: There are no snapshots from the CI job
*** This bug has been marked as a duplicate of bug 1909136 ***
Reopening it as it has been observed while running the csi test suite. Versions: OCP: 4.8.0-0.nightly-2021-06-03-055145 OSP: RHOS-16.1-RHEL-8-20210323.n.0 IPI installation. There are two testcases that are failing: - External Storage [Driver: cinder.csi.openstack.org] [Testpattern: Dynamic Snapshot (delete policy)] snapshottable[Feature:VolumeSnapshotDataSource] volume snapshot controller should check snapshot fields, check restore correctly works after modifying source data, check deletion - External Storage [Driver: cinder.csi.openstack.org] [Testpattern: Pre-provisioned Snapshot (delete policy)] snapshottable[Feature:VolumeSnapshotDataSource] volume snapshot controller should check snapshot fields, check restore correctly works after modifying source data, check deletion In a nutshell, the tests are creating a pod attached to PVC, destroying the pod, creating an snapshot. Then it creates another pod which is using a PVC based on the previously taken snapshot and then destroying everything. The test is failing while destroying the first PVC, as it has a snapshot, it cannot be destroyed: $ oc logs openstack-cinder-csi-driver-controller-6784688d86-ksk8n -n openshift-cluster-csi-drivers csi-driver [...] E0608 10:16:56.210435 1 utils.go:85] GRPC error: rpc error: code = Internal desc = DeleteVolume failed with error Bad request with: [DELETE https://overcloud.redhat.local:13776/v3/b20e10e10b514fb8a196b7734776b991/volumes/8fc77933-e2b5-4e0f-8cdd-5554b5bb0406], error message: {"badRequest": {"code": 400, "message": "Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer."}} The volume and the snapshot are still present from OSP perspective: $ openstack volume show 8fc77933-e2b5-4e0f-8cdd-5554b5bb0406 +------------------------------+-------------------------------------------------+ | Field | Value | +------------------------------+-------------------------------------------------+ | attachments | [] | | availability_zone | cinderAZ0 | | bootable | false | | consistencygroup_id | None | | created_at | 2021-06-08T10:13:08.000000 | | description | Created by OpenStack Cinder CSI driver | | encrypted | False | | id | 8fc77933-e2b5-4e0f-8cdd-5554b5bb0406 | | multiattach | False | | name | pvc-a85e9a2b-8356-4136-a5e0-26a54df95f36 | | os-vol-tenant-attr:tenant_id | b20e10e10b514fb8a196b7734776b991 | | properties | cinder.csi.openstack.org/cluster='ostest-wjzt5' | | replication_status | None | | size | 1 | | snapshot_id | None | | source_volid | None | | status | available | | type | tripleo | | updated_at | 2021-06-08T10:13:56.000000 | | user_id | 6752583b0f3141bcbd63848cceb9e67e | +------------------------------+-------------------------------------------------+ $ openstack volume snapshot show snapshot-8b9c29c0-8c75-4f88-ad83-f36fb3cfca85 +--------------------------------------------+-----------------------------------------------+ | Field | Value | +--------------------------------------------+-----------------------------------------------+ | created_at | 2021-06-08T10:13:29.000000 | | description | Created by OpenStack Cinder CSI driver | | id | 3fe214d0-eb63-4bea-bbc4-747e80b4f310 | | name | snapshot-8b9c29c0-8c75-4f88-ad83-f36fb3cfca85 | | os-extended-snapshot-attributes:progress | 100% | | os-extended-snapshot-attributes:project_id | b20e10e10b514fb8a196b7734776b991 | | properties | | | size | 1 | | status | available | | updated_at | 2021-06-08T10:14:07.000000 | | volume_id | 8fc77933-e2b5-4e0f-8cdd-5554b5bb0406 | +--------------------------------------------+-----------------------------------------------+ However on OCP, the resources have dissapeared (neither PVC nor volumeSnapShot). $ oc get pvc -A NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE demo pvc-cinder-az0 Bound pvc-f5b28358-c883-415f-95d5-51ef458e85a8 1Gi RWO topology-aware-cinder-az0 4d19h demo pvc-cinder-az1 Bound pvc-174a5a6e-e66e-4b30-9f7d-e085e2039f29 1Gi RWO topology-aware-cinder-az1 4d19h $ oc get volumesnapshot -A No resources found must-gather and test-suite logs on http://file.rdu.redhat.com/rlobillo/BZ1917710.tgz
@rlobillo This bug was closed as a duplicate of Bug 1909136, which is still open. Do you have reason to believe it's not a duplicate?
Hello @Pierre. The duplicated BZ is related to the deletion of snapshots while cluster removal. This one is however related to two failing test cases - The test is failing while destroying the first PVC, as it has a snapshot, it cannot be destroyed.
Understood. Both bugs may have the same root cause in Cinder CSI not being able to mark snapshots with some identifier, but once that limitation is lifted each bug may require a separate fix (and separate verification). Thank you for the explanation.
IIUC this is a Cinder issue that is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1989680.
Removing the Triaged keyword because: * the target release value is missing * the QE automation assessment (flag qe_test_coverage) is missing
*** Bug 1909136 has been marked as a duplicate of this bug. ***
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-8838