Bug 2002660 - E2E tests leave local PVs behind
Summary: E2E tests leave local PVs behind
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.z
Assignee: Tomas Smetana
QA Contact: Rohit Patil
URL:
Whiteboard:
Depends On: 1959445
Blocks: 1952931
TreeView+ depends on / blocked
 
Reported: 2021-09-09 12:59 UTC by Tomas Smetana
Modified: 2022-01-25 12:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1959445
Environment:
Last Closed: 2022-01-25 12:13:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2022:0172 0 None None None 2022-01-25 12:13:25 UTC

Description Tomas Smetana 2021-09-09 12:59:23 UTC
+++ This bug was initially created as a clone of Bug #1959445 +++

Description of problem:

After CSI migration tests finish, there are some leftover PVs in the cluster.

See https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-broken#periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-csi-migration

From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-csi-migration/1391694972192821248:

+persistentvolume/local-pv9zb5m
+persistentvolume/local-pvpt5mt
 persistentvolume/pvc-e3b4f183-4c75-4d9c-9385-18b088abcd1b
 persistentvolume/pvc-f15067db-d6d5-4c94-9079-d28bc37ed925
+persistentvolume/pvc-f9c9f7d0-51b8-402a-80f8-4bbc1982a6d0
ERROR: Timed out waiting for PVs to get deleted.
ERROR: It seems that some test left some PVs behind.
ERROR: Check the diff between expected and existing PVs above.

So we have three PVs left behind and not deleted. Two of them are local, the third one looks like mock:

    name: pvc-f9c9f7d0-51b8-402a-80f8-4bbc1982a6d0
...
    csi:
      driver: csi-mock-e2e-csi-mock-volumes-7168
...
    phase: Released

(from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-csi-migration/1391694972192821248/artifacts/e2e-aws-csi-migration/storage-pv-check/artifacts/pvs-29.yaml)

How reproducible:
Always

Steps to Reproduce:
Run openshift conformance/parallel tests against OCP on AWS.

CSI migration is enabled in the linked CI jobs, IMO it should not affect local or mock volumes, but one never knows. We do not check for leftover PVs in other e2e jobs, so we don't know if there are some on GCE or without migration.

Actual results:
At least two local PVs left behind (this is 100% reproducible)

Sometimes, a CSI mock PV left behind (started appearing in https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-csi-migration/1391332579759624192 ???)

Expected results:
No PVs left behind

--- Additional comment from Tomas Smetana on 2021-05-13 15:14:47 CEST ---

Just a minor update on this: both the local volumes have "persistentVolumeReclaimPolicy: Retain". The mock volume is being used by a snapshot test. Logs from the snapshot controller:

createSnapshotContent: Creating content for snapshot e2e-csi-mock-volumes-7168/snapshot-j2zf8 through the plugin ...
Added protection finalizer to persistent volume claim e2e-csi-mock-volumes-7168/snapshot-test-pvc
Keeping PVC e2e-csi-mock-volumes-7168/snapshot-test-pvc, it is used by snapshot e2e-csi-mock-volumes-7168/snapshot-j2zf8
...
checkandRemovePVCFinalizer[snapshot-j2zf8]: Remove Finalizer for PVC snapshot-test-pvc as it is not used by snapshots in creation
...
cannot get claim from snapshot [snapshot-9jkcn]: [failed to retrieve PVC snapshot-test-pvc from the lister: "persistentvolumeclaim \"snapshot-test-pvc\" not found"] Claim may be deleted already

I don't know yet how is the snapshot creation and PV protection interacting with the CSI migration.

--- Additional comment from Tomas Smetana on 2021-05-25 15:49:19 CEST ---

The local volumes come from these two tests: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/persistent_volumes-local.go#L375-L381. They look to create two local volumes in the ginkgo.BeforeEach that get cleaned up in ginkgo.AfterEach, but then the test funciton itself (https://github.com/kubernetes/kubernetes/blob/7229ea343dd649f9a6c20fa1fd6b13e602f3f082/test/e2e/storage/persistent_volumes-local.go#L719) creates also a volume and that one seems to be never removed.

--- Additional comment from Tomas Smetana on 2021-05-25 16:49:06 CEST ---

Local volumes leak upstream fix:
https://github.com/kubernetes/kubernetes/pull/102292

--- Additional comment from Fabio Bertinatto on 2021-06-02 09:10:02 CEST ---



--- Additional comment from Tomas Smetana on 2021-09-09 14:57:11 CEST ---

This got merged to 4.9 with rebase to 1.22.

Comment 2 Tomas Smetana 2022-01-18 16:17:44 UTC
This has been merged in 4.8 with rebase to the latest 4.21. The PR is no longer needed: moving to MODIFIED.

Comment 6 Tomas Smetana 2022-01-21 10:15:33 UTC
This requires rebase of the openshift/origin 4.8 branch.

Comment 8 errata-xmlrpc 2022-01-25 12:13:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.28 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0172


Note You need to log in before you can comment on or make changes to this bug.