Description of problem: There is a large amount of orphaned volumes in the Vexxhost CI tenant, not tied to any running cluster. MOC does not seem to have the problem (MOC runs some 4.6 presubmit jobs, while Vexxhost runs the 4.6 periodic jobs + some presubmit). It appears the job successfully deleted the cluster but somehow left volumes behind. We need to understand what causes the leaks and fix it. We've observed the 4.6 job to leak about 75 volumes per day since roughly 9/16. We have a quota of 200 volumes on vexxhost. Hitting the quota causes the jobs to fail with: Sep 21 13:19:56.303: INFO: cinder output: ERROR: VolumeLimitExceeded: Maximum number of volumes allowed (200) exceeded for quota 'volumes'. (HTTP 413) (Request-ID: req-55b7a487-8663-4376-836c-9349bf30ea92)
MOC does have the same problem; I probably pruned them right before you checked.
[1] in flight upstream to help with debugging suggests it may be some time before we have a handle on this. Leaking volumes is not great, but also seems unlikely to be severe enough to block 4.6 going GA. Punting to 4.7, and fixes can be backported to 4.6.z. [1]: https://github.com/kubernetes/kubernetes/pull/95003
I'm hopeful that [1] is going to fix the volume leak. The upstream tests remove the volumes using the `cinder delete <volume_name>` command and it appears to have started failing since we switched cinder client from stein release to train [2]. The error coming back is: Delete for volume e2e-volumemode-9872 failed: Invalid filters all_tenants,name are found in query options According to Jan, `cinder delete <volume_id>` doesn't yield such error and he has switched to volume IDs in his patch. It's good practice anyway to use resource IDs rather than names in OpenStack. [1] https://github.com/kubernetes/kubernetes/pull/95003 [2] https://github.com/openshift/installer/pull/4175
OpenStack CI is no longer leaking volumes since 2020-10-02T19:23:37.000000. Verified against both MOC and Vexxhost.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196