Description of problem (please be detailed as possible and provide log snippests): In test_multiregion_mirror, we write objects to a bucket backed by two backingstores. We take each of them down, one at a time, try to read from the bucket, and verify the object integrity. The test fails because of reasons that are unclear to us at the moment. However, after that, we try to clean all test resources up. The OBC and bucketclass are deleted successfully, but the backingstore deletion hangs, and the backingstore remains even *hours* after the command was sent. We have logs from the run, but some of them were overwritten (because the test failed twice - once when writing the objects, and twice because the backingstore deletion timed out). When inspecting the logs, *please note the time the file was created in*. The first logs were collected around 2:07, the new ones were collected around 2:18. The second ones are *post-cleanup* and *do not* reflect the status of the bucket/backingstores as of the error. Version of all relevant components (if applicable): v4.7.0-294.ci Also seen in v4.8.0-303.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? It's possible to remove the finalizer from the CRD and delete again, but this might lead to problems in the system Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 5 Can this issue reproducible? Yes, frequency unknown Can this issue reproduce from the UI? Unknown If this is a regression, please provide more details to justify this: Unknown Steps to Reproduce: 1. Create two AWS backingstore on different regions 2. Create a bucketclass that uses them with a Mirror policy 3. Create an OBC that uses the bucketclass 4. Write objects to the OBC 5. Run into NoSuchBucket error 6. Delete the bucket, then the bucketclass 7. Verify the OBC and bucketclass were removed 8. Remove the backingstores. deletion hangs - Could not delete BackingStore \"aws-backingstore-9fefe5ce4d524fb1879746f\" in namespace \"openshift-storage\" as it is being used by one or more buckets" Actual results: Backingstore deletion hangs because backingstore is used by bucket that was deleted Expected results: Backingstore deletion succeeds Additional info: Logs (please note time of creation, 2:07 logs are outdated, 2:18 are up-to-date) http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j001vi1cs33-t4a/j001vi1cs33-t4a_20210313T055042/logs/failed_testcase_ocs_logs_1615617722/test_multiregion_mirror_ocs_logs/
The issue here is that the deletion of the bucket and its objects in noobaa is being handled in the background and I see in logs that there is an infinite loop of object deletion. This infinite loop keeps the bucket not deleted so the backingstore is really can not be deleted because of it being attached to a bucket. Also, I found that the objects that are not deleted are uncompleted multipart upload, fixed in the attached PR.
Verified via regression
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041