Bug 1940476

Summary: Backingstore deletion hangs
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Ben Eli <belimele>
Component: Multi-Cloud Object GatewayAssignee: Romy Ayalon <rayalon>
Status: CLOSED ERRATA QA Contact: aberner
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: ebenahar, etamir, muagarwa, nbecker, ocs-bugs
Target Milestone: ---Keywords: AutomationBackLog
Target Release: OCS 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.7.0-336.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-19 09:20:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Eli 2021-03-18 14:10:36 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In test_multiregion_mirror, we write objects to a bucket backed by two backingstores. We take each of them down, one at a time, try to read from the bucket, and verify the object integrity.
The test fails because of reasons that are unclear to us at the moment.
However, after that, we try to clean all test resources up. The OBC and bucketclass are deleted successfully, but the backingstore deletion hangs, and the backingstore remains even *hours* after the command was sent.

We have logs from the run, but some of them were overwritten (because the test failed twice - once when writing the objects, and twice because the backingstore deletion timed out).
When inspecting the logs, *please note the time the file was created in*.
The first logs were collected around 2:07, the new ones were collected around 2:18. The second ones are *post-cleanup* and *do not* reflect the status of the bucket/backingstores as of the error.

Version of all relevant components (if applicable):
v4.7.0-294.ci
Also seen in v4.8.0-303.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?
It's possible to remove the finalizer from the CRD and delete again, but this might lead to problems in the system

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
5

Can this issue reproducible?
Yes, frequency unknown

Can this issue reproduce from the UI?
Unknown

If this is a regression, please provide more details to justify this:
Unknown

Steps to Reproduce:
1. Create two AWS backingstore on different regions
2. Create a bucketclass that uses them with a Mirror policy
3. Create an OBC that uses the bucketclass
4. Write objects to the OBC
5. Run into NoSuchBucket error
6. Delete the bucket, then the bucketclass
7. Verify the OBC and bucketclass were removed
8. Remove the backingstores. deletion hangs -
Could not delete BackingStore \"aws-backingstore-9fefe5ce4d524fb1879746f\" in namespace \"openshift-storage\" as it is being used by one or more buckets"


Actual results:
Backingstore deletion hangs because backingstore is used by bucket that was deleted

Expected results:
Backingstore deletion succeeds

Additional info:
Logs (please note time of creation, 2:07 logs are outdated, 2:18 are up-to-date)
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j001vi1cs33-t4a/j001vi1cs33-t4a_20210313T055042/logs/failed_testcase_ocs_logs_1615617722/test_multiregion_mirror_ocs_logs/

Comment 2 Romy Ayalon 2021-03-25 08:11:08 UTC
The issue here is that the deletion of the bucket and its objects in noobaa is being handled in the background and I see in logs that there is an infinite loop of object deletion. This infinite loop keeps the bucket not deleted so the backingstore is really can not be deleted because of it being attached to a bucket.
Also, I found that the objects that are not deleted are uncompleted multipart upload, fixed in the attached PR.

Comment 5 aberner 2021-04-26 09:36:31 UTC
Verified via regression

Comment 8 errata-xmlrpc 2021-05-19 09:20:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041