1940476 – Backingstore deletion hangs

Bug 1940476 - Backingstore deletion hangs

Summary: Backingstore deletion hangs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.7.0
Assignee:	Romy Ayalon
QA Contact:	aberner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-18 14:10 UTC by Ben Eli
Modified:	2021-06-01 08:49 UTC (History)
CC List:	5 users (show)
Fixed In Version:	4.7.0-336.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-19 09:20:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-core pull 6423	None	closed	fixed delete_object_version to also find and delete uncompleted multipart uploads	2021-03-30 13:14:42 UTC
Github	noobaa noobaa-core pull 6429	None	open	Backport to 5.7: fix delete_object_version	2021-03-30 13:16:24 UTC
Red Hat Product Errata	RHSA-2021:2041	None	None	None	2021-05-19 09:21:13 UTC

Description Ben Eli 2021-03-18 14:10:36 UTC

Description of problem (please be detailed as possible and provide log
snippests):
In test_multiregion_mirror, we write objects to a bucket backed by two backingstores. We take each of them down, one at a time, try to read from the bucket, and verify the object integrity.
The test fails because of reasons that are unclear to us at the moment.
However, after that, we try to clean all test resources up. The OBC and bucketclass are deleted successfully, but the backingstore deletion hangs, and the backingstore remains even *hours* after the command was sent.

We have logs from the run, but some of them were overwritten (because the test failed twice - once when writing the objects, and twice because the backingstore deletion timed out).
When inspecting the logs, *please note the time the file was created in*.
The first logs were collected around 2:07, the new ones were collected around 2:18. The second ones are *post-cleanup* and *do not* reflect the status of the bucket/backingstores as of the error.

Version of all relevant components (if applicable):
v4.7.0-294.ci
Also seen in v4.8.0-303.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?
It's possible to remove the finalizer from the CRD and delete again, but this might lead to problems in the system

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
5

Can this issue reproducible?
Yes, frequency unknown

Can this issue reproduce from the UI?
Unknown

If this is a regression, please provide more details to justify this:
Unknown

Steps to Reproduce:
1. Create two AWS backingstore on different regions
2. Create a bucketclass that uses them with a Mirror policy
3. Create an OBC that uses the bucketclass
4. Write objects to the OBC
5. Run into NoSuchBucket error
6. Delete the bucket, then the bucketclass
7. Verify the OBC and bucketclass were removed
8. Remove the backingstores. deletion hangs -
Could not delete BackingStore \"aws-backingstore-9fefe5ce4d524fb1879746f\" in namespace \"openshift-storage\" as it is being used by one or more buckets"

Actual results:
Backingstore deletion hangs because backingstore is used by bucket that was deleted

Expected results:
Backingstore deletion succeeds

Additional info:
Logs (please note time of creation, 2:07 logs are outdated, 2:18 are up-to-date)
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j001vi1cs33-t4a/j001vi1cs33-t4a_20210313T055042/logs/failed_testcase_ocs_logs_1615617722/test_multiregion_mirror_ocs_logs/

Comment 2 Romy Ayalon 2021-03-25 08:11:08 UTC

The issue here is that the deletion of the bucket and its objects in noobaa is being handled in the background and I see in logs that there is an infinite loop of object deletion. This infinite loop keeps the bucket not deleted so the backingstore is really can not be deleted because of it being attached to a bucket.
Also, I found that the objects that are not deleted are uncompleted multipart upload, fixed in the attached PR.

Comment 5 aberner 2021-04-26 09:36:31 UTC

Verified via regression

Comment 8 errata-xmlrpc 2021-05-19 09:20:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041

Note You need to log in before you can comment on or make changes to this bug.