1871408 – Noobaa falsely reports AWS endpoint as having IO_ERRORS if its name doesn't contain "amazonaws.com"

Bug 1871408 - Noobaa falsely reports AWS endpoint as having IO_ERRORS if its name doesn't contain "amazonaws.com"

Summary: Noobaa falsely reports AWS endpoint as having IO_ERRORS if its name doesn't c...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.5.0
Assignee:	Ohad
QA Contact:	Ben Eli
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1862755
TreeView+	depends on / blocked

Reported:	2020-08-23 06:44 UTC by Ben Eli
Modified:	2020-09-23 09:07 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.5.0-64.ci
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-15 10:19:07 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-core pull 6141	None	closed	Add proper agent selection to block_store_s3 backed by aws endpoint	2020-09-04 18:02:24 UTC
Github	noobaa noobaa-core pull 6142	None	closed	Backport to 5.5: Add proper agent selection to block_store_s3 backed by aws endpoint	2020-09-04 18:02:24 UTC
Red Hat Product Errata	RHBA-2020:3754	None	None	None	2020-09-15 10:19:17 UTC

Description Ben Eli 2020-08-23 06:44:38 UTC

Description of problem (please be detailed as possible and provide log
snippests):
When creating a backingstore in a an AWS proxied environment, and the backingstore connection has to go through the proxy, the backingstore might falsely think it has IO_ERRORS because of a check on Noobaa's backend.

Version of all relevant components (if applicable):
OCP 4.5.5 OCS 4.5.0-54.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, I can't create new AWS backin

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
It's not, proxied environment isn't yet fully working on NB

Steps to Reproduce:
1. Deploy a proxied cluster on AWS
2. Create a target bucket on a region that's different from the one the cluster resides in
3. Create a backingstore that uses the target bucket


Actual results:
The backingstore hangs without a status for about 15 minutes, and then enters a REJECTED status with an IO_ERRORS message

Expected results:
Backingstore creation is successful

Additional info:

Comment 2 Nimrod Becker 2020-08-23 10:32:53 UTC

Moving out of 4.5, its not a blocker

Comment 3 Ben Eli 2020-08-23 11:04:26 UTC

But Nimrod - it blocks 1862755, which is a blocker

Comment 4 Nimrod Becker 2020-08-23 13:29:29 UTC

It's not, its only when you use and endpoint which is not *.amazonaws.com or at least this is how its described, am I missing something ?

Comment 5 Ben Eli 2020-08-23 13:58:34 UTC

It does; BZ1862755 is about backingstores failing to communicate with their target buckets, which is the exact same case here.
When trying to connect to buckets that don't reside in the same availability zone as the cluster, NooBaa has to connect to their endpoint via the proxy, which in turn changes the address.
Ohad should be able to explain this in more detail, but the bottom line is that NB still can't connect to any proxied backingstores, even if both the cluster and target buckets are on AWS.

Comment 6 Elad 2020-08-24 08:29:33 UTC

Nimrod, from Ben's explanation to me, the endpoint name is being modified when communicating through a proxy so the name is not something the admin has control over.
In addition to that, if this is the situation and this BZ blocks the verification of bug 1862755, we must treat this one (1871408) as a blocker for 4.5.

Comment 9 Ohad 2020-08-24 09:25:44 UTC

Elad, Ben and Nimrod
Please let me calrify, 

The name not containing an "amazonaws.com" was a prelimiary assumption about the origin of the failure in coomunication with the aws endpoints. 
It was WRONG

The actual issue here was that one call site, that try to communicate with the cloud, did not took into account the proxy settings.
Apperently this call site was used when checking the health of a backing store, which resulted with an IO_ERRORS status.

I fixed this issue in an upstream PR (see links section) and Ben and I manually run the test on a patched proxied environment, where it finished successfully.

As a side note, the title for this bug does not reflect the actual issue and its resolution.

Comment 10 Petr Balogh 2020-08-25 07:21:39 UTC

Is this part of new RC2 build? If so, can this be moved to ON_QE?

Comment 11 Mudit Agarwal 2020-08-25 07:28:18 UTC

(In reply to Petr Balogh from comment #10)
> Is this part of new RC2 build? If so, can this be moved to ON_QE?

Yes, it is and ideally errata should do that. Moving it manually.

Comment 12 Ben Eli 2020-08-25 12:07:27 UTC

Cluster is on us-east-2.
I created two backingstores - us-east-2, and us-west-1.
Created a bucketclass that uses both, and an OBC that uses the bucketclass.
OBC was healthy, backingstores were ready.

Verified on 4.5.0-64.ci.

Comment 15 errata-xmlrpc 2020-09-15 10:19:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754

Note You need to log in before you can comment on or make changes to this bug.