Description of problem (please be detailed as possible and provide log snippests): When creating a backingstore in a an AWS proxied environment, and the backingstore connection has to go through the proxy, the backingstore might falsely think it has IO_ERRORS because of a check on Noobaa's backend. Version of all relevant components (if applicable): OCP 4.5.5 OCS 4.5.0-54.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, I can't create new AWS backin Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: It's not, proxied environment isn't yet fully working on NB Steps to Reproduce: 1. Deploy a proxied cluster on AWS 2. Create a target bucket on a region that's different from the one the cluster resides in 3. Create a backingstore that uses the target bucket Actual results: The backingstore hangs without a status for about 15 minutes, and then enters a REJECTED status with an IO_ERRORS message Expected results: Backingstore creation is successful Additional info:
Moving out of 4.5, its not a blocker
But Nimrod - it blocks 1862755, which is a blocker
It's not, its only when you use and endpoint which is not *.amazonaws.com or at least this is how its described, am I missing something ?
It does; BZ1862755 is about backingstores failing to communicate with their target buckets, which is the exact same case here. When trying to connect to buckets that don't reside in the same availability zone as the cluster, NooBaa has to connect to their endpoint via the proxy, which in turn changes the address. Ohad should be able to explain this in more detail, but the bottom line is that NB still can't connect to any proxied backingstores, even if both the cluster and target buckets are on AWS.
Nimrod, from Ben's explanation to me, the endpoint name is being modified when communicating through a proxy so the name is not something the admin has control over. In addition to that, if this is the situation and this BZ blocks the verification of bug 1862755, we must treat this one (1871408) as a blocker for 4.5.
Elad, Ben and Nimrod Please let me calrify, The name not containing an "amazonaws.com" was a prelimiary assumption about the origin of the failure in coomunication with the aws endpoints. It was WRONG The actual issue here was that one call site, that try to communicate with the cloud, did not took into account the proxy settings. Apperently this call site was used when checking the health of a backing store, which resulted with an IO_ERRORS status. I fixed this issue in an upstream PR (see links section) and Ben and I manually run the test on a patched proxied environment, where it finished successfully. As a side note, the title for this bug does not reflect the actual issue and its resolution.
Is this part of new RC2 build? If so, can this be moved to ON_QE?
(In reply to Petr Balogh from comment #10) > Is this part of new RC2 build? If so, can this be moved to ON_QE? Yes, it is and ideally errata should do that. Moving it manually.
Cluster is on us-east-2. I created two backingstores - us-east-2, and us-west-1. Created a bucketclass that uses both, and an OBC that uses the bucketclass. OBC was healthy, backingstores were ready. Verified on 4.5.0-64.ci.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754