Bug 1847099 - [NooBaa] S3 sync fails (Bad Gateway)
Summary: [NooBaa] S3 sync fails (Bad Gateway)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.5.0
Assignee: Nimrod Becker
QA Contact: Ben Eli
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-15 16:24 UTC by Ben Eli
Modified: 2020-09-23 09:04 UTC (History)
4 users (show)

Fixed In Version: v4.5.0-30.ci
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-15 10:17:41 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-core pull 6102 0 None closed Fix error reporting when R\W path has errors 2020-09-22 07:35:00 UTC
Github noobaa noobaa-core pull 6106 0 None closed Backport to 5.5 2020-09-22 07:35:00 UTC
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:18:00 UTC

Description Ben Eli 2020-06-15 16:24:55 UTC
Description of problem (please be detailed as possible and provide log
snippets):
When trying to sync objects as part of the `test_multiregion_mirror` test, the sync command *sometimes* fails with the following error:
E           Error is download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random1.txt to temp/random1.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/danny2.webm to temp/danny2.webm An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/danny.webm to temp/danny.webm An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/airbus.jpg to temp/airbus.jpg An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random3.txt to temp/random3.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random6.txt to temp/random6.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random2.txt to temp/random2.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/goldman.webm to temp/goldman.webm An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random4.txt to temp/random4.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random7.txt to temp/random7.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random5.txt to temp/random5.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/rome.jpg to temp/rome.jpg An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random8.txt to temp/random8.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/steve.webm to temp/steve.webm An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway
E           download failed: s3://oc-bucket-583d3a79d59242aaaa24cacefa4d18ec/random9.txt to temp/random9.txt An error occurred (502) when calling the GetObject operation (reached max retries: 2): Bad Gateway

All occurrences seem to have happened in the same part of the test.
One bucket with data on it uses two backingstores - one is blocked, the other one is healthy. We unblock the blocked one, and then block the healthy one. Then, we proceed to try and download data from the bucket. That's when the issue happens.

Version of all relevant components (if applicable):
OCP 4.5.0-0.nightly-2020-06-11-183238
OCS 4.5.0-448.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Partially. In all cases up until now, an additional attempt to run the sync command was successful. However, this disrupts the user flow.

Is there any workaround available to the best of your knowledge?
Retry to sync.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
5

Can this issue reproducible?
Yes. I'd say it happens in about 1 of 3 runs.
The easiest way to reproduce is by running `test_multiregion_mirror`.

Can this issue reproduce from the UI?
No

If this is a regression, please provide more details to justify this:
It is - we did not run into Bad Gateway errors in OCS 4.4.

Steps to Reproduce:
1. Create two AWS backingstores
2. Create a bucketclass that uses them as mirrors
3. Create an OBC that uses the bucketclass
4. Upload all objects in the AWS bucket `ocsci-test-files` to the bucket
5. Block all IO to the one of the backingstore by applying this bucket policy on its target bucket:
{
    "Version": "2012-10-17",
    "Id": "DenyReadWrite",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": {
                "AWS": "*"
            },
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKETNAME/*",
                "arn:aws:s3:::BUCKETNAME"
            ]
        }
    ]
}
6. Download all objects from the NooBaa bucket, compare their hashes to the files from ocsci-test-files
7. Remove the bucket policy from the backingstore you blocked, place it on the other backingstore's target bucket
8. Download all objects from the NooBaa bucket


Actual results:
Bad gateway error is shown

Expected results:
Download is successful

Additional info:

Comment 6 Nimrod Becker 2020-07-09 12:47:24 UTC
logs are before the extended collection which was added, can we have the new logs?

Comment 7 Ben Eli 2020-07-09 13:52:33 UTC
We did not yet run into a reproduction, this bug seems to be a 4.5 exclusive, so we'll have to wait until 4.5 is deployable and widely tested again.

Comment 8 Yaniv Kaul 2020-07-12 08:05:23 UTC
(In reply to Ben Eli from comment #7)
> We did not yet run into a reproduction, this bug seems to be a 4.5
> exclusive, so we'll have to wait until 4.5 is deployable and widely tested
> again.

NEEDINFO on reporter to reproduce.

Comment 15 errata-xmlrpc 2020-09-15 10:17:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754


Note You need to log in before you can comment on or make changes to this bug.