Bug 2010560 - Gateway timeout with error occurred (504) when calling the CreateBucket operation
Summary: Gateway timeout with error occurred (504) when calling the CreateBucket opera...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nimrod Becker
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-05 02:41 UTC by Tiffany Nguyen
Modified: 2023-08-09 16:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-09 08:15:08 UTC
Embargoed:


Attachments (Terms of Use)

Description Tiffany Nguyen 2021-10-05 02:41:46 UTC
Description of problem (please be detailed as possible and provide log
snippests):
An error occurred (504) when calling the CreateBucket operation (reached max retries: 4): Gateway Timeout is seen when running multiple OBCs with I/O.

Snipped error log from console:
if http.status_code >= 300:
            error_code = parsed_response.get("Error", {}).get("Code")
            error_class = self.exceptions.from_code(error_code)
>           raise error_class(parsed_response, operation_name)
E           botocore.exceptions.ClientError: An error occurred (504) when calling the CreateBucket operation (reached max retries: 4): Gateway Timeout


Version of all relevant components (if applicable):
$ oc get csv -n openshift-storage 
NAME                            DISPLAY                       VERSION        REPLACES   PHASE
noobaa-operator.v4.9.0-164.ci   NooBaa Operator               4.9.0-164.ci              Succeeded
ocs-operator.v4.9.0-164.ci      OpenShift Container Storage   4.9.0-164.ci              Succeeded
odf-operator.v4.9.0-164.ci      OpenShift Data Foundation     4.9.0-164.ci              Succeeded


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
This will be impacting the number of obc running with i/o.


Is there any workaround available to the best of your knowledge?
None

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
100% reproducible.


Can this issue reproduce from the UI?
N/A

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create obcs
2. Run i/o using mcg_job_factory()
3. Observe the error from console log


Actual results:
Failed to run i/o on > 20 obcs

Expected results:
I/O should run without any issue.

Additional info:
More error can be seen at: 
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-trigger-test-pr/1584/testReport/tests.e2e.scale.noobaa.test_scale_obc_creation_repsin_noobaa_pods/TestScaleOCBCreation/test_scale_obc_creation_noobaa_pod_respin_noobaa_core_openshift_storage_noobaa_io_/

Comment 1 Alexander Indenbaum 2021-10-10 13:48:45 UTC
Hello @tunguyen,

Could you please provide must-gather logs of the NooBaa components: endpoint, core, etc for this test? Those logs would provide better insight into the root cause of the HTTP 504 error for the `CreateBucket` operation.

Thank you!

Comment 2 Tiffany Nguyen 2021-10-12 03:14:35 UTC
@ainden

Comment 5 Elad 2021-10-19 08:56:03 UTC
Proposing as a blocker so it won't be pushed out of 4.9.0

Comment 9 Tiffany Nguyen 2021-11-18 17:16:18 UTC
Issue is seen on build ODF 4.9.0-241.ci when creating 100 obcs with I/O using mcg_job_factory():

 if http.status_code >= 300:
            error_code = parsed_response.get("Error", {}).get("Code")
            error_class = self.exceptions.from_code(error_code)
>           raise error_class(parsed_response, operation_name)
E           botocore.exceptions.ClientError: An error occurred (504) when calling the CreateBucket operation (reached max retries: 4): Gateway Timeout

Must-gather logs: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2010560/

Comment 15 Nimrod Becker 2022-03-03 13:00:10 UTC
4.10 dev freeze milestone checkpoint meeting, decided to move out of 4.11


Note You need to log in before you can comment on or make changes to this bug.