Bug 2161279

Summary: [GSS] Getting 502 bad gateway on doing ls to rgw route
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Sonal <sarora>
Component: cephAssignee: Matt Benjamin (redhat) <mbenjamin>
ceph sub component: RGW QA Contact: Elad <ebenahar>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: bniver, etamir, hnallurv, jthottan, mbenjamin, muagarwa, ocs-bugs, odf-bz-bot, smulay, sostapov, theo.wilde, thottanjiffin
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-20 13:05:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sonal 2023-01-16 13:20:44 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Getting frequently "502 bad gateway" error on doing curl to RGW route. frequency of issue: 6/10 times


The issue is observed after upgrading OCP to 4.9,13 and ODF to 4.9.52

Version of all relevant components (if applicable):
ceph 16.2.0-152.el8cp
ODF 4.9.13
OCP 4.9.52


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, end users who are using the Objectstore in Test/Acceptance and Production in all clusters with versions 4.9 & 4.10.

Blocking customer deployments and upgrade of other Openshift clusters from 4.8 to 4.10, to upgrade to 4.9 due to the support policy. 

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes, getting the same error in lab cluster on doing curl very frequently, or when doing ls on bucket:

~~~
$ curl   http://ocs-storagecluster-cephobjectstore-openshift-storage.apps.sdsupi.ocp.gsslab.pnq2.redhat.com
<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
~~~

~~~
 s3cmd -c s3cfg ls
2023-01-12 13:28  s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
[root@smulay sdsupi]# s3cmd -c s3cfg ls
2023-01-12 13:28  s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
[root@smulay sdsupi]# s3cmd -c s3cfg ls
2023-01-12 13:28  s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
[root@smulay sdsupi]# s3cmd -c s3cfg ls
2023-01-12 13:28  s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
[root@smulay sdsupi]# s3cmd -c s3cfg ls
ERROR: Error parsing xml: Malformed error XML returned from remote server..  ErrorXML: b'<html><body><h1>502 Bad Gateway</h1>\nThe server returned an invalid or incomplete response.\n</body></html>\n'
WARNING: Retrying failed request: / (502 (Bad Gateway))
WARNING: Waiting 3 sec...
2023-01-12 13:28  s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56 
~~~

Steps to Reproduce:
1. Perform curl to rgw route very freqeuently
2. Do list operation on a RGW bucket


Actual results:
502 bad gateway

Expected results:
Listing of objects should be successful

Additional info:
in next comment

Comment 28 Red Hat Bugzilla 2023-12-08 04:31:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days