Description of problem (please be detailed as possible and provide log
snippests):
Getting frequently "502 bad gateway" error on doing curl to RGW route. frequency of issue: 6/10 times
The issue is observed after upgrading OCP to 4.9,13 and ODF to 4.9.52
Version of all relevant components (if applicable):
ceph 16.2.0-152.el8cp
ODF 4.9.13
OCP 4.9.52
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, end users who are using the Objectstore in Test/Acceptance and Production in all clusters with versions 4.9 & 4.10.
Blocking customer deployments and upgrade of other Openshift clusters from 4.8 to 4.10, to upgrade to 4.9 due to the support policy.
Is there any workaround available to the best of your knowledge?
No
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2
Can this issue reproducible?
Yes, getting the same error in lab cluster on doing curl very frequently, or when doing ls on bucket:
~~~
$ curl http://ocs-storagecluster-cephobjectstore-openshift-storage.apps.sdsupi.ocp.gsslab.pnq2.redhat.com
<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
~~~
~~~
s3cmd -c s3cfg ls
2023-01-12 13:28 s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
[root@smulay sdsupi]# s3cmd -c s3cfg ls
2023-01-12 13:28 s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
[root@smulay sdsupi]# s3cmd -c s3cfg ls
2023-01-12 13:28 s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
[root@smulay sdsupi]# s3cmd -c s3cfg ls
2023-01-12 13:28 s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
[root@smulay sdsupi]# s3cmd -c s3cfg ls
ERROR: Error parsing xml: Malformed error XML returned from remote server.. ErrorXML: b'<html><body><h1>502 Bad Gateway</h1>\nThe server returned an invalid or incomplete response.\n</body></html>\n'
WARNING: Retrying failed request: / (502 (Bad Gateway))
WARNING: Waiting 3 sec...
2023-01-12 13:28 s3://test-rgw-89c60861-e307-4dcb-9f6a-6d80c5afee56
~~~
Steps to Reproduce:
1. Perform curl to rgw route very freqeuently
2. Do list operation on a RGW bucket
Actual results:
502 bad gateway
Expected results:
Listing of objects should be successful
Additional info:
in next comment
Comment 28Red Hat Bugzilla
2023-12-08 04:31:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days