Description of problem (please be detailed as possible and provide log snippests): S3 PUT requests frequently fail with "We encountered an internal error. Please try again." on obc backed by rgw ns store. The issue arose while running the following script 'test_longevity_stage2.py' in the PR: https://github.com/red-hat-storage/ocs-ci/pull/5540/ The script used to pass uploading the objects to the bucket (backed by rgw namespace store) without any issues until a couple of days back but in the later runs (4/4) the upload is failing with below error: ```E ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage rsh session-awscli-relay-pod-1b40c3df84164d4 sh -c "AWS_CA_BUNDLE=/cert/service-ca.crt AWS_ACCESS_KEY_ID=***** AWS_SECRET_ACCESS_KEY=***** AWS_DEFAULT_REGION=us-east-2 aws s3 --endpoint=***** sync test_longevity_stage2/origin s3://oc-bucket-da750ffd1edb47c78c3495b813faaa". E Error is upload failed: test_longevity_stage2/origin/test58 to s3://oc-bucket-da750ffd1edb47c78c3495b813faaa/test58 An error occurred (InternalError) when calling the PutObject operation (reached max retries: 2): We encountered an internal error. Please try again.``` https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/12962/consoleFull -> this is the success run The exception occurred inside the 'write_empty_files_to_bucket' function which is ran after creation of OBCs inside the '_multi_obc_lifecycle_factory' function.: https://github.com/red-hat-storage/ocs-ci/pull/5540/files#diff-008aeb103a5a9ae662ae2e86cf3a0c9335d41b6047180ff64584d9b2243d2ed8R46 logs ---- [32mMay-25 16:57:57.143 [35m [Endpoint/13] [31m[ERROR] [39m core.rpc.rpc_schema:: INVALID_SCHEMA_PARAMS CLIENT pool_api#/methods/update_issues_report ERRORS: [ { instancePath: [32m'/error_code' [39m, schemaPath: [32m'pool_api#/methods/update_issues_report/params/properties/error_code/type' [39m, keyword: [32m'type' [39m, params: { type: [32m'string' [39m }, message: [32m'must be string' [39m, schema: [32m'string' [39m, parentSchema: { type: [32m'string' [39m }, data: [33m502 [39m }, [length]: [33m1 [39m ] PARAMS: { namespace_resource_id: [32m'628e5ff0029cdc0029f4f9ea' [39m, error_code: [33m502 [39m, time: [33m1653497877143 [39m } [32mMay-25 16:57:57.143 [35m [Endpoint/13] [31m[ERROR] [39m core.rpc.rpc:: RPC._request: response ERROR srv pool_api.update_issues_report reqid <no-reqid-yet> connid <no-connection-yet> params { namespace_resource_id: [32m'628e5ff0029cdc0029f4f9ea' [39m, error_code: [33m502 [39m, time: [33m1653497877143 [39m } Error: INVALID_SCHEMA_PARAMS CLIENT ---- May-25 16:57:59.858 [Endpoint/13] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/oc-bucket-e6dfcb221a464422836c07f1891e24/test520</Resource><RequestId>l3lty19u-g3u9an-1931</RequestId></Error> PUT /oc-bucket-e6dfcb221a464422836c07f1891e24/test520 {"host":"s3.openshift-storage.svc","accept-encoding":"identity","user-agent":"aws-cli/2.0.13 Python/3.7.3 Linux/4.18.0-305.45.1.el8_4.x86_64 botocore/2.0.0dev17","expect":"100-continue","x-amz-date":"20220525T165759Z","x-amz-content-sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","authorization":"AWS4-HMAC-SHA256 Credential=mwLpNOKmzu5z3yx5f69J/20220525/us-east-2/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=b172a99c20ad62858c44bd5f64bd13759d602aaeea0735c66972ecf9698702a2","content-length":"0"} 502: null > The script passed when uploading a single object to the bucket (backed by rgw namespace store) without any issues. Version of all relevant components (if applicable): ODF 4.10.2 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Unable to upload objects. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes (4/4) Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: Not Sure Steps to Reproduce: 1. Run the following script 'test_longevity_stage2.py' in the PR: https://github.com/red-hat-storage/ocs-ci/pull/5540/ (The exception occurred inside the 'write_empty_files_to_bucket' function which is ran after creation of OBCs inside the '_multi_obc_lifecycle_factory' function.: https://github.com/red-hat-storage/ocs-ci/pull/5540/files#diff-008aeb103a5a9ae662ae2e86cf3a0c9335d41b6047180ff64584d9b2243d2ed8R46) Actual results: Exception occurred ```E ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage rsh session-awscli-relay-pod-1b40c3df84164d4 sh -c "AWS_CA_BUNDLE=/cert/service-ca.crt AWS_ACCESS_KEY_ID=***** AWS_SECRET_ACCESS_KEY=***** AWS_DEFAULT_REGION=us-east-2 aws s3 --endpoint=***** sync test_longevity_stage2/origin s3://oc-bucket-da750ffd1edb47c78c3495b813faaa". E Error is upload failed: test_longevity_stage2/origin/test58 to s3://oc-bucket-da750ffd1edb47c78c3495b813faaa/test58 An error occurred (InternalError) when calling the PutObject operation (reached max retries: 2): We encountered an internal error. Please try again.``` Expected results: S3 PUT requests should execute successfully. Additional info: > Must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/tdesala-long-testd/tdesala-long-testd_20220525T080711/logs/failed_testcase_ocs_logs_1653499324/test_longevity_stage2_ocs_logs/ocs_must_gather/ > Same error occurred on 4.11.0 as well.
Harish/Ben, is this still an issue?
Thanks Ben, please reopen if this is seen after fixing the ci issue.