Created attachment 1823937 [details] ocs-ci-test_case log Description of problem (please be detailed as possible and provide log snippests): This test scenario is part of the ocs-ci test tests/manage/rgw/test_bucket_deletion.py::TestBucketDeletion::test_bucket_delete_with_objects[RGW-OC], which creates rgw obc and syncs all the objects and directories in a folder to the rgw obc. I have verified this test manually by creating the rgw obc and synced the files to the rgw obc, upload of few files failed after certain number of s3 requests. However , I have copied the failed files to the same bucket individually and the upload did not fail. # oc get obc -n openshift-storage NAME STORAGE-CLASS PHASE AGE rgw-oc-bucket-76db54f20b3e40ccb8a6798913 ocs-storagecluster-ceph-rgw Bound 142m # aws s3 --no-verify-ssl --endpoint <> ls 2021-09-17 14:06:17 rgw-oc-bucket-76db54f20b3e40ccb8a6798913 # oc -n openshift-storage rsh session-awscli-relay-pod-9e55c3f9e24d4fb sh -c "AWS_CA_BUNDLE=/cert/service-ca.crt AWS_ACCESS_KEY_ID="<>" AWS_SECRET_ACCESS_KEY="<>" AWS_DEFAULT_REGION=us-east-1 aws s3 --endpoint=<> sync /test_objects/ s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913" upload: ../test_objects/book.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/book.txt upload: ../test_objects/bolder.jpg to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/bolder.jpg upload: ../test_objects/apple.mp4 to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/apple.mp4 upload failed: ../test_objects/airbus.jpg to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/airbus.jpg Connection was closed before we received a valid response from endpoint URL: "http://ocs-storagecluster-cephobjectstore-openshift-storage.apps.ocsm4205001.lnxne.boe/rgw-oc-bucket-76db54f20b3e40ccb8a6798913/airbus.jpg?uploads". upload: ../test_objects/canada.jpg to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/canada.jpg upload: ../test_objects/random1.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random1.txt upload: ../test_objects/random2.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random2.txt upload: ../test_objects/random10.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random10.txt upload: ../test_objects/random4.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random4.txt upload: ../test_objects/random5.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random5.txt upload: ../test_objects/random3.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random3.txt upload: ../test_objects/random7.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random7.txt upload: ../test_objects/random6.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random6.txt upload: ../test_objects/random9.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random9.txt upload: ../test_objects/rome.jpg to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/rome.jpg upload failed: ../test_objects/goldman.webm to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/goldman.webm An error occurred (502) when calling the CreateMultipartUpload operation (reached max retries: 4): Bad Gateway upload failed: ../test_objects/random8.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random8.txt Connection was closed before we received a valid response from endpoint URL: "http://ocs-storagecluster-cephobjectstore-openshift-storage.apps.ocsm4205001.lnxne.boe/rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random8.txt". upload: ../test_objects/danny.webm to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/danny.webm upload failed: ../test_objects/enwik8 to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/enwik8 An error occurred (502) when calling the UploadPart operation (reached max retries: 4): Bad Gateway upload: ../test_objects/steve.webm to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/steve.webm command terminated with exit code 1 # oc -n openshift-storage rsh session-awscli-relay-pod-9e55c3f9e24d4fb sh -c "AWS_CA_BUNDLE=/cert/service-ca.crt AWS_ACCESS_KEY_ID="<>" AWS_SECRET_ACCESS_KEY="<>" AWS_DEFAULT_REGION=us-east-1 aws s3 --endpoint=<> cp /test_objects/goldman.webm s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913" upload: ../test_objects/goldman.webm to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/goldman.webm # oc -n openshift-storage rsh session-awscli-relay-pod-9e55c3f9e24d4fb sh -c "AWS_CA_BUNDLE=/cert/service-ca.crt AWS_ACCESS_KEY_ID="<>" AWS_SECRET_ACCESS_KEY="<>" AWS_DEFAULT_REGION=us-east-1 aws s3 --endpoint=<> cp /test_objects/random8.txt s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913" upload: ../test_objects/random8.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random8.txt Version of all relevant components (if applicable): OCP: 4.9.0-0.nightly-s390x-2021-09-09-135631 OCS-Operator: 4.9.0-142.ci LSO : 4.9.0-202109071344 Noobaa: 4.9.0-139.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Ocs-ci test fails Is there any workaround available to the best of your knowledge? Upload individually Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCP and OCS 2. Create Rgw obc bucket with the following yaml apiVersion: objectbucket.io/v1alpha1 kind: ObjectBucketClaim metadata: name: rgw-oc-bucket-76db54f20b3e40ccb8a6798913 namespace: openshift-storage spec: bucketName: rgw-oc-bucket-76db54f20b3e40ccb8a6798913 storageClassName: ocs-storagecluster-ceph-rgw 3. Sync more than 20 objects or directories to the rgw obc # oc -n openshift-storage rsh session-awscli-relay-pod-9e55c3f9e24d4fb sh -c "AWS_CA_BUNDLE=/cert/service-ca.crt AWS_ACCESS_KEY_ID="<>" AWS_SECRET_ACCESS_KEY="<>" AWS_DEFAULT_REGION=us-east-1 aws s3 --endpoint=<> sync /test_objects/ s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913" Actual results: Sync of objects to obc fails with multiple s3 requests # oc -n openshift-storage rsh session-awscli-relay-pod-9e55c3f9e24d4fb sh -c "AWS_CA_BUNDLE=/cert/service-ca.crt AWS_ACCESS_KEY_ID="<>" AWS_SECRET_ACCESS_KEY="<>" AWS_DEFAULT_REGION=us-east-1 aws s3 --endpoint=<> sync /test_objects/ s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913" upload: ../test_objects/book.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/book.txt upload: ../test_objects/bolder.jpg to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/bolder.jpg upload: ../test_objects/apple.mp4 to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/apple.mp4 upload failed: ../test_objects/airbus.jpg to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/airbus.jpg Connection was closed before we received a valid response from endpoint URL: "http://ocs-storagecluster-cephobjectstore-openshift-storage.apps.ocsm4205001.lnxne.boe/rgw-oc-bucket-76db54f20b3e40ccb8a6798913/airbus.jpg?uploads". upload: ../test_objects/canada.jpg to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/canada.jpg upload: ../test_objects/random1.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random1.txt upload: ../test_objects/random2.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random2.txt upload: ../test_objects/random10.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random10.txt upload: ../test_objects/random4.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random4.txt upload: ../test_objects/random5.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random5.txt upload: ../test_objects/random3.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random3.txt upload: ../test_objects/random7.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random7.txt upload: ../test_objects/random6.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random6.txt upload: ../test_objects/random9.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random9.txt upload: ../test_objects/rome.jpg to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/rome.jpg upload failed: ../test_objects/goldman.webm to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/goldman.webm An error occurred (502) when calling the CreateMultipartUpload operation (reached max retries: 4): Bad Gateway upload failed: ../test_objects/random8.txt to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random8.txt Connection was closed before we received a valid response from endpoint URL: "http://ocs-storagecluster-cephobjectstore-openshift-storage.apps.ocsm4205001.lnxne.boe/rgw-oc-bucket-76db54f20b3e40ccb8a6798913/random8.txt". upload: ../test_objects/danny.webm to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/danny.webm upload failed: ../test_objects/enwik8 to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/enwik8 An error occurred (502) when calling the UploadPart operation (reached max retries: 4): Bad Gateway upload: ../test_objects/steve.webm to s3://rgw-oc-bucket-76db54f20b3e40ccb8a6798913/steve.webm command terminated with exit code 1 Expected results: Sync should work fine and all objects should be uploaded successfully Additional info: https://drive.google.com/file/d/1UyN-_XiC2xlFm1tL_5dgAqg6oV2JMM4q/view?usp=sharing
I am able to reproduce it with the ocs-ci tier2 test tests/manage/rgw/test_object_integrity.py::TestObjectIntegrity::test_empty_file_integrity E ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage rsh session-awscli-relay-pod-20562f6b72ec44a sh -c "AWS_CA_BUNDLE=/cert/service-ca.crt AWS_ACCESS_KEY_ID=***** AWS_SECRET_ACCESS_KEY=***** AWS_DEFAULT_REGION=us-east-1 aws s3 --endpoint=***** sync test_empty_file_integrity/origin s3://rgw-oc-bucket-1f0ae58edf5b4ae9bc1425f152". E Error is fatal error: Connection was closed before we received a valid response from endpoint URL: "*****/rgw-oc-bucket-1f0ae58edf5b4ae9bc1425f152?list-type=2&prefix=&encoding-type=url". E command terminated with exit code 1
Nimrod, can someone please take a look. This is blocking IBM team
Hi, Bad Gateway and ‘Connection was closed before we received a valid response from endpoint URL’ can imply a networking issue. Also from logs, I see that from Sep-16 22:20:42.060 there are many RPC disconnection errors and NO_SUCH_NODE errors inside NooBaa core and NooBaa endpoint logs. Few questions: 1. Do you experience other networking issues on that cluster? 2. Did you reproduce the issue on the same cluster? if not, can you try to reproduce it on another cluster? 3. Can you please provide db-dump from inside the noobaa-db-pg-0 pod run: pg_dump nbcore | gzip > nbcore_postgres.gz Thanks
Hi @rayalon , 1. No there is'nt any network issue on the cluster 2. This error has been reproduced on multiple clusters and has occurred each and every time during test case execution 3. db-dump collected and attached to the BZ (nbcore_postgres.gz )
Created attachment 1840949 [details] nbcore_postgres.gz
Hi Sravika, This is not an MCG issue, these are tests that test RGW OBC and not NooBaa OBC, this bucket is not created in noobaa, but in rook ceph. you can also see that by the test path tests/manage/rgw/test_bucket_deletion.py::TestBucketDeletion::test_bucket_delete_with_objects[RGW-OC] Also, I had a short call with Ben from OCS-CI team, and he is saying that this was an OCS-CI issue that was fixed by this PR: https://github.com/red-hat-storage/ocs-ci/pull/5011/files Please check that and I think you can close the bug afterward. Thanks, Romy
Confirmed with Sravika, this issue is not seen now.