Bug 2241907

Summary: [rgw][s3select]: Connection broken error seen querying some csv objects downloaded from internet
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Hemanth Sai <hmaheswa>
Component: RGWAssignee: gal salomon <gsalomon>
Status: CLOSED ERRATA QA Contact: Hemanth Sai <hmaheswa>
Severity: high Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 7.0CC: akraj, ceph-eng-bugs, cephqe-warriors, gsalomon, tserlin, vimishra
Target Milestone: ---Keywords: Regression
Target Release: 7.0Flags: gsalomon: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-18.2.0-105.el9cp Doc Type: Bug Fix
Doc Text:
.Ceph Object Gateway now parses the CSV objects without processing failures Previously, Ceph Object Gateway failed to properly parse CSV objects. When the process failed, the requests would stop without a proper error message. With this fix, the CSV parser works as expected and processes the CSV objects with no failures.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-13 15:24:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hemanth Sai 2023-10-03 10:55:19 UTC
Description of problem:
Connection broken error seen querying some csv objects downloaded from internet

[cephuser@ceph-hmaheswa-rhcs7-ss-99rkno-node6 ~]$ aws s3api --endpoint-url http://10.0.208.32:80 select-object-content  --bucket csvconnectionbrokenbkt1 --key organizations-2000000.csv --expression-type 'SQL' --input-serialization '{"CSV": {}, "CompressionType": "NONE"}' --output-serialization '{"CSV": {}}' --expression "select count(*) from s3object;" /dev/stdout

("Connection broken: InvalidChunkLength(got length b'HTTP/1.1 400 Bad Request\\r\\n', 0 bytes read)", InvalidChunkLength(got length b'HTTP/1.1 400 Bad Request\r\n', 0 bytes read))
[cephuser@ceph-hmaheswa-rhcs7-ss-99rkno-node6 ~]$

objects for which above errors are seen are downloaded from:
https://www.datablist.com/learn/csv/download-sample-csv-files
https://www.kaggle.com/datasets/antoinecarpentier/redditrplacecsv

Version-Release number of selected component (if applicable):
ceph version 18.2.0-52.el9cp

How reproducible:
always

Steps to Reproduce:
1.deploy rhcs7.0 ceph cluster
2.upload above csv object using aws-cli
3.execute the query "select count(*) from s3object;"

Actual results:
("Connection broken: InvalidChunkLength(got length b'HTTP/1.1 400 Bad Request\\r\\n', 0 bytes read)", InvalidChunkLength(got length b'HTTP/1.1 400 Bad Request\r\n', 0 bytes read))

Expected results:
proper error message if something is wrong in the object else proper execution of command

Additional info:
please find rgw debug_20 logs at http://magna002.ceph.redhat.com/ceph-qe-logs/HemanthSai/s3select_connection_broken/ceph-client.rgw.rgw.all.ceph-hmaheswa-rhcs7-ss-99rkno-node5.wswnlh.log

Comment 5 gal salomon 2023-10-15 23:27:39 UTC
the connection-broken is due to CSV parsing error. (seems like missing a closing quote)
the error-message appears in the log (upon set debug_rgw 20)

the fix reside on radosgw-log 

https://github.com/ceph/ceph/pull/53351

Comment 9 gal salomon 2023-10-30 10:09:24 UTC
`An error occurred (s3select-Syntax-Error) when calling the SelectObjectContent operation: engine_version  function not found`

without getting the correct value
there is no point in testing `18.2.0-102.el9cp`

Comment 14 errata-xmlrpc 2023-12-13 15:24:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780