Bug 2249756

Summary:	[s3select][json][trino]: json object querying through trino fails with error "wrong json dataType should use DOCUMENT" when the client sends JSONL
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Hemanth Sai <hmaheswa>
Component:	RGW	Assignee:	gal salomon <gsalomon>
Status:	ASSIGNED ---	QA Contact:	Hemanth Sai <hmaheswa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	7.0	CC:	ceph-eng-bugs, cephqe-warriors, gsalomon, mbenjamin, mkasturi, rpollack
Target Milestone:	---	Keywords:	Reopened
Target Release:	9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	.s3select and Trino fail when processing JSON object using s3select When Trino processes a JSON object using the s3select request, the request fails causing Trino to fail too. This emits the `wrong json dataType should use DOCUMENT` error message in the Ceph Object Gateway logs. As a workaround, it is sometimes possible to use the s3select request directly, not using Trino.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-04-04 13:29:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2237662

Description Hemanth Sai 2023-11-15 08:00:42 UTC

Description of problem:

while querying json through trino, seeing "wrong json dataType should use DOCUMENT" in rgw logs. don't know how to specify “type: document” through trino.


json data:

[root@extensa022 s3select]# cat ~/file1.json 
{
    "id": 1091,
    "name": "Makoto Satoh"
}
[root@extensa022 s3select]#



trino commands:

trino:cephs3> CREATE TABLE persons (
           ->     id integer,
           ->     name varchar
           -> )
           -> WITH (
           ->    external_location = 's3a://bkt5trinoschema1/',
           ->    format = 'JSON'
           -> );
           -> 
CREATE TABLE
trino:cephs3> 
trino:cephs3> 
trino:cephs3> select * from persons;

Query 20231106_111033_00008_krvwk, FAILED, 1 node
Splits: 2 total, 0 done (0.00%)
0.09 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20231106_111033_00008_krvwk failed: io.trino.hive.$internal.org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: [B@1e496306; line: 1, column: 0])
 at [Source: [B@1e496306; line: 1, column: 3]

trino:cephs3> 



rgw log snippet:

2023-11-06T11:59:55.622+0000 7f0f84e5f640 10 req 15821040725807975742 0.000999979s s3:get_obj s3-select query: <SelectObjectContentRequest><Expression>SELECT &apos; &apos; FROM S3Object s</Expression><ExpressionType>SQL</ExpressionType><ScanRange><Start>0</Start><End>265</End></ScanRange><InputSerialization><JSON><Type>LINES</Type></JSON><CompressionType>NONE</CompressionType></InputSerialization><OutputSerialization><JSON></JSON></OutputSerialization></SelectObjectContentRequest>
2023-11-06T11:59:55.622+0000 7f0f84e5f640 10 req 15821040725807975742 0.000999979s s3:get_obj s3-select query: request sent by Trino.
2023-11-06T11:59:55.622+0000 7f0f84e5f640 10 req 15821040725807975742 0.000999979s s3:get_obj s3select: engine is set to process JSON objects
2023-11-06T11:59:55.622+0000 7f0f84e5f640 10 S3select: calling execute(async): request-offset :0 request-length :1289 buffer size : 0
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj get_obj_state: rctx=0x55e06014f9d0 obj=bkt5trinoschema1:json1 state=0x55e04460e1e8 s->prefetch_data=1
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.acl
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.content_type
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.etag
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.idtag
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.manifest
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.pg_ver
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.source_zone
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.tail_tag
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.x-amz-content-sha256
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj Read xattr rgw_rados: user.rgw.x-amz-date
2023-11-06T11:59:55.622+0000 7f0f84e5f640 10 req 15821040725807975742 0.000999979s cache get: name=primary.rgw.log++script.getdata. : hit (negative entry)
2023-11-06T11:59:55.622+0000 7f0f84e5f640 15 req 15821040725807975742 0.000999979s Encryption mode: 
2023-11-06T11:59:55.622+0000 7f0f84e5f640 20 req 15821040725807975742 0.000999979s s3:get_obj get_obj_state: rctx=0x55e06014f9d0 obj=bkt5trinoschema1:json1 state=0x55e04460e1e8 s->prefetch_data=1
2023-11-06T11:59:55.623+0000 7f0f84e5f640 10 req 15821040725807975742 0.001999958s s3:get_obj processing segment 0 out of 1 off 0 len 265 obj-size 265
2023-11-06T11:59:55.623+0000 7f0f4bded640 10 req 15821040725807975742 0.001999958s s3:get_obj s3-select query: wrong json dataType should use DOCUMENT; 
2023-11-06T11:59:55.623+0000 7f0f4bded640  0 req 15821040725807975742 0.001999958s s3:get_obj iterate_obj() failed with -22
2023-11-06T11:59:55.623+0000 7f0f4bded640 10 S3select: done waiting, buffer is complete buffer-size:0
2023-11-06T11:59:55.623+0000 7f0f4bded640  2 req 15821040725807975742 0.001999958s s3:get_obj completing
2023-11-06T11:59:55.623+0000 7f0f4bded640 10 req 15821040725807975742 0.001999958s cache get: name=primary.rgw.log++script.postrequest. : hit (negative entry)
2023-11-06T11:59:55.623+0000 7f0f4bded640  2 req 15821040725807975742 0.001999958s s3:get_obj op status=-22
2023-11-06T11:59:55.623+0000 7f0f4bded640  2 req 15821040725807975742 0.001999958s s3:get_obj http status=200
2023-11-06T11:59:55.623+0000 7f0f4bded640  1 ====== req done req=0x7f0f2f4b1710 op status=-22 http_status=200 latency=0.001999958s ======


Version-Release number of selected component (if applicable):
ceph version 18.2.0-122.el9cp

How reproducible:
always

Steps to Reproduce:
1.deploy rhcs7.0 with rgw daemon
2.create a bucket and upload json object
3.deploy trino, create table and query json file

Actual results:
querying json failed

Expected results:
json querying is successful

Additional info:

Comment 5 gal salomon 2024-01-29 14:11:58 UTC

a fix in RGW should enable to process s3select request sent by Trino upon a JSON object.

Comment 6 gal salomon 2024-04-04 13:27:11 UTC

{jsonl} is a JSON document where each object resides on a single line( like a row in CSV)
thus, each line can stand by itself as a JSON document and parsed completely.

this type of object enables parallel processing, the JSONL-object can be split easily.

the s3select JSON parser can process the jsonl (the delimiters are ignored).


the issue is with Trino, it seems that Trino assumes the object is JSONL(should verify this)
it splits the object and sends each part with a type=LINES, 
as a result, RGW must know how to handle such a use case (i.e. load each line, and parse it separately)


useful links:
- https://jsonlines.org/?ref=dbconvert.com
- https://dbconvert.com/blog/json-lines-data-stream/
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html 
("Amazon S3 Select scan range requests support Parquet, CSV (without quoted delimiters), or JSON objects (in LINES mode only). CSV and JSON objects must be uncompressed. For line-based CSV and JSON objects, when a scan range is specified as part of the Amazon S3 Select request, all records that start within the scan range are processed. For Parquet objects, all of the row groups that start within the scan range requested are processed.")