2182421 – Trino / Ceph integration requires changes in S3select engine and RGW

Bug 2182421 - Trino / Ceph integration requires changes in S3select engine and RGW

Summary: Trino / Ceph integration requires changes in S3select engine and RGW

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	7.0
Assignee:	gal salomon
QA Contact:	Hemanth Sai
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-03-28 15:16 UTC by gal salomon
Modified:	2024-02-04 12:26 UTC (History)
CC List:	6 users (show)
Fixed In Version:	ceph-18.2.0-1
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-12-13 15:20:13 UTC
Embargoed:
Dependent Products:
Flags:	gsalomon: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-6338	0	None	None	None	2023-03-28 17:30:49 UTC
Red Hat Product Errata	RHBA-2023:7780	0	None	None	None	2023-12-13 15:20:16 UTC

Description gal salomon 2023-03-28 15:16:13 UTC

Description of problem:

Trino gains efficiency upon issuing multiple requests per single Query. the return results by RGW should be aligned with Trino expectations (otherwise queries are rejected or results are not accurate).
upon aggregation statement (count) Trino pushes down a non aggregation statement, which retrieves an empty column. Trino issue parallel multiple s3select-requests, it seems that deviation in the result relates to the number of parallel requests.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 gal salomon 2023-08-17 08:46:04 UTC

the following PR's deal with Trino / CEPH integration, and resolve various issues related to that integration.

https://github.com/ceph/ceph/pull/49411
https://github.com/ceph/ceph/pull/50471
https://github.com/ceph/ceph/pull/52651

Comment 10 gal salomon 2023-11-08 13:48:18 UTC

the Trino/CEPH integration is a title for "many things"
the CSV flow is one integration point (out of several). 
meaning there are specific flows in RGW that handle the Trino/s3select/CSV integration (for one example ... splitting the object).

the JSON and Parquet are the other integration points.
and there could be more in the future.

Comment 13 errata-xmlrpc 2023-12-13 15:20:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780

Note You need to log in before you can comment on or make changes to this bug.