Bug 2240974
Summary: | [rgw][s3select]: time taken to query 12GB json is high and cpu utilisation by radosgw is also very high | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Hemanth Sai <hmaheswa> |
Component: | RGW | Assignee: | gal salomon <gsalomon> |
Status: | CLOSED NOTABUG | QA Contact: | Hemanth Sai <hmaheswa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.0 | CC: | ceph-eng-bugs, cephqe-warriors, gsalomon, mbenjamin, mkogan, rpollack, vumrao |
Target Milestone: | --- | Flags: | rpollack:
needinfo?
(mbenjamin) |
Target Release: | 9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 9.0 | Doc Type: | Known Issue |
Doc Text: |
.JSON `select count(*) from S3Object[*];` queries are lagging and cause high CPU usage
When running the `select count(*) from S3Object[*];` query, the time lapse and radosgw CPU utilization are very high compared to CSV object queries.
As a workaround, when running the JSON query, use `count()` instead of `count(*)` query.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2025-08-19 12:01:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2237662 |
Description
Hemanth Sai
2023-09-27 13:59:20 UTC
it should consume 100% CPU, its by design. there isn't any type of wait or lock. as for memory, it consumes very little (try to use JQ application or Python on that object...) the s3select should not limit itself, since its a pure function. the resource consumption limitation should be implemented by the RGW routines. is it a release version? or debug version? as for expected time(too long) compares to what? please change the query instead of "count(*)" use "count()" remove the * in order to establish that operator `count()` is too slow it needs to compare it to other operations. CSV reader scans the whole object the same as JSON, we can compare both readers. we can also use Trino (it runs parallel requests) NOTE: the JSON is much more complex than the CSV, it needs to handle more use cases. but still, it should be executed in a reasonable time. in case 7.0z1 release has passed, it is not fixable. it needs to verify whether TIME&CPU utilization(per JSON format), is reasonable. compared to the other formats (CSV and Parquet) it is slower. |