Bug 2252403
| Summary: | [rgw][s3select]: radosgw process killed with "Out of memory" while executing query "select * from s3object limit 1" on a 12GB parquet file | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Hemanth Sai <hmaheswa> | |
| Component: | RGW | Assignee: | gal salomon <gsalomon> | |
| Status: | CLOSED ERRATA | QA Contact: | Hemanth Sai <hmaheswa> | |
| Severity: | high | Docs Contact: | Rivka Pollack <rpollack> | |
| Priority: | unspecified | |||
| Version: | 7.0 | CC: | ceph-eng-bugs, cephqe-warriors, gsalomon, mbenjamin, mkasturi, rpollack, tserlin, ygayam | |
| Target Milestone: | --- | Keywords: | Reopened | |
| Target Release: | 8.0z3 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | ceph-19.2.0-100.el9cp | Doc Type: | Bug Fix | |
| Doc Text: |
.Large queries on Parquet objects no longer emit an `out of memory` error
Previously, in some cases, when a query was processed on a Parquet object, that object was read in large chunks. This caused the Ceph Object Gateway to load a larger buffer into the memory, which was too big for low-end machines. The memory would especially be affected when Ceph Object Gateway was co-located with OSD processes, which consume a large amount of memory. With the `Out of memory` error, the OS killed the Ceph Object Gateway process.
With this fix, the there is an updated limit for the reader-buffer size for reading column chunks. The default size is now 16 MB and the size can be changed through the Ceph Object Gateway configuration file.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 2275323 2365146 (view as bug list) | Environment: | ||
| Last Closed: | 2025-04-07 15:25:49 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2275323, 2365146 | |||
|
Description
Hemanth Sai
2023-12-01 13:03:26 UTC
i downloaded `rideshare_data.parquet`
and had tried to open it in different ways (apache/c++ and python)
the Python app(bellow) is crashing on OOM
the apache/c++ app rejects the file(metadata mismatch)
it needs to check why RGW is crashing.
import sys
import pyarrow.parquet as pq
parquet_file = pq.ParquetFile(sys.argv[1])
print("==============================")
print (parquet_file.metadata)
print("==============================")
-- This specific parquet file has a big row-groups(500MB), which means it needs to fetch it, assemble it, and then process it. it takes time. -- `count(*)` requires the s3select engine to extract each value residing in a row, `while count(0)` does not retrieve any value. when it comes to 365M rows and 19 columns in each row, it's a huge number of extract value operations (several billion) -- since the row-groups are big with the amount of extract-value operations, the processing takes time, and that triggers a timeout. -- the s3select-operation will send a continue-message to avoid time-out. i had tried to reproduce this issue with no success. i did not observe any memory leaks. i can not reproduce this issue (OOM) i did measure the memory consumption upon `select * from s3object limit 1;` using `pidstat -r -h -p $(pgrep radosgw) 1 300` its possible to observe that memory-consumption "jump" in 1.5GB while the statement was in process (few seconds), it(RSS) went down upon statement completion. with `select count(0) from s3object;` it jumps higher and longer time (and gets back upon completion) this jump may relate to big row-groups(the way the Parquet file was built). what should be the expected result? currently its 4GB-RAM, what about 2GB-RAM? Thanks Hemanth for this important information these findings imply that there isn't anything wrong with the radosgw behavior upon processing Parquet object. it depends on machine sizing and workload. this specific 12GB parquet file contains *only* 6 row-groups (on 365M rows!) thus, upon `select *` (extract all columns), it "forces" the reader to load a great amount of data. my opinion is that radosgw can not satisfy every combination of HW-size and extreme workloads. Gal. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:3635 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |