Bug 2225434 - rgw crashes seen for s3select json query with "where" clause
Summary: rgw crashes seen for s3select json query with "where" clause
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 7.0
Assignee: gal salomon
QA Contact: Hemanth Sai
Rivka Pollack
URL:
Whiteboard:
Depends On:
Blocks: 2237662
TreeView+ depends on / blocked
 
Reported: 2023-07-25 10:05 UTC by Hemanth Sai
Modified: 2023-12-13 15:21 UTC (History)
8 users (show)

Fixed In Version: ceph-18.2.0-105.el9cp
Doc Type: Bug Fix
Doc Text:
.Ceph Object Gateway daemon no longer crashes with “where” clause in an `s3select` JSON query. Previously, due to a syntax error, an `s3select` JSON query with a “where” clause would cause the the Ceph Object Gateway daemon to crash. With this fix the wrong syntax is identified and reported, no longer causing the daemon to crash.
Clone Of:
Environment:
Last Closed: 2023-12-13 15:21:03 UTC
Embargoed:
gsalomon: needinfo-
gsalomon: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7070 0 None None None 2023-07-25 10:05:35 UTC
Red Hat Product Errata RHBA-2023:7780 0 None None None 2023-12-13 15:21:07 UTC

Description Hemanth Sai 2023-07-25 10:05:23 UTC
Description of problem:
rgw crashes seen for s3select json query with "where" clause

crash info:
{
    "crash_id": "2023-07-25T08:50:29.789218Z_82a457cb-e975-4636-8804-7bfe1bbd0e08",
    "timestamp": "2023-07-25T08:50:29.789218Z",
    "process_name": "radosgw",
    "entity_name": "client.rgw.shared.pri.ceph-pri-hmaheswa-automation-0jcitv-node5.vnvrex",
    "ceph_version": "18.0.0-5070-g01bc98b4",
    "utsname_hostname": "ceph-pri-hmaheswa-automation-0jcitv-node5",
    "utsname_sysname": "Linux",
    "utsname_release": "5.14.0-284.18.1.el9_2.x86_64",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Wed May 31 10:39:18 EDT 2023",
    "utsname_machine": "x86_64",
    "os_name": "CentOS Stream",
    "os_id": "centos",
    "os_version_id": "8",
    "os_version": "8",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7fb199b20cf0]",
        "gsignal()",
        "abort()",
        "/usr/bin/radosgw(+0x673c58) [0x563908f9cc58]",
        "(s3selectEngine::json_object::init_json_processor(s3selectEngine::s3select*)+0x78f) [0x5639093b4a8f]",
        "(RGWSelectObj_ObjStore_S3::run_s3select_on_json(char const*, char const*, unsigned long)+0x364) [0x56390938ba64]",
        "(RGWSelectObj_ObjStore_S3::json_processing(ceph::buffer::v15_2_0::list&, long, long)+0x6a5) [0x5639093903a5]",
        "(RGWRados::get_obj_iterate_cb(DoutPrefixProvider const*, rgw_raw_obj const&, long, long, long, bool, RGWObjState*, void*)+0x131) [0x5639094f1d01]",
        "/usr/bin/radosgw(+0xba8ed6) [0x5639094d1ed6]",
        "(RGWRados::iterate_obj(DoutPrefixProvider const*, RGWObjectCtx&, RGWBucketInfo&, rgw_obj const&, long, long, unsigned long, int (*)(DoutPrefixProvider const*, rgw_raw_obj const&, long, long, long, bool, RGWObjState*, void*), void*, optional_yield)+0x3b6) [0x563909514a36]",
        "(RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)+0x138) [0x563909515298]",
        "(RGWGetObj::execute(optional_yield)+0x1122) [0x5639092b8582]",
        "(RGWSelectObj_ObjStore_S3::execute(optional_yield)+0xc1) [0x56390938e131]",
        "(rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xd91) [0x5639090a5171]",
        "(process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x2b5c) [0x5639090a875c]",
        "/usr/bin/radosgw(+0x6b347e) [0x563908fdc47e]",
        "/usr/bin/radosgw(+0x6b4147) [0x563908fdd147]",
        "make_fcontext()"
    ]
}

Version-Release number of selected component (if applicable):
ceph version 18.0.0-5070-g01bc98b4 (01bc98b489ef938d10e187313be218ecd8a7ef33) reef (dev)

How reproducible:
18/18

Steps to Reproduce:
1.deploy a ceph upstream reef cluster
2.upload a json object using aws-cli
3.query the json object with awscli with "where" clause.

[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key small_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object where employee.name=raju;" /dev/stdout

Could not connect to the endpoint URL: "http://10.0.98.34:80/bkt1/small_json?select&select-type=2"
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ cat small_json {
"employee": {
"name": "raju",
"salary": 56000,
"married": true
}
}

[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key 200_mb_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object[*] where tags=ring;" /dev/stdout

Could not connect to the endpoint URL: "http://10.0.98.34:80/bkt1/200_mb_json?select&select-type=2"

json file is downloaded from here: https://www.kaggle.com/datasets/kristoft/pitt-quantum-repository-106066-molecules


But the below query with S3Object[*] is not giving crashes for small object and it is crashing for large object:
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key small_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object[*] where employee.name=raju;"  /dev/stdout
1 : alias {name} or column not exist in schema
#=== 0 ===#
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$

Actual results:
rgw daemon is crashing with s3select query on json object with "where" clause

Expected results:
query is successful without rgw crashes 

Additional info:
rgw logs and crash logs are present here: http://magna002.ceph.redhat.com/ceph-qe-logs/HemanthSai/s3select_json_logs/

rgw node: 10.0.98.34
creds: root/passwd ; cephuser/cephuser

also raised upstream tracker for this: https://tracker.ceph.com/issues/62156

Comment 2 gal salomon 2023-07-25 22:00:12 UTC
the JSON query syntax is *wrong*.
 
please review the s3select-tests, it provides many use-cases.
you can easily observe the query, input, and output.
bellow a link to tests in the s3select-repo.
https://github.com/ceph/s3select/blob/45e29caeea37d6f34fad516cbe7fd8f8bd4d68a9/test/s3select_test.cpp#L2967

[ --output-serialization '{"JSON": {}}' ]
as for output-serialization, the engine support only CSV as output. 


the wrong input is actually a negative test, and it detects a wrong flow per wrong input.
it crashed before query processing.

Comment 4 gal salomon 2023-07-25 22:25:44 UTC
no, it should not crash.

it failed on a negative test, and there could be a lot of combinations for that.
will generate more cases of that type.

Comment 5 gal salomon 2023-07-26 07:18:36 UTC
the input-serialization(AWS-CLI) indicates the s3select-request as a JSON statement 
while 
the statement itself does not align with the JSON statement syntax(the from clause)

it creates conflict, that is not handled correctly, and that lead to the crash.

Comment 6 gal salomon 2023-08-16 16:07:32 UTC
it is fixed on https://github.com/ceph/ceph/pull/52651

Comment 15 gal salomon 2023-10-30 08:30:07 UTC
`An error occurred (s3select-Syntax-Error) when calling the SelectObjectContent operation: engine_version  function not found`

with that returns the correct value 
there is no point in testing `18.2.0-102.el9cp`

Comment 16 gal salomon 2023-10-30 08:31:51 UTC
`An error occurred (s3select-Syntax-Error) when calling the SelectObjectContent operation: engine_version  function not found`

with that returns the correct value 
there is no point in testing `18.2.0-102.el9cp`

Comment 22 errata-xmlrpc 2023-12-13 15:21:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780


Note You need to log in before you can comment on or make changes to this bug.