Bug 2225434 - rgw crashes seen for s3select json query with "where" clause [NEEDINFO]
Summary: rgw crashes seen for s3select json query with "where" clause
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 7.0
Assignee: gal salomon
QA Contact: Hemanth Sai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-25 10:05 UTC by Hemanth Sai
Modified: 2023-08-16 16:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
mbenjamin: needinfo? (tserlin)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7070 0 None None None 2023-07-25 10:05:35 UTC

Description Hemanth Sai 2023-07-25 10:05:23 UTC
Description of problem:
rgw crashes seen for s3select json query with "where" clause

crash info:
{
    "crash_id": "2023-07-25T08:50:29.789218Z_82a457cb-e975-4636-8804-7bfe1bbd0e08",
    "timestamp": "2023-07-25T08:50:29.789218Z",
    "process_name": "radosgw",
    "entity_name": "client.rgw.shared.pri.ceph-pri-hmaheswa-automation-0jcitv-node5.vnvrex",
    "ceph_version": "18.0.0-5070-g01bc98b4",
    "utsname_hostname": "ceph-pri-hmaheswa-automation-0jcitv-node5",
    "utsname_sysname": "Linux",
    "utsname_release": "5.14.0-284.18.1.el9_2.x86_64",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Wed May 31 10:39:18 EDT 2023",
    "utsname_machine": "x86_64",
    "os_name": "CentOS Stream",
    "os_id": "centos",
    "os_version_id": "8",
    "os_version": "8",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7fb199b20cf0]",
        "gsignal()",
        "abort()",
        "/usr/bin/radosgw(+0x673c58) [0x563908f9cc58]",
        "(s3selectEngine::json_object::init_json_processor(s3selectEngine::s3select*)+0x78f) [0x5639093b4a8f]",
        "(RGWSelectObj_ObjStore_S3::run_s3select_on_json(char const*, char const*, unsigned long)+0x364) [0x56390938ba64]",
        "(RGWSelectObj_ObjStore_S3::json_processing(ceph::buffer::v15_2_0::list&, long, long)+0x6a5) [0x5639093903a5]",
        "(RGWRados::get_obj_iterate_cb(DoutPrefixProvider const*, rgw_raw_obj const&, long, long, long, bool, RGWObjState*, void*)+0x131) [0x5639094f1d01]",
        "/usr/bin/radosgw(+0xba8ed6) [0x5639094d1ed6]",
        "(RGWRados::iterate_obj(DoutPrefixProvider const*, RGWObjectCtx&, RGWBucketInfo&, rgw_obj const&, long, long, unsigned long, int (*)(DoutPrefixProvider const*, rgw_raw_obj const&, long, long, long, bool, RGWObjState*, void*), void*, optional_yield)+0x3b6) [0x563909514a36]",
        "(RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)+0x138) [0x563909515298]",
        "(RGWGetObj::execute(optional_yield)+0x1122) [0x5639092b8582]",
        "(RGWSelectObj_ObjStore_S3::execute(optional_yield)+0xc1) [0x56390938e131]",
        "(rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xd91) [0x5639090a5171]",
        "(process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x2b5c) [0x5639090a875c]",
        "/usr/bin/radosgw(+0x6b347e) [0x563908fdc47e]",
        "/usr/bin/radosgw(+0x6b4147) [0x563908fdd147]",
        "make_fcontext()"
    ]
}

Version-Release number of selected component (if applicable):
ceph version 18.0.0-5070-g01bc98b4 (01bc98b489ef938d10e187313be218ecd8a7ef33) reef (dev)

How reproducible:
18/18

Steps to Reproduce:
1.deploy a ceph upstream reef cluster
2.upload a json object using aws-cli
3.query the json object with awscli with "where" clause.

[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key small_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object where employee.name=raju;" /dev/stdout

Could not connect to the endpoint URL: "http://10.0.98.34:80/bkt1/small_json?select&select-type=2"
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ cat small_json {
"employee": {
"name": "raju",
"salary": 56000,
"married": true
}
}

[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key 200_mb_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object[*] where tags=ring;" /dev/stdout

Could not connect to the endpoint URL: "http://10.0.98.34:80/bkt1/200_mb_json?select&select-type=2"

json file is downloaded from here: https://www.kaggle.com/datasets/kristoft/pitt-quantum-repository-106066-molecules


But the below query with S3Object[*] is not giving crashes for small object and it is crashing for large object:
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key small_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object[*] where employee.name=raju;"  /dev/stdout
1 : alias {name} or column not exist in schema
#=== 0 ===#
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$

Actual results:
rgw daemon is crashing with s3select query on json object with "where" clause

Expected results:
query is successful without rgw crashes 

Additional info:
rgw logs and crash logs are present here: http://magna002.ceph.redhat.com/ceph-qe-logs/HemanthSai/s3select_json_logs/

rgw node: 10.0.98.34
creds: root/passwd ; cephuser/cephuser

also raised upstream tracker for this: https://tracker.ceph.com/issues/62156

Comment 2 gal salomon 2023-07-25 22:00:12 UTC
the JSON query syntax is *wrong*.
 
please review the s3select-tests, it provides many use-cases.
you can easily observe the query, input, and output.
bellow a link to tests in the s3select-repo.
https://github.com/ceph/s3select/blob/45e29caeea37d6f34fad516cbe7fd8f8bd4d68a9/test/s3select_test.cpp#L2967

[ --output-serialization '{"JSON": {}}' ]
as for output-serialization, the engine support only CSV as output. 


the wrong input is actually a negative test, and it detects a wrong flow per wrong input.
it crashed before query processing.

Comment 4 gal salomon 2023-07-25 22:25:44 UTC
no, it should not crash.

it failed on a negative test, and there could be a lot of combinations for that.
will generate more cases of that type.

Comment 5 gal salomon 2023-07-26 07:18:36 UTC
the input-serialization(AWS-CLI) indicates the s3select-request as a JSON statement 
while 
the statement itself does not align with the JSON statement syntax(the from clause)

it creates conflict, that is not handled correctly, and that lead to the crash.

Comment 6 gal salomon 2023-08-16 16:07:32 UTC
it is fixed on https://github.com/ceph/ceph/pull/52651


Note You need to log in before you can comment on or make changes to this bug.