Bug 2225434

Summary: rgw crashes seen for s3select json query with "where" clause
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Hemanth Sai <hmaheswa>
Component: RGWAssignee: gal salomon <gsalomon>
Status: ASSIGNED --- QA Contact: Hemanth Sai <hmaheswa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: ceph-eng-bugs, cephqe-warriors, gsalomon, mbenjamin, tserlin
Target Milestone: ---Flags: mbenjamin: needinfo? (tserlin)
Target Release: 7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hemanth Sai 2023-07-25 10:05:23 UTC
Description of problem:
rgw crashes seen for s3select json query with "where" clause

crash info:
{
    "crash_id": "2023-07-25T08:50:29.789218Z_82a457cb-e975-4636-8804-7bfe1bbd0e08",
    "timestamp": "2023-07-25T08:50:29.789218Z",
    "process_name": "radosgw",
    "entity_name": "client.rgw.shared.pri.ceph-pri-hmaheswa-automation-0jcitv-node5.vnvrex",
    "ceph_version": "18.0.0-5070-g01bc98b4",
    "utsname_hostname": "ceph-pri-hmaheswa-automation-0jcitv-node5",
    "utsname_sysname": "Linux",
    "utsname_release": "5.14.0-284.18.1.el9_2.x86_64",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Wed May 31 10:39:18 EDT 2023",
    "utsname_machine": "x86_64",
    "os_name": "CentOS Stream",
    "os_id": "centos",
    "os_version_id": "8",
    "os_version": "8",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7fb199b20cf0]",
        "gsignal()",
        "abort()",
        "/usr/bin/radosgw(+0x673c58) [0x563908f9cc58]",
        "(s3selectEngine::json_object::init_json_processor(s3selectEngine::s3select*)+0x78f) [0x5639093b4a8f]",
        "(RGWSelectObj_ObjStore_S3::run_s3select_on_json(char const*, char const*, unsigned long)+0x364) [0x56390938ba64]",
        "(RGWSelectObj_ObjStore_S3::json_processing(ceph::buffer::v15_2_0::list&, long, long)+0x6a5) [0x5639093903a5]",
        "(RGWRados::get_obj_iterate_cb(DoutPrefixProvider const*, rgw_raw_obj const&, long, long, long, bool, RGWObjState*, void*)+0x131) [0x5639094f1d01]",
        "/usr/bin/radosgw(+0xba8ed6) [0x5639094d1ed6]",
        "(RGWRados::iterate_obj(DoutPrefixProvider const*, RGWObjectCtx&, RGWBucketInfo&, rgw_obj const&, long, long, unsigned long, int (*)(DoutPrefixProvider const*, rgw_raw_obj const&, long, long, long, bool, RGWObjState*, void*), void*, optional_yield)+0x3b6) [0x563909514a36]",
        "(RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)+0x138) [0x563909515298]",
        "(RGWGetObj::execute(optional_yield)+0x1122) [0x5639092b8582]",
        "(RGWSelectObj_ObjStore_S3::execute(optional_yield)+0xc1) [0x56390938e131]",
        "(rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xd91) [0x5639090a5171]",
        "(process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x2b5c) [0x5639090a875c]",
        "/usr/bin/radosgw(+0x6b347e) [0x563908fdc47e]",
        "/usr/bin/radosgw(+0x6b4147) [0x563908fdd147]",
        "make_fcontext()"
    ]
}

Version-Release number of selected component (if applicable):
ceph version 18.0.0-5070-g01bc98b4 (01bc98b489ef938d10e187313be218ecd8a7ef33) reef (dev)

How reproducible:
18/18

Steps to Reproduce:
1.deploy a ceph upstream reef cluster
2.upload a json object using aws-cli
3.query the json object with awscli with "where" clause.

[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key small_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object where employee.name=raju;" /dev/stdout

Could not connect to the endpoint URL: "http://10.0.98.34:80/bkt1/small_json?select&select-type=2"
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ cat small_json {
"employee": {
"name": "raju",
"salary": 56000,
"married": true
}
}

[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key 200_mb_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object[*] where tags=ring;" /dev/stdout

Could not connect to the endpoint URL: "http://10.0.98.34:80/bkt1/200_mb_json?select&select-type=2"

json file is downloaded from here: https://www.kaggle.com/datasets/kristoft/pitt-quantum-repository-106066-molecules


But the below query with S3Object[*] is not giving crashes for small object and it is crashing for large object:
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$ venv/bin/aws s3api select-object-content --endpoint-url http://10.0.98.34:80 --bucket bkt1 --key small_json --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' --expression "select * from S3Object[*] where employee.name=raju;"  /dev/stdout
1 : alias {name} or column not exist in schema
#=== 0 ===#
[cephuser@ceph-pri-hmaheswa-automation-0jcitv-node6 ~]$

Actual results:
rgw daemon is crashing with s3select query on json object with "where" clause

Expected results:
query is successful without rgw crashes 

Additional info:
rgw logs and crash logs are present here: http://magna002.ceph.redhat.com/ceph-qe-logs/HemanthSai/s3select_json_logs/

rgw node: 10.0.98.34
creds: root/passwd ; cephuser/cephuser

also raised upstream tracker for this: https://tracker.ceph.com/issues/62156

Comment 2 gal salomon 2023-07-25 22:00:12 UTC
the JSON query syntax is *wrong*.
 
please review the s3select-tests, it provides many use-cases.
you can easily observe the query, input, and output.
bellow a link to tests in the s3select-repo.
https://github.com/ceph/s3select/blob/45e29caeea37d6f34fad516cbe7fd8f8bd4d68a9/test/s3select_test.cpp#L2967

[ --output-serialization '{"JSON": {}}' ]
as for output-serialization, the engine support only CSV as output. 


the wrong input is actually a negative test, and it detects a wrong flow per wrong input.
it crashed before query processing.

Comment 4 gal salomon 2023-07-25 22:25:44 UTC
no, it should not crash.

it failed on a negative test, and there could be a lot of combinations for that.
will generate more cases of that type.

Comment 5 gal salomon 2023-07-26 07:18:36 UTC
the input-serialization(AWS-CLI) indicates the s3select-request as a JSON statement 
while 
the statement itself does not align with the JSON statement syntax(the from clause)

it creates conflict, that is not handled correctly, and that lead to the crash.

Comment 6 gal salomon 2023-08-16 16:07:32 UTC
it is fixed on https://github.com/ceph/ceph/pull/52651