Description of problem: post upgrade from 8.0 to 8.1, rgw crashing with old versioned objects download using aws-cli/2.24.22 default checksum enabled. log snippet: [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 cp s3://versioned-bkt1/obj10MB obj10MB.download_post_upgrade2 download failed: s3://versioned-bkt1/obj10MB to ./obj10MB.download_post_upgrade2 Could not connect to the endpoint URL: "http://10.0.67.106:80/versioned-bkt1/obj10MB" [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ rgw crash snippet: 2025-03-25T18:31:50.068+0000 7fc4c15e5640 20 req 8669017663611319326 0.003000053s s3:get_obj RGWObjManifest::operator++(): result: ofs=8388608 stripe_ofs=8388608 part_ofs=8388608 rule->part_size=1611392 2025-03-25T18:31:50.073+0000 7fc492d88640 -1 *** Caught signal (Aborted) ** in thread 7fc492d88640 thread_name:io_context_pool ceph version 19.2.1-61.el9cp (df255b5e3adc837e3ed676ab42debe258fcec2ae) squid (stable) 1: /lib64/libc.so.6(+0x3e730) [0x7fc532a4c730] 2: /lib64/libc.so.6(+0x8b52c) [0x7fc532a9952c] 3: raise() 4: abort() 5: /usr/bin/radosgw(+0x4214f8) [0x56084f7174f8] 6: /usr/bin/radosgw(+0x68d88f) [0x56084f98388f] 7: (RGWGetObj_ObjStore_S3::send_response_data(ceph::buffer::v15_2_0::list&, long, long)+0x9ba) [0x56084fa1806a] 8: (get_obj_data::flush(rgw::OwningList<rgw::AioResultEntry>&&)+0x788) [0x56084fba12d8] 9: (RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)+0x2fb) [0x56084fba489b] 10: (RGWGetObj::execute(optional_yield)+0x1228) [0x56084f99ea18] 11: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xa70) [0x56084f841400] 12: (process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0xfdd) [0x56084f842bed] 13: /usr/bin/radosgw(+0xe9b664) [0x560850191664] 14: /usr/bin/radosgw(+0x49b0e6) [0x56084f7910e6] 15: /usr/bin/radosgw(+0x47e474) [0x56084f774474] 16: make_fcontext() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events — and get-object-attributes of old object in a non-versioned bucket is also causing rgw crash: log snippet: [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3api get-object-attributes --bucket bkt1 --key obj1_awscli_v2 --object-attributes ObjectSize ObjectParts checksum etag StorageClass Could not connect to the endpoint URL: "http://10.0.67.106:80/bkt1/obj1_awscli_v2?attributes" [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ crash snippet: 2025-03-25T19:39:27.430+0000 7f3300cf0640 20 req 14339725936029400992 0.002000036s s3:get_obj_attrs Read xattr rgw_rados: user.rgw.trace 2025-03-25T19:39:27.430+0000 7f3300cf0640 10 req 14339725936029400992 0.002000036s cache get: name=primary.rgw.log++script.getdata. : hit (negative entry) 2025-03-25T19:39:27.430+0000 7f3300cf0640 15 req 14339725936029400992 0.002000036s Encryption mode: 2025-03-25T19:39:27.430+0000 7f3300cf0640 2 req 14339725936029400992 0.002000036s s3:get_obj_attrs completing 2025-03-25T19:39:27.433+0000 7f3300cf0640 -1 *** Caught signal (Aborted) ** in thread 7f3300cf0640 thread_name:io_context_pool ceph version 19.2.1-61.el9cp (df255b5e3adc837e3ed676ab42debe258fcec2ae) squid (stable) 1: /lib64/libc.so.6(+0x3e730) [0x7f33320d7730] 2: /lib64/libc.so.6(+0x8b52c) [0x7f333212452c] 3: raise() 4: abort() 5: /usr/bin/radosgw(+0x4214f8) [0x55a1f16734f8] 6: /usr/bin/radosgw(+0x68d88f) [0x55a1f18df88f] 7: (RGWGetObjAttrs_ObjStore_S3::send_response()+0x243) [0x55a1f1998bf3] 8: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xb90) [0x55a1f179d520] 9: (process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0xfdd) [0x55a1f179ebed] 10: /usr/bin/radosgw(+0xe9b664) [0x55a1f20ed664] 11: /usr/bin/radosgw(+0x49b0e6) [0x55a1f16ed0e6] 12: /usr/bin/radosgw(+0x47e474) [0x55a1f16d0474] 13: make_fcontext() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- Version-Release number of selected component (if applicable): ceph version 19.2.1-61.el9cp How reproducible: always Steps to Reproduce: 1.create a ceph cluster on 8.0(19.2.0-120.el9cp) 2.create a bucket, enable versioning on the bucket. and upload few objects. able to download objects prev to upgrade [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 mb s3://versioned-bkt1 make_bucket: versioned-bkt1 [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 ls 2025-03-13 15:09:17 bkt1 2025-03-25 18:04:25 versioned-bkt1 [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3api put-bucket-versioning --bucket versioned-bkt1 --versioning-configuration Status=Enabled [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 cp obj10MB s3://versioned-bkt1 upload: ./obj10MB to s3://versioned-bkt1/obj10MB [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 cp s3://versioned-bkt1/obj10MB obj10MB.download_pre_upgrade download: s3://versioned-bkt1/obj10MB to ./obj10MB.download_pre_upgrade [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 3.upgrade the cluster to 8.1, try to download old verioned objects after upgrade with awscli/2.24.22 with default checksum enabled, it is causing rgw crash. Actual results: rgw crashing with old versioned objects download post upgrade using awscli/2.24.22 Expected results: rgw should not crash with object download/get-object-atrributes of old objects post upgrade to 8.1 Additional info: rgw logs and coredump are present here: http://magna002.ceph.redhat.com/cephci-jenkins/hsm/rgw_crash_with_get_object_post_upgrade/ceph-client.rgw.shared.pri.ceph-pri-hsm-scale-63cc9z-node5.fhwbev.log http://magna002.ceph.redhat.com/cephci-jenkins/hsm/rgw_crash_with_get_object_post_upgrade/rgw_coredump1_get_object_crash test environment details: pri site: [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ ceph orch host ls HOST ADDR LABELS STATUS ceph-pri-hsm-scale-63cc9z-node1-installer 10.0.66.218 _admin,mon,mgr,installer ceph-pri-hsm-scale-63cc9z-node2 10.0.65.55 osd,mgr ceph-pri-hsm-scale-63cc9z-node3 10.0.64.3 osd,mon ceph-pri-hsm-scale-63cc9z-node4 10.0.65.231 osd,mon ceph-pri-hsm-scale-63cc9z-node5 10.0.67.106 rgw,osd 5 hosts in cluster [cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ pris site client: 10.0.64.233 sec site: [cephuser@ceph-sec-hsm-scale-63cc9z-node6 ~]$ ceph orch host ls HOST ADDR LABELS STATUS ceph-sec-hsm-scale-63cc9z-node1-installer 10.0.65.250 _admin,mon,mgr,installer ceph-sec-hsm-scale-63cc9z-node2 10.0.67.18 osd,mgr ceph-sec-hsm-scale-63cc9z-node3 10.0.65.112 osd,mon ceph-sec-hsm-scale-63cc9z-node4 10.0.65.162 osd,mon ceph-sec-hsm-scale-63cc9z-node5 10.0.64.20 rgw,osd 5 hosts in cluster [cephuser@ceph-sec-hsm-scale-63cc9z-node6 ~]$ creds: root/passwd , cephuser/cephuser
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:9775