Bug 2354911

Summary:	[8.1][rgw]: post upgrade from 8.0 to 8.1, rgw crashing with old versioned objects download using aws-cli/2.24.22
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Hemanth Sai <hmaheswa>
Component:	RGW	Assignee:	Matt Benjamin (redhat) <mbenjamin>
Status:	CLOSED ERRATA	QA Contact:	Hemanth Sai <hmaheswa>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	8.1	CC:	ceph-eng-bugs, cephqe-warriors, tserlin
Target Milestone:	---	Keywords:	Regression
Target Release:	8.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-19.2.1-99.el9cp	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2025-06-26 12:29:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hemanth Sai 2025-03-25 19:47:47 UTC

Description of problem:
post upgrade from 8.0 to 8.1, rgw crashing with old versioned objects download using aws-cli/2.24.22 default checksum enabled.



log snippet:

[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 cp s3://versioned-bkt1/obj10MB obj10MB.download_post_upgrade2
download failed: s3://versioned-bkt1/obj10MB to ./obj10MB.download_post_upgrade2 Could not connect to the endpoint URL: "http://10.0.67.106:80/versioned-bkt1/obj10MB"
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 



rgw crash snippet:

2025-03-25T18:31:50.068+0000 7fc4c15e5640 20 req 8669017663611319326 0.003000053s s3:get_obj RGWObjManifest::operator++(): result: ofs=8388608 stripe_ofs=8388608 part_ofs=8388608 rule->part_size=1611392
2025-03-25T18:31:50.073+0000 7fc492d88640 -1 *** Caught signal (Aborted) **
 in thread 7fc492d88640 thread_name:io_context_pool

 ceph version 19.2.1-61.el9cp (df255b5e3adc837e3ed676ab42debe258fcec2ae) squid (stable)
 1: /lib64/libc.so.6(+0x3e730) [0x7fc532a4c730]
 2: /lib64/libc.so.6(+0x8b52c) [0x7fc532a9952c]
 3: raise()
 4: abort()
 5: /usr/bin/radosgw(+0x4214f8) [0x56084f7174f8]
 6: /usr/bin/radosgw(+0x68d88f) [0x56084f98388f]
 7: (RGWGetObj_ObjStore_S3::send_response_data(ceph::buffer::v15_2_0::list&, long, long)+0x9ba) [0x56084fa1806a]
 8: (get_obj_data::flush(rgw::OwningList<rgw::AioResultEntry>&&)+0x788) [0x56084fba12d8]
 9: (RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)+0x2fb) [0x56084fba489b]
 10: (RGWGetObj::execute(optional_yield)+0x1228) [0x56084f99ea18]
 11: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xa70) [0x56084f841400]
 12: (process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0xfdd) [0x56084f842bed]
 13: /usr/bin/radosgw(+0xe9b664) [0x560850191664]
 14: /usr/bin/radosgw(+0x49b0e6) [0x56084f7910e6]
 15: /usr/bin/radosgw(+0x47e474) [0x56084f774474]
 16: make_fcontext()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events —




and get-object-attributes of old object in a non-versioned bucket is also causing rgw crash:


log snippet:

[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3api get-object-attributes --bucket bkt1 --key obj1_awscli_v2 --object-attributes ObjectSize ObjectParts checksum etag StorageClass

Could not connect to the endpoint URL: "http://10.0.67.106:80/bkt1/obj1_awscli_v2?attributes"
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 


crash snippet:

2025-03-25T19:39:27.430+0000 7f3300cf0640 20 req 14339725936029400992 0.002000036s s3:get_obj_attrs Read xattr rgw_rados: user.rgw.trace
2025-03-25T19:39:27.430+0000 7f3300cf0640 10 req 14339725936029400992 0.002000036s cache get: name=primary.rgw.log++script.getdata. : hit (negative entry)
2025-03-25T19:39:27.430+0000 7f3300cf0640 15 req 14339725936029400992 0.002000036s Encryption mode:
2025-03-25T19:39:27.430+0000 7f3300cf0640  2 req 14339725936029400992 0.002000036s s3:get_obj_attrs completing
2025-03-25T19:39:27.433+0000 7f3300cf0640 -1 *** Caught signal (Aborted) **
 in thread 7f3300cf0640 thread_name:io_context_pool

 ceph version 19.2.1-61.el9cp (df255b5e3adc837e3ed676ab42debe258fcec2ae) squid (stable)
 1: /lib64/libc.so.6(+0x3e730) [0x7f33320d7730]
 2: /lib64/libc.so.6(+0x8b52c) [0x7f333212452c]
 3: raise()
 4: abort()
 5: /usr/bin/radosgw(+0x4214f8) [0x55a1f16734f8]
 6: /usr/bin/radosgw(+0x68d88f) [0x55a1f18df88f]
 7: (RGWGetObjAttrs_ObjStore_S3::send_response()+0x243) [0x55a1f1998bf3]
 8: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xb90) [0x55a1f179d520]
 9: (process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0xfdd) [0x55a1f179ebed]
 10: /usr/bin/radosgw(+0xe9b664) [0x55a1f20ed664]
 11: /usr/bin/radosgw(+0x49b0e6) [0x55a1f16ed0e6]
 12: /usr/bin/radosgw(+0x47e474) [0x55a1f16d0474]
 13: make_fcontext()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---



Version-Release number of selected component (if applicable):
ceph version 19.2.1-61.el9cp

How reproducible:
always

Steps to Reproduce:
1.create a ceph cluster on 8.0(19.2.0-120.el9cp)
2.create a bucket, enable versioning on the bucket. and upload few objects. able to download objects prev to upgrade
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 mb s3://versioned-bkt1
make_bucket: versioned-bkt1
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 ls
2025-03-13 15:09:17 bkt1
2025-03-25 18:04:25 versioned-bkt1
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3api put-bucket-versioning --bucket versioned-bkt1 --versioning-configuration Status=Enabled
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 cp obj10MB s3://versioned-bkt1
upload: ./obj10MB to s3://versioned-bkt1/obj10MB                 
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ /usr/local/bin/aws --endpoint-url http://10.0.67.106:80 s3 cp s3://versioned-bkt1/obj10MB obj10MB.download_pre_upgrade
download: s3://versioned-bkt1/obj10MB to ./obj10MB.download_pre_upgrade
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 

3.upgrade the cluster to 8.1, try to download old verioned objects after upgrade with awscli/2.24.22 with default checksum enabled, it is causing rgw crash. 

Actual results:
rgw crashing with old versioned objects download post upgrade using awscli/2.24.22

Expected results:
rgw should not crash with object download/get-object-atrributes of old objects post upgrade to 8.1

Additional info:

rgw logs and coredump are present here:
http://magna002.ceph.redhat.com/cephci-jenkins/hsm/rgw_crash_with_get_object_post_upgrade/ceph-client.rgw.shared.pri.ceph-pri-hsm-scale-63cc9z-node5.fhwbev.log

http://magna002.ceph.redhat.com/cephci-jenkins/hsm/rgw_crash_with_get_object_post_upgrade/rgw_coredump1_get_object_crash



test environment details:

pri site:

[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ ceph orch host ls
HOST                                       ADDR         LABELS                    STATUS  
ceph-pri-hsm-scale-63cc9z-node1-installer  10.0.66.218  _admin,mon,mgr,installer          
ceph-pri-hsm-scale-63cc9z-node2            10.0.65.55   osd,mgr                           
ceph-pri-hsm-scale-63cc9z-node3            10.0.64.3    osd,mon                           
ceph-pri-hsm-scale-63cc9z-node4            10.0.65.231  osd,mon                           
ceph-pri-hsm-scale-63cc9z-node5            10.0.67.106  rgw,osd                           
5 hosts in cluster
[cephuser@ceph-pri-hsm-scale-63cc9z-node6 ~]$ 

pris site client: 10.0.64.233

sec site:

[cephuser@ceph-sec-hsm-scale-63cc9z-node6 ~]$ ceph orch host ls
HOST                                       ADDR         LABELS                    STATUS  
ceph-sec-hsm-scale-63cc9z-node1-installer  10.0.65.250  _admin,mon,mgr,installer          
ceph-sec-hsm-scale-63cc9z-node2            10.0.67.18   osd,mgr                           
ceph-sec-hsm-scale-63cc9z-node3            10.0.65.112  osd,mon                           
ceph-sec-hsm-scale-63cc9z-node4            10.0.65.162  osd,mon                           
ceph-sec-hsm-scale-63cc9z-node5            10.0.64.20   rgw,osd                           
5 hosts in cluster
[cephuser@ceph-sec-hsm-scale-63cc9z-node6 ~]$


creds: root/passwd , cephuser/cephuser

Comment 6 errata-xmlrpc 2025-06-26 12:29:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775