Bug 2321269

Summary: [8.0][mutipart-uploads]: RGW crashes in thread "io_context_pool" during s3 copy object between storage classes
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vidushi Mishra <vimishra>
Component: RGWAssignee: Soumya Koduri <skoduri>
Status: CLOSED ERRATA QA Contact: Hemanth Sai <hmaheswa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.0CC: ceph-eng-bugs, cephqe-warriors, hmaheswa, mwatts, tserlin, vereddy
Target Milestone: ---   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-19.2.0-50.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-25 09:13:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vidushi Mishra 2024-10-23 11:55:34 UTC
Description of problem:

RGW crashes in "thread_name:io_context_pool" when attempting to perform an s3 copy operation involving multipart objects across different storage classes.

==========Snippet of the backtrace=========


2024-10-23T11:45:36.000+0000 7fca57eb0640  5 req 10767782735017001868 0.003000282s s3:copy_obj Copy object :src1[b3b88145-0183-4daf-9457-a4454d134fe7.25219.1]):object20m => :dest1[b3b88145-0183-4daf-9457-a4454d134fe7.25219.2]):object20m-sc
2024-10-23T11:45:36.092+0000 7fca0de1c640 -1 *** Caught signal (Aborted) **
 in thread 7fca0de1c640 thread_name:io_context_pool

 ceph version 19.2.0-44.el9cp (8c6c17081885f1a0df618a41cd6963567c8412e9) squid (stable)
 1: /lib64/libc.so.6(+0x3e6f0) [0x7fcac13086f0]
 2: /lib64/libc.so.6(+0x8b94c) [0x7fcac135594c]
 3: raise()
 4: abort()
 5: /lib64/libstdc++.so.6(+0xa1b21) [0x7fcac166cb21]
 6: /lib64/libstdc++.so.6(+0xad52c) [0x7fcac167852c]
 7: /lib64/libstdc++.so.6(+0xad597) [0x7fcac1678597]
 8: /lib64/libstdc++.so.6(+0xad51f) [0x7fcac167851f]
 9: /usr/bin/radosgw(+0x43e77f) [0x560b8270977f]
 10: (void boost::asio::detail::executor_function::complete<boost::asio::detail::binder0<ceph::async::ForwardingHandler<ceph::async::CompletionHandler<boost::asio::executor_binder<spawn::detail::coro_handler<boost::asio::executor_binder<void (*)(), boost::asio::any_io_executor>, unsigned long>, boost::asio::any_io_executor>, std::tuple<boost::system::error_code, unsigned long> > > >, std::allocator<void> >(boost::asio::detail::executor_function::impl_base*, bool)+0x235) [0x560b82bf33f5]
 11: (boost::asio::detail::executor_op<boost::asio::detail::executor_function, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0xab) [0x560b8273311b]
 12: /usr/bin/radosgw(+0x474d9e) [0x560b8273fd9e]
 13: (boost::asio::detail::executor_op<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> const, void>, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0xcc) [0x560b8274022c]
 14: /usr/bin/radosgw(+0xe715ce) [0x560b8313c5ce]
 15: /usr/bin/radosgw(+0x436e71) [0x560b82701e71]
 16: /lib64/libstdc++.so.6(+0xdbad4) [0x7fcac16a6ad4]
 17: /lib64/libc.so.6(+0x89c02) [0x7fcac1353c02]
 18: /lib64/libc.so.6(+0x10ec40) [0x7fcac13d8c40]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
 -9999> 2024-10-23T11:45:01.435+0000 7fc9bf57f640  2 req 3156897534353529934 0.000000000s getting op 0
 -9998> 2024-10-23T11:45:01.435+0000 7fc9bf57f640  2 req 3156897534353529934 0.000000000s :list_data_changes_log verifying requester
 -9997> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log normalizing buckets and tenants
 -9996> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log init permissions
 -9995> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log recalculating target
 -9994> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log reading permissions
 -9993> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log init op
 -9992> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log verifying op mask
 -9991> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log verifying op permissions
 -9990> 2024-10-23T11:45:01.435+0000 7fca90721640  2 overriding permissions due to system operation






Version-Release number of selected component (if applicable):
ceph version 19.2.0-44.el9cp (8c6c17081885f1a0df618a41cd6963567c8412e9) squid (stable)


How reproducible:
Always

Steps to Reproduce:
1. Create a bucket 'src1' and 'dest1'
2. Upload a multipart object to the bucket 'src1'
3. cp the object from source bucket src1 to the destination bucket dest1 but in storage class 'sc1'

# s3cmd mb s3://src1
Bucket 's3://src1/' created
# s3cmd mb s3://dest1
Bucket 's3://dest1/' created
# ls -lkh object20m 
-rw-r--r--. 1 root root 20M Oct 23 07:41 object20m

# s3cmd put object20m s3://src1/
upload: 'object20m' -> 's3://src1/object20m'  [part 1 of 2, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s    25.82 MB/s  done
upload: 'object20m' -> 's3://src1/object20m'  [part 2 of 2, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    14.50 MB/s  done

# s3cmd cp  s3://src1/object20m s3://dest1/object20m-sc --storage-class sc1
WARNING: Retrying failed request: /object20m-sc (Remote end closed connection without response)
WARNING: Waiting 3 sec...
ERROR: Could not connect to server: [Errno 111] Connection refused

# ceph -v
ceph version 19.2.0-44.el9cp (8c6c17081885f1a0df618a41cd6963567c8412e9) squid (stable)

Actual results:

RGW crashes with s3 copy between storage classes for multipart uploads

Expected results:

No crash should be seen and the s3 copy should be successful for multipart uploads

Additional info:

Comment 11 errata-xmlrpc 2024-11-25 09:13:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216