Bug 2321269 - [8.0][mutipart-uploads]: RGW crashes in thread "io_context_pool" during s3 copy object between storage classes
Summary: [8.0][mutipart-uploads]: RGW crashes in thread "io_context_pool" during s3 co...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 8.0
Assignee: Soumya Koduri
QA Contact: Hemanth Sai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-10-23 11:55 UTC by Vidushi Mishra
Modified: 2024-11-25 09:13 UTC (History)
6 users (show)

Fixed In Version: ceph-19.2.0-50.el9cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-11-25 09:13:32 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-10112 0 None None None 2024-10-23 11:56:57 UTC
Red Hat Product Errata RHBA-2024:10216 0 None None None 2024-11-25 09:13:35 UTC

Description Vidushi Mishra 2024-10-23 11:55:34 UTC
Description of problem:

RGW crashes in "thread_name:io_context_pool" when attempting to perform an s3 copy operation involving multipart objects across different storage classes.

==========Snippet of the backtrace=========


2024-10-23T11:45:36.000+0000 7fca57eb0640  5 req 10767782735017001868 0.003000282s s3:copy_obj Copy object :src1[b3b88145-0183-4daf-9457-a4454d134fe7.25219.1]):object20m => :dest1[b3b88145-0183-4daf-9457-a4454d134fe7.25219.2]):object20m-sc
2024-10-23T11:45:36.092+0000 7fca0de1c640 -1 *** Caught signal (Aborted) **
 in thread 7fca0de1c640 thread_name:io_context_pool

 ceph version 19.2.0-44.el9cp (8c6c17081885f1a0df618a41cd6963567c8412e9) squid (stable)
 1: /lib64/libc.so.6(+0x3e6f0) [0x7fcac13086f0]
 2: /lib64/libc.so.6(+0x8b94c) [0x7fcac135594c]
 3: raise()
 4: abort()
 5: /lib64/libstdc++.so.6(+0xa1b21) [0x7fcac166cb21]
 6: /lib64/libstdc++.so.6(+0xad52c) [0x7fcac167852c]
 7: /lib64/libstdc++.so.6(+0xad597) [0x7fcac1678597]
 8: /lib64/libstdc++.so.6(+0xad51f) [0x7fcac167851f]
 9: /usr/bin/radosgw(+0x43e77f) [0x560b8270977f]
 10: (void boost::asio::detail::executor_function::complete<boost::asio::detail::binder0<ceph::async::ForwardingHandler<ceph::async::CompletionHandler<boost::asio::executor_binder<spawn::detail::coro_handler<boost::asio::executor_binder<void (*)(), boost::asio::any_io_executor>, unsigned long>, boost::asio::any_io_executor>, std::tuple<boost::system::error_code, unsigned long> > > >, std::allocator<void> >(boost::asio::detail::executor_function::impl_base*, bool)+0x235) [0x560b82bf33f5]
 11: (boost::asio::detail::executor_op<boost::asio::detail::executor_function, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0xab) [0x560b8273311b]
 12: /usr/bin/radosgw(+0x474d9e) [0x560b8273fd9e]
 13: (boost::asio::detail::executor_op<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> const, void>, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0xcc) [0x560b8274022c]
 14: /usr/bin/radosgw(+0xe715ce) [0x560b8313c5ce]
 15: /usr/bin/radosgw(+0x436e71) [0x560b82701e71]
 16: /lib64/libstdc++.so.6(+0xdbad4) [0x7fcac16a6ad4]
 17: /lib64/libc.so.6(+0x89c02) [0x7fcac1353c02]
 18: /lib64/libc.so.6(+0x10ec40) [0x7fcac13d8c40]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
 -9999> 2024-10-23T11:45:01.435+0000 7fc9bf57f640  2 req 3156897534353529934 0.000000000s getting op 0
 -9998> 2024-10-23T11:45:01.435+0000 7fc9bf57f640  2 req 3156897534353529934 0.000000000s :list_data_changes_log verifying requester
 -9997> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log normalizing buckets and tenants
 -9996> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log init permissions
 -9995> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log recalculating target
 -9994> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log reading permissions
 -9993> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log init op
 -9992> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log verifying op mask
 -9991> 2024-10-23T11:45:01.435+0000 7fca90721640  2 req 7363331454563604739 0.000000000s :list_data_changes_log verifying op permissions
 -9990> 2024-10-23T11:45:01.435+0000 7fca90721640  2 overriding permissions due to system operation






Version-Release number of selected component (if applicable):
ceph version 19.2.0-44.el9cp (8c6c17081885f1a0df618a41cd6963567c8412e9) squid (stable)


How reproducible:
Always

Steps to Reproduce:
1. Create a bucket 'src1' and 'dest1'
2. Upload a multipart object to the bucket 'src1'
3. cp the object from source bucket src1 to the destination bucket dest1 but in storage class 'sc1'

# s3cmd mb s3://src1
Bucket 's3://src1/' created
# s3cmd mb s3://dest1
Bucket 's3://dest1/' created
# ls -lkh object20m 
-rw-r--r--. 1 root root 20M Oct 23 07:41 object20m

# s3cmd put object20m s3://src1/
upload: 'object20m' -> 's3://src1/object20m'  [part 1 of 2, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s    25.82 MB/s  done
upload: 'object20m' -> 's3://src1/object20m'  [part 2 of 2, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    14.50 MB/s  done

# s3cmd cp  s3://src1/object20m s3://dest1/object20m-sc --storage-class sc1
WARNING: Retrying failed request: /object20m-sc (Remote end closed connection without response)
WARNING: Waiting 3 sec...
ERROR: Could not connect to server: [Errno 111] Connection refused

# ceph -v
ceph version 19.2.0-44.el9cp (8c6c17081885f1a0df618a41cd6963567c8412e9) squid (stable)

Actual results:

RGW crashes with s3 copy between storage classes for multipart uploads

Expected results:

No crash should be seen and the s3 copy should be successful for multipart uploads

Additional info:

Comment 11 errata-xmlrpc 2024-11-25 09:13:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216


Note You need to log in before you can comment on or make changes to this bug.