Bug 2356922
| Summary: | Consistent, reproducible RGW crashes in special awscli upload-part-copy scenario | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Scott Nipp <snipp> | |
| Component: | RGW-Multisite | Assignee: | Matt Benjamin (redhat) <mbenjamin> | |
| Status: | CLOSED ERRATA | QA Contact: | Chaithra <ckulal> | |
| Severity: | high | Docs Contact: | Rivka Pollack <rpollack> | |
| Priority: | unspecified | |||
| Version: | 7.1 | CC: | ceph-eng-bugs, cephqe-warriors, ckulal, laurent.barbe, mbenjamin, mkasturi, rpollack, rsachere, tru, tserlin | |
| Target Milestone: | --- | Flags: | mkasturi:
needinfo+
|
|
| Target Release: | 8.1 | |||
| Hardware: | x86_64 | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | ceph-19.2.1-202.el9cp | Doc Type: | Bug Fix | |
| Doc Text: |
.Invalid URL-encoded text from the client no longer creates errors
Previously, the system improperly handled scenarios where URL decoding resulted in an empty `key.name`. The empty `key.name` due to invalid URL-encoded text from the client. As a result, an assertion error during the copy operation would occur, and sometimes led to a crash later.
With this fix, invalid empty `key.name` values are now ignored, and copy operations no longer trigger assertions or causes crashes.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 2369418 (view as bug list) | Environment: | ||
| Last Closed: | 2025-06-26 12:22:11 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2370192, 2351689, 2369418 | |||
|
Description
Scott Nipp
2025-04-02 14:22:21 UTC
The log file added for an older version of AWS-cli was tried in relation to KCS (https://access.redhat.com/solutions/7109373). However, though the CU did reduce the AWS-cli version and try the test again, they also mentioned that this is being observed with multiple clients. We just got some new information from the CU on this. They are able to reproduce this without versioning on an object. Below is their more abbreviated reproduce and crash log for this without versioning on the object.
~~~
# Create a bucket
aws-debug s3api create-bucket --bucket lbarbe-debug
# Create file > 8M and upload it
dd if=/dev/zero of=file bs=9M count=1
aws-debug s3 cp file "s3://lbarbe-debug/object_with_%.txt"
# Copy file with multipart upload
aws-debug s3api create-multipart-upload --bucket lbarbe-debug --key 'new_object.txt'
aws-debug s3api upload-part-copy --bucket lbarbe-debug --key 'new_object.txt' --copy-source 'lbarbe-debug/object_with_%.txt' --part-number 1 --upload-id "2~aUmf8gH_YdOCwNxyCrk---i8-1eE0v-"
~~~
# --> RGW Crash
"/lib64/libc.so.6(+0x3e730) [0x7f8d6e517730]",
"/lib64/libc.so.6(+0x8ba6c) [0x7f8d6e564a6c]",
"raise()",
"abort()",
"/lib64/libc.so.6(+0x2875b) [0x7f8d6e50175b]",
"/lib64/libc.so.6(+0x373c6) [0x7f8d6e5103c6]",
"(RGWObjectCtx::set_atomic(rgw_obj const&)+0xf8) [0x55e1c44658a8]",
"/usr/bin/radosgw(+0x4ff208) [0x55e1c4246208]",
"(RGWPutObj::verify_permission(optional_yield)+0x512) [0x55e1c4266de2]",
"(rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0x648) [0x55e1c41b6c58]",
"(process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x1002) [0x55e1c41bdeb2]",
"/usr/bin/radosgw(+0xb9cdf1) [0x55e1c48e3df1]",
"/usr/bin/radosgw(+0x385e76) [0x55e1c40cce76]",
"make_fcontext()"
~~~
So the crash appears exactly the same in dealing with a versioned object or not. I've asked the CU if they can test this in a bucket without versioning enabled to see if that makes a difference. I'll update once they provide those results.
Ignore previous comment #5. I missed that the CU in that test had created a completely new bucket without versioning. I thought they had simply uploaded another object to the same bucket without specifying versioning on the object. I followed up with the customer on this issue today just to touch base. I expressed that this is really beyond just Ceph/RGW as the issue seems to be more with the AWS S3 protocol itself. Their response was the concern that a bad actor might use this as a security threat to basically crash their RGW daemons. They are much less concerned with the ability to reliably use the '%' character in object names as much as to ensure this doesn't crash the RGW daemons. I just wanted to check in again on this for the customer with regards to their concern with potential this being used as a DOS type of attack by crashing the RGW daemons. Any thoughts on mitigation of this particular AWS S3 protocol issue? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:9775 |