Bug 2008835

Summary: [GSS][RGW]Arbitrarily-large space leaks generated by re-uploading the same multi-part part multiple times
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Geo Jose <gjose>
Component: RGWAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: MODIFIED --- QA Contact: Madhavi Kasturi <mkasturi>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2CC: cbodley, ceph-eng-bugs, kbader, mbenjamin, mmuench, vereddy
Target Milestone: ---   
Target Release: 7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Geo Jose 2021-09-29 10:06:10 UTC
In RGW workload, large quantities of objects are accumulated in the data pool that appear to be orphans. The leaked part objects are for completed multi-part uploads.

Engineering believes this is the primary underlying issue-the ability to generate arbitrarily-large space leaks by re-uploading the same multi-part part multiple times.  This affects all RGW versions which have supported S3 multipart upload.

The root cause is that although these RGWs contain logic to detect that the upload part operation has conflicted with a prior upload of the part, the code handling that case addresses the naming conflict but does not correctly accumulate the full set of object names generated by all the upload attempts for a given part, and instead overwrites metadata related to prior uploads of the part with the latest one.

To fix this, we propose to move the current serialization and store of part upload metadata into RGW's OSD-side CLS interface, where it is straightforward to combine existing and new part metadata, as well as avoid races between simultaneous uploads of the same part.  Secondarily, this extra historical data will be used in the code to clean up completed and aborted multipart uploads.


Pull request: https://github.com/ceph/ceph/pull/37260

Comment 1 RHEL Program Management 2021-09-29 10:06:17 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.