Bug 1892644 - [GSS] S3 client is reporting S3 error: 404 (NoSuchKey) for an object which exists in the cluster
Summary: [GSS] S3 client is reporting S3 error: 404 (NoSuchKey) for an object which ex...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 4.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.1z3
Assignee: J. Eric Ivancich
QA Contact: Uday kurundwade
Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks: 1816167
TreeView+ depends on / blocked
 
Reported: 2020-10-29 11:39 UTC by Gaurav Sitlani
Modified: 2024-06-13 23:18 UTC (History)
26 users (show)

Fixed In Version: ceph-14.2.8-115.el8cp ceph-14.2.8-115.el7cp
Doc Type: Bug Fix
Doc Text:
.Parts of some objects were erroneously added to garbage collection When reading objects using the Ceph Object Gateway, if parts of those objects took more than half of the value, as defined by the `rgw_gc_obj_min_wait` option, then their tail object was added to the garbage collection list. Those tail objects in the garbage collection list were deleted, resulting in data loss. With this release, the garbage collection feature meant to delay garbage collection for deleted objects was disabled. As a result, reading objects using the Ceph Object Gateway that are taking a long time are not added to the garbage collection list.
Clone Of:
Environment:
Last Closed: 2020-12-02 15:22:34 UTC
Embargoed:
gsitlani: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1908910 0 unspecified CLOSED [RFE] - Tool to identify objects impacted by BZ 1892644 2024-10-01 17:13:46 UTC
Red Hat Issue Tracker RHCEPH-8255 0 None None None 2024-02-04 15:21:21 UTC
Red Hat Knowledge Base (Solution) 5592891 0 None None None 2020-11-23 14:42:03 UTC
Red Hat Product Errata RHSA-2020:5325 0 None None None 2020-12-02 15:22:53 UTC

Internal Links: 1908910

Comment 46 J. Eric Ivancich 2020-11-21 20:25:08 UTC
One of the upstream users hit by this bug has provided some impressive corroboration that this is in fact the root cause. All the missing objects had read times in excess of 1 hour. The one with the smallest read time was 1 hour and 53 seconds, so just barely over the threshold set by the default value of rgw_gc_obj_min_wait.

See: https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-59

Comment 47 J. Eric Ivancich 2020-11-21 21:44:23 UTC
Another individual from the same upstream user provided additional corroboration that we have the root cause. In fact, I believe we can be *certain* at this point.

He modified parameters and forced a slow download with `curl`. He then saw tail objects appear on the GC queue.

See: https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-62

Comment 73 errata-xmlrpc 2020-12-02 15:22:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 4.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5325


Note You need to log in before you can comment on or make changes to this bug.