Bug 1892644

Summary: [GSS] S3 client is reporting S3 error: 404 (NoSuchKey) for an object which exists in the cluster
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Gaurav Sitlani <gsitlani>
Component: RGWAssignee: J. Eric Ivancich <ivancich>
Status: CLOSED ERRATA QA Contact: Uday kurundwade <ukurundw>
Severity: urgent Docs Contact: Aron Gunn <agunn>
Priority: urgent    
Version: 4.1CC: agunn, assingh, bhubbard, bhull, cbodley, ceph-eng-bugs, cgaynor, ivancich, kbader, kdreyer, linuxkidd, lithomas, mamccoma, mbenjamin, mhackett, mkogan, mmuench, palshure, pdhange, prsrivas, roemerso, sbaldwin, sweil, tchandra, tserlin, vereddy
Target Milestone: ---Flags: gsitlani: needinfo-
Target Release: 4.1z3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-14.2.8-115.el8cp ceph-14.2.8-115.el7cp Doc Type: Bug Fix
Doc Text:
.Parts of some objects were erroneously added to garbage collection When reading objects using the Ceph Object Gateway, if parts of those objects took more than half of the value, as defined by the `rgw_gc_obj_min_wait` option, then their tail object was added to the garbage collection list. Those tail objects in the garbage collection list were deleted, resulting in data loss. With this release, the garbage collection feature meant to delay garbage collection for deleted objects was disabled. As a result, reading objects using the Ceph Object Gateway that are taking a long time are not added to the garbage collection list.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-02 15:22:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1816167    

Comment 46 J. Eric Ivancich 2020-11-21 20:25:08 UTC
One of the upstream users hit by this bug has provided some impressive corroboration that this is in fact the root cause. All the missing objects had read times in excess of 1 hour. The one with the smallest read time was 1 hour and 53 seconds, so just barely over the threshold set by the default value of rgw_gc_obj_min_wait.

See: https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-59

Comment 47 J. Eric Ivancich 2020-11-21 21:44:23 UTC
Another individual from the same upstream user provided additional corroboration that we have the root cause. In fact, I believe we can be *certain* at this point.

He modified parameters and forced a slow download with `curl`. He then saw tail objects appear on the GC queue.

See: https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-62

Comment 73 errata-xmlrpc 2020-12-02 15:22:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 4.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5325