Bug 2181424 - [RGW] bucket stats output has incorrect num_objects in rgw.none and rgw.main on multipart upload
Summary: [RGW] bucket stats output has incorrect num_objects in rgw.none and rgw.main ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 6.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 6.1z3
Assignee: Matt Benjamin (redhat)
QA Contact: Chaithra
Disha Walvekar
URL:
Whiteboard:
Depends On: 2021926
Blocks: 227624 2022585
TreeView+ depends on / blocked
 
Reported: 2023-03-24 04:52 UTC by Tejas
Modified: 2023-12-12 13:55 UTC (History)
15 users (show)

Fixed In Version: ceph-17.2.6-155.el9cp
Doc Type: Bug Fix
Doc Text:
Cause: A race condition in S3 CompleteMultipartUpload where multiple updates complete around the same time caused cleanup of the uploaded part objects to be skipped. Consequence: This caused bucket stats to show an unexpected value for numObjects. Fix: The race condition is resolved. Result: The stats inconsistency is resolved.
Clone Of: 2021926
Environment:
Last Closed: 2023-12-12 13:55:28 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-6302 0 None None None 2023-03-24 04:56:16 UTC
Red Hat Product Errata RHSA-2023:7740 0 None None None 2023-12-12 13:55:33 UTC

Description Tejas 2023-03-24 04:52:13 UTC
+++ This bug was initially created as a clone of Bug #2021926 +++

Description of problem:
bucket stats displays incorrect num_objects in rgw.main and rgw.none, while uploading a mutlipart object with whitespace in object name [encyclopedia/space & universe/.bkp/journal] 

Version-Release number of selected component (if applicable):
ceph version 14.2.22-16.el8cp

How reproducible:
2/2

Steps to Reproduce:
1. deploy ceph cluster on 4.3 with rgw daemons [4 rgw instances]

2. using s3cmd create bucket and objects
 s3cmd mb s3://kvm-mp-nbkt1;fallocate -l 25m obj25m;for i in {1..200};do s3cmd put obj25m s3://kvm-mp-nbkt/encyclopedia/space & universe/.bkp/journal$i;done;

3. observed following with bucket stats 

a) rgw.none with large num_objects appears in usage
snippet:
    "usage": {
        "rgw.none": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 18446744073709551612

3. b) rgw.main shows incorrect num_objects
snippet:
        "rgw.main": {
            "size": 3853516800,
            "size_actual": 3853516800,
            "size_utilized": 3853516800,
            "size_kb": 3763200,
            "size_kb_actual": 3763200,
            "size_kb_utilized": 3763200,
            "num_objects": 293
        },


4. s3cmd lists only one object
snippet:
[root@extensa001 ubuntu]# s3cmd ls s3://kvm-mp-nbkt1 --recursive
2021-11-10 12:15  26214400   s3://kvm-mp-nbkt1/encyclopedia/space


Actual results:
usage lists rgw.none with large num_objects & rgw.main lists incorrect num_objects

Expected results:
bucket stats reflects correct num_objects

Additional info:

setup details:
rgw nodes: 10.8.130.217;10.8.130.218
login credentials : root/r

--- Additional comment from Madhavi Kasturi on 2021-11-10 12:47:10 UTC ---

PFA, bucket stats, radoslist , s3cmd console output and rgw.logs at http://magna002.ceph.redhat.com/ceph-qe-logs/madhavi/bz2021926/

--- Additional comment from Matt Benjamin (redhat) on 2021-11-10 18:32:10 UTC ---

Hi Casey,

This is the multipart upload race you debugged, correct?

Matt

--- Additional comment from Casey Bodley on 2021-11-11 14:02:58 UTC ---

(In reply to Matt Benjamin (redhat) from comment #2)
> Hi Casey,
> 
> This is the multipart upload race you debugged, correct?
> 
> Matt

right. we need the fix on 5.x also

--- Additional comment from Vikhyat Umrao on 2021-11-17 19:59:02 UTC ---

(In reply to Casey Bodley from comment #3)
> (In reply to Matt Benjamin (redhat) from comment #2)
> > Hi Casey,
> > 
> > This is the multipart upload race you debugged, correct?
> > 
> > Matt
> 
> right. we need the fix on 5.x also

Moving this to 5.1.

--- Additional comment from Vikhyat Umrao on 2021-11-17 20:00:08 UTC ---

For RHCS 4.3 we already have customer bug - https://bugzilla.redhat.com/show_bug.cgi?id=2022585

--- Additional comment from Red Hat Bugzilla on 2021-12-09 06:35:50 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-12-09 06:36:37 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from  on 2022-01-06 06:07:20 UTC ---

We need the following commits downstream in ceph-5.1-rhel-patches:

- rgw/rados: index transactions pass remove_objs to cancel() too
- cls/rgw: index cancelation still cleans up remove_objs
- cls/rgw: add complete_remove_obj() helper for remove_objs
- cls/rgw: helpers take const input params
- rgw: fix rgw.none statistics
- cls/rgw: don't add canceled ops to bilog

They were in the RHCS 4.2 hotfix for Wisetech, and are already in ceph-4.3-rhel-patches.

Moving to POST because we have a fix, but it's not downstream for 5.1 yet.

Thomas

--- Additional comment from Jenkins Automation for Ceph (Ken Dreyer) on 2022-01-06 22:38:11 UTC ---

Ken Dreyer <kdreyer> committed to ceph-5.1-rhel-8 in RHEL dist-git:
http://pkgs.devel.redhat.com/cgit/rpms/ceph/commit/?id=18444facfc3297c0e689a0552fd25965d3254227

--- Additional comment from errata-xmlrpc on 2022-01-08 16:53:01 UTC ---

This bug has been added to advisory RHBA-2021:82038 by Thomas Serlin (tserlin)

--- Additional comment from errata-xmlrpc on 2022-01-08 16:53:02 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2021:82038-01
https://errata.devel.redhat.com/advisory/82038

--- Additional comment from Chaithra on 2022-01-11 07:11:41 UTC ---

Verified with ceph version 16.2.7-18.el8cp.

Bucket inconsistency issue not observed for newly created buckets.

http://magna002.ceph.redhat.com/ceph-qe-logs/BZ_2021926

Moving BZ to Verified

--- Additional comment from errata-xmlrpc on 2022-04-04 08:03:31 UTC ---

Bug report changed to RELEASE_PENDING status by Errata System.
Advisory RHSA-2021:82038-06 has been changed to PUSH_READY status.
https://errata.devel.redhat.com/advisory/82038

--- Additional comment from errata-xmlrpc on 2022-04-04 10:22:28 UTC ---

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174

Comment 6 Scott Ostapovicz 2023-11-09 07:33:13 UTC
Moving this to MODIFIED in accordance with Matt's comment above.  This needs to get tested ASAP (there is a push to get z3 out as smoothly as possible).

Comment 16 errata-xmlrpc 2023-12-12 13:55:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740


Note You need to log in before you can comment on or make changes to this bug.