Bug 2181424

Summary: [RGW] bucket stats output has incorrect num_objects in rgw.none and rgw.main on multipart upload
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tejas <tchandra>
Component: RGWAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED ERRATA QA Contact: Chaithra <ckulal>
Severity: urgent Docs Contact: Disha Walvekar <dwalveka>
Priority: unspecified    
Version: 6.0CC: cbodley, ceph-eng-bugs, cephqe-warriors, ckulal, dwalveka, gsitlani, ivancich, kbader, mbenjamin, mkasturi, pdhange, sostapov, tserlin, vereddy, vumrao
Target Milestone: ---Keywords: Regression
Target Release: 6.1z3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-17.2.6-155.el9cp Doc Type: Bug Fix
Doc Text:
Cause: A race condition in S3 CompleteMultipartUpload where multiple updates complete around the same time caused cleanup of the uploaded part objects to be skipped. Consequence: This caused bucket stats to show an unexpected value for numObjects. Fix: The race condition is resolved. Result: The stats inconsistency is resolved.
Story Points: ---
Clone Of: 2021926 Environment:
Last Closed: 2023-12-12 13:55:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2021926    
Bug Blocks: 227624, 2022585    

Description Tejas 2023-03-24 04:52:13 UTC
+++ This bug was initially created as a clone of Bug #2021926 +++

Description of problem:
bucket stats displays incorrect num_objects in rgw.main and rgw.none, while uploading a mutlipart object with whitespace in object name [encyclopedia/space & universe/.bkp/journal] 

Version-Release number of selected component (if applicable):
ceph version 14.2.22-16.el8cp

How reproducible:
2/2

Steps to Reproduce:
1. deploy ceph cluster on 4.3 with rgw daemons [4 rgw instances]

2. using s3cmd create bucket and objects
 s3cmd mb s3://kvm-mp-nbkt1;fallocate -l 25m obj25m;for i in {1..200};do s3cmd put obj25m s3://kvm-mp-nbkt/encyclopedia/space & universe/.bkp/journal$i;done;

3. observed following with bucket stats 

a) rgw.none with large num_objects appears in usage
snippet:
    "usage": {
        "rgw.none": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 18446744073709551612

3. b) rgw.main shows incorrect num_objects
snippet:
        "rgw.main": {
            "size": 3853516800,
            "size_actual": 3853516800,
            "size_utilized": 3853516800,
            "size_kb": 3763200,
            "size_kb_actual": 3763200,
            "size_kb_utilized": 3763200,
            "num_objects": 293
        },


4. s3cmd lists only one object
snippet:
[root@extensa001 ubuntu]# s3cmd ls s3://kvm-mp-nbkt1 --recursive
2021-11-10 12:15  26214400   s3://kvm-mp-nbkt1/encyclopedia/space


Actual results:
usage lists rgw.none with large num_objects & rgw.main lists incorrect num_objects

Expected results:
bucket stats reflects correct num_objects

Additional info:

setup details:
rgw nodes: 10.8.130.217;10.8.130.218
login credentials : root/r

--- Additional comment from Madhavi Kasturi on 2021-11-10 12:47:10 UTC ---

PFA, bucket stats, radoslist , s3cmd console output and rgw.logs at http://magna002.ceph.redhat.com/ceph-qe-logs/madhavi/bz2021926/

--- Additional comment from Matt Benjamin (redhat) on 2021-11-10 18:32:10 UTC ---

Hi Casey,

This is the multipart upload race you debugged, correct?

Matt

--- Additional comment from Casey Bodley on 2021-11-11 14:02:58 UTC ---

(In reply to Matt Benjamin (redhat) from comment #2)
> Hi Casey,
> 
> This is the multipart upload race you debugged, correct?
> 
> Matt

right. we need the fix on 5.x also

--- Additional comment from Vikhyat Umrao on 2021-11-17 19:59:02 UTC ---

(In reply to Casey Bodley from comment #3)
> (In reply to Matt Benjamin (redhat) from comment #2)
> > Hi Casey,
> > 
> > This is the multipart upload race you debugged, correct?
> > 
> > Matt
> 
> right. we need the fix on 5.x also

Moving this to 5.1.

--- Additional comment from Vikhyat Umrao on 2021-11-17 20:00:08 UTC ---

For RHCS 4.3 we already have customer bug - https://bugzilla.redhat.com/show_bug.cgi?id=2022585

--- Additional comment from Red Hat Bugzilla on 2021-12-09 06:35:50 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-12-09 06:36:37 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from  on 2022-01-06 06:07:20 UTC ---

We need the following commits downstream in ceph-5.1-rhel-patches:

- rgw/rados: index transactions pass remove_objs to cancel() too
- cls/rgw: index cancelation still cleans up remove_objs
- cls/rgw: add complete_remove_obj() helper for remove_objs
- cls/rgw: helpers take const input params
- rgw: fix rgw.none statistics
- cls/rgw: don't add canceled ops to bilog

They were in the RHCS 4.2 hotfix for Wisetech, and are already in ceph-4.3-rhel-patches.

Moving to POST because we have a fix, but it's not downstream for 5.1 yet.

Thomas

--- Additional comment from Jenkins Automation for Ceph (Ken Dreyer) on 2022-01-06 22:38:11 UTC ---

Ken Dreyer <kdreyer> committed to ceph-5.1-rhel-8 in RHEL dist-git:
http://pkgs.devel.redhat.com/cgit/rpms/ceph/commit/?id=18444facfc3297c0e689a0552fd25965d3254227

--- Additional comment from errata-xmlrpc on 2022-01-08 16:53:01 UTC ---

This bug has been added to advisory RHBA-2021:82038 by Thomas Serlin (tserlin)

--- Additional comment from errata-xmlrpc on 2022-01-08 16:53:02 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2021:82038-01
https://errata.devel.redhat.com/advisory/82038

--- Additional comment from Chaithra on 2022-01-11 07:11:41 UTC ---

Verified with ceph version 16.2.7-18.el8cp.

Bucket inconsistency issue not observed for newly created buckets.

http://magna002.ceph.redhat.com/ceph-qe-logs/BZ_2021926

Moving BZ to Verified

--- Additional comment from errata-xmlrpc on 2022-04-04 08:03:31 UTC ---

Bug report changed to RELEASE_PENDING status by Errata System.
Advisory RHSA-2021:82038-06 has been changed to PUSH_READY status.
https://errata.devel.redhat.com/advisory/82038

--- Additional comment from errata-xmlrpc on 2022-04-04 10:22:28 UTC ---

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174

Comment 6 Scott Ostapovicz 2023-11-09 07:33:13 UTC
Moving this to MODIFIED in accordance with Matt's comment above.  This needs to get tested ASAP (there is a push to get z3 out as smoothly as possible).

Comment 16 errata-xmlrpc 2023-12-12 13:55:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740