1915078 – rgw: omnibus 3.3 bucket listing correctness and perf issues

Bug 1915078 - rgw: omnibus 3.3 bucket listing correctness and perf issues

Summary: rgw: omnibus 3.3 bucket listing correctness and perf issues

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.3z7
Assignee:	J. Eric Ivancich
QA Contact:	Tejas
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-11 21:53 UTC by Matt Benjamin (redhat)
Modified:	2023-09-15 00:58 UTC (History)
CC List:	10 users (show)
Fixed In Version:	RHEL: ceph-12.2.12-132.el7cp Ubuntu: ceph_12.2.12-113redhat1xenial
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-06 18:32:06 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:1518	0	None	None	None	2021-05-06 18:32:15 UTC

Description Matt Benjamin (redhat) 2021-01-11 21:53:59 UTC

Description of problem:

The RGW team believes there are at least 10 commits in rhcs-4.x that address serious bucket listing correctness as well as performance and which are eligible for backport for rhcs-3.x.

We believe that backports of these commits should be included in the upcoming rhcs-3.3z7 release, to reduce cost of support.

list of commits:

commit 67c36d4a7ff2995a3ad235ea1774274f6397ebf7
Author: J. Eric Ivancich <ivancich>
Date:   Fri Dec 4 18:16:08 2020 -0500

    rgw: in ordered bucket listing skip namespaced entries when possible

    When listing non-namespaced entries in the bucket index, the code
    would march through the namespaced entries in blocks, requesting all
    of them from the CLS layer. When there were many namespaced entries,
    it would significantly affect the performance of ordered listing.

    This commit adds code to advance the marker passed to lower layers to
    skip past namespaced entries. This is challenging in that
    non-namespaced entries can appear in the middle of the namespaced
    entries. We'll ignore the issue instance tags in names to simplify the
    following discussion. Non-namespaced entries are indexed by
    "name". Namespaced entries are indexed by _namespace_name, using
    underscores to surround the namespace. The challenge comes with
    entries such as "_name", where the name begins with an underscore. In
    that case we index them by "__name", quoting the underscore with
    another.

    Now the extra challenge comes due to the lexic ordering of the
    following:

        ASP
        _BAT_cat
        __DOG
        _eel_FOX
        goat

    Note that the namespaced entries are in positions 2 and 4, and the
    non-namespaced entries are in positions 1, 3, and 5. So when skipping
    past the namespaced entries, we have to be careful not to skip past
    the non-namespaced entries that begin with underscore.

    Additional code clean-ups done as well.

    Signed-off-by: J. Eric Ivancich <ivancich>

    Resolves: rhbz#1883283

commit e70d08c483c15067c5cf2d7c7f8d4fe1b6192bf3
Author: J. Eric Ivancich <ivancich>
Date:   Sat Nov 21 11:10:35 2020 -0500

    rgw: during GC defer, prevent new GC enqueue

    With the new queue-based GC code, when a GC defer operation is
    performed, it adds an "urgent" record to prevent GC from occuring,
    whether there's a GC entry or not (it's not checked).

    But either way the code *also* adds a new GC entry to the queue to
    cause GC to occur at a later time. This would be incorrect if there is
    no GC entry to begin with. This will cause GC to delete tail objects
    when there has been no user-initiated delete. In other words a READ
    operation can result in a permenent DELETE of portions of large
    objects.

    This is a temporary fix for this bug. It marks the code in error and
    prevents GC defer operations from taking place at all as a temporary
    measure.

    Signed-off-by: J. Eric Ivancich <ivancich>

    Resolves: rhbz#1892644

commit 340360faf951c92cc3a3ca5ca57842ac6b4e72ec
Author: J. Eric Ivancich <ivancich>
Date:   Wed Oct 21 10:32:30 2020 -0400

    tools/rados: flush formatter periodically during json output of `rados ls`

    While `rados ls` is emitting object info through a json formatter,
    flush the formatter after there are at least 4096 bytes are buffered
    for output.

    Signed-off-by: J. Eric Ivancich <ivancich>
    (cherry picked from commit 1548ef7a97559f17023f17842dab51d47cef89df)

    Resolves: rhbz#1883590

commit 7011bf3f64b183243b788323eb76cdd114a3ed07
Author: J. Eric Ivancich <ivancich>
Date:   Fri Oct 9 16:06:55 2020 -0400

    rgw: rgw-orphan-list should use "plain" formatted `rados ls` output

    The previous version that used "json-pretty" output for `rados ls`
    added complications due to json's escaping of special characters. So
    this version returns to the "plain" output for `rados ls` but deals
    with entries (oids) that might have namespaces and/or locators as
    well.

    Signed-off-by: J. Eric Ivancich <ivancich>
    (cherry picked from commit 5b994f90594208dab81045732099a03686819b30)

    Resolves: rhbz#1883590


commit 501239bb32c4ccf867610625c5fa049b98a43a61
Author: J. Eric Ivancich <ivancich>
Date:   Tue Jun 9 23:12:22 2020 -0400

    rgw: use exponential back-off for retries after bucketinfo update race

    Use a simple exponential back-off mechanism during write races of
    bucketinfo updates to note bucket index reshard status.

    Signed-off-by: J. Eric Ivancich <ivancich>

    Resolves: rhbz#1846504

commit 7b46439c2a0a5f67c11c7edbbd0252f944f2c045
Author: J. Eric Ivancich <ivancich>
Date:   Tue Sep 15 14:20:04 2020 -0400

    rgw: advance pseudo-folders properly in delimited ordered listing

    The code mistakenly uses the current marker to figure out how to skip
    past a pseudo-directory. This could allow for some entries in a bucket
    to be skipped. The code should have used the current pseudo-directory
    to determine what to skip past.

    Signed-off-by: J. Eric Ivancich <ivancich>
    (cherry picked from commit c3f346e3157ef56254b199ea46e26016166451e4)

    Resolves: rhbz#1874645

commit 6d1105f33479a6f7a2ff110a8540e75c027d2bdf
Author: J. Eric Ivancich <ivancich>
Date:   Mon Sep 14 19:33:51 2020 -0400

    Revert "rgw: fix list bucket with delimiter wrongly skip some special keys"

    This reverts commit 04b15cef88c5d50ce18911f63c63fa094101ced0.

    While this did fix https://tracker.ceph.com/issues/40905, it did so in
    an unnecessarily complex manner. So we're reverting it to more easily
    apply a cleaner solution.

    Signed-off-by: J. Eric Ivancich <ivancich>
    (cherry picked from commit 130a74a60802d8b0db15dc0d5c9fb6164d78d72d)

    Resolves: rhbz#1874645

commit ae82ee1df4e86c0c56570230aaf58cb1f0a5a33a
Author: J. Eric Ivancich <ivancich>
Date:   Tue Oct 6 15:21:02 2020 -0400

    rgw: allow rgw-orphan-list to note when rados objects are in namespace

    Currently namespaces and locators are ignored when `rados ls` is run
    by rgw-orphan-list to record RADOS's known objects.

    However there have been cases where RADOS objects have a locator, and
    when one is included in the listing, the script does not handle it
    correctly. Now when objects have locators, we will prevent their
    output from entering the .intermediate file.

    Additionally we do not expect RGW data objects to be in RADOS
    namespaces, so when a namespaced object is detected, we'll error out
    with a message.

    Signed-off-by: J. Eric Ivancich <ivancich>
    (cherry picked from commit ddf52016fa03ba192f242ad641a5c8e5a95035a1)

    Resolves: rhbz#1883590

commit eb8e59276bec45831fbe123f5c51a9e54270c745
Author: J. Eric Ivancich <ivancich>
Date:   Tue Oct 6 12:42:22 2020 -0400

    rgw: fix setting of namespace in ordered and unordered bucket listing

    The namespace is not always set correctly during bucket listing. This
    can, for example, cause the listing of incomplete multipart uploads,
    which are in the _multipart_ namespace, to not paginate correctly, and
    cause entries to be re-listed.

    Signed-off-by: J. Eric Ivancich <ivancich>
    (cherry picked from commit bd6f163f366753e8ec42b85a53334f4bf78916bd)

    Resolves: rhbz#1883283

commit 1e15dde27c8078f93a98003a3c592255a902af92
Author: J. Eric Ivancich <ivancich>
Date:   Thu Oct 1 13:33:01 2020 -0400

    rgw: radosgw-admin should paginate internally when listing bucket

    Currently `radosgw-admin bucket list ...`, when listing a bucket, asks
    for the value of "--max-entries" internally. To list a large bucket
    entirely the user would have to set "--max-entries" to a large value
    (e.g., 10000000). Internally this doesn't paginate, so it will try to
    produce the entire list at once. This can consume a lot of memory, and
    there are known cases where this induces an out-of-memory crash.

    So now we'll set a maximum pagination size of 10,000. So even with
    large values of "--max-entries" it will still be able to produce the
    full listing without stressing memory, because it will ask for at most
    10,000 entries at a time.

    Signed-off-by: J. Eric Ivancich <ivancich>
    (cherry picked from commit 6d033061bf9eaebf3dab37b9ed45de22ce6fa6b7)

    Resolves: rhbz#1883283

Comment 2 J. Eric Ivancich 2021-01-13 13:36:51 UTC

6 bucket listing related commits added to ceph-3.3-rhel-patches.

In addition to the above listed, I added one more (here's the commit):

Author: J. Eric Ivancich <ivancich>
Date:   Fri Jul 19 16:10:59 2019 -0400

    rgw: mitigate bucket list with max-entries excessively high
    
    When listing a bucket with radosgw-admin, the user can specify the
    maximum number of entries. That number can be unreasonably large, and
    can affect the performance and memory availability. For example:
    
        radosgw-admin bucket list --bucket mybucket1 --max-entries=10000000
    
    This has the potential for creating large data structures at multiple
    levels in the the call stack of the radosgw(-admin) process,
    potentially causing the process to run out of memory. This change
    limits the maximum number of entries requested in all but the high
    level code to help mitigate this issue.
    
    Signed-off-by: J. Eric Ivancich <ivancich>
    (cherry picked from commit 300429c9e98a27e17c2a20ade82c6c63ac276c20)
    
    Conflicts: variable converted from static constexpr to static const in
               light of varying compiler versions.
    
    Resolves: rhbz#1915078

Comment 3 J. Eric Ivancich 2021-01-13 15:44:18 UTC

Note: one commit listed above is not included:

    rgw: during GC defer, prevent new GC enqueue

The bug that it was intended to address did not appear in 3.3z6, so it's unnnecessary.

Comment 12 errata-xmlrpc 2021-05-06 18:32:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 3.3 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1518

Comment 13 Red Hat Bugzilla 2023-09-15 00:58:10 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.