Bug 1884023

Summary:

list pending GCs is very slow

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

John Harrigan <jharriga>

Component:

RGW

Assignee:

Pritha Srivastava <prsrivas>

Status:

CLOSED ERRATA

QA Contact:

Rachana Patel <racpatel>

Severity:

high

Docs Contact:

Ranjini M N <rmandyam>

Priority:

unspecified

Version:

4.1

CC:

cbodley, ceph-eng-bugs, ceph-qe-bugs, kbader, mbenjamin, prsrivas, racpatel, rmandyam, sweil, tchandra, tserlin, twilkins, ukurundw, vimishra, vumrao

Target Milestone:

---

Keywords:

Performance

Target Release:

4.2

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

ceph-14.2.11-57.el8cp, ceph-14.2.11-57.el7cp

Doc Type:

Bug Fix

Doc Text:

.Listing of entries in the last GC object does not enter a loop Previously, the listing of entries in the last GC object entered a loop because the marker was reset every time for the last GC object. With this release, the truncated flag is updated which does not cause the marker to be reset and the listing works as expected.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-01-12 14:57:21 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1890121

Attachments:

Description	Flags
text log of pending GC response times	none

Description John Harrigan 2020-09-30 19:33:42 UTC

Description of problem:
`radosgw-admin gc list --include-all` can take over an hour to return on RHCS 4.1
On RHCS 4.0 command returns in 90sec, with same workload executing.

Version-Release number of selected component (if applicable):
RHCS 4.1

How reproducible:
yes

Steps to Reproduce:
1. Two Identical Clusters: each 8x OSD/RGW nodes (192 OSDs)
Site 1 = RHCS 4.0
Site 2 = RHCS 4.1 (14.2.8-91.el7cp)
Clusters pre-filled to 25% RAW USED
62MB mean objsz: h(1|1|50,64|64|15,8192|8192|15,65536|65536|15,1048576|1048576|5)KB
Manually set rgw_gc_obj_min_wait to 30min (2 hour lag by default)
Workload delWrite (50% delete / 50% write), 48hour runtime

2. Same polling script on both clusters, executes every three minutes:
   `radosgw-admin gc list --include-all` 

3. On RHCS 4.1 the command periodically requires 50min (or more) to complete.
   On RHCS 4.0 the command requires 90sec to return.

Actual results:
During the 48hr workload, 
Site1 (v4.0) saw 1526 samples compared to 109 in site2 (v4.1)
Site2 (v4.1) sees occasional huge spikes of #pendingGCs and slow response
One hour into workload, 17min delay
Two hours in (timestamp 17:27:44), 50min delay
Again at 18:27:41 - happens every hour, all on site2 (v4.1)
v4.0 shows much more steady progression

Expected results:
Consistent time to complete cmd producing same number of samples in 48hr runtime 
Steady increase, and decrease in pending GCs, rather than huge spikes which
coincide with very long command completion

Additional info:
Raw results gsheet  https://docs.google.com/spreadsheets/d/1spUzXxiQu3RCioo7FM9vyrOt-g58kzJ9By6s26gys84/edit#gid=126852855

Pending GCs comparision  (see attachment: w7pendingGCs.txt

Comment 1 RHEL Program Management 2020-09-30 19:33:48 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 38 Tim Wilkinson 2020-12-14 15:56:46 UTC

*** Bug 1898647 has been marked as a duplicate of this bug. ***

Comment 41 John Harrigan 2020-12-14 20:21:53 UTC

Created attachment 1739114 [details]
text log of pending GC response times

Comment 46 errata-xmlrpc 2021-01-12 14:57:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0081