Bug 1595833 - parallel GC helps but delete-write workload still fills cluster
Summary: parallel GC helps but delete-write workload still fills cluster
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 3.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 3.2
Assignee: Matt Benjamin (redhat)
QA Contact: Vidushi Mishra
URL:
Whiteboard:
Depends On:
Blocks: 1581350 1584264 1638102 1641792
TreeView+ depends on / blocked
 
Reported: 2018-06-27 15:04 UTC by John Harrigan
Modified: 2018-10-23 10:21 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.RGW garbage collection fails to keep pace during evenly balanced delete-write workloads In testing during an evenly balanced delete-write (50% / 50%) workload the cluster fills completely in eleven hours. Object Gateway garbage collection fails to keep pace. This causes the cluster to fill completely and the status switches to HEALTH_ERR state. Aggressive settings for the new parallel/async garbage collection tunables did significantly delay the onset of cluster fill in testing, and can be helpful for many workloads. Typical real world cluster workloads are not likely to cause a cluster fill due primarily to garbage collection.
Clone Of:
: 1638102 (view as bug list)
Environment:
Last Closed: 2018-10-10 18:42:31 UTC
Embargoed:
vakulkar: automate_bug?


Attachments (Terms of Use)
10hr deleteWrite XML COSbench spec file (869 bytes, application/xml)
2018-07-18 20:19 UTC, John Harrigan
no flags Details

Description John Harrigan 2018-06-27 15:04:22 UTC
Description of problem:
Running a evenly balanced delete-write (50% / 50%) workload fills cluster
in 11 hours. RGW garbage collection fails to keep pace. Note that with the
previous version, RHCS 3, the cluster would fill in about 3 hours - so there
is definite improvement with this release.

Version-Release number of selected component (if applicable):
RHCEPH-3.1-RHEL-7-20180530.ci.0

Steps to Reproduce:
1. Fill cluster to 30%
2. Start evenly balanced delete-write workload
2. Run it for an extended period, monitoring cluster capacity and pending GC's
3. The cluster %RAW USED keeps rising and the pending GC's keep increasing
4. Eventually the cluster fills and reaches HEALTH_ERR state 
I have automation at https://github.com/jharriga/GCrate to assist

Actual results:
Cluster fills and reaches HEALTH_ERR state

Expected results:
When the workload requires it, garbage collection can be made aggressive enough to keep pace with workload 

Additional info:
Product documentation (Ceph Object Gateway for Production) should guide users on monitoring and tuning garbage collection.

Comment 5 Ben England 2018-07-11 16:43:52 UTC
Did we try experiment where we do a mixture of reads, writes and deletes?  Is pure write-delete workload normal?  So I would suggest we try it with something like half-read and half-write and see if it keeps up in that case.  If it does, then perhaps this is acceptable for now, and we can document tuning for the case where the workload is pure write-delete.

But my original suggestion was to speed up garbage-collection activity as the system fills up.  There is no harm in aggressively doing garbage collection if the system is about to run out of storage anyway.  I think librados lets you ask how full storage is, perhaps with rados_cluster_stat function ?  Could that be considered for a future release?  See

http://docs.ceph.com/docs/luminous/rados/api/librados/

Comment 6 John Harrigan 2018-07-11 17:08:52 UTC
I have also done a number of runs using a workload I refer to as 'hybrid". It has this operation mix: 60% read; 16% write; 14% delete and 10% list. I have been
able to run this for extended periods (24 hours) and the RGW garbage collection
in RHCS 3.1 does keep pace.
Unfortunately while running hybrid workload for extended periods I have observed
a significant drop off in client performance once GC activity kicks in. 
See https://bugzilla.redhat.com/show_bug.cgi?id=1596401

Comment 7 John Harrigan 2018-07-18 20:19:29 UTC
Created attachment 1459798 [details]
10hr deleteWrite XML COSbench spec file

Comment 8 John Harrigan 2018-07-18 20:24:01 UTC
I added the deleteWrite COSbench XML file.
The runtime=36000 means that the workload will run for ten hours. Obviously
that can be changed. Be aware, on the Scale Lab cfg I have with 312 OSDs
and 486TB of storage the cluster gets full in 11 hours.
Ceph doesn't look kindly on cluster full situations and in my experience 
that condition necessitates a cluster purge and redeploy.

Comment 9 Ken Dreyer (Red Hat) 2018-07-24 20:41:04 UTC
Discussed with Matt and Scott, re-targeting to 3.2

Comment 17 John Brier 2018-10-02 20:18:57 UTC
Release Notes Doc Text change "may" to "can" per IBM Style Guide


Note You need to log in before you can comment on or make changes to this bug.