Description of problem: Running a evenly balanced delete-write (50% / 50%) workload fills cluster in 11 hours. RGW garbage collection fails to keep pace. Note that with the previous version, RHCS 3, the cluster would fill in about 3 hours - so there is definite improvement with this release. Version-Release number of selected component (if applicable): RHCEPH-3.1-RHEL-7-20180530.ci.0 Steps to Reproduce: 1. Fill cluster to 30% 2. Start evenly balanced delete-write workload 2. Run it for an extended period, monitoring cluster capacity and pending GC's 3. The cluster %RAW USED keeps rising and the pending GC's keep increasing 4. Eventually the cluster fills and reaches HEALTH_ERR state I have automation at https://github.com/jharriga/GCrate to assist Actual results: Cluster fills and reaches HEALTH_ERR state Expected results: When the workload requires it, garbage collection can be made aggressive enough to keep pace with workload Additional info: Product documentation (Ceph Object Gateway for Production) should guide users on monitoring and tuning garbage collection.
Did we try experiment where we do a mixture of reads, writes and deletes? Is pure write-delete workload normal? So I would suggest we try it with something like half-read and half-write and see if it keeps up in that case. If it does, then perhaps this is acceptable for now, and we can document tuning for the case where the workload is pure write-delete. But my original suggestion was to speed up garbage-collection activity as the system fills up. There is no harm in aggressively doing garbage collection if the system is about to run out of storage anyway. I think librados lets you ask how full storage is, perhaps with rados_cluster_stat function ? Could that be considered for a future release? See http://docs.ceph.com/docs/luminous/rados/api/librados/
I have also done a number of runs using a workload I refer to as 'hybrid". It has this operation mix: 60% read; 16% write; 14% delete and 10% list. I have been able to run this for extended periods (24 hours) and the RGW garbage collection in RHCS 3.1 does keep pace. Unfortunately while running hybrid workload for extended periods I have observed a significant drop off in client performance once GC activity kicks in. See https://bugzilla.redhat.com/show_bug.cgi?id=1596401
Created attachment 1459798 [details] 10hr deleteWrite XML COSbench spec file
I added the deleteWrite COSbench XML file. The runtime=36000 means that the workload will run for ten hours. Obviously that can be changed. Be aware, on the Scale Lab cfg I have with 312 OSDs and 486TB of storage the cluster gets full in 11 hours. Ceph doesn't look kindly on cluster full situations and in my experience that condition necessitates a cluster purge and redeploy.
Discussed with Matt and Scott, re-targeting to 3.2
Release Notes Doc Text change "may" to "can" per IBM Style Guide