Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2401203

Summary: [GSS][LC]: LC is paused for two or more days after restarting the RGW daemons
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Rodrigo Capa <rocapa>
Component: RGWAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED UPSTREAM QA Contact: Chaithra <ckulal>
Severity: high Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 8.1CC: bkunal, ceph-eng-bugs, cephqe-warriors, ckulal, kjosy, mbenjamin, rpollack, rsachere
Target Milestone: ---   
Target Release: 9.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-20.1.0-50 Doc Type: Known Issue
Doc Text:
.Lifecycle processing stuck in `PROCESSING` state for a given bucket If a Ceph Object Gateway server is unexpectedly restarted when the lifecycle processing is in progress for a given bucket, that bucket does not resume processing lifecycle work for at least two scheduling cycles and is stuck in `PROCESSING` state. This is an expected behavior as it is intended to avoid multiple Ceph Object gateway instances or threads from processing the same bucket simultaneously, especially when the debugging is in progress in production. Currently there is no workaround.
Story Points: ---
Clone Of:
: 2402744 (view as bug list) Environment:
Last Closed: 2026-03-04 09:55:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2388233, 2402744    

Description Rodrigo Capa 2025-10-03 08:24:36 UTC
Description of problem:

Whenever the RGW daemons are restarted, the current LC round pauses for two or more days. This behavior is known and documented (BZ#2072681).

As consequence a large number of objects are queued on hold for deletion, where, after a couple of days on a busy clusters, deletion can't catch up in the 24 hour period.

To solve this condition, the running application needs to be paused in order to allow time for the LC to process all pending removals in an inferior period of 24 hours.

The RGW restart is mandatory for the application of many RGW parameters.

Manually forcing to process the bucket LC policy with (radosgw-admin lc process --bucket=<bucket-name>), doesn't resume the object removal. Object count doesn't decrease.

Modifying rgw_lifecycle_work_time requires a RGW restart, therefore it is not possible to provoke LC runs which will count for the LC unlock. 



Version-Release number of selected component (if applicable):

RHCS 8.1, single site.

How reproducible:

Have an active RHCS cluster configured with LC policies and a large number of objects in a bucket.
Set the LC to be running.
Restart the RGW daemons.
radosgw-admin lc list, presents status PROCESSING.
Observe that the number of objects stop decreasing for two days at least.



Actual results:

radosgw-admin lc list, presents status PROCESSING.

Bucket stats don't show object count decrement for two or more days.

When the LC automatically resumes after two or more days, the object count is so large that it takes too much to catch up, even overcoming a day, therefore saturating the cluster.



Expected results:

Been able to force resume the LC process.

Any workaround mechanism to resume the LC will be welcome.



Additional info:

BZ#2072681 relates to this condition in RHCS 4, and documents this behavior as intentional.

This case is specially important in a very busy cluster with 10/100Ms objects written daily on a few buckets, where LC is 7 days.

Customer will be moving this multimillon multicluster account to other provider if performance doesn't fulfill their application.

rgw_lifecycle_work_time is configured from 1 AM to 10 PM, in any case, 24 hour won't be enough time to process the accumulated deletions.

There are 24 RGW daemons running 6 servers.

Cluster currently single site, but there are multisite clusters with similar condition.

Comment 11 Red Hat Bugzilla 2026-03-04 09:55:50 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.