Bug 1944009 - [GSS] "RGWReshardLock::lock failed to acquire lock on reshard.0000000002 ret=-16" messages are reported in rgw log
Summary: [GSS] "RGWReshardLock::lock failed to acquire lock on reshard.0000000002 ret=...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 4.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 5.1
Assignee: J. Eric Ivancich
QA Contact: Vidushi Mishra
Ranjini M N
URL:
Whiteboard:
Depends On:
Blocks: 2031073
TreeView+ depends on / blocked
 
Reported: 2021-03-29 03:23 UTC by hhuan
Modified: 2024-06-14 01:03 UTC (History)
10 users (show)

Fixed In Version: ceph-16.2.6-1.el8cp
Doc Type: Enhancement
Doc Text:
.Lock contention messages from the Ceph Object Gateway reshard queue are marked as informational Previously, when the Ceph Object Gateway failed to get a lock on a reshard queue, the output log entry would appear to be an error causing concern to customers. With this release, the entries in the output log appear as informational and are tagged as “INFO:”.
Clone Of:
Environment:
Last Closed: 2022-04-04 10:19:55 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 40862 0 None closed rgw: during reshard lock contention, adjust logging 2021-04-30 22:07:14 UTC
Red Hat Product Errata RHSA-2022:1174 0 None None None 2022-04-04 10:20:13 UTC

Comment 3 J. Eric Ivancich 2021-04-14 17:07:13 UTC
So to more directly address customer, locks are a way for processes running in parallel to coordinate their access to shared objects/data. We would not want each of the RGW processes to simultaneously process the same reshard log, so the first one to try acquires the lock, the second one is locked out for the duration, and finally the first one releases the lock.

The customer clearly diagnosed this when they write: "Enable rgw debug log on the first rgw node in test env, find that the error msg is logged when another RGW daemon already acquired lock for reshard.000000000x:"

So the links to an analogous situation with LC (lifecycle) logs are relevant in that although based on a different subsystem of RGW, it's ultimately the same underlying issue.

I think the best course is to mark these messages INFOs rather than WARNINGs or ERRORs, so they don't raise unnecessary concern. If that's the case, remaining at log level 0 would not be an issue.

I'll put together a fix and target it for 5.1.

Eric

Comment 5 J. Eric Ivancich 2021-04-14 20:20:55 UTC
The upstream PR to address this can be found at https://github.com/ceph/ceph/pull/40862 .

Comment 6 J. Eric Ivancich 2021-04-14 20:46:27 UTC
The commit used from the pr linked to in comment #4 is 6d3dee37791ad427a3435c493a1d7874ba075674 .

Comment 21 errata-xmlrpc 2022-04-04 10:19:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174


Note You need to log in before you can comment on or make changes to this bug.