Bug 1856100
Summary: | [RGW] Lifecycle polices stopped processing after upgrade | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Steve Baldwin <sbaldwin> |
Component: | RGW | Assignee: | Matt Benjamin (redhat) <mbenjamin> |
Status: | CLOSED ERRATA | QA Contact: | Vidushi Mishra <vimishra> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.3 | CC: | cbodley, ceph-eng-bugs, ceph-qe-bugs, gsitlani, kbader, lithomas, mamccoma, mbenjamin, mhackett, mmuench, sweil, tchandra, tserlin, ukurundw, vimishra |
Target Milestone: | z6 | ||
Target Release: | 3.3 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHEL: ceph-12.2.12-124.el7cp Ubuntu: ceph_12.2.12-111redhat1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-08-18 18:05:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Steve Baldwin
2020-07-12 17:59:56 UTC
The errors below are reported in the rgw logs when lc attempts to process - 2020-07-08 00:00:00.634619 7f3d1c1dc700 0 RGWLC::process() failed to acquire lock on, sleep 5, try againlc.14 2020-07-08 00:00:00.792339 7f3d1e1e0700 0 RGWLC::process() failed to acquire lock on, sleep 5, try againlc.15 2020-07-08 00:00:00.805290 7f3d1a1d8700 0 RGWLC::process() failed to acquire lock on, sleep 5, try againlc.30 2020-07-08 00:00:05.768541 7f3d1c1dc700 0 RGWLC::process() failed to update lc object lc.19-5 2020-07-08 00:00:05.900899 7f3d1a1d8700 0 RGWLC::process() failed to update lc object lc.19-5 2020-07-08 00:00:06.041900 7f3d1e1e0700 0 RGWLC::process() failed to update lc object lc.19-5 When the user attempts to list the lc policies it results in the following error. $ radosgw-admin lc list ERROR: failed to list objs: (5) Input/output error I had the customer run the lc list with debugging on and will attach the output to the BZ. I have a bucket list and stats as well as the rgw logs however the lc logging at the default level is sparse and I could only find the errors mentioned above relating to the lifecycle. Let me know if you would need debug logs when lc attempts to process and will have the customer proceed with that data capture.. Thanks, - Steve The customer provided some additional data from their staging cluster that is running 3.3z5 which has the same errors plus these bucket_lc_prepare error messages. ======== Case 02698166 : Comment #25 ================ FYI, we get the same error message on our staging cluster when running LC commands. However, the staging cluster has these errors logged: 2020-07-16 00:40:09.956696 7f621caa9700 0 RGWLC::bucket_lc_prepare() failed to set entry lc.17 2020-07-16 00:40:09.965018 7f621aaa5700 0 RGWLC::bucket_lc_prepare() failed to set entry lc.1 2020-07-16 00:40:09.965737 7f621eaad700 0 RGWLC::bucket_lc_prepare() failed to set entry lc.23 2020-07-16 00:40:09.973526 7f621caa9700 0 RGWLC::bucket_lc_prepare() failed to set entry lc.17 2020-07-16 00:40:09.981725 7f621aaa5700 0 RGWLC::bucket_lc_prepare() failed to set entry lc.1 The RGW log pool there has high IOPS (~2000) but is otherwise idle. I'm guessing it's writing lots of error logs or marker info? ======================================================= Hello Casey, The errors included in my last update c#6 which are from the customers staging cluster. I inquired with customer on ceph versions to verify all components were at version 12.2.12-115 in the staging cluster. I had verified production versions but not the staging cluster. The customer did in fact have 1 node running with the older code and has upgraded that node in staging and the "bucket_lc_prepare() failed to set entry" errors are not reporting any longer. See update from customer below: ======== Case 02698166 : Comment #29 ================ There was 1 server that was running old code due to an ansible SNAFU. After upgrading, the "failed to set entry" message has gone away. The error -5 is still thrown when running radosgw-admin lc list. ====================================================== Thanks, - Steve Reproduced and fix pushed for 3.3z6, as discussed in RHCS-LT call. thanks! Matt Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3504 |