Bug 2067056
Summary: | [RDR] [tracker for Ceph BZ #2068531] ceph status is in warn state with msg snap trim queue for 22 pg(s) >= 32768 (mon_osd_snap_trim_queue_warn_on) | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Pratik Surve <prsurve> | |
Component: | ceph | Assignee: | Ronen Friedman <rfriedma> | |
ceph sub component: | RADOS | QA Contact: | Elad <ebenahar> | |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | amagrawa, bniver, edonnell, ekuric, etamir, idryomov, jdurgin, jespy, kramdoss, kseeger, madam, mmuench, muagarwa, nojha, ocs-bugs, odf-bz-bot, owasserm, pcuzner, pdhiran, pnataraj, rcyriac, rfriedma, sostapov, srangana | |
Version: | 4.10 | Keywords: | TestBlocker, Tracking | |
Target Milestone: | --- | |||
Target Release: | ODF 4.12.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | 4.11.0-50 | Doc Type: | Bug Fix | |
Doc Text: |
.Ceph OSD snap trimming is no longer blocked by a running scrub
Previously, OSD snap trimming, once blocked by a running scrub, was not restarted. As a result, no trimming was performed until an OSD reset. This release fixes the handling of restarting the trimming if blocked after the scrub and snap trimming works as expected.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2068531 2095674 (view as bug list) | Environment: | ||
Last Closed: | 2022-09-06 08:15:41 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2068531, 2078956, 2094357, 2095674 |
Description
Pratik Surve
2022-03-23 06:56:15 UTC
After investigating a similar setup from Paul Cuzner, it's clear this is due to https://tracker.ceph.com/issues/52026 Raising the priority since this can lead to OOM and out of space issues and is easily reproducible with snap mirroring. For testing purposes, you can avoid hitting the problem by disabling scrubbing in ceph. This would allow longevity testing to proceed. To disable scrub in ceph, use the toolbox pod to run 'ceph osd set noscrub' on all clusters. *** Bug 2021079 has been marked as a duplicate of this bug. *** *** Bug 2017429 has been marked as a duplicate of this bug. *** Moving DR BZs to 4.10.z/4.11 Aman, can you reproduce with higher log levels (debug_osd = 20, debug_ms = 1, log_to_file=true for all osds) - these kinds of bugs can't be investigated without more detailed logs. @Josh is it fine if we increase the log level after hitting this issue? @prsurve - it's highly unlikely to help if not On before the bug occurs. Have we reproduced this on a cluster with the higher log level set yet (see comment 25)? Is this a TP blocker, if not I will move it out of 4.11 (In reply to Mudit Agarwal from comment #32) > Is this a TP blocker, if not I will move it out of 4.11 This is not considered a TP blocker. Pls provide doc text The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |