Bug 2021079
| Summary: | ceph HEALTH_WARN snap trim queue for 10 pg(s) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Elvir Kuric <ekuric> |
| Component: | ceph | Assignee: | Josh Durgin <jdurgin> |
| Status: | CLOSED DUPLICATE | QA Contact: | Elad <ebenahar> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.9 | CC: | bhubbard, bniver, dupadhya, jespy, madam, mmuench, muagarwa, ocs-bugs, odf-bz-bot, vumrao |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-02-15 17:52:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 5
Scott Ostapovicz
2021-11-09 15:03:01 UTC
(In reply to Scott Ostapovicz from comment #5) > So this is a cleanup issue that eventually fixes itself as the PG count > eventually reaches zero, and that does not degrade the system? In any case > have you tracked how long it takes for this state to clear up? Once the cluster ends in this state and if I actively use it ( create Pods/PVC , do writes/reads ) it will take very long to move to healthy state ( hours / days ). Leaving cluster in "idle" state without issuing read / write against it will eventually move it to HEALTHY state. Other than this, cluster is operational, possible to create images and issue r/w operations against it. I am doing lot of tests and using same tests scenario ( create pod(s), attach PVC(s) execute load against pods, once test is done , delete pods / PVCs ... and I see this problem only in case when rbd replication / mirroring is involved ) Not a 4.9 blocker, moving it out while we continue the discussion. This is a symptom of an overloaded cluster - not a bug. We need to test to determine what configuration / workload we can support on given hardware, as described here: https://docs.google.com/document/d/1lLSf2GzdBIt9EATcqMx9jYcX5ylgIu6rzPISVmOYtIA/edit?usp=sharing This turned out to be bug in scrub/snap trim interaction - marking as a duplicate instead *** This bug has been marked as a duplicate of bug 2067056 *** |