Bug 1798284
| Summary: | clients failing to respond to cache pressure | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Manjunatha <mmanjuna> |
| Component: | CephFS | Assignee: | Jeff Layton <jlayton> |
| Status: | CLOSED ERRATA | QA Contact: | Yogesh Mane <ymane> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.3 | CC: | ceph-eng-bugs, dang, dfuller, ffilz, gsitlani, hyelloji, jlayton, kkeithle, pdonnell, sweil, tserlin, vereddy |
| Target Milestone: | --- | ||
| Target Release: | 5.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | NeedsCherrypick | ||
| Fixed In Version: | ceph-15.2.4 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-30 08:23:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1849725 | ||
| Bug Blocks: | |||
|
Comment 45
Daniel Gryniewicz
2020-04-09 13:10:28 UTC
The LRU_Run_Interval is 90s, which seems like a _very_ long time -- long enough to really go way above the cache limits. What exactly does reaper_work_per_lane denote? If we're only trimming max 50 entries per lane and only every 90s, it seems plausible that we could go way over the hwmark. LRU_Run_Interval has nothing at all to do with reclaiming (reaping) entries. It will not remove or free entries at all. All it does is demote entries from L1 to L2, closing their global FD in the process. It's about global FD management, not about entry management. Entries are reaped via lru_try_reap_entry(), which is done on demand, every time a new entry is required. Otherwise, entries are freed when they become invalid. Ok, so that would seem to rule out my hypothesis that this is being caused by ganesha's inability to reap old entries fast enough to keep up with the ones being added. So, we're probably left with two possibilities: 1) there is a refcount leak in ganesha that's causing entries to remain pinned in the cache 2) the working set of open files is just _that_ large. Manjunatha, can you ask them about their workload here? Approximately how many files would they have open at a given time? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3294 |