Bug 1798284 - clients failing to respond to cache pressure
Summary: clients failing to respond to cache pressure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 3.3
Hardware: All
OS: All
high
medium
Target Milestone: ---
: 5.0
Assignee: Jeff Layton
QA Contact: Yogesh Mane
URL:
Whiteboard: NeedsCherrypick
Depends On: 1849725
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-05 04:04 UTC by Manjunatha
Modified: 2023-09-07 21:43 UTC (History)
12 users (show)

Fixed In Version: ceph-15.2.4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-30 08:23:40 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 12334 0 None None None 2020-05-30 19:30:56 UTC
Red Hat Issue Tracker RHCEPH-1119 0 None None None 2021-08-30 00:11:56 UTC
Red Hat Product Errata RHBA-2021:3294 0 None None None 2021-08-30 08:24:07 UTC

Comment 45 Daniel Gryniewicz 2020-04-09 13:10:28 UTC
The fact that L2 is empty generally means that reaping is happening.  In addition, if L2 is empty, it will try to reap from L1, if it can, so it will do it's best to reap unused entries if it's over HWMark.  (But note, it will never reap entries if it's under HWMark, so you will in general have ~HWMark entries in a stable, busy system.)  Reaping will attempt the LRU end of each lane, and then stop if it doesn't find anything, and allocate a new entry.

Reaping is separate from L2 vs. L1.  The levels of the LRU are intended to manage open global FDs (FDs used by NFSv3 or by anonymous access in NFSv4).  Entries in L1 may have an open global FD, entries in L2 do not.  The LRU_Run_Interval determines how often we scan L1 to demote entries to L2 and close the global FD.  This interval is variable, and will become shorter if the number of open global FDs is above it's FD_LWMark (and much faster if it's above FD_HWMark).

All that said, entries can be reaped (and reused) if they're at the tail of any lane at either L2 or L1, and their refcount is only 1.

Comment 46 Jeff Layton 2020-04-09 13:15:29 UTC
The LRU_Run_Interval is 90s, which seems like a _very_ long time -- long enough to really go way above the cache limits. What exactly does reaper_work_per_lane denote? If we're only trimming max 50 entries per lane and only every 90s, it seems plausible that we could go way over the hwmark.

Comment 47 Daniel Gryniewicz 2020-04-09 14:03:08 UTC
LRU_Run_Interval has nothing at all to do with reclaiming (reaping) entries.  It will not remove or free entries at all.  All it does is demote entries from L1 to L2, closing their global FD in the process.  It's about global FD management, not about entry management.

Entries are reaped via lru_try_reap_entry(), which is done on demand, every time a new entry is required.  Otherwise, entries are freed when they become invalid.

Comment 48 Jeff Layton 2020-04-09 14:58:50 UTC
Ok, so that would seem to rule out my hypothesis that this is being caused by ganesha's inability to reap old entries fast enough to keep up with the ones being added. So, we're probably left with two possibilities:

1) there is a refcount leak in ganesha that's causing entries to remain pinned in the cache

2) the working set of open files is just _that_ large.

Manjunatha, can you ask them about their workload here? Approximately how many files would they have open at a given time?

Comment 67 errata-xmlrpc 2021-08-30 08:23:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294


Note You need to log in before you can comment on or make changes to this bug.