The fact that L2 is empty generally means that reaping is happening. In addition, if L2 is empty, it will try to reap from L1, if it can, so it will do it's best to reap unused entries if it's over HWMark. (But note, it will never reap entries if it's under HWMark, so you will in general have ~HWMark entries in a stable, busy system.) Reaping will attempt the LRU end of each lane, and then stop if it doesn't find anything, and allocate a new entry. Reaping is separate from L2 vs. L1. The levels of the LRU are intended to manage open global FDs (FDs used by NFSv3 or by anonymous access in NFSv4). Entries in L1 may have an open global FD, entries in L2 do not. The LRU_Run_Interval determines how often we scan L1 to demote entries to L2 and close the global FD. This interval is variable, and will become shorter if the number of open global FDs is above it's FD_LWMark (and much faster if it's above FD_HWMark). All that said, entries can be reaped (and reused) if they're at the tail of any lane at either L2 or L1, and their refcount is only 1.
The LRU_Run_Interval is 90s, which seems like a _very_ long time -- long enough to really go way above the cache limits. What exactly does reaper_work_per_lane denote? If we're only trimming max 50 entries per lane and only every 90s, it seems plausible that we could go way over the hwmark.
LRU_Run_Interval has nothing at all to do with reclaiming (reaping) entries. It will not remove or free entries at all. All it does is demote entries from L1 to L2, closing their global FD in the process. It's about global FD management, not about entry management. Entries are reaped via lru_try_reap_entry(), which is done on demand, every time a new entry is required. Otherwise, entries are freed when they become invalid.
Ok, so that would seem to rule out my hypothesis that this is being caused by ganesha's inability to reap old entries fast enough to keep up with the ones being added. So, we're probably left with two possibilities: 1) there is a refcount leak in ganesha that's causing entries to remain pinned in the cache 2) the working set of open files is just _that_ large. Manjunatha, can you ask them about their workload here? Approximately how many files would they have open at a given time?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3294