Description of problem: Cachefilesd doesn't appear to properly recognize when a filesystem passes cull limits due to utilization outside of cachefilesd's control (ie. when it doesn't have a dedicated filesystem). Once the culling is activated, the (empty) cull table will be built, decant_cull_table will schedule a jumpstart scan, and the main even loop will be immediately repeat the cycle. This leads to cachefilesd gobbling all of a core reading /proc/fs/cachefilesd, reading an empty cache directory, and scheduling the jumpstart scan (which never takes place, since a new one is rescheduled before the alarm() ever fires). Version-Release number of selected component (if applicable): cachefilesd-0.8-5.el5 / kernel-2.6.18-92.el5 How reproducible: 1. Configure cache directory on a non-dedicated filesystem. 2. Don't allow cachefilesd to cache anything (ie. skip 'fsc' mount options) 3. Fill that filesystem past the bcull limit. cachefilesd will now start chewing 100% CPU trying desperately to correct things. 4. Free up space above the cull limit and cachefilesd will go back to normal. Actual results: From looking at things, it would seem there may need to be an explicit case to handle when /proc/fs/cachefiles indicates a cull is required, but the entire cache has already been vacated. Additional info: $ grep -v '^#' /etc/cachefilesd.conf dir /u0/.fs-cache tag mycache brun 25% bcull 18% bstop 13% frun 25% fcull 18% fstop 13% $
I've seen this behaviour in the past too, but did not get around to report it. Now it's back on this Fedora18 installation: $ grep ^[a-z] /etc/cachefilesd.conf dir /var/cache/fscache tag mycache brun 10% bcull 7% bstop 3% frun 10% fcull 7% fstop 3% secctx system_u:system_r:cachefiles_kernel_t:s0 $ df -h /var/cache/fscache Filesystem Size Used Avail Use% Mounted on /dev/sda1 3.9G 3.4G 276M 93% / Granted, the root filesystem is pretty full, but cachefilesd isn't in use, yet it's using 1 core, doing the following, over and over: 966 18:35:39.303054 lseek(4, 0, SEEK_SET) = 0 966 18:35:39.303243 rt_sigaction(SIGALRM, {0x804a740, [ALRM], SA_RESTART}, {0x804a740, [ALRM], SA_RESTART}, 8) = 0 966 18:35:39.303497 alarm(30) = 30 966 18:35:39.303636 read(3, "cull=1 frun=6400 fcull=4600 fstop=1e00 brun=189b6 bcull=11399 bstop=761d", 4096) = 72 966 18:35:39.303792 fchdir(4) = 0 966 18:35:39.304020 getdents(4, /* 2 entries */, 32768) = 32 966 18:35:39.304189 getdents(4, /* 0 entries */, 32768) = 0 kernel-PAE-3.7.2-204.fc18.i686 cachefilesd-0.10.5-3.fc18.i686
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
I've seen this behavior on RHEL 6 as well.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).
I have opened Bug 1109640 against RHEL 6 for this issue. Please direct future interest to that bug.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days