1. Please describe the problem: Using 6.12.* as NFS-4.2 client, with cachefilesd ON, causes mount/unmount problems. Switching off cachefilesd solves the problems. 2. What is the Version-Release number of the kernel: 6.12.4 (but same on .1 and .3) 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : This has worked without problems in all 6.11.* First problems was with 6.12.1 (However: there were previously similar problems in Fedora 40 with much older kernels) 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: It's not 100% sure, but simply logging in with a NFS-4.2 home directory triggers a hang with a half-mounted filesystem. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Not tried with Rawhide yet (as 6.12 is already ahead of current) 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. There is nothing logged about this, not in the dmesg, not in the journal. Reproducible: Always
Some extra info: * Kernel 6.12.5-200.fc41 has the exact same problem * Also aarch64 has the exact same problem
I have the same issue. I am on: Linux 6.12.9-200.fc41.x86_64 nfs-utils-2.8.1-4.rc2.fc41.x86_64 I start up cachefilesd I have: NV SERVER PORT DEV FSID FSC v4 0a0c0e01 801 0:90 5c95aeb110ab56f0:0 yes v4 0a0c0e1e 801 0:76 e228d38d2b7a0f8c:0 yes I have files (newly created AFTER cachefilesd is started) in /var/cache/fscache find /var/cache/fscache/ -type f | wc 58 58 9135 but very shortly after trying to access anything on my nfs share the process accessing the nfs share hangs. dmesg and journalctl don't show any errors. I do have some 'stuck' processes that seemed to have started up at the time of the file system hang: root 2177 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2178 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2182 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2183 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2186 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2198 679 0 11:22 ? 00:00:00 systemd-userwork: waiting... root 2199 679 0 11:22 ? 00:00:00 systemd-userwork: waiting... root 2200 679 0 11:22 ? 00:00:00 systemd-userwork: waiting... I am not 100% certain that they are related... but I don't recall seeing those before... and the system is definilty 'waiting' on something and the time is about the time I had the issue.... so.... This is the FIRST time I've ever looked at cachefilesd so I don't have any idea if it worked on a different kernel version.
I downloaded and installed dnf install kernel-modules-core-6.11.4-301.fc41.x86_64 \ kernel-core-6.11.4-301.fc41.x86_64 \ kernel-modules-6.11.4-301.fc41.x86_64 \ kernel-tools-libs-6.11.4-301.fc41.x86_64 \ kernel-tools-6.11.4-301.fc41.x86_64 \ kernel-modules-extra-6.11.4-301.fc41.x86_64 \ kernel-6.11.4-301.fc41.x86_64 using koji download-build and booted up with the older, kernel-6.11.4-301.fc41.x86_64 kernel. uname reports: Linux 6.11.4-301.fc41.x86_64 Now I have more 1500+ files in my /var/cache/fscache dir ( -vs- 50 when it started and hung up ) and my nfs stuff works and does not hang, so I can say that the older 6.11.4-301 kernel works with the cachefilesd stuff. I checked again and I still have: root 4223 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... root 4224 671 0 12:16 ? 00:00:00 systemd-userwork: waiting... root 4225 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... root 4226 671 0 12:16 ? 00:00:00 systemd-userwork: waiting... root 4227 671 0 12:16 ? 00:00:00 systemd-userwork: waiting... root 4228 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... root 4229 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... root 4230 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... but the nfs stuff seems to be working so I don't think that those are related to the issue with the newer 6.12 kernel.
I'm seeing same on 6.12.9-200.fc41.x86_64 with cachefilesd enabled.
(rolled back to 6.11.11-300.fc41.x86_64 which works fine)
It seems kernel 6.12.10 solved the problem. At least, I'm no longer able to trigger the problem.
Correction: it happens a lot less frequently.
The very same problem reappeared instantly and repeatably on kernel 6.15 on Fedora 42.
Seconded, with all 6.15.{1..3} affected. Actually, it's even worse: after a reboot into 6.15.3, even before any NFS is actually mounted, a 'systemctl stop cachefilesd' can already hang the whole machine. So you need to 'systemctl disable' it before anyone boots into the fresh kernel. As we heavily use NFS-caching, the reoccurring of this is really a pain. But it seems a little used feature outside of academia...
(In reply to Bert DeKnuydt from comment #9) > Actually, it's even worse: after a reboot into 6.15.3, even before any NFS > is actually mounted, a 'systemctl stop cachefilesd' > can already hang the whole machine. So you need to 'systemctl disable' it > before anyone boots into the fresh kernel. Identical behaviour here - really not fun when different users that left the lab in the evening and authorized the automcatic packages update at reboot come all up to you howling that their machine has frozen... > As we heavily use NFS-caching, the reoccurring of this is really a pain. > But it seems a little used feature outside of academia... At this point, I'm considering simply removing cachefilesd and be done with it - the perpetual risk of hosing the whole lab for an ordinary upgrade-and-reboot cycle (which has already occurred several times) can't be reasonably justified without hard performance numbers in favour of keeping it enabled or a glaring difference in responsiveness, which I really don't see...
FYI: 6.15.4, with quite some NFS fixes, still suffers. @Francesco: As for performance of cachefilesd: we measured no increased responsiveness on the NFS client (in fact: the opposite: a bit more latency), but ... a lot less traffic to the NFS server. And that makes is worthwhile for us. When it works.
unlike the NFS-triggered hangs described above, I see a consistent failure of `cachefilesd` at start i'm not whether it's the same issue, or not ... posting here, even though NFS doesn't appear to be required here, since involves cachefilesd + recent Fedora kernel indicates kernel-level breakage in FS-Cache / cachefiles backend with, $ distro Name: Fedora Linux 42 (Adams) Version: 42 Codename: $ uname -rm 6.15.4-200.fc42.x86_64 x86_64 $ lsmod | grep cachefiles cachefiles 204800 0 netfs 602112 1 cachefiles $ grep dir /etc/cachefilesd.conf dir /fscache $ mount | grep /fscache tmpfs on /fscache type tmpfs (rw,relatime,size=8388608k,mode=755,inode64) $ systemctl start cachefilesd.service Job for cachefilesd.service failed because the control process exited with error code. See "systemctl status cachefilesd.service" and "journalctl -xeu cachefilesd.service" for details. $ journalctl -f Jul 09 13:57:59 svr systemd[1]: Starting cachefilesd.service - CacheFiles daemon (loc)... Jul 09 13:57:59 svr kernel: CacheFiles: Failed to register: -95 Jul 09 13:57:59 svr systemd[1]: cachefilesd.service: Control process exited, code=exited, status=1/FAILURE Jul 09 13:57:59 svr systemd[1]: cachefilesd.service: Failed with result 'exit-code'. Jul 09 13:57:59 svr systemd[1]: Failed to start cachefilesd.service - CacheFiles daemon (loc). Jul 09 13:58:00 svr systemd[1]: cachefilesd.service: Scheduled restart job, restart counter is at 1. $ test -d /proc/fs/cachefiles && echo OK || echo Missing Missing $ zgrep CACHEFILES /proc/config.gz CONFIG_CACHEFILES=m # CONFIG_CACHEFILES_DEBUG is not set # CONFIG_CACHEFILES_ERROR_INJECTION is not set CONFIG_CACHEFILES_ONDEMAND=y iiuc, error (`-95`, `EOPNOTSUPP`) here indicates a kernel-side failure of cachefiles backend registration request occurs here even with no NFS mounts and a valid/writable cache dir -> `/fscache` on tmpfs which suggests a possible regression in kernel-side cachefiles support in 6.15.x. in conjunction with the above, the broken backend appears to affect both initialization and NFS interaction depending on system usage. i know i didn't have this problem previously. i haven't bisected, or even tested earlier kernle versions yet.
an alternative is to switch to in-kernel FS-Cache v2, using its non-fixed/dynamic memory cache given $ grep -i fscache /boot/config-6.15.4-200.fc42.x86_64 CONFIG_FSCACHE=y CONFIG_FSCACHE_STATS=y CONFIG_NFS_FSCACHE=y CONFIG_CEPH_FSCACHE=y CONFIG_CIFS_FSCACHE=y CONFIG_AFS_FSCACHE=y CONFIG_9P_FSCACHE=y with `cachefiles` removed $ rpm -qa | grep -i cachefiles (empty) and `fsc` usage enabled $ grep fsc /etc/auto.nfs4 TEST -fstype=nfs4,vers=4.2,_netdev,...,fsc,... machine.example.com:/ ^^^ when accessing `TEST`, e.g., $ cat /proc/fs/fscache/stats | grep -E "Cookies|Acquire|Reads|IO" Reads : DR=0 RA=25015 RF=0 RS=0 WB=0 WBZ=0 Cookies: n=20572 v=1 vcol=0 voom=0 Acquire: n=28995 ok=28995 oom=0 IO : rd=0 wr=0 mis=0 $ ps ax | grep -E "fscache|cachefiles" $ cat /proc/fs/fscache/caches CACHE REF VOLS OBJS ACCES S NAME ======== ===== ===== ===== ===== = =============== 00000024 1 1 0 0 - - $
looks like FSCache v2 can support disk-backed cache, but currently _only_ (?) using legacy cachefiles backend. which is what looks broken here. so, for now, FSCv2 is RAM-only unless/until cachefiles is functional. which is limiting if working data's > RAM, cache needs to persist across boot other memory pressure's significant
It seems the problems with NFS/fsc/cachefilesd are solved in 6.15.8-200.fc42.x86_64. Ran it over the weekend on a dozen machines without problems. There are indeed changes to this in the Changelog: -- Zizhi Wo (1): cachefiles: Fix the incorrect return value in __cachefiles_write() -- Now what about 6.16 :)
This message is a reminder that Fedora Linux 41 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 41 on 2025-12-15. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '41'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 41 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 41 entered end-of-life (EOL) status on 2025-12-15. Fedora Linux 41 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.