1. Please describe the problem: Using 6.12.* as NFS-4.2 client, with cachefilesd ON, causes mount/unmount problems. Switching off cachefilesd solves the problems. 2. What is the Version-Release number of the kernel: 6.12.4 (but same on .1 and .3) 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : This has worked without problems in all 6.11.* First problems was with 6.12.1 (However: there were previously similar problems in Fedora 40 with much older kernels) 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: It's not 100% sure, but simply logging in with a NFS-4.2 home directory triggers a hang with a half-mounted filesystem. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Not tried with Rawhide yet (as 6.12 is already ahead of current) 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. There is nothing logged about this, not in the dmesg, not in the journal. Reproducible: Always
Some extra info: * Kernel 6.12.5-200.fc41 has the exact same problem * Also aarch64 has the exact same problem
I have the same issue. I am on: Linux 6.12.9-200.fc41.x86_64 nfs-utils-2.8.1-4.rc2.fc41.x86_64 I start up cachefilesd I have: NV SERVER PORT DEV FSID FSC v4 0a0c0e01 801 0:90 5c95aeb110ab56f0:0 yes v4 0a0c0e1e 801 0:76 e228d38d2b7a0f8c:0 yes I have files (newly created AFTER cachefilesd is started) in /var/cache/fscache find /var/cache/fscache/ -type f | wc 58 58 9135 but very shortly after trying to access anything on my nfs share the process accessing the nfs share hangs. dmesg and journalctl don't show any errors. I do have some 'stuck' processes that seemed to have started up at the time of the file system hang: root 2177 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2178 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2182 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2183 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2186 678 0 11:21 ? 00:00:00 systemd-nsresourcework: waiting... root 2198 679 0 11:22 ? 00:00:00 systemd-userwork: waiting... root 2199 679 0 11:22 ? 00:00:00 systemd-userwork: waiting... root 2200 679 0 11:22 ? 00:00:00 systemd-userwork: waiting... I am not 100% certain that they are related... but I don't recall seeing those before... and the system is definilty 'waiting' on something and the time is about the time I had the issue.... so.... This is the FIRST time I've ever looked at cachefilesd so I don't have any idea if it worked on a different kernel version.
I downloaded and installed dnf install kernel-modules-core-6.11.4-301.fc41.x86_64 \ kernel-core-6.11.4-301.fc41.x86_64 \ kernel-modules-6.11.4-301.fc41.x86_64 \ kernel-tools-libs-6.11.4-301.fc41.x86_64 \ kernel-tools-6.11.4-301.fc41.x86_64 \ kernel-modules-extra-6.11.4-301.fc41.x86_64 \ kernel-6.11.4-301.fc41.x86_64 using koji download-build and booted up with the older, kernel-6.11.4-301.fc41.x86_64 kernel. uname reports: Linux 6.11.4-301.fc41.x86_64 Now I have more 1500+ files in my /var/cache/fscache dir ( -vs- 50 when it started and hung up ) and my nfs stuff works and does not hang, so I can say that the older 6.11.4-301 kernel works with the cachefilesd stuff. I checked again and I still have: root 4223 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... root 4224 671 0 12:16 ? 00:00:00 systemd-userwork: waiting... root 4225 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... root 4226 671 0 12:16 ? 00:00:00 systemd-userwork: waiting... root 4227 671 0 12:16 ? 00:00:00 systemd-userwork: waiting... root 4228 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... root 4229 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... root 4230 670 0 12:16 ? 00:00:00 systemd-nsresourcework: waiting... but the nfs stuff seems to be working so I don't think that those are related to the issue with the newer 6.12 kernel.
I'm seeing same on 6.12.9-200.fc41.x86_64 with cachefilesd enabled.
(rolled back to 6.11.11-300.fc41.x86_64 which works fine)
It seems kernel 6.12.10 solved the problem. At least, I'm no longer able to trigger the problem.
Correction: it happens a lot less frequently.
The very same problem reappeared instantly and repeatably on kernel 6.15 on Fedora 42.
Seconded, with all 6.15.{1..3} affected. Actually, it's even worse: after a reboot into 6.15.3, even before any NFS is actually mounted, a 'systemctl stop cachefilesd' can already hang the whole machine. So you need to 'systemctl disable' it before anyone boots into the fresh kernel. As we heavily use NFS-caching, the reoccurring of this is really a pain. But it seems a little used feature outside of academia...
(In reply to Bert DeKnuydt from comment #9) > Actually, it's even worse: after a reboot into 6.15.3, even before any NFS > is actually mounted, a 'systemctl stop cachefilesd' > can already hang the whole machine. So you need to 'systemctl disable' it > before anyone boots into the fresh kernel. Identical behaviour here - really not fun when different users that left the lab in the evening and authorized the automcatic packages update at reboot come all up to you howling that their machine has frozen... > As we heavily use NFS-caching, the reoccurring of this is really a pain. > But it seems a little used feature outside of academia... At this point, I'm considering simply removing cachefilesd and be done with it - the perpetual risk of hosing the whole lab for an ordinary upgrade-and-reboot cycle (which has already occurred several times) can't be reasonably justified without hard performance numbers in favour of keeping it enabled or a glaring difference in responsiveness, which I really don't see...
FYI: 6.15.4, with quite some NFS fixes, still suffers. @Francesco: As for performance of cachefilesd: we measured no increased responsiveness on the NFS client (in fact: the opposite: a bit more latency), but ... a lot less traffic to the NFS server. And that makes is worthwhile for us. When it works.