Created attachment 1556046 [details] Crash log Machine get's unstable / or totally locks up when high amount of specific NFS4 file ops takes place. NFS3 seems unaffected. Any user account seems to be able to kill/crash the server completely within around 5-30s, if it has access to nfs4 share. This was tested by admins of company i work for, on VM and barebone servers, and by me on VMWare, we were able to crash several (virtual) machines which are using new FC, within seconds. Sometimes files were lost after the crash or whole volume was unable to mount. Strange thing is that it also sometimes affected files which were never written to (source code files, or the app file itself... which shouldn't get damaged because the app was running) Version-Release number of the kernel: - 5.0.7-200.fc29.x86_64 #1 SMP (Fedora 29, also other sub-versions) - 5.0.6-200.fc29.x86_64 #1 SMP (Fedora 29 Server Edition) Not sure when the bug first appeared, but it works ok on FC26. Also im not very familiar with fedora. It's usually crashing on posix_lock_inode+0x4cf/0x8c0 and these were all (beside one case) fresh system installs, just with nfs client and nfs server added. Logs attached, also sending VMWare screenshot if that helps in anything: https://www.screencast.com/t/w06VPrBap I wrote an app for crashing the server and reproducing this bug. Not sure if i should release it, due to severity of this and possible data loss. I can also share complete VMWare VM. If you think that's ok i can post it on github, also i identified single, specific file operation which is causing the crash if run alongside other file ops. Not sure if I should disclose in public because then it can be reproduced. Component crashing is NFS Server (the client works fine, just locks up after the server is gone when tested on 2 machines). Reporting as urgent because anyone having access to any nfs share on new FC installs can crash server or cause data loss. Discovered this bug when working with sqlite database over NFS share, but it takes around 5-10 minutes to crash our production server(s) probably because the FS load wasn't as high as generated by the app. So servers can also crash on their own when load is high and this specific file operation is taking place.
Some more info about the issue. Tested this with Debian 9 (Linux debian 4.9.0-8-amd64 #1 SMP), and it for sure seems kernel issue because there the bug manifests exactly as when we have used nfsv4 on Fedora 26. So instead of kernel errors / killing the machine - the nfs server is loosing ability to write anything to the share. After issuing "echo 123 > t.txt" the shell locks indefinitely, and some files could be read but not all (maybe because of some cache which is still working properly). This happened to us on production just couple of times... The only error which is seen in dmesg is "nfs4_reclaim_open_state: Lock reclaim failed!". It'll kill nfs 4.2, 4.1. NFS 4.2 takes around 5-10s to be killed and 4.1 around a minute. After that the server needs restart /etc/init.d (stop/start/reload) seems to not have any effect. Very strange thing is that if the client connects using nfs 4.0 then everything works fine ("mount -t nfs -o vers=4.0 127.0.0.1:/home /homenfs/") and there are no errors in dmesg. Maybe because it's slower because the server seems to be also harder to kill when enabling nfsd debugging... Also this seems related: https://bugzilla.kernel.org/show_bug.cgi?id=115521
Another update on the issue. Tested this on Fedora 30 Server Edition (5.0.7-300.fc30.x86_64 #1 SMP) same thing happens. Posting test/exploit code and there are more logs on github (compilation instructions at the top) https://github.com/slawomir-pryczek/drbd_kill NFS 4.1 and 4.2 affected. NFS 4.0 and NFS 3 is working fine
Reported that to kernel bugtracker... seems patch is ready https://bugzilla.kernel.org/show_bug.cgi?id=203363
kernel-tools-5.0.9-300.fc30 kernel-headers-5.0.9-300.fc30 kernel-5.0.9-300.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e84f6c34da
kernel-tools-5.0.9-200.fc29 kernel-headers-5.0.9-200.fc29 kernel-5.0.9-200.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-1e8a4c6958
kernel-tools-5.0.9-100.fc28 kernel-headers-5.0.9-100.fc28 kernel-5.0.9-100.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-1b986880ea
kernel-5.0.9-300.fc30, kernel-headers-5.0.9-300.fc30, kernel-tools-5.0.9-300.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e84f6c34da
kernel-5.0.9-100.fc28, kernel-headers-5.0.9-100.fc28, kernel-tools-5.0.9-100.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-1b986880ea
kernel-5.0.9-200.fc29, kernel-headers-5.0.9-200.fc29, kernel-tools-5.0.9-200.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-1e8a4c6958
kernel-5.0.9-301.fc30 kernel-headers-5.0.9-300.fc30 kernel-tools-5.0.9-300.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e84f6c34da
kernel-5.0.9-301.fc30, kernel-headers-5.0.9-300.fc30, kernel-tools-5.0.9-300.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e84f6c34da
kernel-5.0.9-200.fc29, kernel-headers-5.0.9-200.fc29, kernel-tools-5.0.9-200.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.
kernel-5.0.9-301.fc30, kernel-headers-5.0.9-300.fc30, kernel-tools-5.0.9-300.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
kernel-5.0.9-100.fc28, kernel-headers-5.0.9-100.fc28, kernel-tools-5.0.9-100.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.