Red Hat Bugzilla – Bug 1269390
fs/nfsd/nfs4state.c:3937 nfsd4_process, and lib/list_debug.c:53 __list_del_entry+0x70/0xe0
Last modified: 2015-10-20 16:01 EDT
Created attachment 1080538 [details]
Extract form journalctl long time before the crash there is a kernel bug at fs/nfsd/nfs4state.c:3937 nfsd4_process_open
Description of problem:
Our file server crashes / freezes suddenly since about some weeks. It happens with Fedora 21, with a 4.x kernel, and with Fedora 22 (I have installed it a Sep 27, 2015).
The server crashes sometimes after some hours again, sometimes not until some days.
The crash seems always have to do with the kernel module nfsd, I append serveral files with extracts from journalctl from the same boot (between a reboot and the following crash). There are serveral error messages - after the last in the list the system was crashed (frozen).
Please could you help us to fix the problem? Is there a kernel which has fixed the problem? Is this problem already known or new?
Version-Release number of selected component (if applicable):
non-deterministic, the machine freezes after a random time.
Steps to Reproduce:
1. Start a Fedora 22 system with nfs export of ext4 files systems with facl and quota enabled (for home directories of users).
2. Use nfs clients that mounts and uses this nfs server.
3. Wait until the server crashes (random time) - I don't know what triggers the crash.
Created attachment 1080539 [details]
Extract form journalctl long time before the crash there are strange messeges about rpc-srv/tcp: nfsd
Created attachment 1080540 [details]
Extract form journalctl long time before the crash there are strange messeges about perf
Created attachment 1080541 [details]
Extract form journalctl just before the crash with kernel bug at lib/list_debug.c:53 __list_del_entry
rpc and perf messages look unrelated, list_debug warnings are probably the real cause. This looks like a dup of bug 1236688, could you confirm after update to 4.1.10?
*** This bug has been marked as a duplicate of bug 1236688 ***
Our server crashed again today (after 8 days without crash), but with kernel 4.1.8. It has the same error messages as reported here.
Now, after reboot, the server runs with kernel 4.1.10.
I don't know how I can cause the server to run into this bug. I can just wait and tell you after some days if the server has crashed again or not.
I will possibly try the "Steps to Reproduce" of bug #1236688 to test if I can reproduce the bug on another system with the new and an older kernel.
Created attachment 1084919 [details]
kernel trace of fs/nfsd/nfs4state.c:3956 nfsd4_process_open2
My server is still running with kernel 4.1.10-200.fc22.x86_64 without a crash (since 16th October 2015). But I found a warning (see attachment).
I will make a new comment if the server will crash again with the error described in this bug report.
Thanks for your comments and patches!