Description of problem: We have a nfs4/kerberos/ldap system at work. This server is the nfs4-server. When rhel 5.4 clients is connected, everything works as it should. When a fedora 12 client is connected the nfs4-daemon hangs within a short while. The nfs-service can't be restarted, unless the f12 client is turned off. This is the error messages in /var/log/messages: Nov 9 12:53:25 artem kernel: slab error in kmem_cache_destroy(): cache `nfsd4_files': Can't free all objects Nov 9 12:53:25 artem kernel: Nov 9 12:53:25 artem kernel: Call Trace: Nov 9 12:53:25 artem kernel: [<ffffffff800db76f>] kmem_cache_destroy+0x7e/0x177 Nov 9 12:53:25 artem kernel: [<ffffffff888abef9>] :nfsd:nfsd4_free_slab+0x11/0x4d Nov 9 12:53:25 artem kernel: [<ffffffff888abf51>] :nfsd:nfsd4_free_slabs+0x1c/0x33 Nov 9 12:53:25 artem kernel: [<ffffffff888acec7>] :nfsd:nfs4_state_shutdown+0x17e/0x18a Nov 9 12:53:25 artem kernel: [<ffffffff88896570>] :nfsd:nfsd_last_thread+0x45/0x76 Nov 9 12:53:25 artem kernel: [<ffffffff8877eff4>] :sunrpc:svc_destroy+0x77/0xad Nov 9 12:53:25 artem kernel: [<ffffffff88896856>] :nfsd:nfsd+0x2b5/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Nov 9 12:53:25 artem kernel: [<ffffffff888965a1>] :nfsd:nfsd+0x0/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff888965a1>] :nfsd:nfsd+0x0/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Nov 9 12:53:25 artem kernel: Nov 9 12:53:25 artem kernel: BUG: warning at fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: P ) Nov 9 12:53:25 artem kernel: Nov 9 12:53:25 artem kernel: Call Trace: Nov 9 12:53:25 artem kernel: [<ffffffff888abf51>] :nfsd:nfsd4_free_slabs+0x1c/0x33 Nov 9 12:53:25 artem kernel: [<ffffffff888acec7>] :nfsd:nfs4_state_shutdown+0x17e/0x18a Nov 9 12:53:25 artem kernel: [<ffffffff88896570>] :nfsd:nfsd_last_thread+0x45/0x76 Nov 9 12:53:25 artem kernel: [<ffffffff8877eff4>] :sunrpc:svc_destroy+0x77/0xad Nov 9 12:53:25 artem kernel: [<ffffffff88896856>] :nfsd:nfsd+0x2b5/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Nov 9 12:53:25 artem kernel: [<ffffffff888965a1>] :nfsd:nfsd+0x0/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff888965a1>] :nfsd:nfsd+0x0/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Nov 9 12:53:25 artem kernel: Nov 9 12:53:25 artem kernel: slab error in kmem_cache_destroy(): cache `nfsd4_delegations': Can't free all objects Nov 9 12:53:25 artem kernel: Nov 9 12:53:25 artem kernel: Call Trace: Nov 9 12:53:25 artem kernel: [<ffffffff800db76f>] kmem_cache_destroy+0x7e/0x177 Nov 9 12:53:25 artem kernel: [<ffffffff888abef9>] :nfsd:nfsd4_free_slab+0x11/0x4d Nov 9 12:53:25 artem kernel: [<ffffffff888acec7>] :nfsd:nfs4_state_shutdown+0x17e/0x18a Nov 9 12:53:25 artem kernel: [<ffffffff88896570>] :nfsd:nfsd_last_thread+0x45/0x76 Nov 9 12:53:25 artem kernel: [<ffffffff8877eff4>] :sunrpc:svc_destroy+0x77/0xad Nov 9 12:53:25 artem kernel: [<ffffffff88896856>] :nfsd:nfsd+0x2b5/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Nov 9 12:53:25 artem kernel: [<ffffffff888965a1>] :nfsd:nfsd+0x0/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff888965a1>] :nfsd:nfsd+0x0/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Nov 9 12:53:25 artem kernel: Nov 9 12:53:25 artem kernel: BUG: warning at fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: P ) Nov 9 12:53:25 artem kernel: Nov 9 12:53:25 artem kernel: Call Trace: Nov 9 12:53:25 artem kernel: [<ffffffff888acec7>] :nfsd:nfs4_state_shutdown+0x17e/0x18a Nov 9 12:53:25 artem kernel: [<ffffffff88896570>] :nfsd:nfsd_last_thread+0x45/0x76 Nov 9 12:53:25 artem kernel: [<ffffffff8877eff4>] :sunrpc:svc_destroy+0x77/0xad Nov 9 12:53:25 artem kernel: [<ffffffff88896856>] :nfsd:nfsd+0x2b5/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Nov 9 12:53:25 artem kernel: [<ffffffff888965a1>] :nfsd:nfsd+0x0/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff888965a1>] :nfsd:nfsd+0x0/0x2cb Nov 9 12:53:25 artem kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Nov 9 12:53:25 artem kernel: Nov 9 12:53:25 artem kernel: nfsd: last server has exited Nov 9 12:53:25 artem kernel: nfsd: unexporting all filesystems Nov 9 12:53:48 artem kernel: kmem_cache_create: duplicate cache nfsd4_files Nov 9 12:53:48 artem kernel: Nov 9 12:53:48 artem kernel: Call Trace: Nov 9 12:53:48 artem kernel: [<ffffffff800396ce>] kmem_cache_create+0x56a/0x5a4 Nov 9 12:53:48 artem kernel: [<ffffffff888acf25>] :nfsd:nfs4_state_start+0x52/0x18f Nov 9 12:53:48 artem kernel: [<ffffffff888963ae>] :nfsd:nfsd_svc+0x6c/0x1e9 Nov 9 12:53:48 artem kernel: [<ffffffff88896f8e>] :nfsd:write_threads+0x0/0xa9 Nov 9 12:53:48 artem kernel: [<ffffffff88896ffd>] :nfsd:write_threads+0x6f/0xa9 Nov 9 12:53:48 artem kernel: [<ffffffff8002b8e9>] get_zeroed_page+0x21/0x82 Nov 9 12:53:48 artem kernel: [<ffffffff800f0e2b>] simple_transaction_get+0x8b/0xa5 Nov 9 12:53:48 artem kernel: [<ffffffff88896f8e>] :nfsd:write_threads+0x0/0xa9 Nov 9 12:53:48 artem kernel: [<ffffffff88896d59>] :nfsd:nfsctl_transaction_write+0x42/0x77 Nov 9 12:53:48 artem kernel: [<ffffffff80016927>] vfs_write+0xce/0x174 Nov 9 12:53:48 artem kernel: [<ffffffff800171df>] sys_write+0x45/0x6e Nov 9 12:53:48 artem kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Nov 9 12:53:48 artem kernel: Nov 9 12:53:48 artem nfsd[5493]: nfssvc: Cannot allocate memory Version-Release number of selected component (if applicable): kernel-2.6.18-164.3.1.el5 nfs-utils-1.0.9-42.el5 nfs-utils-lib-1.0.8-7.6.el5 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I managed to find the root of the problem. I had one file that didn't have the correct uid/gid (instead it had a huge number), so when tracker found this file, the nfs-server froze. When i changed the file to be owned by me, the problem never occured again.
Espen, This could be related to bz 519184. A uid/gid of 4294967294 confused idmapd on nfsserver causing it to hang. A fix for this is available on RHEL 5.5 beta. If possible, can you please check this version and report if this fixes the issue for you on bz 519184. Sachin Prabhu
Marking as a duplicate of bug 519184 under the presumption that Sachin's analysis is correct. Please reopen if that's not the case. *** This bug has been marked as a duplicate of bug 519184 ***