This bug has been copied from bug #589512 and has been proposed to be backported to 5.6 z-stream (EUS).
in kernel-2.6.18-238.12.1.el5 linux-2.6-fs-nfsd-fix-auth_domain-reference-leak-on-nlm-operations.patch
The bug is reproduced in 2.6.18-238.el5 and verified in 2.6.18-238.12.1.el5 (RHEL6). This test uses one nfs client and nfs host. In nfs client, test command: [root@ibm-ls22-01 ~]# for i in {1..100}; do mount intel-s3e36-01.rhts.eng.rdu.redhat.com:/mnt/test /mnt/test; flock /mnt/test/lockfile -c "sleep 1" ; umount /mnt/test ; done In nfs host, test command: stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}' Output is as follow: ====reproducer [root@intel-s3e36-01 ~]# uname -a Linux intel-s3e36-01.rhts.eng.rdu.redhat.com 2.6.18-238.el5 #1 SMP Sun Dec 19 14:22:44 EST 2010 x86_64 x86_64 x86_64 GNU/Linux [root@intel-s3e36-01 ~]# stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}' * 4 * 4 * 4 * 5 * 5 * 5 * 6 * 6 * 6 * 7 * 7 * 7 * 8 * 7 * 8 * 8 * 8 ====verify [root@intel-s3e36-01 ~]# uname -a Linux intel-s3e36-01.rhts.eng.rdu.redhat.com 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux [root@intel-s3e36-01 ~]# stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}' * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0833.html
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 232 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 232 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.+An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 2^32 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.