Bug 697448 - slab corruption after seeing some nfs-related BUG: warning [rhel-5.6.z]
Summary: slab corruption after seeing some nfs-related BUG: warning [rhel-5.6.z]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Phillip Lougher
QA Contact: Jian Li
URL:
Whiteboard:
Depends On: 589512
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-18 10:33 UTC by RHEL Program Management
Modified: 2014-03-04 00:07 UTC (History)
22 users (show)

Fixed In Version: kernel-2.6.18-238.12.1.el5
Doc Type: Bug Fix
Doc Text:
An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 2^32 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.
Clone Of:
Environment:
Last Closed: 2011-05-31 14:11:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0833 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2011-05-31 14:05:42 UTC

Description RHEL Program Management 2011-04-18 10:33:25 UTC
This bug has been copied from bug #589512 and has been proposed
to be backported to 5.6 z-stream (EUS).

Comment 5 Phillip Lougher 2011-05-09 15:16:15 UTC
in kernel-2.6.18-238.12.1.el5

linux-2.6-fs-nfsd-fix-auth_domain-reference-leak-on-nlm-operations.patch

Comment 7 Jian Li 2011-05-19 10:28:26 UTC
The bug is reproduced in 2.6.18-238.el5 and verified in 2.6.18-238.12.1.el5 (RHEL6). 

This test uses one nfs client and nfs host.
In nfs client, test command:
[root@ibm-ls22-01 ~]# for i in {1..100}; do mount intel-s3e36-01.rhts.eng.rdu.redhat.com:/mnt/test /mnt/test; flock /mnt/test/lockfile -c "sleep 1" ; umount /mnt/test ; done

In nfs host, test command:
stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}'

Output is as follow:
====reproducer
[root@intel-s3e36-01 ~]# uname -a
Linux intel-s3e36-01.rhts.eng.rdu.redhat.com 2.6.18-238.el5 #1 SMP Sun Dec 19 14:22:44 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@intel-s3e36-01 ~]# stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}'
* 4
* 4
* 4
* 5
* 5
* 5
* 6
* 6
* 6
* 7
* 7
* 7
* 8
* 7
* 8
* 8
* 8
====verify
[root@intel-s3e36-01 ~]# uname -a
Linux intel-s3e36-01.rhts.eng.rdu.redhat.com 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@intel-s3e36-01 ~]# stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}'
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4
* 4

Comment 8 errata-xmlrpc 2011-05-31 14:11:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0833.html

Comment 9 Martin Prpič 2011-06-02 13:34:34 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 232 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.

Comment 10 J. Bruce Fields 2011-06-02 15:06:29 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 232 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.+An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 2^32 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.


Note You need to log in before you can comment on or make changes to this bug.