Bug 697448
Summary: | slab corruption after seeing some nfs-related BUG: warning [rhel-5.6.z] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | RHEL Program Management <pm-rhel> |
Component: | kernel | Assignee: | Phillip Lougher <plougher> |
Status: | CLOSED ERRATA | QA Contact: | Jian Li <jiali> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.3 | CC: | anton, bfields, dhoward, ekuric, james.brown, jiali, jlayton, jthomas, kchoi, lwoodman, nmurray, pm-eus, qcai, rmitchel, rwheeler, sprabhu, steved, tao, tumeya, vfalico, vgaikwad, yanwang |
Target Milestone: | rc | Keywords: | OtherQA, Reopened, ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.18-238.12.1.el5 | Doc Type: | Bug Fix |
Doc Text: |
An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 2^32 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-05-31 14:11:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 589512 | ||
Bug Blocks: |
Description
RHEL Program Management
2011-04-18 10:33:25 UTC
in kernel-2.6.18-238.12.1.el5 linux-2.6-fs-nfsd-fix-auth_domain-reference-leak-on-nlm-operations.patch The bug is reproduced in 2.6.18-238.el5 and verified in 2.6.18-238.12.1.el5 (RHEL6). This test uses one nfs client and nfs host. In nfs client, test command: [root@ibm-ls22-01 ~]# for i in {1..100}; do mount intel-s3e36-01.rhts.eng.rdu.redhat.com:/mnt/test /mnt/test; flock /mnt/test/lockfile -c "sleep 1" ; umount /mnt/test ; done In nfs host, test command: stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}' Output is as follow: ====reproducer [root@intel-s3e36-01 ~]# uname -a Linux intel-s3e36-01.rhts.eng.rdu.redhat.com 2.6.18-238.el5 #1 SMP Sun Dec 19 14:22:44 EST 2010 x86_64 x86_64 x86_64 GNU/Linux [root@intel-s3e36-01 ~]# stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}' * 4 * 4 * 4 * 5 * 5 * 5 * 6 * 6 * 6 * 7 * 7 * 7 * 8 * 7 * 8 * 8 * 8 ====verify [root@intel-s3e36-01 ~]# uname -a Linux intel-s3e36-01.rhts.eng.rdu.redhat.com 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux [root@intel-s3e36-01 ~]# stap -e 'probe module("sunrpc").function("auth_domain_lookup").return { printf("%s %d\n",kernel_string($return->name), $return->ref->refcount->counter);}' * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 * 4 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0833.html Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 232 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 232 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics.+An NFS server uses reference-counted structures, called auth_domains, to identify which group of clients (for example, 192.168.0.0/24 or *.foo.edu) the client who sent an RPC request belongs to. The server NLM code incorrectly took an extra reference of the auth_domain associated with each NLM RPC request, and never dropped that reference. The reference count is an unsigned 32-bit value, so after 2^32 (about 4 billion) lock operations from the same client or group of clients, the reference count would overflow to 0, and the kernel would incorrectly think that the auth_domain should be freed. As a result, the kernel would panic. This update removes the extra reference-count increment from the server NLM code, and the kernel no longer panics. |