Bug 432393 - memory leak on size-8192 buckets with NFSV4
Summary: memory leak on size-8192 buckets with NFSV4
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On: 423521
Blocks: 461297
TreeView+ depends on / blocked
 
Reported: 2008-02-11 19:33 UTC by Jeff Layton
Modified: 2009-05-18 19:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-18 19:19:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch -- fix several problems with callback thread shutdown (938 bytes, patch)
2008-02-19 18:35 UTC, Jeff Layton
no flags Details | Diff
patch -- fix several problems with callback thread shutdown (2.69 KB, patch)
2008-02-20 20:50 UTC, Jeff Layton
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1024 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 14:57:26 UTC

Description Jeff Layton 2008-02-11 19:33:39 UTC
+++ This bug was initially created as a clone of Bug #423521 +++

Description of problem:

We get on the client side a memory leak on the size-8192 buckets when running
under a NFSV4 mount the bench iozone with -U option in a infinite loop.

We can see with "slabtop -s c" the number of size-8192 buckets constantly
increases without never decreases.

At the beginning :  
    33     33 100%    8.00K     33        1       264K size-8192
After half and hour:
    89     89 100%    8.00K     89        1       712K size-8192
After one hour:  
   174    172  98%    8.00K    174        1      1392K size-8192
After two hours:
   302    300  99%    8.00K    302        1      2416K size-8192
After four hours:
   533    533 100%    8.00K    533        1      4264K size-8192
After six hours and half:
   804    804 100%    8.00K    804        1      6432K size-8192










Version-Release number of selected component (if applicable):
Linux version 2.6.18-53.el5 (brewbuilder.redhat.com) (gcc
version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Wed Oct 10 16:34:19 EDT 2007



How reproducible:

Do the nfsv4 mount and run with the mounted directory:

It is possible to reproduce it on one machine with a NFSV4 mount in loopback.
For example on the machine nfs1gb machine:

mount -t nfs4 nfs1gb:/ /mnt/nosec

while true; do ./iozone -+q 30 -ace -r 64 -i 0 -i 1 -i 2 -f
/mnt/nosec/nfs1_nfs4_gb -U /mnt/nosec; date; sleep 30; done



The nfs1gb machine used is a Intel X86_64 64bits two-ways.

Best Regards





Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

-- Additional comment from jlayton on 2008-02-06 15:25 EST --
Finally got a chance to look over this today. I see a similar memory leak when
just mounting and unmounting a NFS4 share in a loop:

# for i in `seq 1 100`; do mount /mnt/rhel4; umount /mnt/rhel4; done

...after this, the size-8192 slab gains ~100 more active objects. I'll have to
ponder how we can best track down what's actually doing these kmallocs. Maybe a
systemtap script that traps on kmalloc and does a dump_stack for any that are
>4096 and <8192 ?

-- Additional comment from jlayton on 2008-02-06 16:12 EST --
Created an attachment (id=294150)
stap script for looking at size 8192 kmallocs and their corresponding kfrees

Systemtap script to try and track this down...


-- Additional comment from jlayton on 2008-02-06 16:17 EST --
Created an attachment (id=294151)
output from stap script

Output from stap script. It looks like we have 6 kmallocs and 5 kfrees. The
lingering kmalloc seems to be the second one that returned 0xffff8800025b8000.

Stack trace from print_backtrace is:

size = 4120, addr = 0xffff8800025b8000
Returning from: 0xffffffff802c9899 : __kmalloc+0x0/0x9f []
Returning to  : 0xffffffff802bd91a : __kzalloc+0x9/0x21 []
 0xffffffff802774b8 : kretprobe_trampoline_holder+0x0/0x2 []
 0xffffffff80404f07 : reqsk_queue_alloc+0x21/0x99 []
 0xffffffff8042bc7b : inet_csk_listen_start+0x1a/0x135 []
 0xffffffff8043c59b : inet_listen+0x42/0x68 []
 0xffffffff881e2464 : svc_makesock+0x127/0x183 [sunrpc]
 0xffffffff881e18b7 : svc_create+0xee/0xf8 [sunrpc]
 0xffffffff883711bf : nfs_callback_up+0x9c/0x14d [nfs]
 0xffffffff8834fe2f : nfs_get_client+0xfd/0x3df [nfs]
 0xffffffff88350158 : nfs4_set_client+0x47/0x173 [nfs]
 0xffffffff88350909 : nfs4_create_server+0x7a/0x393 [nfs]
 0xffffffff8025e823 : error_exit+0x0/0x6e []
 0xffffffff883573b4 : nfs_copy_user_string+0x3c/0x89 [nfs]
 0xffffffff88357cdc : nfs4_get_sb+0x1fc/0x323 [nfs]
 0xffffffff8020adff : get_page_from_freelist+0x32e/0x3bc []
 0xffffffff802cee21 : vfs_kern_mount+0x93/0x11a []
 0xffffffff802ceeea : do_kern_mount+0x36/0x4d []
 0xffffffff802d855b : do_mount+0x68c/0x6fc []
 0xffffffff80418c8b : __qdisc_run+0x36/0x1bb []
 0xffffffff8022bf6b : local_bh_enable+0x9/0xa5 []
 0xffffffff80230ebb : dev_queue_xmit+0x2f2/0x313 []
 0xffffffff80233001 : ip_output+0x29a/0x2dd []
 0xffffffff802628b1 : _spin_lock_irqsave+0x9/0x14 []
 0xffffffff802229d4 : __up_read+0x19/0x7f []
 0xffffffff802d72eb : copy_mount_options+0xce/0x127 []
 0xffffffff80297b70 : search_exception_tables+0x1d/0x2d []
 0xffffffff802654f6 : do_page_fault+0x10e7/0x12cc []
 0xffffffff80263786 : do_debug+0x70/0x151 []
 0xffffffff8020b663 : kfree+0x0/0xc5 []
 0xffffffff80263f9d : kprobe_handler+0x1ac/0x1c8 []
 0xffffffff80263ff4 : kprobe_exceptions_notify+0x3b/0x75 []
 0xffffffff802656fb : notifier_call_chain+0x20/0x32 []
 0xffffffff8020acc5 : get_page_from_freelist+0x1f4/0x3bc []
 0xffffffff8020f2d0 : __alloc_pages+0x65/0x2ce []
 0xffffffff8024c3cf : sys_mount+0x8a/0xcd []
 0xffffffff8025e2f1 : tracesys+0xa7/0xb2 []

I think there's a lot of garbage in there though, but that gives me some idea
of where to look...


-- Additional comment from jlayton on 2008-02-06 16:27 EST --
It looks like we're calling svc_setup_socket to create a socket for the nfs4
callback thread, but I don't see where that gets torn down. I suspect that's
where the problem is, but need to look a bit more closely.


-- Additional comment from jlayton on 2008-02-11 07:59 EST --
The problem seems to be with the sv_nrthreads count for the callback thread.
It's at 2 when we do the umount:

RPC: svc_destroy(NFSv4 callback, 2)

...so it doesn't actually tear down the socket or the svc_serv. nfsd and lockd
also use those functions and when they go down their refcounts seem to be OK...


-- Additional comment from jlayton on 2008-02-11 09:05 EST --
This appears to be an upstream bug too. On a rawhide machine after mounting and
unmounting: 

svc: svc_destroy(NFSv4 callback, 2)

...and I don't see where the socket got torn down. I think I see the problem,
svc_create starts the service with sv_nrthreads==1. Then, svc_create_thread
increments that count.

nfs_callback_up() isn't handling this correctly. It should be calling
svc_destroy() on success and failure, but it isn't. As an example,
lockd_up_proto() handles this correctly. I'll post a patch here soon that I can
propose upstream to fix this.


-- Additional comment from jlayton on 2008-02-11 09:44 EST --
Created an attachment (id=294563)
patch -- fix reference counting for NFS4 callback thread

This patch seems to fix the problem on rawhide. Backporting to RHEL5 and RHEL4
should be trivial. I'm assuming RHEL4 has this problem too, though I need to
check. I'll clone this BZ if so.


-- Additional comment from jlayton on 2008-02-11 10:00 EST --
Patch posted upstream. Awaiting comment...


-- Additional comment from jlayton on 2008-02-11 11:05 EST --
Looks like Trond applied the patch, so I'll plan to propose a similar one for
RHEL5. It's a bit too late for RHEL5.2, but I'll try to make sure we get
something for 5.3.

I'll also plan to take this patch into my test kernels for you to test. Once I
do, I'll post a note here so that you can test them.

Comment 1 Jeff Layton 2008-02-11 19:38:23 UTC
For RHEL4, we'll need the same patch as RHEL5 and will also need. The following
upstream patch. nfs_callback_svc() never calls svc_exit_thread(), so it's also
leaking memory in this situation:

commit f25bc34967d76610d17bc70769d7c220976eeeb1
Author: Trond Myklebust <Trond.Myklebust>
Date:   Mon Mar 20 13:44:46 2006 -0500

    NFSv4: Ensure nfs_callback_down() calls svc_destroy()
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust>



Comment 2 Jeff Layton 2008-02-19 18:35:24 UTC
Created attachment 295316 [details]
patch -- fix several problems with callback thread shutdown

Untested patch. I'm planning to add this to my test kernels for a bit before
proposing internally...

Comment 3 Jeff Layton 2008-02-20 20:50:07 UTC
Created attachment 295447 [details]
patch -- fix several problems with callback thread shutdown

Posted the wrong patch to this BZ yesterday. This one is the real patch...

I've got kernels on my people page with this patch in case anyone is
interested:

http://people.redhat.com/jlayton

Comment 5 RHEL Program Management 2008-04-18 11:21:15 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 RHEL Program Management 2008-09-03 13:02:06 UTC
Updating PM score.

Comment 8 Vivek Goyal 2009-01-14 14:22:04 UTC
Committed in 78.28.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 10 Jan Tluka 2009-05-05 16:13:10 UTC
Patch is in -89.EL kernel.

Comment 12 errata-xmlrpc 2009-05-18 19:19:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html


Note You need to log in before you can comment on or make changes to this bug.