Bug 157028

Summary: Memory leak in rpc.idmapd
Product: Red Hat Enterprise Linux 4 Reporter: Axel Thimm <axel.thimm>
Component: nfs-utilsAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: chad_gatesman, kevin_m_lange, mef, steved
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0316 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-01 23:09:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch 1
none
patch 2 none

Description Axel Thimm 2005-05-06 11:29:41 UTC
Description of problem:

rpc.idmapd shows monotonic growth on a client accessing O(1000) mountpoints over
NFS:

USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root     16731  0.0 21.1 441160 438276 ?     Ss   Apr29   2:06 rpc.idmapd

Version-Release number of selected component (if applicable):
nfs-utils-1.0.6-52
autofs-4.1.3-114

How reproducible:
always.

Steps to Reproduce:
1.Have each users' home directory as a separate mount point in autofs
2.access these home directories over NFS
3.
  
Actual results:
rpc.idmapd's memory resources grow by 40MB whenever all 1000 mountpoints are
accessed (some fail due to the 800 mountpoint limit). rpc.idmap never reduces
its memory footprint. 

Expected results:
rpc.idmapd should free the memory again, at the very least after automount
umounts the mountpoints.

Additional info:
The NFS server is a dated TruUnix cluster. mounts are performed over autofs as
for example
nukleon:/amd/nukleon/1/home/ag-hamprecht/thimm on /home/thimm type nfs
(rw,nosuid,nodev,intr,proplist,udp,addr=160.45.32.130)

Comment 1 Axel Thimm 2005-05-20 11:01:16 UTC
This also occurs under RHEL4/x86_64, and has a larger memory leak per mount
(~50-60kB). I'm moving this to RHEL.

Comment 2 Michael Forrest 2005-10-01 02:25:13 UTC
I appear to be suffering from this leak, with automounts to Solaris 8 servers
(aka SunOS 5.8).

I am using RHEL 4 (Update 1) i386 on a pentium 3 processor.

There are only O(10) mounted nfs filesystems, but after 2 days of uptime, the
rpc.idmapd process has grown to 5233 blocks (as reported in the SZ column of
'ps -l'). The size of this process appears to increase with the first read of
directories and files on just one of those mounts (it increased from 5170
blocks in a read traversal of O(8000) directories and files). It does not
appear to grow with repeated traversals of the same partition. However,
when the mount is unmounted through the action of automountd, accumulated
memory is not released.

Last week I had to reboot the box because it had become unusable (rpc.idmapd
had grown to over 25000 blocks over 2 weeks). This box only has 384 MB of
physical ram.


Comment 5 Steve Dickson 2006-02-10 02:27:30 UTC
Is this still a problem? I have not been able to reproduce it in my testing...

Comment 6 Axel Thimm 2006-02-10 10:34:58 UTC
We turned off rpc.idmapd, later merged all mount points to one and finally even
decomissioned the TrueCluster for a Linux NFS server, so I cannot provide any
useful feedback anymore. Maybe Michael Forrest still has a setup to test this.

Comment 7 Steve Dickson 2006-02-10 12:12:56 UTC
Ok for now, I'm going to put his bug in the DEFERRED state. So If
I come across this problems in my travels or if other people start
to see this problem again, please feel free to REOPEN the bug...

Thank you for your patience!

Comment 8 Chad Gatesman 2006-05-02 17:01:53 UTC
Please reopen this.

I am seeing this very same problem on our servers.  I am running RHEL ES 4
Update 2 (32-bit).  I have a little over 100 auto mounted file systems from a
variety of OS's, but mostly are from Solaris 8 (Sparc).  Here are my package
versions:

autofs-4.1.3-155
am-utils-6.0.9-15.RHEL4
nfs-utils-1.0.6-65.EL4

Let me know if there is anything else I can provide or do to help diagnose and
fix this problem.

Comment 9 RHEL Program Management 2006-08-18 17:43:42 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Kevin Lange 2006-10-23 12:23:30 UTC
Can this be escalated to be a fix for RHEL4 as errata or hotfix channel?  We're
seeing memory growth of 2.5GB after 2 days.  

Comment 11 Jeff Layton 2006-10-24 19:46:51 UTC
This upstream list post looks like it might be relevant:

http://linux-nfs.org/pipermail/nfsv4/2006-August/004917.html

The list post makes it sound like the problem occurs primarily due to NFSv2/3
usage when rpc.idmapd is running.


Comment 12 Jeff Layton 2006-10-25 20:16:23 UTC
Playing with rpc.idmapd on a client and doing 300 mounts and unmounts. After
this, when I kill rpc.idmapd, valgrind says this:

==6115== 1,499,216 (284,880 direct, 1,214,336 indirect) bytes in 397 blocks are
definitely lost in loss record 15 of 15
==6115==    at 0x400579F: realloc (vg_replace_malloc.c:306)
==6115==    by 0xA39BC1: scandir (in /lib/tls/libc-2.3.4.so)
==6115==    by 0x804AF1A: dirscancb (idmapd.c:308)
==6115==    by 0x804DB14: event_loop (event.c:210)
==6115==    by 0x804DBD8: event_dispatch (event.c:222)
==6115==    by 0x804C493: main (idmapd.c:293)

I'll have a closer look at this code tomorrow...


Comment 13 Jeff Layton 2006-10-26 15:44:41 UTC
Created attachment 139476 [details]
patch 1

The leak seems to be coming from the "scandir". scandir() allocates an array of
strings via malloc. dirscancb() is calling this function, but isn't freeing the
strings and the array when it's complete.

This patch seems like it should fix the problem, but with it, I'm getting a
reproducable segfault in idmapd once the last filesystem is unmounted. This is
*probably* an existing bug that's just now evident now that we're freeing
things properly.

The segfault is occurring in this line of code:

	TAILQ_FOREACH(ic, icq, ic_next) {

so it seems like something with the list handling here isn't right.

Comment 14 Jeff Layton 2006-10-26 17:43:10 UTC
Created attachment 139493 [details]
patch 2

Yes indeed. This line:

	TAILQ_FOREACH(ic, icq, ic_next) {

unrolls into:

	for(ic=icq->tqh_first; ic != NULL; ic=ic->ic_next.tqe_next) {

...and within this loop we are freeing "ic". The easist fix is to not use the
TAILQ_FOREACH macro so we can work around the free. This patch does that and
seems to avoid the segfault.

Comment 15 Jeff Layton 2006-10-26 18:18:53 UTC
I've placed i386, x86_64 and SRPM packages on my people page:

http://people.redhat.com/jlayton/bz157028/

Please test them and post here whether they seem to take care of the problem.
Also please post here if you need packages for other arches for testing.


Comment 16 Jeff Layton 2006-10-26 19:03:29 UTC
1.0.10 Patches posted to:

nfs.net

Subject: [NFS] [PATCH 1/2] idmapd: plug memory leak in dirscancb
Subject: [NFS] [PATCH 2/2] idmapd: fix use after free in dirscancb cleanup loop


Comment 19 Steve Dickson 2007-01-12 12:07:13 UTC
*** Bug 222421 has been marked as a duplicate of this bug. ***

Comment 23 Red Hat Bugzilla 2007-05-01 23:09:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0316.html