+++ This bug was initially created as a clone of Bug #212547 +++ +++ This bug was initially created as a clone of Bug #157028 +++ Description of problem: rpc.idmapd shows monotonic growth on a client accessing O(1000) mountpoints over NFS: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 16731 0.0 21.1 441160 438276 ? Ss Apr29 2:06 rpc.idmapd Version-Release number of selected component (if applicable): nfs-utils-1.0.6-52 autofs-4.1.3-114 How reproducible: always. Steps to Reproduce: 1.Have each users' home directory as a separate mount point in autofs 2.access these home directories over NFS 3. Actual results: rpc.idmapd's memory resources grow by 40MB whenever all 1000 mountpoints are accessed (some fail due to the 800 mountpoint limit). rpc.idmap never reduces its memory footprint. Expected results: rpc.idmapd should free the memory again, at the very least after automount umounts the mountpoints. Additional info: The NFS server is a dated TruUnix cluster. mounts are performed over autofs as for example nukleon:/amd/nukleon/1/home/ag-hamprecht/thimm on /home/thimm type nfs (rw,nosuid,nodev,intr,proplist,udp,addr=160.45.32.130) -- Additional comment from Axel.Thimm on 2005-05-20 07:01 EST -- This also occurs under RHEL4/x86_64, and has a larger memory leak per mount (~50-60kB). I'm moving this to RHEL. -- Additional comment from mef on 2005-09-30 22:25 EST -- I appear to be suffering from this leak, with automounts to Solaris 8 servers (aka SunOS 5.8). I am using RHEL 4 (Update 1) i386 on a pentium 3 processor. There are only O(10) mounted nfs filesystems, but after 2 days of uptime, the rpc.idmapd process has grown to 5233 blocks (as reported in the SZ column of 'ps -l'). The size of this process appears to increase with the first read of directories and files on just one of those mounts (it increased from 5170 blocks in a read traversal of O(8000) directories and files). It does not appear to grow with repeated traversals of the same partition. However, when the mount is unmounted through the action of automountd, accumulated memory is not released. Last week I had to reboot the box because it had become unusable (rpc.idmapd had grown to over 25000 blocks over 2 weeks). This box only has 384 MB of physical ram. -- Additional comment from poelstra on 2005-10-11 20:42 EST -- QE ACK -- Additional comment from kanderso on 2005-10-20 16:27 EST -- Devel ACK and move to CanFix for U3. -- Additional comment from steved on 2006-02-09 21:27 EST -- Is this still a problem? I have not been able to reproduce it in my testing... -- Additional comment from Axel.Thimm on 2006-02-10 05:34 EST -- We turned off rpc.idmapd, later merged all mount points to one and finally even decomissioned the TrueCluster for a Linux NFS server, so I cannot provide any useful feedback anymore. Maybe Michael Forrest still has a setup to test this. -- Additional comment from steved on 2006-02-10 07:12 EST -- Ok for now, I'm going to put his bug in the DEFERRED state. So If I come across this problems in my travels or if other people start to see this problem again, please feel free to REOPEN the bug... Thank you for your patience! -- Additional comment from chad_gatesman on 2006-05-02 13:01 EST -- Please reopen this. I am seeing this very same problem on our servers. I am running RHEL ES 4 Update 2 (32-bit). I have a little over 100 auto mounted file systems from a variety of OS's, but mostly are from Solaris 8 (Sparc). Here are my package versions: autofs-4.1.3-155 am-utils-6.0.9-15.RHEL4 nfs-utils-1.0.6-65.EL4 Let me know if there is anything else I can provide or do to help diagnose and fix this problem. -- Additional comment from pm-rhel on 2006-08-18 13:43 EST -- This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. -- Additional comment from Kevin_M_Lange on 2006-10-23 08:23 EST -- Can this be escalated to be a fix for RHEL4 as errata or hotfix channel? We're seeing memory growth of 2.5GB after 2 days. -- Additional comment from jlayton on 2006-10-24 15:46 EST -- This upstream list post looks like it might be relevant: http://linux-nfs.org/pipermail/nfsv4/2006-August/004917.html The list post makes it sound like the problem occurs primarily due to NFSv2/3 usage when rpc.idmapd is running. -- Additional comment from jlayton on 2006-10-25 16:16 EST -- Playing with rpc.idmapd on a client and doing 300 mounts and unmounts. After this, when I kill rpc.idmapd, valgrind says this: ==6115== 1,499,216 (284,880 direct, 1,214,336 indirect) bytes in 397 blocks are definitely lost in loss record 15 of 15 ==6115== at 0x400579F: realloc (vg_replace_malloc.c:306) ==6115== by 0xA39BC1: scandir (in /lib/tls/libc-2.3.4.so) ==6115== by 0x804AF1A: dirscancb (idmapd.c:308) ==6115== by 0x804DB14: event_loop (event.c:210) ==6115== by 0x804DBD8: event_dispatch (event.c:222) ==6115== by 0x804C493: main (idmapd.c:293) I'll have a closer look at this code tomorrow... -- Additional comment from jlayton on 2006-10-26 11:44 EST -- Created an attachment (id=139476) patch 1 The leak seems to be coming from the "scandir". scandir() allocates an array of strings via malloc. dirscancb() is calling this function, but isn't freeing the strings and the array when it's complete. This patch seems like it should fix the problem, but with it, I'm getting a reproducable segfault in idmapd once the last filesystem is unmounted. This is *probably* an existing bug that's just now evident now that we're freeing things properly. The segfault is occurring in this line of code: TAILQ_FOREACH(ic, icq, ic_next) { so it seems like something with the list handling here isn't right. -- Additional comment from jlayton on 2006-10-26 13:43 EST -- Created an attachment (id=139493) patch 2 Yes indeed. This line: TAILQ_FOREACH(ic, icq, ic_next) { unrolls into: for(ic=icq->tqh_first; ic != NULL; ic=ic->ic_next.tqe_next) { ...and within this loop we are freeing "ic". The easist fix is to not use the TAILQ_FOREACH macro so we can work around the free. This patch does that and seems to avoid the segfault. -- Additional comment from jlayton on 2006-10-26 14:18 EST -- I've placed i386, x86_64 and SRPM packages on my people page: http://people.redhat.com/jlayton/bz157028/ Please test them and post here whether they seem to take care of the problem. Also please post here if you need packages for other arches for testing. -- Additional comment from jlayton on 2006-10-26 15:03 EST -- 1.0.10 Patches posted to: nfs.net Subject: [NFS] [PATCH 1/2] idmapd: plug memory leak in dirscancb Subject: [NFS] [PATCH 2/2] idmapd: fix use after free in dirscancb cleanup loop -- Additional comment from jlayton on 2006-10-27 07:52 EST -- Going ahead and opening a RHEL5 bug on this. I've not actually tested RHEL5 to make sure this bug is there, but this bug exists upstream in the latest nfs-utils packages so I expect that it does. I pushed these patches to the nfs list yesterday so I expect they'll make it upstream soon. If we've frozen the nfs-utils version for RHEL5, however, we should add these to it. -- Additional comment from pm-rhel on 2006-10-27 08:00 EST -- This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. -- Additional comment from jlayton on 2006-10-27 14:58 EST -- Tried to set devel_ack here, but I don't seem to have the right permissions. We have the patches to fix this already, though. -- Additional comment from jturner on 2006-12-01 15:34 EST -- QE ack for RHEL5. -- Additional comment from jlayton on 2006-12-04 07:34 EST -- Committed in 1.0.9-12... -- Additional comment from pm-rhel on 2006-12-22 20:46 EST -- A package has been built which should help the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
*** This bug has been marked as a duplicate of 157028 ***