Bug 157028
Summary: | Memory leak in rpc.idmapd | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Axel Thimm <axel.thimm> | ||||||
Component: | nfs-utils | Assignee: | Jeff Layton <jlayton> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Ben Levenson <benl> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.0 | CC: | chad_gatesman, kevin_m_lange, mef, steved | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2007-0316 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-05-01 23:09:23 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Axel Thimm
2005-05-06 11:29:41 UTC
This also occurs under RHEL4/x86_64, and has a larger memory leak per mount (~50-60kB). I'm moving this to RHEL. I appear to be suffering from this leak, with automounts to Solaris 8 servers (aka SunOS 5.8). I am using RHEL 4 (Update 1) i386 on a pentium 3 processor. There are only O(10) mounted nfs filesystems, but after 2 days of uptime, the rpc.idmapd process has grown to 5233 blocks (as reported in the SZ column of 'ps -l'). The size of this process appears to increase with the first read of directories and files on just one of those mounts (it increased from 5170 blocks in a read traversal of O(8000) directories and files). It does not appear to grow with repeated traversals of the same partition. However, when the mount is unmounted through the action of automountd, accumulated memory is not released. Last week I had to reboot the box because it had become unusable (rpc.idmapd had grown to over 25000 blocks over 2 weeks). This box only has 384 MB of physical ram. Is this still a problem? I have not been able to reproduce it in my testing... We turned off rpc.idmapd, later merged all mount points to one and finally even decomissioned the TrueCluster for a Linux NFS server, so I cannot provide any useful feedback anymore. Maybe Michael Forrest still has a setup to test this. Ok for now, I'm going to put his bug in the DEFERRED state. So If I come across this problems in my travels or if other people start to see this problem again, please feel free to REOPEN the bug... Thank you for your patience! Please reopen this. I am seeing this very same problem on our servers. I am running RHEL ES 4 Update 2 (32-bit). I have a little over 100 auto mounted file systems from a variety of OS's, but mostly are from Solaris 8 (Sparc). Here are my package versions: autofs-4.1.3-155 am-utils-6.0.9-15.RHEL4 nfs-utils-1.0.6-65.EL4 Let me know if there is anything else I can provide or do to help diagnose and fix this problem. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Can this be escalated to be a fix for RHEL4 as errata or hotfix channel? We're seeing memory growth of 2.5GB after 2 days. This upstream list post looks like it might be relevant: http://linux-nfs.org/pipermail/nfsv4/2006-August/004917.html The list post makes it sound like the problem occurs primarily due to NFSv2/3 usage when rpc.idmapd is running. Playing with rpc.idmapd on a client and doing 300 mounts and unmounts. After this, when I kill rpc.idmapd, valgrind says this: ==6115== 1,499,216 (284,880 direct, 1,214,336 indirect) bytes in 397 blocks are definitely lost in loss record 15 of 15 ==6115== at 0x400579F: realloc (vg_replace_malloc.c:306) ==6115== by 0xA39BC1: scandir (in /lib/tls/libc-2.3.4.so) ==6115== by 0x804AF1A: dirscancb (idmapd.c:308) ==6115== by 0x804DB14: event_loop (event.c:210) ==6115== by 0x804DBD8: event_dispatch (event.c:222) ==6115== by 0x804C493: main (idmapd.c:293) I'll have a closer look at this code tomorrow... Created attachment 139476 [details]
patch 1
The leak seems to be coming from the "scandir". scandir() allocates an array of
strings via malloc. dirscancb() is calling this function, but isn't freeing the
strings and the array when it's complete.
This patch seems like it should fix the problem, but with it, I'm getting a
reproducable segfault in idmapd once the last filesystem is unmounted. This is
*probably* an existing bug that's just now evident now that we're freeing
things properly.
The segfault is occurring in this line of code:
TAILQ_FOREACH(ic, icq, ic_next) {
so it seems like something with the list handling here isn't right.
Created attachment 139493 [details]
patch 2
Yes indeed. This line:
TAILQ_FOREACH(ic, icq, ic_next) {
unrolls into:
for(ic=icq->tqh_first; ic != NULL; ic=ic->ic_next.tqe_next) {
...and within this loop we are freeing "ic". The easist fix is to not use the
TAILQ_FOREACH macro so we can work around the free. This patch does that and
seems to avoid the segfault.
I've placed i386, x86_64 and SRPM packages on my people page: http://people.redhat.com/jlayton/bz157028/ Please test them and post here whether they seem to take care of the problem. Also please post here if you need packages for other arches for testing. 1.0.10 Patches posted to: nfs.net Subject: [NFS] [PATCH 1/2] idmapd: plug memory leak in dirscancb Subject: [NFS] [PATCH 2/2] idmapd: fix use after free in dirscancb cleanup loop *** Bug 222421 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0316.html |