Bug 174847 - Memory leak in glibc NIS+ resolver when doing repetitive getpwent calls
Summary: Memory leak in glibc NIS+ resolver when doing repetitive getpwent calls
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: glibc
Version: 4.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 181409
TreeView+ depends on / blocked
 
Reported: 2005-12-02 18:36 UTC by Raviprasad
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHBA-2006-0510
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 21:33:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Application which exposes the memory leak in the NIS+ resolver in glibc (3.13 KB, text/plain)
2005-12-02 18:38 UTC, Raviprasad
no flags Details
Valgrind on RHEL4.0 AS U2 configured to retrieve 50 users (3.60 KB, text/plain)
2005-12-06 00:46 UTC, Raviprasad
no flags Details
Valgrind on RHEL4.0 AS U2 configured to retrieve 1500 users (50.27 KB, text/plain)
2005-12-06 00:47 UTC, Raviprasad
no flags Details
Valgrind with verbose option on RHEL4.0 AS U2 configured to retrieve 50 users (56.33 KB, text/plain)
2005-12-06 00:48 UTC, Raviprasad
no flags Details
mtrace on RHAS 2.1 retrieving 75 users (13.45 KB, text/plain)
2005-12-06 00:49 UTC, Raviprasad
no flags Details
mtrace on RHAS 2.1 retrieving 1500 users (519.99 KB, text/plain)
2005-12-06 00:50 UTC, Raviprasad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0510 0 normal SHIPPED_LIVE glibc bug fix update 2006-08-09 04:00:00 UTC

Description Raviprasad 2005-12-02 18:36:07 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

Description of problem:
We at VMware, recently came across an issue where our application which uses glibc API calls (getpwent,getgrent,getpwname,getgrname) showed symptoms of memory leak. Our team isolated the issue to happen only when the host is configured to authenticate against a NIS+ server. Our application does a large number of the above mentioned API calls to get the entire user and group list for internal use in the application. The configuration where the issue is seen is when the NIS+ server is configured with 8000 or more users and similar number of groups. The memory leak is not seen if the local user/group list is configured similar to the NIS+ server with 8000+ entries. The memory leak soon (a couple of hours) leads to an out of memory condition which causes the process to crash and hence our application. For ease of use and test purposes, we have built a simple utility which is a stripped-down code version of what is implemented in the application for gathering the user/group data. Similar memory leak symptoms can be seen with this simple utility as well. I am attaching the source code of the simple utility.
 
The hosts were this application has been tested are
1) VMware ESX Server 2.5.1 using GLIBC version 2.2.4-32, version  2.2.4-32.20
2) RHEL Workstation 3.0 U4
3) RHEL Advanced Server 4.0 U2

And we could observe the memory leak in all the cases.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1) Configure the NIS+ server with 8000+ users and groups
2) Configure a Linux Host to authenticate against the NIS+ server. [Make the host a NIS+ client and configure nsswitch.conf to use NIS+]
3) Compile the source code of the simple utility, attached with this email using gcc [ gcc -o nisMem NISMemoryChecking.c]
4) Open two terminal sessions.
    a) In one terminal session, run the simple utility as
        ./nisMem 0 2
              0 - The first parameter is the maximum loop count [One full iteration of gathering user/group list]
              2 - The second parameter is the sleep interval between two consecutive loops
    b) In the other terminal, run top | grep nis.
5) You will soon notice that the memory utilization of the program keeps constantly rising and never goes down until the application is killed with "^C" or is terminated through other mechanisms.

Actual Results:  we could observe the memory leak.

Additional info:

Comment 1 Raviprasad 2005-12-02 18:38:17 UTC
Created attachment 121773 [details]
Application which exposes the memory leak in the NIS+ resolver in glibc

Comment 2 Jakub Jelinek 2005-12-02 21:01:53 UTC
RHEL (nor any other linux distro) doesn't include NIS+ server and NIS+ client
is included in RHEL just as is, unsupported.
If you want this to be fixed, please on RHEL4 install glibc-debuginfo
(from
http://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4AS/en/os/Debuginfo/
) and run valgrind on the application to find out where exactly the memory leaks
come from.  Alternatively or additionally, you can stick a mtrace () call
at the beginning of main () and run the program under mtrace.

Comment 3 Raviprasad 2005-12-06 00:46:48 UTC
Created attachment 121876 [details]
Valgrind on RHEL4.0 AS U2 configured to retrieve 50 users

Comment 4 Raviprasad 2005-12-06 00:47:36 UTC
Created attachment 121877 [details]
Valgrind on RHEL4.0 AS U2 configured to retrieve 1500 users

Comment 5 Raviprasad 2005-12-06 00:48:33 UTC
Created attachment 121878 [details]
Valgrind with verbose option on RHEL4.0 AS U2 configured to retrieve 50 users

Comment 6 Raviprasad 2005-12-06 00:49:20 UTC
Created attachment 121879 [details]
mtrace on RHAS 2.1 retrieving 75 users

Comment 7 Raviprasad 2005-12-06 00:50:01 UTC
Created attachment 121880 [details]
mtrace on RHAS 2.1 retrieving 1500 users

Comment 8 Raviprasad 2005-12-06 00:51:38 UTC
The NIS+ server is setup on a Sun Solaris server. Sorry for not specifying the
info earlier. I have attached the valgrind outputs from running the application
accessing around 25 users and 1500 users on RHEL4.0 Update 2. 

valgrindoutput.txt - Test app limited to accessing 50 total users, 24 local and
rest from the NIS+ server.

valgrindoutput-1500.txt - Test app limited to accessing 1500 total users, 24
local and rest from the NIS+ server. Syntax used for valgrind 
valgrind --leak-check=yes --tool=memcheck ./test.o

valgrindoutput-1500-verb.txt - same as valgrindoutput-1500.txt but the verbose
option turned on in valgrind. Syntax used for valgrind 
valgrind -v --show-reachable=yes --leak-check=yes --tool=memcheck ./test.o

The test was run after installing the glibc-debuginfo-2.3.4.2-13 on the RHEL4.0
U2 server. 

Performed the following test on a RHAS 2.1 server.
Rebuilt glibc from glibc-2.2.4-32.20.src.rpm and ran the test app with 75 users
and 1500 users with MALLOC tracing enabled using mtrace(). Results are attached.

mtrace-75.txt - Output of mtrace with 75 users (42 local users).
mtrace-1500.txt - Output of trace wit 1500 users (42 local users).

Here is the source of the test app.
#include <pwd.h>
#define USER_COUNT 50
main()
{
  struct passwd *pwent;
  int i=0;
  mtrace();
  setpwent();
  for (i=0;i<USER_COUNT;i++) {
      pwent=getpwent();
      printf("USER %s \n",pwent->pw_name);
  }
  endpwent();
}


nsswitch.conf is setup for passwd and group as 
passwd:     files nisplus
group:     files nisplus

We would like to have a fix for RHAS 2.1 since our application is based on glibc
from RHAS 2.1.

Thanks.

Comment 11 Raviprasad 2005-12-06 23:35:30 UTC
I found an issue in sunrpc/auth_des.c.
In function authdes_pk_create,

 if (ckey == NULL)
    {
      if (key_gendes (&auth->ah_key) < 0)
        {
          debug ("authdes_create: unable to gen conversation key");
          return NULL;   <----
        }
    }

If key_gendes returns a negative value, then the memory allocation for auth and
ad is not cleaned up. Shouldn't it instead be

 if (ckey == NULL)
    {
      if (key_gendes (&auth->ah_key) < 0)
        {
          debug ("authdes_create: unable to gen conversation key");
          goto failed;
        }
    }

Thanks.

Comment 12 Ulrich Drepper 2005-12-07 04:09:14 UTC
The auth_des.c case is certainly wrong.  I've changed it upstream.

Comment 14 Jakub Jelinek 2005-12-09 13:33:03 UTC
A RHEL4 U3 (proposed) rpm with recent fortnight of NIS+ fixes is at:
ftp://people.redhat.com/jakub/glibc/2.3.4-2.17.174847/
Can you please check it out and test it?
Thanks.

Comment 15 Jakub Jelinek 2005-12-09 13:38:01 UTC
Perhaps bad wording, sorry. RHEL4 U3 proposed rpm is 2.3.4-2.17 and
the backports of recent changes were applied on top of it.  While -2.17
is currently undergoing QA and has seen some testing already, the NIS+
changes on top of it are mostly untested, so we'd appreciate any feedback
or problems you'll encounter.

Comment 16 Jakub Jelinek 2005-12-22 13:55:22 UTC
Ping.

Comment 17 Michael Waite 2005-12-22 14:58:25 UTC
who are we waiting on?

Comment 19 Jakub Jelinek 2006-01-16 10:43:54 UTC
Without testing feedback we really can't include the NIS+ fixes in any updates.
I'll close this bug as WONTFIX if no feedback is provided within next 14 days.

Comment 36 Jakub Jelinek 2006-04-03 20:59:27 UTC
We found a fatal bug in glibc-nis+.patch.  Fixed testing rpms uploaded to:
RHEL4 U3 glibc with NIS+ fixes:
ftp://people.redhat.com/jakub/glibc/2.3.4-2.19.174847/
RHEL3 U7 glibc with NIS+ fixes:
ftp://people.redhat.com/jakub/glibc/2.3.2-95.39.184362.2/


Comment 43 Red Hat Bugzilla 2006-08-10 21:33:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0510.html



Note You need to log in before you can comment on or make changes to this bug.