From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6 Description of problem: We at VMware, recently came across an issue where our application which uses glibc API calls (getpwent,getgrent,getpwname,getgrname) showed symptoms of memory leak. Our team isolated the issue to happen only when the host is configured to authenticate against a NIS+ server. Our application does a large number of the above mentioned API calls to get the entire user and group list for internal use in the application. The configuration where the issue is seen is when the NIS+ server is configured with 8000 or more users and similar number of groups. The memory leak is not seen if the local user/group list is configured similar to the NIS+ server with 8000+ entries. The memory leak soon (a couple of hours) leads to an out of memory condition which causes the process to crash and hence our application. For ease of use and test purposes, we have built a simple utility which is a stripped-down code version of what is implemented in the application for gathering the user/group data. Similar memory leak symptoms can be seen with this simple utility as well. I am attaching the source code of the simple utility. The hosts were this application has been tested are 1) VMware ESX Server 2.5.1 using GLIBC version 2.2.4-32, version 2.2.4-32.20 2) RHEL Workstation 3.0 U4 3) RHEL Advanced Server 4.0 U2 And we could observe the memory leak in all the cases. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1) Configure the NIS+ server with 8000+ users and groups 2) Configure a Linux Host to authenticate against the NIS+ server. [Make the host a NIS+ client and configure nsswitch.conf to use NIS+] 3) Compile the source code of the simple utility, attached with this email using gcc [ gcc -o nisMem NISMemoryChecking.c] 4) Open two terminal sessions. a) In one terminal session, run the simple utility as ./nisMem 0 2 0 - The first parameter is the maximum loop count [One full iteration of gathering user/group list] 2 - The second parameter is the sleep interval between two consecutive loops b) In the other terminal, run top | grep nis. 5) You will soon notice that the memory utilization of the program keeps constantly rising and never goes down until the application is killed with "^C" or is terminated through other mechanisms. Actual Results: we could observe the memory leak. Additional info:
Created attachment 121773 [details] Application which exposes the memory leak in the NIS+ resolver in glibc
RHEL (nor any other linux distro) doesn't include NIS+ server and NIS+ client is included in RHEL just as is, unsupported. If you want this to be fixed, please on RHEL4 install glibc-debuginfo (from http://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4AS/en/os/Debuginfo/ ) and run valgrind on the application to find out where exactly the memory leaks come from. Alternatively or additionally, you can stick a mtrace () call at the beginning of main () and run the program under mtrace.
Created attachment 121876 [details] Valgrind on RHEL4.0 AS U2 configured to retrieve 50 users
Created attachment 121877 [details] Valgrind on RHEL4.0 AS U2 configured to retrieve 1500 users
Created attachment 121878 [details] Valgrind with verbose option on RHEL4.0 AS U2 configured to retrieve 50 users
Created attachment 121879 [details] mtrace on RHAS 2.1 retrieving 75 users
Created attachment 121880 [details] mtrace on RHAS 2.1 retrieving 1500 users
The NIS+ server is setup on a Sun Solaris server. Sorry for not specifying the info earlier. I have attached the valgrind outputs from running the application accessing around 25 users and 1500 users on RHEL4.0 Update 2. valgrindoutput.txt - Test app limited to accessing 50 total users, 24 local and rest from the NIS+ server. valgrindoutput-1500.txt - Test app limited to accessing 1500 total users, 24 local and rest from the NIS+ server. Syntax used for valgrind valgrind --leak-check=yes --tool=memcheck ./test.o valgrindoutput-1500-verb.txt - same as valgrindoutput-1500.txt but the verbose option turned on in valgrind. Syntax used for valgrind valgrind -v --show-reachable=yes --leak-check=yes --tool=memcheck ./test.o The test was run after installing the glibc-debuginfo-2.3.4.2-13 on the RHEL4.0 U2 server. Performed the following test on a RHAS 2.1 server. Rebuilt glibc from glibc-2.2.4-32.20.src.rpm and ran the test app with 75 users and 1500 users with MALLOC tracing enabled using mtrace(). Results are attached. mtrace-75.txt - Output of mtrace with 75 users (42 local users). mtrace-1500.txt - Output of trace wit 1500 users (42 local users). Here is the source of the test app. #include <pwd.h> #define USER_COUNT 50 main() { struct passwd *pwent; int i=0; mtrace(); setpwent(); for (i=0;i<USER_COUNT;i++) { pwent=getpwent(); printf("USER %s \n",pwent->pw_name); } endpwent(); } nsswitch.conf is setup for passwd and group as passwd: files nisplus group: files nisplus We would like to have a fix for RHAS 2.1 since our application is based on glibc from RHAS 2.1. Thanks.
I found an issue in sunrpc/auth_des.c. In function authdes_pk_create, if (ckey == NULL) { if (key_gendes (&auth->ah_key) < 0) { debug ("authdes_create: unable to gen conversation key"); return NULL; <---- } } If key_gendes returns a negative value, then the memory allocation for auth and ad is not cleaned up. Shouldn't it instead be if (ckey == NULL) { if (key_gendes (&auth->ah_key) < 0) { debug ("authdes_create: unable to gen conversation key"); goto failed; } } Thanks.
The auth_des.c case is certainly wrong. I've changed it upstream.
A RHEL4 U3 (proposed) rpm with recent fortnight of NIS+ fixes is at: ftp://people.redhat.com/jakub/glibc/2.3.4-2.17.174847/ Can you please check it out and test it? Thanks.
Perhaps bad wording, sorry. RHEL4 U3 proposed rpm is 2.3.4-2.17 and the backports of recent changes were applied on top of it. While -2.17 is currently undergoing QA and has seen some testing already, the NIS+ changes on top of it are mostly untested, so we'd appreciate any feedback or problems you'll encounter.
Ping.
who are we waiting on?
Without testing feedback we really can't include the NIS+ fixes in any updates. I'll close this bug as WONTFIX if no feedback is provided within next 14 days.
We found a fatal bug in glibc-nis+.patch. Fixed testing rpms uploaded to: RHEL4 U3 glibc with NIS+ fixes: ftp://people.redhat.com/jakub/glibc/2.3.4-2.19.174847/ RHEL3 U7 glibc with NIS+ fixes: ftp://people.redhat.com/jakub/glibc/2.3.2-95.39.184362.2/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0510.html