Bug 103142 - [GLIBC AS2.1] getpwuid() leaks memory when "compat" is used in nsswitch.conf
[GLIBC AS2.1] getpwuid() leaks memory when "compat" is used in nsswitch.conf
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: glibc (Show other bugs)
2.1
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-08-26 17:51 EDT by Ian McLeod
Modified: 2016-11-24 09:52 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-10-03 14:24:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
testbed_getpwnam (4.93 KB, text/plain)
2003-10-02 14:02 EDT, Eric Hagberg
no flags Details
testbed_getpwnam_nscd (2.18 KB, text/plain)
2003-10-02 14:03 EDT, Eric Hagberg
no flags Details

  None (edit)
Description Ian McLeod 2003-08-26 17:51:03 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
Machines configured to use compat mode for passwd (through /etc/nsswitch.conf)
leak memory with each call to getpwuid() that results in an NIS lookup. 
(Lookups that are satisfied by the local passwd file, before going to NIS, do
not show this behavior.)

If nscd is running, the leak occurs in that daemon and not the process calling
getpwuid().  

Version-Release number of selected component (if applicable):
All versions through most recent errata

How reproducible:
Always

Steps to Reproduce:
Configure a machine to use NIS is compat mode.  
Add a "+::::::" line to the end of the password file.
Stop nscd.
Run a C program that does an infinite loop of getpwuid() calls
(call must be passed a UID that exists only in NIS and not in the local password
file)

Actual Results:  Process memory consumption grows until it fails with ENOMEM
errors.  (If nscd is running it will leak memory instead, albeit at a slower
rate due to caching.) 

Expected Results:  All but the first call to getpwuid() should have resulted in
no new memory being allocated.  Instead, it should be returning a pointer to the
existing passwd struct for the ID in question.

Additional info:

I have discussed this bug with Ulrich over IRC.  He indicated that it was a
known glibc issue when dealing with compat mode.  He further indicated that it
is fixed in the upstream glibc.
Comment 1 Ulrich Drepper 2003-08-26 18:22:16 EDT
The leak I remember was fixed in

2003-04-23  Ulrich Drepper  <drepper@redhat.com>

	* nis/ypclnt.c (__yp_bind): Expect YPDB parameter to always be !=
	NULL.  Remove code made redundant by this assumption.
	(__yp_unbind): Add call to free.  Adjust all callers.

	* nis/ypclnt.c (yp_all): Free the dom_binding object after
	unbinding it.

	* grp/initgroups.c (getgrouplist): Don't copy too much into the
	user buffer if more groups are found than fit into it.

	* nis/nss_nis/nis-initgroups.c (_nss_nis_initgroups_dyn): Use
	extend_alloca.


But there have been a number of other bugs in that code.
Comment 2 Jakub Jelinek 2003-09-29 06:21:36 EDT
Should be fixed in glibc-2.2.4-32.9.
Comment 3 Eric Hagberg 2003-10-02 11:47:44 EDT
the patch in glibc-2.2.4-nis3.patch from the glibc-2.2.4-32.9 package seems to
break things for me. I start getting out of memory errors, at least when running
heavily threaded apps that make simultaneous calls to getpwnam_r where
getnetgrent is called (due to netgroup expansion in /etc/passwd w/ compat turned
on in the passwd map in nsswitch.conf).

It seems that the patch returns enomem, where it used to just continue on.
Shouldn't it just free the structure and continute?

In fact, if I remove the bits of the patch to the nis/nis_table.c file that look
like this:

+      if (ibreq->ibr_name == NULL)
+       {
+         nis_free_request (ibreq);
+         NIS_RES_STATUS (res) = NIS_NOMEMORY;
+         return res;
+       }

but leave everything else the same, the out of memory errors go away.
Comment 4 Ulrich Drepper 2003-10-02 12:35:38 EDT
Do you have a line number for that piece of code?

I added these error handling blocks where they are necessary.  If these
conditionals trigger for you, you _are_ out of memory.  The implementation
avoided failing instantly in the past, producing unrelated errors.  E.g., the
situation which first block in the patch where this check is added handles,
would fail in the following __nisfind_server call with NIS_BADNAME.  That's
wrong.  The name is not bad, it doesn't exist since no memory is available.

So, I tend to not believe at this point that things are worse.  Instead the
system is now reporting the real reason.

If you have reason to believe it's different let me know I can reproduce it.  I
can create NIS databases etc but I need some more info on the kind of entries
you are using.
Comment 5 Eric Hagberg 2003-10-02 14:01:57 EDT
OK. Here's how to cause the problem (and I know that I'm not out of memory). I
have this passwd file:

vampire # cat /etc/passwd
root:x:0:1:0000-Admin(0000):/:/bin/ksh
daemon:x:1:1:0000-Admin(0000):/:
bin:x:2:2:0000-Admin(0000):/usr/bin:
sys:x:3:3:0000-Admin(0000):/:
adm:x:4:4:0000-Admin(0000):/var/adm:
lp:x:71:8:0000-lp(0000):/usr/spool/lp:
uucp:x:5:5:0000-uucp(0000):/usr/lib/uucp:
nuucp:x:9:9:0000-uucp(0000):/var/spool/uucppublic:/usr/lib/uucp/uucico
listen:x:37:4:Network Admin:/usr/net/nls:
nobody:x:60001:60001:uid no body:/:
noaccess:x:60002:60002:uid no access:/:
postfix:x:89:89::/var/spool/postfix:/bin/true
+@mqops::0:0:::
+:NOLOGIN:0:0:::/usr/bin/nologin
vampire # ypmatch mqops netgroup
(,wpm,) (,Xlbarnes,) (,molinam,) (,biersma,) (,phuang,) (,Xmortonm,) (,klantsd,)
(,mqm,) (,Xmqadmin,) (,Xangho,) (,esuss,) (,umesh,) (,lonstein,) (,tanski,)
vampire #

I'll attach the test programs and how to reproduce.

The nis3 patchfile contains 5 references to the same code construct in
nis/nis_table.c, which I remove to stop the bogus out of memory errors from
occurring. I think you have the patchfile to get the line numbers.
Comment 6 Eric Hagberg 2003-10-02 14:02:56 EDT
Created attachment 94887 [details]
testbed_getpwnam
Comment 7 Eric Hagberg 2003-10-02 14:03:27 EDT
Created attachment 94888 [details]
testbed_getpwnam_nscd
Comment 8 Eric Hagberg 2003-10-02 14:06:16 EDT
when I run the testbeds simultaneously, I sometimes get this problem:

In one window:
$ while true;do ./testbed_getpwnam_nscd 10000;done
Client: start (children=10000)
Did read 10000 userids
Testbed complete
Client: start (children=10000)
Did read 10000 userids
Testbed complete
(...snip...)

In another, I run this on the same machine, and get:
saias10 /var/tmp 3$ ./testbed_getpwnam
Client: start (threads=100, runs=10000)
Did read 10000 userids
Maximum no of threads reached (run 102) - sleep 1 second
Maximum no of threads reached (run 207) - sleep 1 second
(...snip...)
Maximum no of threads reached (run 7511) - sleep 1 second
Maximum no of threads reached (run 7624) - sleep 1 second
pthread_create error: Cannot allocate memory
saias10 /var/tmp 4$

The machine has memory free, according to /proc/meminfo when the error occurs:


Wed Oct  1 11:56:40 EDT 2003
       total:    used:    free:  shared: buffers:  cached:
Mem:  2372177920 2309406720 62771200        0 345853952 1710452736
Swap: 2147442688        0 2147442688
MemTotal:      2316580 kB
MemFree:         61300 kB
MemShared:           0 kB
Buffers:        337748 kB
Cached:        1670364 kB
SwapCached:          0 kB
Active:        1052512 kB
Inact_dirty:    955600 kB
Inact_clean:         0 kB
Inact_target:   578764 kB
HighTotal:     1441776 kB
HighFree:         2040 kB
LowTotal:       874804 kB
LowFree:         59260 kB
SwapTotal:     2097112 kB
SwapFree:      2097112 kB
BigPagesFree:        0 kB
Wed Oct  1 11:56:43 EDT 2003
       total:    used:    free:  shared: buffers:  cached:
Mem:  2372177920 2317090816 55087104        0 345853952 1710669824
Swap: 2147442688        0 2147442688
MemTotal:      2316580 kB
MemFree:         53796 kB
MemShared:           0 kB
Buffers:        337748 kB
Cached:        1670576 kB
SwapCached:          0 kB
Active:        1051728 kB
Inact_dirty:    956600 kB
Inact_clean:         0 kB
Inact_target:   578764 kB
HighTotal:     1441776 kB
HighFree:         2040 kB
LowTotal:       874804 kB
LowFree:         51756 kB
SwapTotal:     2097112 kB
SwapFree:      2097112 kB
BigPagesFree:        0 kB
Wed Oct  1 11:56:45 EDT 2003
       total:    used:    free:  shared: buffers:  cached:
Mem:  2372177920 2302304256 69873664        0 345853952 1710936064
Swap: 2147442688        0 2147442688
MemTotal:      2316580 kB
MemFree:         68236 kB
MemShared:           0 kB
Buffers:        337748 kB
Cached:        1670836 kB
SwapCached:          0 kB
Active:        1051732 kB
Inact_dirty:    956852 kB
Inact_clean:         0 kB
Inact_target:   578764 kB
HighTotal:     1441776 kB
HighFree:         2040 kB
LowTotal:       874804 kB
LowFree:         66196 kB
SwapTotal:     2097112 kB
SwapFree:      2097112 kB
BigPagesFree:        0 kB

The problem happened at 11:56:43. I ran "while true;do date;cat
/proc/meminfo;sleep 2;done" to get the data.
Comment 9 Eric Hagberg 2003-10-02 14:09:10 EDT
I do still occasionally get "pthread_create error: Resource temporarily
unavailable" from testbed_getpwnam, but rather rarely, compared to the very
easily repeated out of memory error if I run with an unmodified 2.2.4-32.9.
Comment 10 Ulrich Drepper 2003-10-02 14:41:18 EDT
> The machine has memory free, according to /proc/meminfo when the error occurs:

This does not mean much.  What does 'ulimit -a' show in that environment?
Comment 11 Eric Hagberg 2003-10-02 14:46:19 EDT
We were asked to use /proc/meminfo by our techincal contact at RedHat, as he
thought that we might be running out of memory in some particular zone (we weren't).

saias10 /var/tmp 37# ulimit -a
address space limit (kbytes)   (-M)  unlimited
core file size (blocks)        (-c)  unlimited
cpu time (seconds)             (-t)  unlimited
data size (kbytes)             (-d)  unlimited
file size (blocks)             (-f)  unlimited
locks                          (-L)  unlimited
locked address space (kbytes)  (-l)  unlimited
nofile                         (-n)  1024
nproc                          (-u)  9215
pipe buffer size (bytes)       (-p)  4096
resident set size (kbytes)     (-m)  unlimited
socket buffer size (bytes)     (-b)  4096
stack size (kbytes)            (-s)  8192
threads                        (-T)  not supported
process size (kbytes)          (-v)  unlimited
Comment 12 Ulrich Drepper 2003-10-02 14:51:50 EDT
Can you run the application under strace (with strace writing the output to a
file to speed up the execution)?
Comment 13 Eric Hagberg 2003-10-02 17:27:24 EDT
OK, for now I'm gonna say "never mind"... I can't cause the problem anymore,
(under strace or not) though it was reproducible for me the other day right
after the upgrade, but prior to a reboot. This time I rebooted after going back
to the official 2.2.4-32.9.

Very weird that it's working fine now, though. I had seen the "out of memory"
problem on a couple machines - one test machine of mine and a test MQ series server.

Maybe going from 2.2.4-32.8 to -32.9 requires a reboot in some cases?
Comment 14 Ulrich Drepper 2003-10-03 14:24:20 EDT
A reboot shouldn't be needed.  But if it helped it might point to memory
fragmentation or something like that.  glibc itself only needs reboot in some
situation to get rid of all users of the old binaries.  Especially needed for
code outside libc.so, e.g., the NSS modules.

I'm closing the bug as WORKSFORME.  If you see problems again, reopen.  We'll
then have to decide whether it's not really a kernel problem.
Comment 15 John Flanagan 2003-12-19 12:49:36 EST
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2003-359.html

Note You need to log in before you can comment on or make changes to this bug.