Bug 66174 - getgrent() consumes all memory after seeing large group in db
getgrent() consumes all memory after seeing large group in db
Status: CLOSED CANTFIX
Product: Red Hat Linux
Classification: Retired
Component: nss_db (Show other bugs)
7.3
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Nalin Dahyabhai
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-06-05 16:29 EDT by Alan Sundell
Modified: 2007-04-18 12:42 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-10-18 13:03:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sloppy configure patch that gets configure to work for nss_db (327 bytes, patch)
2002-06-06 13:32 EDT, Alan Sundell
no flags Details | Diff
sloppy patch to db-XXX.c that seems to solve the problem (267 bytes, patch)
2002-06-06 13:33 EDT, Alan Sundell
no flags Details | Diff

  None (edit)
Description Alan Sundell 2002-06-05 16:29:19 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0rc1) Gecko/20020418

Description of problem:
After encountering a large group in group.db, getgrent() will spin after the
last entry, consuming CPU and memory without bound.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Create a long group entry (~1000 characters) anywhere in group source file
2. Set group entry in nsswitch.conf to include "db" (mine says "files db")
3. Make db files in /var/db
4. Run a program that calls getgrent iteratively, like "id -Gn user" or "perl -e
'setgrent; while(@ent=getgrent) { print join(":", @ent), "\n"; } print "DONE\n";
endgrent;'

Actual Results:  The perl one-liner listed above, for instance, gets past the
large group, and all the groups, but never to the DONE statement -- it hangs on
getgrent.  Simpler things like 'id -Gn' seem to hang as well.  In any case, the
processes are not merely hanging, but looping and consuming CPU and memory,
seemingly without bounds (I killed them after 1 GB).  Presumably they would get
ENOMEM after consuming all memory swap, but the system isn't exactly happy at
that point.

Expected Results:  The last getgrent() should return NULL, and processes should
therefore know that they've reached the end of the group list.

Additional info:

I was able to produce similar behavior on RedHat 6.2, with spinning CPU
consumption, but without the ever-growing memory.  I was not able to reproduce
the problem on a Debian machine.  I therefore assume that this is a bug in
either one of the patches RedHat applies or in db4, which RedHat seems to be using.

The problem occurs no matter the position in which the large group appears in
the db, so long as it is there.  Without it, the problem goes away.  It appears
to be tied to the length of the entry, rather than, for instance, the number of
users in the entry.  It does not occur with a large entry in the group flat file
(that's why I think it's nss_db).  The spinning and memory consumption does not
occur until the program calls getgrent() *after* the getgrent() which has
returned the last group.

I'm no expert on these things, but here's my shaky theory of what I think is
happening, in my rather primitive understanding of these things:

glibc has a wrapper function [__nss_getent() in nss/getnssent.c] that calls the
nss_db internal version of getgrent() [lookup() in db-XXX.c].  It passes the
internal function a buffer.  If the group is too big to fit in the buffer
provided, the internal function returns an error and sets errno to ERANGE.  When
the wrapper sees that the internal function has returned an error, it checks to
see if errno is set to ERANGE; if so, then it reallocs the buffer and tries again.

Once the errno is set to ERANGE, it does not get reset upon success.
When lookup() tries to look up the record after the last record in the db,
db->get returns 1 to indicate that the record does not exist, but, because of
the nss_db-2.2-compat.patch, errno does not get reset.  Therefore, the wrapper
loops, and keeps realloc-ing the buffer.  What I can't figure out is why this
would happen in nss_db without the compat, so maybe the theory is just a bunch
of bunk.


Anyway, that's my current theory, but I'm unable to verify it, because even "rpm
--rebuild nss_db-2.2-14.src.rpm" is erroring out with:
+ popd
/usr/src/redhat/BUILD/nss_db-2.2
+ CFLAGS=-O2 -march=i386 -mcpu=i686
+ export CFLAGS
+ CXXFLAGS=-O2 -march=i386 -mcpu=i686
+ export CXXFLAGS
+ FFLAGS=-O2 -march=i386 -mcpu=i686
+ export FFLAGS
+ ./configure i386-redhat-linux --prefix=/usr --exec-prefix=/usr
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share
--includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec
--localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man
--infodir=/usr/share/info --with-db=/usr/src/redhat/BUILD/nss_db-2.2/db-instroot
[...]
checking for db.h... yes
checking for db_version in -ldb... no
configure: error: 
*** Could not find Berkeley DB library.
error: Bad exit status from /var/tmp/rpm-tmp.3344 (%build)

from config.log:
configure: failed program was:
#line 5345 "configure"
#include "confdefs.h"
/* Override any gcc2 internal prototype to avoid an error.  */
/* We use char because int might match the return type of a gcc2
    builtin and then its argument prototype would still apply.  */
char db_version();

int main() {
db_version()
; return 0; }

# nm /usr/src/redhat/BUILD/nss_db-2.2/db-instroot/lib/libdb.a  | grep db_version
00000004 T db_version_nssdb
         U db_version_nssdb

...so I believe that has something to do with the fact that db_version has been
renamed db_version_nssdb in your included db4.  Maybe whoever was working on
this had another db library installed on the system, so configure found that one
(like I said, I'm no expert, so who knows -- maybe it's all my fault).

At this point, however, it's probably time that I turn this over to you guys,
before I head off on any more wild goose chases.  I've rated this as severity
high, because it seems to qualify as a serious memory leak, and it can rapidly
take a system down into swapping hell.
Comment 1 Alan Sundell 2002-06-06 13:32:42 EDT
Created attachment 59904 [details]
sloppy configure patch that gets configure to work for nss_db
Comment 2 Alan Sundell 2002-06-06 13:33:49 EDT
Created attachment 59905 [details]
sloppy patch to db-XXX.c that seems to solve the problem
Comment 3 Alan Sundell 2002-06-06 13:56:29 EDT
OK, so my initial hypothesis about this being a problem with RedHat's patch
seems to have been wrong (due to a misunderstanding on my part about db->get's
return values).  it seems this bug may exist in the sources straight from GNU
(which doesn't explain my success with Debian, but whatever, I was tired).

the patch attached [db.patch] makes lookup() in db-XXX.c set errno to ENOENT if
a lookup fails.  without this, errno remains set to ERANGE, and __nss_getent()
loops, realloc()-ing 'till the cows come home, as described in my initial report.

however, the patch is incomplete, since it only sets errno in the one case
("case DB_NOTFOUND:") that matters to me.  given nss_getent's expectations, it
should, IMHO, probably set errno to an appropriate value in other cases as well,
and someone with closer knowledge of this package should probably have it do
that, lest someone else have similar problems.  i'd take care of case default,
too, but i'm unsure of what a good default errno would be, which is why i am
deferring on the matter.
Comment 4 Bill Nottingham 2006-08-05 01:22:52 EDT
Red Hat apologizes that these issues have not been resolved yet. We do want to
make sure that no important bugs slip through the cracks.

Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc.
They are maintained by the Fedora Legacy project (http://www.fedoralegacy.org/)
for security updates only. If this is a security issue, please reassign to the
'Fedora Legacy' product in bugzilla. Please note that Legacy security update
support for these products will stop on December 31st, 2006.

If this is not a security issue, please check if this issue is still present
in a current Fedora Core release. If so, please change the product and version
to match, and check the box indicating that the requested information has been
provided.

If you are currently still running Red Hat Linux 7.3 or 9, please note that
Fedora Legacy security update support for these products will stop on December
31st, 2006. You are strongly advised to upgrade to a current Fedora Core release
or Red Hat Enterprise Linux or comparable. Some information on which option may
be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/.

Any bug still open against Red Hat Linux 7.3 or 9 at the end of 2006 will be
closed 'CANTFIX'. Again, if this bug still exists in a current release, or is a
security issue, please change the product as necessary. We thank you for your
help, and apologize again that we haven't handled these issues to this point.
Comment 5 Bill Nottingham 2006-10-18 13:03:58 EDT
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still
running Red Hat Linux, you are strongly advised to upgrade to a
current Fedora Core release or Red Hat Enterprise Linux or comparable.
Some information on which option may be right for you is available at
http://www.redhat.com/rhel/migrate/redhatlinux/.

Closing as CANTFIX.

Note You need to log in before you can comment on or make changes to this bug.