Bug 173358

Summary: glibc attempts to free an invalid pointer
Product: Red Hat Enterprise Linux 4 Reporter: Nathan Ehresman <nehresma>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: poelstra
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0124 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-07 18:27:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168429    

Description Nathan Ehresman 2005-11-16 16:25:06 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
I have a machine that has standard utilities such as tar and ps crash occasionally with the error:

--- begin quote from ps crash ---
*** glibc detected *** free(): invalid pointer: 0x0050809c ***


Signal 6 (ABRT) caught by ps (procps version 3.2.3).
--- end quote ---

Twice, our Amanda backups have failed because tar has crashed trying to free the exact same pointer.  Normally, this is the result of a poorly coded application trying to do a double free or something, but I am suspecting glibc and here's why (disclaimer -- I do not have much experience disassembling binaries).

Doing some poking around in the ps binary with objdump -D does not show anything using the address 0x0050809c (neither does tar).  So I then started to look through the libraries ps was linked against to see who owned that pointer.  Turns out it was in bss section of /lib/tls/libc.so.6:

0050809c <resbuf.1>:
        ...

Some googling shows that the resbuf pointer is used in glibc's MD5 implementation.  Since two different applications are both showing the exact same behavior, my hunch is that it is caused by a common library (glibc?).  Also, do ps and tar actually use MD5 for something?  I could possibly see tar using it, but ps?

Other useful (hopefully) notes:
- We did not experience this behavior at all until we updated to U2.
- It only happens occasionally -- once every 4 to 5 days does an application crash like this.
- I have been unable to reproduce this on demand.

Version-Release number of selected component (if applicable):
glibc-2.3.4-2.13

How reproducible:
Couldn't Reproduce

Steps to Reproduce:
Unable to reproduce on demand.

Additional info:

Comment 1 Jakub Jelinek 2005-11-16 17:03:10 UTC
resbuf is just a function argument in md5, so doesn't live in .bss.
But many non-reentrant functions using NSS call their static variables
resbuf, e.g.
fgetgrent fgetpwent fgetspent getaliasname getgrgid getgrnam gethstbyad
gethstbynm gethstbynm2 getnetbyad getnetbynm getproto getprtname getpwnam
getpwuid getrpcbyname getrpcbynumber getspnam getsrvbynm getsrvbypt sgetspent
So, first of all, it would be interesting to know which one exactly it is.
As you have libc.so.6 likely prelinked, you'd need to at least provide
readelf -l /lib/tls/libc.so.6 | grep LOAD
output (and exact NVR of glibc you are using, is that 2.3.4-2.13).
Much better of course would be if you could get a backtrace.
Say ulimit -c unlimited in the script/shell starting the program that triggers
it.  Once you get a core dump, installing
ftp://people.redhat.com/jakub/glibc/2.3.4-2.13/
debuginfo packages could give more details in the backtrace.

Comment 2 Nathan Ehresman 2005-11-16 17:35:44 UTC
I have turned on core dumps now in the backup scripts as well as another script
that I was seeing the ps error from occasionally.  I'll keep an eye on it and
report back once the error happens again.  Thanks!

Comment 3 Nathan Ehresman 2005-11-16 17:37:01 UTC
FYI, here's the output from readelf:

readelf -l /lib/tls/libc.so.6 | grep LOAD
  LOAD           0x000000 0x003e0000 0x003e0000 0x123991 0x123991 R E 0x1000
  LOAD           0x1245e4 0x005045e4 0x005045e4 0x02a80 0x056d8 RW  0x1000


Comment 4 Jakub Jelinek 2005-11-16 22:55:13 UTC
In that case 0x0050809c is resbuf static var in getpwuid.
BTW, what NSS modules are you using?
grep passwd: /etc/nsswitch.conf


Comment 6 Nathan Ehresman 2005-11-17 19:01:46 UTC
I'm using the files and LDAP nss modules (in that order) on this machine.  In
answer to the second question, I am indeed using nscd on the machine to reduce
the number of queries made to the LDAP server.  I have shut down nscd for now to
see if this helps things out as well.  If it crashes again, hopefully I'll have
a core dump to report back with.

Comment 12 Red Hat Bugzilla 2006-03-07 18:26:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0124.html


Comment 13 Red Hat Bugzilla 2006-03-07 18:27:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0124.html


Comment 14 Red Hat Bugzilla 2006-03-07 18:27:40 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0124.html