From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7 Description of problem: I have a machine that has standard utilities such as tar and ps crash occasionally with the error: --- begin quote from ps crash --- *** glibc detected *** free(): invalid pointer: 0x0050809c *** Signal 6 (ABRT) caught by ps (procps version 3.2.3). --- end quote --- Twice, our Amanda backups have failed because tar has crashed trying to free the exact same pointer. Normally, this is the result of a poorly coded application trying to do a double free or something, but I am suspecting glibc and here's why (disclaimer -- I do not have much experience disassembling binaries). Doing some poking around in the ps binary with objdump -D does not show anything using the address 0x0050809c (neither does tar). So I then started to look through the libraries ps was linked against to see who owned that pointer. Turns out it was in bss section of /lib/tls/libc.so.6: 0050809c <resbuf.1>: ... Some googling shows that the resbuf pointer is used in glibc's MD5 implementation. Since two different applications are both showing the exact same behavior, my hunch is that it is caused by a common library (glibc?). Also, do ps and tar actually use MD5 for something? I could possibly see tar using it, but ps? Other useful (hopefully) notes: - We did not experience this behavior at all until we updated to U2. - It only happens occasionally -- once every 4 to 5 days does an application crash like this. - I have been unable to reproduce this on demand. Version-Release number of selected component (if applicable): glibc-2.3.4-2.13 How reproducible: Couldn't Reproduce Steps to Reproduce: Unable to reproduce on demand. Additional info:
resbuf is just a function argument in md5, so doesn't live in .bss. But many non-reentrant functions using NSS call their static variables resbuf, e.g. fgetgrent fgetpwent fgetspent getaliasname getgrgid getgrnam gethstbyad gethstbynm gethstbynm2 getnetbyad getnetbynm getproto getprtname getpwnam getpwuid getrpcbyname getrpcbynumber getspnam getsrvbynm getsrvbypt sgetspent So, first of all, it would be interesting to know which one exactly it is. As you have libc.so.6 likely prelinked, you'd need to at least provide readelf -l /lib/tls/libc.so.6 | grep LOAD output (and exact NVR of glibc you are using, is that 2.3.4-2.13). Much better of course would be if you could get a backtrace. Say ulimit -c unlimited in the script/shell starting the program that triggers it. Once you get a core dump, installing ftp://people.redhat.com/jakub/glibc/2.3.4-2.13/ debuginfo packages could give more details in the backtrace.
I have turned on core dumps now in the backup scripts as well as another script that I was seeing the ps error from occasionally. I'll keep an eye on it and report back once the error happens again. Thanks!
FYI, here's the output from readelf: readelf -l /lib/tls/libc.so.6 | grep LOAD LOAD 0x000000 0x003e0000 0x003e0000 0x123991 0x123991 R E 0x1000 LOAD 0x1245e4 0x005045e4 0x005045e4 0x02a80 0x056d8 RW 0x1000
In that case 0x0050809c is resbuf static var in getpwuid. BTW, what NSS modules are you using? grep passwd: /etc/nsswitch.conf
Are you using nscd? If so, this is likely http://sources.redhat.com/bugzilla/show_bug.cgi?id=1363 http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/ChangeLog.diff?cvsroot=glibc&r1=1.9536&r2=1.9537 http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/nscd/nscd_getpw_r.c.diff?cvsroot=glibc&r1=1.30&r2=1.31
I'm using the files and LDAP nss modules (in that order) on this machine. In answer to the second question, I am indeed using nscd on the machine to reduce the number of queries made to the LDAP server. I have shut down nscd for now to see if this helps things out as well. If it crashes again, hopefully I'll have a core dump to report back with.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0124.html