Bug 633519
Summary: | NSPR pthread_key_t leak and memory corruption | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Bastian <jbastian> | ||||||||||||||||
Component: | nspr | Assignee: | Elio Maldonado Batiz <emaldona> | ||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Aleš Mareček <amarecek> | ||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||
Priority: | high | ||||||||||||||||||
Version: | 5.5 | CC: | ablum, amarecek, azelinka, cww, dpal, emaldona, jorton, kengert, ksrot, nalin, rcritten, rrelyea, sforsber | ||||||||||||||||
Target Milestone: | rc | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | All | ||||||||||||||||||
OS: | Linux | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | nspr-4.9.1-6.el5 | Doc Type: | Bug Fix | ||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | |||||||||||||||||||
: | 817178 905013 (view as bug list) | Environment: | |||||||||||||||||
Last Closed: | 2013-01-08 07:38:25 UTC | Type: | --- | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Bug Depends On: | |||||||||||||||||||
Bug Blocks: | 668957, 719046, 743405, 807971, 817178, 905013 | ||||||||||||||||||
Attachments: |
|
Description
Jeff Bastian
2010-09-13 21:32:11 UTC
Created attachment 447053 [details]
core dump
Attached is a core dump from httpd. Here is the backtrace:
#0 0x006d3402 in __kernel_vsyscall ()
#1 0x00397df0 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x00399701 in abort () at abort.c:88
#3 0x003d028b in __libc_message (do_abort=2,
fmt=0x497904 "*** glibc detected *** %s: %s: 0x%s ***\n")
at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#4 0x003d69f8 in malloc_printerr (av=0x4b2140) at malloc.c:6205
#5 malloc_consolidate (av=0x4b2140) at malloc.c:5083
#6 0x003d8d17 in _int_malloc (av=0x4b2140, bytes=1164) at malloc.c:4313
#7 0x003dae97 in __libc_malloc (bytes=1164) at malloc.c:3605
#8 0x00a0e987 in init_common (context=0xbff6fbd0, secure=0, kdc=0)
at init_ctx.c:157
#9 0x00fb7139 in authenticate_user_krb5pwd (r=0x83c8c28)
at src/mod_auth_kerb.c:871
#10 kerb_authenticate_user (r=0x83c8c28) at src/mod_auth_kerb.c:1502
#11 0x00d2994d in ap_run_check_user_id (r=0x83c8c28)
at /usr/src/debug/httpd-2.2.3/server/request.c:70
#12 0x00d2ac57 in ap_process_request_internal (r=0x83c8c28)
at /usr/src/debug/httpd-2.2.3/server/request.c:218
#13 0x00d2b06b in ap_sub_req_method_uri (method=0x5eee74 "GET",
new_uri=0x83a7e78 "/svn/helloworld/!svn/ver/1/lots-o-files/file13982.txt",
r=0x8396b60, next_filter=0x8397ad8)
at /usr/src/debug/httpd-2.2.3/server/request.c:1630
#14 0x005e6f29 in dav_svn_authz_read (allowed=0xbff70124, root=0x83870c0,
path=0x836d098 "/lots-o-files/file13982.txt", baton=0xbff700dc,
pool=0x8396b20)
at /usr/src/debug/subversion-1.4.2/subversion/mod_dav_svn/update.c:200
#15 0x005dca87 in check_readability (readable=0xbff70124,
r=<value optimized out>, repos=0x836d0c8,
path=0x836d098 "/lots-o-files/file13982.txt", pool=0x8396b20)
at /usr/src/debug/subversion-1.4.2/subversion/mod_dav_svn/lock.c:252
#16 0x005dcb26 in dav_svn_has_locks (lockdb=0x8386308, resource=0x836cfa8,
locks_present=0xbff70174)
at /usr/src/debug/subversion-1.4.2/subversion/mod_dav_svn/lock.c:672
#17 0x00e99b99 in dav_get_resource_state (r=0x8396b60, resource=0x836cfa8)
at /usr/src/debug/httpd-2.2.3/modules/dav/main/util_lock.c:717
#18 0x00e9063f in dav_method_propfind (r=0x8396b60)
at /usr/src/debug/httpd-2.2.3/modules/dav/main/mod_dav.c:1965
#19 0x00e94585 in dav_handler (r=0x8396b60)
at /usr/src/debug/httpd-2.2.3/modules/dav/main/mod_dav.c:4644
#20 0x00d2ea4d in ap_run_handler (r=0x8396b60)
at /usr/src/debug/httpd-2.2.3/server/config.c:157
#21 0x00d32413 in ap_invoke_handler (r=0x8396b60)
at /usr/src/debug/httpd-2.2.3/server/config.c:375
#22 0x00d3e55e in ap_process_request (r=0x8396b60)
at /usr/src/debug/httpd-2.2.3/modules/http/http_request.c:258
#23 0x00d3b2ff in ap_process_http_connection (c=0x83608d0)
at /usr/src/debug/httpd-2.2.3/modules/http/http_core.c:184
#24 0x00d3694d in ap_run_process_connection (c=0x83608d0)
at /usr/src/debug/httpd-2.2.3/server/connection.c:43
#25 0x00d36a4c in ap_process_connection (c=0x83608d0, csd=0x8360738)
at /usr/src/debug/httpd-2.2.3/server/connection.c:178
#26 0x00d42e44 in child_main (child_num_arg=<value optimized out>)
at /usr/src/debug/httpd-2.2.3/server/mpm/prefork/prefork.c:640
#27 0x00d43151 in make_child (s=0x81533f0, slot=4)
at /usr/src/debug/httpd-2.2.3/server/mpm/prefork/prefork.c:736
#28 0x00d4322a in startup_children (number_to_start=4)
at /usr/src/debug/httpd-2.2.3/server/mpm/prefork/prefork.c:754
#29 0x00d43d8b in ap_mpm_run (_pconf=0x8151548, plog=0x817f600, s=0x81533f0)
at /usr/src/debug/httpd-2.2.3/server/mpm/prefork/prefork.c:975
#30 0x00d1a157 in main (argc=135591368, argv=0x82083e0)
at /usr/src/debug/httpd-2.2.3/server/main.c:717
Curiously, if I downgrade the nss package (and dependencies), it works: I can commit all 20,000 files with no httpd seg-faults nor corrupted-double-linked-list errors. That is, originally I had the errata packages installed: nspr-4.8.4-1.el5_4.i386 nss-3.12.6-2.el5_5.i386 nss-tools-3.12.6-2.el5_5.i386 But when I downgrade to the original versions included in RHEL 5.5: nspr-4.7.6-1.el5_4.i386 nss-3.12.3.99.3-1.el5_3.2.i386 nss-tools-3.12.3.99.3-1.el5_3.2.i386 the problem goes away. I say "curiously" because I don't appear to be using nss: mod_nss is not installed, and the stack trace doesn't seem to involve the nss libraries. But obviously they're involved in some fashion. Curious. Could you try reproducing with: export MALLOC_CHECK_=2 in /etc/sysconfig/httpd to see if the segfault triggers earlier? Have you checked /proc/$PID/maps of a running httpd child to see whether any of the NSS libraries are getting pulled in somehow? Created attachment 447202 [details]
httpd error_log
The memory maps were actually captured in httpd's error_log at the time of the crash; see the attached file. I don't see any of the nss libraries listed.
I ran my test again, but this time I started sampling /proc/$PID/maps during the 'svn commit' stage: let n=0 while true; do for p in $(pgrep http) ; do cat /proc/$p/maps > /tmp/maps-$n-$p.out done let n=n+1 sleep 15 done The nss libraries are indeed in use: $ egrep 'libfreebl3.so|libnss3.so|libnssckbi.so|libnssutil3.so|libsmime3.so|libsoftokn3.so|libssl3.so' /tmp/maps-* /tmp/maps-1-17259.out:00924000-0093a000 r-xp 00000000 fc:00 868435 /usr/lib/libnssutil3.so /tmp/maps-1-17259.out:0093a000-0093d000 rwxp 00016000 fc:00 868435 /usr/lib/libnssutil3.so /tmp/maps-1-17259.out:009a8000-009db000 r-xp 00000000 fc:00 868644 /usr/lib/libssl3.so /tmp/maps-1-17259.out:009db000-009dd000 rwxp 00033000 fc:00 868644 /usr/lib/libssl3.so /tmp/maps-1-17259.out:00e0e000-00e33000 r-xp 00000000 fc:00 868643 /usr/lib/libsmime3.so /tmp/maps-1-17259.out:00e33000-00e35000 rwxp 00025000 fc:00 868643 /usr/lib/libsmime3.so ... It looks like the nss libraries might be pulled in via the pkinit-nss kerberos plugin: $ grep nss /tmp/maps-1-17259.out | egrep -v '_files|_dns' 00924000-0093a000 r-xp 00000000 fc:00 868435 /usr/lib/libnssutil3.so 0093a000-0093d000 rwxp 00016000 fc:00 868435 /usr/lib/libnssutil3.so 00be2000-00c07000 r-xp 00000000 fc:00 1020848 /usr/lib/krb5/plugins/preauth/pkinit-nss.so 00c07000-00c15000 rwxp 00024000 fc:00 1020848 /usr/lib/krb5/plugins/preauth/pkinit-nss.so 05a22000-05b43000 r-xp 00000000 fc:00 868641 /usr/lib/libnss3.so 05b43000-05b47000 rwxp 00121000 fc:00 868641 /usr/lib/libnss3.so $ ldd /usr/lib/krb5/plugins/preauth/pkinit-nss.so linux-gate.so.1 => (0x004ed000) libssl3.so => /usr/lib/libssl3.so (0x00143000) libsmime3.so => /usr/lib/libsmime3.so (0x004c2000) libnss3.so => /usr/lib/libnss3.so (0x00178000) libnssutil3.so => /usr/lib/libnssutil3.so (0x00f12000) ... $ rpm -q --qf '%{description}\n' pkinit-nss The pkinit-nss package implements the PKINIT standard for MIT Kerberos. It does so using the Mozilla NSS library. Can you try renaming that pkinit-nss.so and testing whether the issue is still reproducible? Also any luck with malloc debugging giving a different segfault? Nalin/Elio will libkrb5 link in that plugin by default without any explicit configuration? The pkinit plugin will be loaded when the client library attempts to get initial creds, so if you're using Basic auth, I'd expect that to happen. (For Negotiate cases, that code path shouldn't be hit.) If you're not using pkinit, removing the plugin package is probably the simplest workaround. I didn't see anything interesting when I added export MALLOC_CHECK_=2 to /etc/sysconfig/httpd and started the service normally. All I got was the standard seg fault error in /var/log/httpd/error_log: [Tue Sep 14 08:59:49 2010] [notice] child pid 17258 exit signal Segmentation fault (11), possible coredump in /etc/httpd I'll try it again running httpd in the foreground since I believe the malloc check messages go to stderr. After that I'll try removing the pkinit-nss package and see what happens. The MALLOC_CHECK_=2 didn't reveal anything in the terminal: # MALLOC_CHECK_=2 /usr/sbin/httpd -X Segmentation fault (core dumped) However, the core dump has a new backtrace: Core was generated by `/usr/sbin/httpd -X'. Program terminated with signal 11, Segmentation fault. #0 _int_malloc (av=0x531140, bytes=5825) at malloc.c:4380 4380 bck->fd = unsorted_chunks(av); (gdb) bt #0 _int_malloc (av=0x531140, bytes=5825) at malloc.c:4380 #1 0x0045a9cf in malloc_check (sz=5824, caller=0xa2a1c1) at hooks.c:266 #2 0x00459fb7 in __libc_malloc (bytes=5824) at malloc.c:3567 #3 0x00a2a1c1 in zcalloc () from /usr/lib/libz.so.1 #4 0x00a27c12 in deflateInit2_ () from /usr/lib/libz.so.1 #5 0x00a27e42 in deflateInit_ () from /usr/lib/libz.so.1 #6 0x00a23c4d in compress2 () from /usr/lib/libz.so.1 #7 0x00390381 in zlib_encode ( data=0x9653e80 "Bs2zyWhQ9d2sjeHcMG8UcDVS9HnUlwQJSPxIktZySR1E0+QcraKyV6p3N/Kq 8EOCdaRccbYmBczM\nh0xHhGJtXSup63z40M12vbyCqUPrIN6Wsc5USHg4y/N5ryoM9rXA9K0TQwilUiPl3QOfQEn2Fg2K\nWqb/UWeAJtB9y1m8gsyr6FKUa0HuskDyU92qnOEkeaaBdR"..., len=15202, out=0x94cba48) at subversion/libsvn_delta/svndiff.c:154 #8 0x003906b2 in window_handler (window=0x9657e88, baton=0x94c38a8) at subversion/libsvn_delta/svndiff.c:250 #9 0x00390de2 in tpush_close_handler (baton=0xa06cc40) at subversion/libsvn_delta/text_delta.c:460 #10 0x0064d549 in svn_stream_close (stream=0x531170) at subversion/libsvn_subr/stream.c:116 #11 0x003b87cf in rep_write_contents_close (baton=0x9a1d320) at subversion/libsvn_fs_fs/fs_fs.c:3547 #12 0x0064d549 in svn_stream_close (stream=0x531170) at subversion/libsvn_subr/stream.c:116 #13 0x003c00a6 in window_consumer (window=0x0, baton=0xa344508) at subversion/libsvn_fs_fs/tree.c:2347 #14 0x0038fd66 in close_handler (baton=0x9a1e600) at subversion/libsvn_delta/svndiff.c:785 #15 0x0064d549 in svn_stream_close (stream=0x531170) at subversion/libsvn_subr/stream.c:116 #16 0x00540f81 in dav_svn_close_stream (stream=0x9638840, commit=1) at /usr/src/debug/subversion-1.4.2/subversion/mod_dav_svn/repos.c:2177 #17 0x00e21f9d in dav_method_put (r=0x95b6d60) at /usr/src/debug/httpd-2.2.3/modules/dav/main/mod_dav.c:1028 #18 dav_handler (r=0x95b6d60) at /usr/src/debug/httpd-2.2.3/modules/dav/main/mod_dav.c:4628 #19 0x00607a4d in ap_run_handler (r=0x95b6d60) at /usr/src/debug/httpd-2.2.3/server/config.c:157 #20 0x0060b3f8 in ap_invoke_handler (r=0x95b6d60) at /usr/src/debug/httpd-2.2.3/server/config.c:371 #21 0x0061753e in ap_process_request (r=0x95b6d60) at /usr/src/debug/httpd-2.2.3/modules/http/http_request.c:258 #22 0x006142df in ap_process_http_connection (c=0x8d35540) at /usr/src/debug/httpd-2.2.3/modules/http/http_core.c:184 #23 0x0060f92d in ap_run_process_connection (c=0x8d35540) at /usr/src/debug/httpd-2.2.3/server/connection.c:43 #24 0x0060fa2c in ap_process_connection (c=0x8d35540, csd=0x8d353a8) at /usr/src/debug/httpd-2.2.3/server/connection.c:178 #25 0x0061be24 in child_main (child_num_arg=<value optimized out>) at /usr/src/debug/httpd-2.2.3/server/mpm/prefork/prefork.c:640 #26 0x0061c094 in make_child (s=0x8b613d8, slot=0) at /usr/src/debug/httpd-2.2.3/server/mpm/prefork/prefork.c:680 #27 0x0061cfa9 in ap_mpm_run (_pconf=0x8b5f538, plog=0x8b8d5f0, s=0x8b613d8) at /usr/src/debug/httpd-2.2.3/server/mpm/prefork/prefork.c:956 #28 0x005f3157 in main (argc=146134448, argv=0x0) at /usr/src/debug/httpd-2.2.3/server/main.c:717 Removing the pkinit-nss package seems to do the trick, I was able to successfully commit all 20,000 files with no hiccups. I'm repeating it just to be sure. Created attachment 449420 [details]
PoC fix
Proof-of-concept fix.
Created attachment 449421 [details]
Test case
Minimal repro case for crash.
There is still potential corruption of thread-specific-data with that proof of concept patch, if PR_Init() is called in a state where pthread_key_create() will fail. The only way to fix that is to properly catch the error and do something about it; not obvious to me what to do other than abort(). What other libraries are loaded? We've ran into other issues where libraries will quietly allocate thread local storage, then close their allocation. They latter come back and use that storage location that the closed. If NSPR has allocated the storage, the offender will stomp on the NSPR thread local storage pool. The last offender I knew of was an xml parser. Evidently there's a shutdown routine in the parser that is never supposed to be called, but users tend to call it anyway... bob Bob: see test case in comment 32, this is reproducible with no other libraries loaded. Created attachment 449849 [details]
tsdcorrupt.c - test case to show how PR_Init() corrupts TSD bindings
This is a test case showing that PR_Init() does not fail safely in the case of a pthread_key_create() failure; this is the root cause of heap corruption in this bug, ignoring the fact that NSPR also causes the TSD key leak.
Created attachment 477368 [details]
the valgrind logs are attached.
Created attachment 477369 [details]
the valgrind logs are attached.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Upsteam is looking at the Joe's proposed patch. the expectation is either that patch or something similiar should be available shortly. dev_ack bob I cannot see any indication that upstream has discussed or reviewed the suggested patch. Given this patch changes memory management, it's too risky to pick up the patch without review and without upstream testing. Given the deadline for RHEL 5.9 packages (must be done by tomorrow), it seems like this bug will miss rhel 5.9 We worked on this today and will build for RHEL 5.9 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0081.html |