Red Hat Bugzilla – Bug 1094468
389-ds-base server reported crash in stan_GetCERTCertificate (pki3hack.c) under the replication replay failure condition
Last modified: 2015-03-05 03:27:44 EST
Description of problem: Directory Server reported a crash when running acceptance tests. Version-Release number of selected component (if applicable): 389-ds-base-1.3.1.6-25 How reproducible: Consistently Steps to Reproduce: 1. Run automated acceptance for mmrepl/accept tests 2. Clone a similar beaker job - https://beaker.engineering.redhat.com/jobs/643479 3. Observe the test results - 15/66 FAIL Actual results: Test failures reported with trac47606 tests. Server crashed when trying initialize one of the consumer. Expected results: No crash reported. Additional info: The crash is in NSS.... Program terminated with signal 11, Segmentation fault. #0 PR_EnterMonitor (mon=0x0) at ../../../nspr/pr/src/pthreads/ptsynch.c:527 527 if (!pthread_equal(mon->owner, self)) Missing separate debuginfos, use: debuginfo-install audit-libs-2.3.3-4.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 nss-softokn-3.15.4-2.el7.x86_64 nss-softokn-freebl-3.15.4-2.el7.x86_64 nss-util-3.15.4-2.el7.x86_64 sqlite-3.7.17-4.el7.x86_64 xz-libs-5.1.2-8alpha.el7.x86_64 zlib-1.2.7-13.el7.x86_64 (gdb) bt #0 PR_EnterMonitor (mon=0x0) at ../../../nspr/pr/src/pthreads/ptsynch.c:527 #1 0x00007f9fe29c74b6 in nssPKIObject_Lock (object=object@entry=0x7f9fc82836f0) at pkibase.c:22 #2 0x00007f9fe29c96e9 in stan_GetCERTCertificate (c=0x7f9fc82836f0, forceUpdate=forceUpdate@entry=1) at pki3hack.c:858 #3 0x00007f9fe29c9d04 in STAN_ForceCERTCertificateUpdate (c=<optimized out>) at pki3hack.c:914 #4 0x00007f9fe29c59c0 in nssTrustDomain_RemoveTokenCertsFromCache (td=0x7f9fe6101c60, token=<optimized out>) at tdcache.c:448 #5 0x00007f9fe298627c in nssToken_NotifyCertsNotVisible (tok=<optimized out>) at dev3hack.c:303 #6 0x00007f9fe29cac59 in nssSlot_IsTokenPresent (slot=0x7f9fe624e030) at devslot.c:172 #7 0x00007f9fe29cd66e in nssToken_IsPresent (token=<optimized out>) at devtoken.c:1442 #8 0x00007f9fe29af5d4 in pk11_IsPresentCertLoad (slot=0x7f9fbc030aa0, loadCerts=1) at pk11slot.c:1436 #9 0x00007f9fe29b28b1 in SECMOD_CloseUserDB (slot=0x7f9fbc030aa0) at pk11util.c:1522 #10 0x00007f9fe312bc73 in tlsm_ctx_free (ctx=0x7f9fbc009710) at tls_m.c:2148 #11 0x00007f9fe3127c85 in ldap_int_tls_destroy (lo=0x7f9fbc000e80) at tls2.c:105 #12 0x00007f9fe310c6d7 in ldap_ld_free (ld=0x7f9fbc000dd0, close=close@entry=1, sctrls=sctrls@entry=0x0, cctrls=cctrls@entry=0x0) at unbind.c:209 #13 0x00007f9fe310c807 in ldap_unbind_ext (ld=<optimized out>, sctrls=sctrls@entry=0x0, cctrls=cctrls@entry=0x0) at unbind.c:52 #14 0x00007f9fe3d2d3de in slapi_ldap_unbind (ld=<optimized out>) at ldap/servers/slapd/ldaputil.c:148 #15 0x00007f9fd97cbf99 in close_connection_internal (conn=conn@entry=0x7f9fe64211b0) at ldap/servers/plugins/replication/repl5_connection.c:1203 #16 0x00007f9fd97cc8c6 in perform_operation (conn=0x7f9fe64211b0, optype=optype@entry=1, dn=<optimized out>, attrs=attrs@entry=0x7f9fbc080de0, newrdn=newrdn@entry=0x0, newparent=newparent@entry=0x0, deleteoldrdn=deleteoldrdn@entry=0, update_control=update_control@entry=0x7f9fbc056430, extop_oid=extop_oid@entry=0x0, extop_payload=extop_payload@entry=0x0, message_id=message_id@entry=0x7f9fd1f1cc6c) at ldap/servers/plugins/replication/repl5_connection.c:738 #17 0x00007f9fd97cd01d in conn_send_add (conn=<optimized out>, dn=<optimized out>, attrs=attrs@entry=0x7f9fbc080de0, update_control=update_control@entry=0x7f9fbc056430, message_id=message_id@entry=0x7f9fd1f1cc6c) at ldap/servers/plugins/replication/repl5_connection.c:771 #18 0x00007f9fd97cf94a in replay_update (prp=prp@entry=0x7f9fe64465f0, op=0x7f9fd1f1ccd0, message_id=message_id@entry=0x7f9fd1f1cc6c) at ldap/servers/plugins/replication/repl5_inc_protocol.c:1362 #19 0x00007f9fd97d0e56 in send_updates (num_changes_sent=0x7f9fd1f1cc60, remote_update_vector=<optimized out>, prp=0x7f9fe64465f0) at ldap/servers/plugins/replication/repl5_inc_protocol.c:1719 #20 repl5_inc_run (prp=<optimized out>) at ldap/servers/plugins/replication/repl5_inc_protocol.c:1056 #21 0x00007f9fd97d5a5c in prot_thread_main (arg=0x7f9fe6422c30) at ldap/servers/plugins/replication/repl5_protocol.c:296 #22 0x00007f9fe233a740 in _pt_root (arg=0x7f9fe641ff20) at ../../../nspr/pr/src/pthreads/ptthread.c:204 #23 0x00007f9fe1cdbdf3 in start_thread (arg=0x7f9fd1f1d700) at pthread_create.c:308 #24 0x00007f9fe1a093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 # rpm -q nss nss-3.15.4-6.el7.x86_64 This is the latest NSS for rhel-7.0. Just in case, other nss related package versions. nss-3.15.4-6.el7.x86_64 nss-softokn-freebl-3.15.4-2.el7.x86_64 nss-util-3.15.4-2.el7.x86_64 nss-tools-3.15.4-6.el7.x86_64 nss-debuginfo-3.15.4-6.el7.x86_64 nss-softokn-3.15.4-2.el7.x86_64 nss-sysinit-3.15.4-6.el7.x86_64
Additional information. Output from valgrind ==9703== Invalid read of size 8 ==9703== at 0x61E0880: stan_GetCERTCertificate (pki3hack.c:834) ==9703== by 0x61DC9BF: nssTrustDomain_RemoveTokenCertsFromCache (tdcache.c:448) ==9703== by 0x61E1C58: nssSlot_IsTokenPresent (devslot.c:172) ==9703== by 0x61C65D3: pk11_IsPresentCertLoad (pk11slot.c:1436) ==9703== by 0x61C98B0: SECMOD_CloseUserDB (pk11util.c:1522) ==9703== by 0x5B06C72: tlsm_ctx_free (tls_m.c:2148) ==9703== by 0x5B02C84: ldap_int_tls_destroy (tls2.c:105) ==9703== by 0x5AE76D6: ldap_ld_free (unbind.c:209) ==9703== by 0x10E04F98: close_connection_internal (repl5_connection.c:1203) ==9703== by 0x10E058C5: perform_operation (repl5_connection.c:738) ==9703== by 0x10E0601C: conn_send_add (repl5_connection.c:771) ==9703== by 0x10E08949: replay_update (repl5_inc_protocol.c:1362) ==9703== by 0x10E09E55: repl5_inc_run (repl5_inc_protocol.c:1719) ==9703== by 0x10E0EA5B: prot_thread_main (repl5_protocol.c:296) ==9703== by 0x68E773F: ??? (in /usr/lib64/libnspr4.so) ==9703== by 0x6F26DF2: start_thread (in /usr/lib64/libpthread-2.17.so) ==9703== by 0x72313DC: clone (in /usr/lib64/libc-2.17.so) ==9703== Address 0xe86ca80 is 240 bytes inside a block of size 2,048 free'd ==9703== at 0x4C29577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==9703== by 0x66BC61A: PL_FinishArenaPool (in /usr/lib64/libplds4.so) ==9703== by 0x61E712E: nssArena_Destroy (arena.c:509) ==9703== by 0x61D92B7: nssCertificate_Destroy (certificate.c:125) ==9703== by 0x5F3ED48: ssl3_CleanupPeerCerts.isra.20 (ssl3con.c:9635) ==9703== by 0x5F4AFE2: ssl3_DestroySSL3Info (ssl3con.c:11990) ==9703== by 0x5F5883A: ssl_DestroySocketContents (sslsock.c:335) ==9703== by 0x5F59BC1: ssl_FreeSocket (sslsock.c:398) ==9703== by 0x5F5006F: ssl_DefClose (ssldef.c:205) ==9703== by 0x5B05B49: tlsm_sb_remove (tls_m.c:3255) ==9703== by 0x5D2BD7C: ber_sockbuf_remove_io (sockbuf.c:230) ==9703== by 0x5D2C584: ber_int_sb_destroy (sockbuf.c:404) ==9703== by 0x5D2C5FB: ber_sockbuf_free (sockbuf.c:75) ==9703== by 0x5AE753D: ldap_ld_free (unbind.c:133) ==9703== by 0x10E04F98: close_connection_internal (repl5_connection.c:1203) ==9703== by 0x10E058C5: perform_operation (repl5_connection.c:738) ==9703== by 0x10E0601C: conn_send_add (repl5_connection.c:771) ==9703== by 0x10E08949: replay_update (repl5_inc_protocol.c:1362) ==9703== by 0x10E09E55: repl5_inc_run (repl5_inc_protocol.c:1719) ==9703== by 0x10E0EA5B: prot_thread_main (repl5_protocol.c:296) ==9703== by 0x68E773F: ??? (in /usr/lib64/libnspr4.so) ==9703== by 0x6F26DF2: start_thread (in /usr/lib64/libpthread-2.17.so) ==9703== by 0x72313DC: clone (in /usr/lib64/libc-2.17.so) 1) Not a race condition. "perform_operation" calls PR_Lock(conn->lock) at the head of the function and holds it when close_connection_internal is called. ==9703== by 0x10E04F98: close_connection_internal (repl5_connection.c:1203) ==9703== by 0x10E058C5: perform_operation (repl5_connection.c:738) 2) Not a double free. close_connection_internal sets NULL to the ldap handle conn->ld once slapi_ldap_unbind is done. Thus,in the valgrind output, the 2 stacktraces cannot be from 2 different calls since if the call for the lower stacktrace is successfully finished, there is no handle conn->ld available to call slapi_ldap_unbind with for the upper stacktrace... close_connection_internal(Repl_Connection *conn) if (NULL != conn->ld) { /* Since we call slapi_ldap_init, we must call slapi_ldap_unbind */ slapi_ldap_unbind(conn->ld); } conn->ld = NULL; NSS related packages installed. nss-softokn-freebl-3.15.4-2.el7.x86_64 nss-sysinit-3.15.4-6.el7.x86_64 nss-util-devel-3.15.4-2.el7.x86_64 nss-softokn-devel-3.15.4-2.el7.x86_64 nss-debuginfo-3.15.4-6.el7.x86_64 libsss_nss_idmap-1.11.2-65.el7.x86_64 nss-util-3.15.4-2.el7.x86_64 nss-softokn-freebl-devel-3.15.4-2.el7.x86_64 nss-devel-3.15.4-6.el7.x86_64 mod_nss-1.0.8-32.el7.x86_64 nss-3.15.4-6.el7.x86_64 python-nss-0.14.0-5.el7.x86_64 libsss_nss_idmap-python-1.11.2-65.el7.x86_64 nss-softokn-3.15.4-2.el7.x86_64 nss-tools-3.15.4-6.el7.x86_64
Hi Elio, could there be any progress on this bug? We are getting this crash report from the automated beaker tests (sorry, debuginfo was not installed). Do you need any other information for your debugging? [IPA CRASH ALERT] Test_Suite: Submitter: Beaker JobID: 655359 :Thread 1 (Thread 0x7f29fffff700 (LWP 21597)): :#0 0x00007f2a1d0221eb in PORT_ArenaAlloc_Util () from /lib64/libnssutil3.so :No symbol table info available. :#1 0x00007f2a1eb46b7a in cert_trust_from_stan_trust () from /lib64/libnss3.so :No symbol table info available. :#2 0x00007f2a1eb47640 in nssTrust_GetCERTCertTrustForCert () from /lib64/libnss3.so :No symbol table info available. :#3 0x00007f2a1eb479a5 in stan_GetCERTCertificate () from /lib64/libnss3.so :No symbol table info available. :#4 0x00007f2a1eb439c0 in nssTrustDomain_RemoveTokenCertsFromCache () from /lib64/libnss3.so :No symbol table info available. :#5 0x00007f2a1eb48c59 in nssSlot_IsTokenPresent () from /lib64/libnss3.so :No symbol table info available. :#6 0x00007f2a1eb2d5d4 in pk11_IsPresentCertLoad () from /lib64/libnss3.so :No symbol table info available. :#7 0x00007f2a1eb308b1 in SECMOD_CloseUserDB () from /lib64/libnss3.so :No symbol table info available. :#8 0x00007f2a1f2a9c73 in tlsm_ctx_free () from /lib64/libldap_r-2.4.so.2 :No symbol table info available. :#9 0x00007f2a1f2a5c85 in ldap_int_tls_destroy () from /lib64/libldap_r-2.4.so.2 :No symbol table info available. :#10 0x00007f2a1f28a6d7 in ldap_ld_free () from /lib64/libldap_r-2.4.so.2 :No symbol table info available. :#11 0x00007f2a14b23f99 in close_connection_internal (conn=conn@entry=0x7f2a2179c6d0) at ldap/servers/plugins/replication/repl5_connection.c:1203 [...] Thanks, --noriko
Created attachment 924708 [details] Manage objects references to ensure object does not go away until we finish This is Bob Relyea's fix to get rid of the crash.
Created attachment 924709 [details] changes to nss.spec in patch format
Comment on attachment 924709 [details] changes to nss.spec in patch format r+ rrelyea
Once you have an official errata filed from NSS team, please let DS QE team and we will do the necessary verification. Please contact sramling@redhat.com for any further queries related to verifying this issue.
Tests automated in TET RHEL7 branch for 389-ds-base as trac47606 tests.
This crash is observed with the other beaker run. Hence, re-opening the bug. Beaker job - https://beaker.engineering.redhat.com/jobs/812004
(In reply to Sankar Ramalingam from comment #20) > This crash is observed with the other beaker run. Hence, re-opening the bug. > Beaker job - https://beaker.engineering.redhat.com/jobs/812004 Sankar, I cannot find the crash report (including the stacktraces) in the link you put. https://beaker.engineering.redhat.com/jobs/812004 Could you attach the file to this bug? If you don't find it, could you repeat running the test? Also, every time you run, could you give the output of "rpm -qa |egrep '389-ds-base|nss'"? Thanks, --noriko
Created attachment 970672 [details] Fix latest race
Created attachment 970673 [details] spec file with latest race fix.
Created attachment 977437 [details] Fix latest race Updated patch from Bob that gave good results in Noriko's testing.
Created attachment 977439 [details] spec file with latest patch applied
Created attachment 978539 [details] spec file with latest patch applied - actually used for build
No crashes observed with the latest nss/nspr and 389-ds-base packages. Hence, marking the bug as Verified. Packages tested: [root@intel-s3eb1-03 MMR_WINSYNC]# rpm -qa |egrep 'nss-|nspr-' nss-softokn-freebl-3.16.2.3-7.el7.x86_64 nss-tools-3.16.2.3-5.el7.x86_64 nspr-4.10.6-3.el7.x86_64 nss-util-3.16.2.3-2.el7.x86_64 nss-softokn-3.16.2.3-7.el7.x86_64 nss-3.16.2.3-5.el7.x86_64 nspr-debuginfo-4.10.6-3.el7.x86_64 nss-sysinit-3.16.2.3-5.el7.x86_64 nss-debuginfo-3.16.2.3-5.el7.x86_64 [root@intel-s3eb1-03 MMR_WINSYNC]# rpm -qa |egrep 389-ds-base 389-ds-base-libs-1.3.3.1-13.el7.x86_64 389-ds-base-debuginfo-1.3.3.1-13.el7.x86_64 389-ds-base-1.3.3.1-13.el7.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0364.html