RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1094468 - 389-ds-base server reported crash in stan_GetCERTCertificate (pki3hack.c) under the replication replay failure condition
Summary: 389-ds-base server reported crash in stan_GetCERTCertificate (pki3hack.c) und...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nss
Version: 7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Elio Maldonado Batiz
QA Contact: Alicja Kario
URL:
Whiteboard:
Depends On:
Blocks: 1064025 1113520 1139349
TreeView+ depends on / blocked
 
Reported: 2014-05-05 18:38 UTC by Sankar Ramalingam
Modified: 2015-03-05 08:27 UTC (History)
9 users (show)

Fixed In Version: nss-3.16.2.3-4.el7
Doc Type: Bug Fix
Doc Text:
Cause: nss internal call stan_GetCERTCertificate did nor properly ensure sure that objects do not go away until it it's finished with the operation Consequence: A crash in 389-ds-base server was reported in in nss's stan_GetCERTCertificate (pki3hack.c) under the replication replay failure condition. Fix: The certificate code was fixed to properly manage objects references. Result: The crashes reported by the 389 directory server no longer occur
Clone Of:
: 1139349 (view as bug list)
Environment:
Last Closed: 2015-03-05 08:27:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Manage objects references to ensure object does not go away until we finish (4.71 KB, patch)
2014-08-07 04:28 UTC, Elio Maldonado Batiz
rrelyea: review+
Details | Diff
changes to nss.spec in patch format (1.49 KB, patch)
2014-08-07 04:57 UTC, Elio Maldonado Batiz
rrelyea: review+
Details | Diff
Fix latest race (1.19 KB, patch)
2014-12-18 17:28 UTC, Bob Relyea
no flags Details | Diff
spec file with latest race fix. (70.41 KB, patch)
2014-12-18 17:29 UTC, Bob Relyea
no flags Details | Diff
Fix latest race (1.20 KB, patch)
2015-01-07 16:59 UTC, Elio Maldonado Batiz
no flags Details | Diff
spec file with latest patch applied (70.54 KB, text/x-matlab)
2015-01-07 17:00 UTC, Elio Maldonado Batiz
no flags Details
spec file with latest patch applied - actually used for build (70.95 KB, text/x-matlab)
2015-01-10 16:31 UTC, Elio Maldonado Batiz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Mozilla Foundation 1050069 0 None None None Never
Red Hat Product Errata RHBA-2015:0364 0 normal SHIPPED_LIVE nss, nss-softokn, nss-util, and nspr bug fix and enhancement update 2015-03-05 12:51:43 UTC

Description Sankar Ramalingam 2014-05-05 18:38:18 UTC
Description of problem: Directory Server reported a crash when running acceptance tests.


Version-Release number of selected component (if applicable): 389-ds-base-1.3.1.6-25


How reproducible: Consistently


Steps to Reproduce:
1. Run automated acceptance for mmrepl/accept tests
2. Clone a similar beaker job - https://beaker.engineering.redhat.com/jobs/643479
3. Observe the test results - 15/66 FAIL

Actual results: Test failures reported with trac47606 tests. Server crashed when trying initialize one of the consumer.


Expected results: No crash reported.


Additional info:

The crash is in NSS....

Program terminated with signal 11, Segmentation fault.
#0  PR_EnterMonitor (mon=0x0) at ../../../nspr/pr/src/pthreads/ptsynch.c:527
527        if (!pthread_equal(mon->owner, self))
Missing separate debuginfos, use: debuginfo-install audit-libs-2.3.3-4.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 nss-softokn-3.15.4-2.el7.x86_64 nss-softokn-freebl-3.15.4-2.el7.x86_64 nss-util-3.15.4-2.el7.x86_64 sqlite-3.7.17-4.el7.x86_64 xz-libs-5.1.2-8alpha.el7.x86_64 zlib-1.2.7-13.el7.x86_64
(gdb) bt
#0  PR_EnterMonitor (mon=0x0) at ../../../nspr/pr/src/pthreads/ptsynch.c:527
#1  0x00007f9fe29c74b6 in nssPKIObject_Lock (object=object@entry=0x7f9fc82836f0) at pkibase.c:22
#2  0x00007f9fe29c96e9 in stan_GetCERTCertificate (c=0x7f9fc82836f0, forceUpdate=forceUpdate@entry=1)
    at pki3hack.c:858
#3  0x00007f9fe29c9d04 in STAN_ForceCERTCertificateUpdate (c=<optimized out>) at pki3hack.c:914
#4  0x00007f9fe29c59c0 in nssTrustDomain_RemoveTokenCertsFromCache (td=0x7f9fe6101c60,
    token=<optimized out>) at tdcache.c:448
#5  0x00007f9fe298627c in nssToken_NotifyCertsNotVisible (tok=<optimized out>) at dev3hack.c:303
#6  0x00007f9fe29cac59 in nssSlot_IsTokenPresent (slot=0x7f9fe624e030) at devslot.c:172
#7  0x00007f9fe29cd66e in nssToken_IsPresent (token=<optimized out>) at devtoken.c:1442
#8  0x00007f9fe29af5d4 in pk11_IsPresentCertLoad (slot=0x7f9fbc030aa0, loadCerts=1)
    at pk11slot.c:1436
#9  0x00007f9fe29b28b1 in SECMOD_CloseUserDB (slot=0x7f9fbc030aa0) at pk11util.c:1522
#10 0x00007f9fe312bc73 in tlsm_ctx_free (ctx=0x7f9fbc009710) at tls_m.c:2148
#11 0x00007f9fe3127c85 in ldap_int_tls_destroy (lo=0x7f9fbc000e80) at tls2.c:105
#12 0x00007f9fe310c6d7 in ldap_ld_free (ld=0x7f9fbc000dd0, close=close@entry=1,
    sctrls=sctrls@entry=0x0, cctrls=cctrls@entry=0x0) at unbind.c:209
#13 0x00007f9fe310c807 in ldap_unbind_ext (ld=<optimized out>, sctrls=sctrls@entry=0x0,
    cctrls=cctrls@entry=0x0) at unbind.c:52
#14 0x00007f9fe3d2d3de in slapi_ldap_unbind (ld=<optimized out>) at ldap/servers/slapd/ldaputil.c:148
#15 0x00007f9fd97cbf99 in close_connection_internal (conn=conn@entry=0x7f9fe64211b0)
    at ldap/servers/plugins/replication/repl5_connection.c:1203
#16 0x00007f9fd97cc8c6 in perform_operation (conn=0x7f9fe64211b0, optype=optype@entry=1,
    dn=<optimized out>, attrs=attrs@entry=0x7f9fbc080de0, newrdn=newrdn@entry=0x0,
    newparent=newparent@entry=0x0, deleteoldrdn=deleteoldrdn@entry=0,
    update_control=update_control@entry=0x7f9fbc056430, extop_oid=extop_oid@entry=0x0,
    extop_payload=extop_payload@entry=0x0, message_id=message_id@entry=0x7f9fd1f1cc6c)
    at ldap/servers/plugins/replication/repl5_connection.c:738
#17 0x00007f9fd97cd01d in conn_send_add (conn=<optimized out>, dn=<optimized out>,
    attrs=attrs@entry=0x7f9fbc080de0, update_control=update_control@entry=0x7f9fbc056430,
    message_id=message_id@entry=0x7f9fd1f1cc6c)
    at ldap/servers/plugins/replication/repl5_connection.c:771
#18 0x00007f9fd97cf94a in replay_update (prp=prp@entry=0x7f9fe64465f0, op=0x7f9fd1f1ccd0,
    message_id=message_id@entry=0x7f9fd1f1cc6c)
    at ldap/servers/plugins/replication/repl5_inc_protocol.c:1362
#19 0x00007f9fd97d0e56 in send_updates (num_changes_sent=0x7f9fd1f1cc60,
    remote_update_vector=<optimized out>, prp=0x7f9fe64465f0)
    at ldap/servers/plugins/replication/repl5_inc_protocol.c:1719
#20 repl5_inc_run (prp=<optimized out>) at ldap/servers/plugins/replication/repl5_inc_protocol.c:1056
#21 0x00007f9fd97d5a5c in prot_thread_main (arg=0x7f9fe6422c30)
    at ldap/servers/plugins/replication/repl5_protocol.c:296
#22 0x00007f9fe233a740 in _pt_root (arg=0x7f9fe641ff20)
    at ../../../nspr/pr/src/pthreads/ptthread.c:204
#23 0x00007f9fe1cdbdf3 in start_thread (arg=0x7f9fd1f1d700) at pthread_create.c:308
#24 0x00007f9fe1a093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


# rpm -q nss
nss-3.15.4-6.el7.x86_64

This is the latest NSS for rhel-7.0.


Just in case, other nss related package versions.

nss-3.15.4-6.el7.x86_64

nss-softokn-freebl-3.15.4-2.el7.x86_64
nss-util-3.15.4-2.el7.x86_64
nss-tools-3.15.4-6.el7.x86_64
nss-debuginfo-3.15.4-6.el7.x86_64
nss-softokn-3.15.4-2.el7.x86_64
nss-sysinit-3.15.4-6.el7.x86_64

Comment 2 Noriko Hosoi 2014-05-08 20:07:12 UTC
Additional information.

Output from valgrind

    ==9703== Invalid read of size 8
    ==9703==    at 0x61E0880: stan_GetCERTCertificate (pki3hack.c:834)
    ==9703==    by 0x61DC9BF: nssTrustDomain_RemoveTokenCertsFromCache (tdcache.c:448)
    ==9703==    by 0x61E1C58: nssSlot_IsTokenPresent (devslot.c:172)
    ==9703==    by 0x61C65D3: pk11_IsPresentCertLoad (pk11slot.c:1436)
    ==9703==    by 0x61C98B0: SECMOD_CloseUserDB (pk11util.c:1522)
    ==9703==    by 0x5B06C72: tlsm_ctx_free (tls_m.c:2148)
    ==9703==    by 0x5B02C84: ldap_int_tls_destroy (tls2.c:105)
    ==9703==    by 0x5AE76D6: ldap_ld_free (unbind.c:209)
    ==9703==    by 0x10E04F98: close_connection_internal (repl5_connection.c:1203)
    ==9703==    by 0x10E058C5: perform_operation (repl5_connection.c:738)
    ==9703==    by 0x10E0601C: conn_send_add (repl5_connection.c:771)
    ==9703==    by 0x10E08949: replay_update (repl5_inc_protocol.c:1362)
    ==9703==    by 0x10E09E55: repl5_inc_run (repl5_inc_protocol.c:1719)
    ==9703==    by 0x10E0EA5B: prot_thread_main (repl5_protocol.c:296)
    ==9703==    by 0x68E773F: ??? (in /usr/lib64/libnspr4.so)
    ==9703==    by 0x6F26DF2: start_thread (in /usr/lib64/libpthread-2.17.so)
    ==9703==    by 0x72313DC: clone (in /usr/lib64/libc-2.17.so)
    ==9703==  Address 0xe86ca80 is 240 bytes inside a block of size 2,048 free'd
    ==9703==    at 0x4C29577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==9703==    by 0x66BC61A: PL_FinishArenaPool (in /usr/lib64/libplds4.so)
    ==9703==    by 0x61E712E: nssArena_Destroy (arena.c:509)
    ==9703==    by 0x61D92B7: nssCertificate_Destroy (certificate.c:125)
    ==9703==    by 0x5F3ED48: ssl3_CleanupPeerCerts.isra.20 (ssl3con.c:9635)
    ==9703==    by 0x5F4AFE2: ssl3_DestroySSL3Info (ssl3con.c:11990)
    ==9703==    by 0x5F5883A: ssl_DestroySocketContents (sslsock.c:335)
    ==9703==    by 0x5F59BC1: ssl_FreeSocket (sslsock.c:398)
    ==9703==    by 0x5F5006F: ssl_DefClose (ssldef.c:205)
    ==9703==    by 0x5B05B49: tlsm_sb_remove (tls_m.c:3255)
    ==9703==    by 0x5D2BD7C: ber_sockbuf_remove_io (sockbuf.c:230)
    ==9703==    by 0x5D2C584: ber_int_sb_destroy (sockbuf.c:404)
    ==9703==    by 0x5D2C5FB: ber_sockbuf_free (sockbuf.c:75)
    ==9703==    by 0x5AE753D: ldap_ld_free (unbind.c:133)
    ==9703==    by 0x10E04F98: close_connection_internal (repl5_connection.c:1203)
    ==9703==    by 0x10E058C5: perform_operation (repl5_connection.c:738)
    ==9703==    by 0x10E0601C: conn_send_add (repl5_connection.c:771)
    ==9703==    by 0x10E08949: replay_update (repl5_inc_protocol.c:1362)
    ==9703==    by 0x10E09E55: repl5_inc_run (repl5_inc_protocol.c:1719)
    ==9703==    by 0x10E0EA5B: prot_thread_main (repl5_protocol.c:296)
    ==9703==    by 0x68E773F: ??? (in /usr/lib64/libnspr4.so)
    ==9703==    by 0x6F26DF2: start_thread (in /usr/lib64/libpthread-2.17.so)
    ==9703==    by 0x72313DC: clone (in /usr/lib64/libc-2.17.so) 

1) Not a race condition.  "perform_operation" calls PR_Lock(conn->lock) at the head of the function and holds it when close_connection_internal is called.

==9703==    by 0x10E04F98: close_connection_internal (repl5_connection.c:1203)
==9703==    by 0x10E058C5: perform_operation (repl5_connection.c:738)

2) Not a double free.  close_connection_internal sets NULL to the ldap handle conn->ld once slapi_ldap_unbind is done.  Thus,in the valgrind output, the 2 stacktraces cannot be from 2 different calls since if the call for the lower stacktrace is successfully finished, there is no handle conn->ld available to call slapi_ldap_unbind with for the upper stacktrace...
   close_connection_internal(Repl_Connection *conn)
    if (NULL != conn->ld)
    {
        /* Since we call slapi_ldap_init, we must call slapi_ldap_unbind */
        slapi_ldap_unbind(conn->ld);
    }
    conn->ld = NULL;

NSS related packages installed.

    nss-softokn-freebl-3.15.4-2.el7.x86_64
    nss-sysinit-3.15.4-6.el7.x86_64
    nss-util-devel-3.15.4-2.el7.x86_64
    nss-softokn-devel-3.15.4-2.el7.x86_64
    nss-debuginfo-3.15.4-6.el7.x86_64
    libsss_nss_idmap-1.11.2-65.el7.x86_64
    nss-util-3.15.4-2.el7.x86_64
    nss-softokn-freebl-devel-3.15.4-2.el7.x86_64
    nss-devel-3.15.4-6.el7.x86_64
    mod_nss-1.0.8-32.el7.x86_64
    nss-3.15.4-6.el7.x86_64
    python-nss-0.14.0-5.el7.x86_64
    libsss_nss_idmap-python-1.11.2-65.el7.x86_64
    nss-softokn-3.15.4-2.el7.x86_64
    nss-tools-3.15.4-6.el7.x86_64

Comment 3 Noriko Hosoi 2014-05-20 01:08:13 UTC
Hi Elio, could there be any progress on this bug?

We are getting this crash report from the automated beaker tests (sorry, debuginfo was not installed).  Do you need any other information for your debugging?

[IPA CRASH ALERT] Test_Suite:  Submitter:  Beaker JobID: 655359
:Thread 1 (Thread 0x7f29fffff700 (LWP 21597)):
:#0  0x00007f2a1d0221eb in PORT_ArenaAlloc_Util () from /lib64/libnssutil3.so
:No symbol table info available.
:#1  0x00007f2a1eb46b7a in cert_trust_from_stan_trust () from /lib64/libnss3.so
:No symbol table info available.
:#2  0x00007f2a1eb47640 in nssTrust_GetCERTCertTrustForCert () from /lib64/libnss3.so
:No symbol table info available.
:#3  0x00007f2a1eb479a5 in stan_GetCERTCertificate () from /lib64/libnss3.so
:No symbol table info available.
:#4  0x00007f2a1eb439c0 in nssTrustDomain_RemoveTokenCertsFromCache () from /lib64/libnss3.so
:No symbol table info available.
:#5  0x00007f2a1eb48c59 in nssSlot_IsTokenPresent () from /lib64/libnss3.so
:No symbol table info available.
:#6  0x00007f2a1eb2d5d4 in pk11_IsPresentCertLoad () from /lib64/libnss3.so
:No symbol table info available.
:#7  0x00007f2a1eb308b1 in SECMOD_CloseUserDB () from /lib64/libnss3.so
:No symbol table info available.
:#8  0x00007f2a1f2a9c73 in tlsm_ctx_free () from /lib64/libldap_r-2.4.so.2
:No symbol table info available.
:#9  0x00007f2a1f2a5c85 in ldap_int_tls_destroy () from /lib64/libldap_r-2.4.so.2
:No symbol table info available.
:#10 0x00007f2a1f28a6d7 in ldap_ld_free () from /lib64/libldap_r-2.4.so.2
:No symbol table info available.
:#11 0x00007f2a14b23f99 in close_connection_internal (conn=conn@entry=0x7f2a2179c6d0) at ldap/servers/plugins/replication/repl5_connection.c:1203
[...]

Thanks,
--noriko

Comment 10 Elio Maldonado Batiz 2014-08-07 04:28:42 UTC
Created attachment 924708 [details]
Manage objects references to ensure object does not go away until we finish

This is Bob Relyea's fix to get rid of the crash.

Comment 11 Elio Maldonado Batiz 2014-08-07 04:57:52 UTC
Created attachment 924709 [details]
changes to nss.spec in patch format

Comment 12 Bob Relyea 2014-08-11 18:26:05 UTC
Comment on attachment 924709 [details]
changes to nss.spec in patch format

r+ rrelyea

Comment 13 Sankar Ramalingam 2014-08-24 23:42:06 UTC
Once you have an official errata filed from NSS team, please let DS QE team and we will do the necessary verification. Please contact sramling for any further queries related to verifying this issue.

Comment 17 Sankar Ramalingam 2014-09-15 10:43:19 UTC
Tests automated in TET RHEL7 branch for 389-ds-base as trac47606 tests.

Comment 20 Sankar Ramalingam 2014-11-27 14:12:31 UTC
This crash is observed with the other beaker run. Hence, re-opening the bug.
Beaker job - https://beaker.engineering.redhat.com/jobs/812004

Comment 21 Noriko Hosoi 2014-12-12 19:53:55 UTC
(In reply to Sankar Ramalingam from comment #20)
> This crash is observed with the other beaker run. Hence, re-opening the bug.
> Beaker job - https://beaker.engineering.redhat.com/jobs/812004

Sankar, I cannot find the crash report (including the stacktraces) in the link you put.
https://beaker.engineering.redhat.com/jobs/812004
Could you attach the file to this bug?  If you don't find it, could you repeat running the test?

Also, every time you run, could you give the output of "rpm -qa |egrep '389-ds-base|nss'"?

Thanks,
--noriko

Comment 24 Bob Relyea 2014-12-18 17:28:18 UTC
Created attachment 970672 [details]
Fix latest race

Comment 25 Bob Relyea 2014-12-18 17:29:00 UTC
Created attachment 970673 [details]
spec file with latest race fix.

Comment 26 Elio Maldonado Batiz 2015-01-07 16:59:13 UTC
Created attachment 977437 [details]
Fix latest race

Updated patch from Bob that gave good results in Noriko's testing.

Comment 27 Elio Maldonado Batiz 2015-01-07 17:00:39 UTC
Created attachment 977439 [details]
spec file with latest patch applied

Comment 28 Elio Maldonado Batiz 2015-01-10 16:31:01 UTC
Created attachment 978539 [details]
spec file with latest patch applied - actually used for build

Comment 32 Sankar Ramalingam 2015-01-27 14:01:32 UTC
No crashes observed with the latest nss/nspr and 389-ds-base packages. Hence, marking the bug as Verified.

Packages tested:
[root@intel-s3eb1-03 MMR_WINSYNC]# rpm -qa |egrep 'nss-|nspr-'
nss-softokn-freebl-3.16.2.3-7.el7.x86_64
nss-tools-3.16.2.3-5.el7.x86_64
nspr-4.10.6-3.el7.x86_64
nss-util-3.16.2.3-2.el7.x86_64
nss-softokn-3.16.2.3-7.el7.x86_64
nss-3.16.2.3-5.el7.x86_64
nspr-debuginfo-4.10.6-3.el7.x86_64
nss-sysinit-3.16.2.3-5.el7.x86_64
nss-debuginfo-3.16.2.3-5.el7.x86_64

[root@intel-s3eb1-03 MMR_WINSYNC]# rpm -qa |egrep 389-ds-base
389-ds-base-libs-1.3.3.1-13.el7.x86_64
389-ds-base-debuginfo-1.3.3.1-13.el7.x86_64
389-ds-base-1.3.3.1-13.el7.x86_64

Comment 35 errata-xmlrpc 2015-03-05 08:27:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0364.html


Note You need to log in before you can comment on or make changes to this bug.