Bug 701678 - SSL_ForceHandshake function is not thread safe
Summary: SSL_ForceHandshake function is not thread safe
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: openldap
Version: 14
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Jan Vcelak
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 720896 (view as bug list)
Depends On:
Blocks: 709407 855159
TreeView+ depends on / blocked
 
Reported: 2011-05-03 14:26 UTC by Dennis Krul
Modified: 2013-03-04 01:29 UTC (History)
9 users (show)

Fixed In Version: openldap-2.4.24-5.fc15
Clone Of:
: 855159 (view as bug list)
Environment:
Last Closed: 2011-11-17 23:41:23 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
slapd stackstrace (31.98 KB, text/plain)
2011-05-11 08:51 UTC, Dennis Krul
no flags Details
slapd.conf for provider (6.04 KB, application/octet-stream)
2011-08-26 22:09 UTC, Mark Nipper
no flags Details
slapd.conf for consumer (6.63 KB, application/octet-stream)
2011-08-26 22:10 UTC, Mark Nipper
no flags Details
stack trace from crash (60.33 KB, text/plain)
2011-09-06 17:41 UTC, Rich Megginson
no flags Details
stack trace of just the two problematic threads (20.20 KB, text/plain)
2011-09-06 17:43 UTC, Rich Megginson
no flags Details
0001-use-mutex-for-connection-handshake-when-using-PEM-ns.patch (1.93 KB, patch)
2011-09-06 19:06 UTC, Rich Megginson
no flags Details | Diff
0001-use-mutex-for-connection-handshake-when-using-PEM-ns.patch (2.80 KB, patch)
2011-09-06 20:52 UTC, Rich Megginson
no flags Details | Diff

Description Dennis Krul 2011-05-03 14:26:53 UTC
Description of problem:

Several times a day slapd crashes and has to be restarted.

This ldap server is used by about 50 Fedora and FreeBSD servers for passwd lookups and pam authentication. The clients show no issues. 

Syslog shows this message:

kernel: [68789.935862] slapd[28586] general protection ip:7f7738fa5340 sp:7f77089437b0 error:0
kernel: [68789.960255] slapd[28582] general protection ip:7f7738fa5419 sp:7f7709e150b0 error:0 in libssl3.so[7f7738f89000+32000]

From F9 up to F13 everything was stable. This is happening since we upgraded to F14. I think it might be related to the switch to Mozilla's nss library in F14.


Version-Release number of selected component (if applicable):

kernel-2.6.35.12-90.fc14.x86_64
openldap-servers-2.4.23-4.fc14.x86_64
nss-3.12.9-10.fc14.x86_64

How reproducible:

Steps to Reproduce:
1. Setup a LDAP server with TLS
2. Setup a few clients
3. Wait..
  
Actual results:


Expected results:


Additional info:

SElinux is disabled. 
Latest patches have been installed.

Comment 1 Jan Vcelak 2011-05-04 10:17:44 UTC
Dennis, please try latest OpenLDAP package from updates-testing.

https://admin.fedoraproject.org/updates/openldap-2.4.23-10.fc14
# yum --enablerepo updates-testing update "openldap\*"

Comment 2 Jan Vcelak 2011-05-04 10:18:31 UTC
Sorry, a typo:
# yum --enablerepo updates-testing update "openldap*"

Comment 3 jonnya99 2011-05-05 14:19:43 UTC
Hi Jan/Dennis

I've also experienced this error. Updating to openldap-servers-2.4.23-10.fc14 did not resolve the issue. 

Some additional info: I use nss-pam-ldapd(-0.7.7-1) instead of nss_ldap.

Disabling TLS results in a stable slapd.

Comment 4 Rich Megginson 2011-05-05 14:27:15 UTC
A stack trace would be very helpful.
1) allow setuid programs to create core files:
sysctl -w fs.suid_dumpable=1
2) install the openldap-debuginfo package
3) not sure where openldap leaves core files - Jan?
4) reproduce the crash
5) run gdb with the core file:
gdb /usr/sbin/slapd /path/to/core.PID
(gdb) set logging on
(gdb) set logging redirect on
(gdb) set logging file /tmp/openldap.crash.txt
(gdb) thread apply all bt full
(gdb) set logging redirect off
(gdb) quit

Then attach /tmp/openldap.crash.txt to this bug as an attachment.  Note that you will probably want to scrub/obscure any sensitive information from /tmp/openldap.crash.txt

Comment 5 Dennis Krul 2011-05-09 10:30:42 UTC
Hello Jan,

I'm now running:

openldap-2.4.23-10.fc14.x86_64
openldap-clients-2.4.23-10.fc14.x86_64
openldap-servers-2.4.23-10.fc14.x86_64

The "general protection" kernel messages seem to have disappeared, but slapd still crashes. The crashes do appear to be less frequent.

I've installed the debuginfo pkg and enabled the sysctl. When the crash returns I will post the gdb output.

Thanks

Comment 6 Jan Vcelak 2011-05-09 19:08:39 UTC
Sorry Dennis for complicating this a little bit more.

(In reply to comment #4)
> A stack trace would be very helpful.
> 1) allow setuid programs to create core files:
> sysctl -w fs.suid_dumpable=1
> 2) install the openldap-debuginfo package
> 3) not sure where openldap leaves core files - Jan?

Unfortunately, this will not work. Forcing OpenLDAP to drop the core was always problematic. There are two problems, 1) core file size is set to zero by initscripts, 2) slapd will change working directory to / and drops root privileges. In this situation, no core file can be created.

But there is one (ugly) solution:

# service slapd stop
# ulimit -c unlimited
# sysctl -w fs.suid_dumpable=1
# mkdir -p /var/run/cores
# chmod a+w /var/run/cores
# sysctl -w "kernel.core_pattern=/var/run/cores/%e.%p"
# slapd -u ldap -g ldap -F /etc/openldap/slapd.d -h ldapi:/// ldap://localhost ldaps:///

This will cause all core files to be created in /var/run/cores directory, where all users are allowed to write. And the server is started manually with appropriate ulimit settings.

When the server crashes, you should find slapd.pid file in /var/run/cores.

I'm not aware of any other (better) solution...

(kill -6 $(pidof slapd) can verify, if the core dumping works)

Comment 7 Ruben Kerkhof 2011-05-10 00:21:03 UTC
Hi, I'm a colleague of Dennis.

Here's the backtrace you asked for:

Thread 1 (Thread 0x7f6b3885f700 (LWP 8827)):
#0  0x00007f6b58b040c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f6b58b05a76 in abort () at abort.c:92
#2  0x00007f6b58b40cfb in __libc_message (do_abort=2, fmt=0x7f6b58c2fa98 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x00007f6b58b47d63 in malloc_printerr (av=<value optimized out>, p=0x7f6b30213010, have_lock=0) at malloc.c:6283
#4  _int_free (av=<value optimized out>, p=0x7f6b30213010, have_lock=0) at malloc.c:4795
#5  0x00007f6b3b5e04a2 in nss_ZFreeIf (pointer=0x7f6b30213030) at arena.c:975
#6  0x00007f6b3b5d1574 in pem_getPrivateKey (arena=0x7f6b24123230, rawkey=<value optimized out>, pError=0x7f6b3885e0d8, modulus=0x7f6b3417f7a8) at prsa.c:202
#7  0x00007f6b3b5d16cd in pem_mdCryptoOperationRSAPriv_Create (proto=0x7f6b3b7f9140, mdMechanism=0x7f6b3b7f9200, pError=0x7f6b3885e0d8, mdKey=<value optimized out>) at prsa.c:361
#8  0x00007f6b3b5de72f in nssCKFWMechanism_DecryptInit (fwMechanism=0x7f6b24127f10, pMechanism=0x7f6b3885e3f0, fwSession=0x7f6b1c16af30, fwObject=0x7f6b34156298) at mechanism.c:538
#9  0x00007f6b3b5db5d1 in NSSCKFWC_DecryptInit (fwInstance=<value optimized out>, hSession=<value optimized out>, pMechanism=0x7f6b3885e3f0, hKey=<value optimized out>) at wrap.c:3105
#10 0x00007f6b5993b091 in pk11_HandUnwrap (slot=0x7f6b34154cc0, wrappingKey=12, mech=0x7f6b3885e3f0, target=881, keyTemplate=0x7f6b3885e250, templateCount=3, key_size=0, wincx=0x0, crvp=0x7f6b3885e408, isPerm=0, inKey=<value optimized out>) at pk11skey.c:1960
#11 0x00007f6b5993b687 in pk11_AnyUnwrapKey (slot=0x7f6b34154cc0, wrappingKey=12, wrapType=1, param=<value optimized out>, wrappedKey=0x7f6b3885e5e0, target=881, operation=268, keySize=0, wincx=0x0, userAttr=<value optimized out>, numAttrs=0, isPerm=0) at pk11skey.c:2104
#12 0x00007f6b5993e6ec in PK11_PubUnwrapSymKey (wrappingKey=0x7f6b2c1124d0, wrappedKey=0x7f6b3885e5e0, target=881, operation=268, keySize=0) at pk11skey.c:2232
#13 0x00007f6b59e63f54 in ssl3_HandleRSAClientKeyExchange (ss=0x7f6b1c164bb0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7360
#14 ssl3_HandleClientKeyExchange (ss=0x7f6b1c164bb0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7476
#15 ssl3_HandleHandshakeMessage (ss=0x7f6b1c164bb0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:8648
#16 0x00007f6b59e6694c in ssl3_HandleHandshake (ss=0x7f6b1c164bb0, cText=<value optimized out>, databuf=0x7f6b1c164f18) at ssl3con.c:8727
#17 ssl3_HandleRecord (ss=0x7f6b1c164bb0, cText=<value optimized out>, databuf=0x7f6b1c164f18) at ssl3con.c:9066
#18 0x00007f6b59e67a06 in ssl3_GatherCompleteHandshake (ss=0x7f6b1c164bb0, flags=0) at ssl3gthr.c:209
#19 0x00007f6b59e7024f in SSL_ForceHandshake (fd=<value optimized out>) at sslsecur.c:397
#20 0x00007f6b5b00a025 in tlsm_session_accept_or_connect (session=<value optimized out>, is_accept=1) at tls_m.c:2350
#21 0x00007f6b5b008087 in ldap_pvt_tls_accept (sb=0x7f6b24151500, ctx_arg=0x7f6b5c62fb10) at tls2.c:429
#22 0x00007f6b5aee5dc3 in connection_read (ctx=0x7f6b3885eb80, argv=0x2a) at ../../../servers/slapd/connection.c:1326
#23 connection_read_thread (ctx=0x7f6b3885eb80, argv=0x2a) at ../../../servers/slapd/connection.c:1238
#24 0x00007f6b5afe5278 in ldap_int_thread_pool_wrapper (xpool=0x7f6b5c5c3760) at ../../../libraries/libldap_r/tpool.c:685
#25 0x00007f6b58e73ccb in start_thread (arg=0x7f6b3885f700) at pthread_create.c:301

Thanks,

Ruben

Comment 8 Rich Megginson 2011-05-10 03:28:07 UTC
Can you attach your tls configuration (olcTls* parameters from cn=config)?
Can you attach the output of openssl x509 -in /path/to/your/servercert.pem -text?  Be sure to obscure any sensitive information.
Can you paste the header of your private key file?  e.g. is it
-----BEGIN PRIVATE KEY-----
or something else?

Comment 9 Dennis Krul 2011-05-10 08:30:53 UTC
Hello Rich,

We've not yet switched to cn=config, but here's the relevant parts of our slapd.conf:

TLSCACertificateFile /etc/openldap/cacerts/ca.pem
TLSCertificateFile /etc/pki/tls/certs/slapd.pem
TLSCertificateKeyFile /etc/pki/tls/private/slapd.key
TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!MD5
security ssf=128

Our private key:

# head -2 /etc/pki/tls/private/slapd.key 
-----BEGIN RSA PRIVATE KEY-----
MIICXAIBAAKBgQCngGIkmOR94SzNXsULQn9EmzUE1Cy+6QrQE/2d1iwDOCKJ5KH4

The certificate:

# openssl x509 -in /etc/pki/tls/certs/slapd.pem -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 117 (0x75)
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: CN=puppet.ourdomain
        Validity
            Not Before: Dec 26 12:14:35 2010 GMT
            Not After : Dec 25 12:14:35 2015 GMT
        Subject: CN=fqdn.of.ldap.host
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (1024 bit)
                Modulus:
                    00:a7:80:62:24:98:e4:7d:e1:2c:cd:5e:c5:0b:42:
                    7f:44:9b:35:04:d4:2c:be:e9:0a:d0:13:fd:9d:d6:
                    2c:03:38:22:89:e4:a1:f8:9a:19:45:0e:6f:a6:e2:
                    5b:77:4a:aa:df:a8:7f:39:f3:8e:44:6f:8e:89:38:
                    24:2c:a0:b3:83:ae:d9:89:89:42:58:7a:3b:f5:a3:
                    a0:84:39:1e:5e:02:bc:a2:c1:69:74:3c:a9:1c:51:
                    c1:d8:6c:13:49:27:18:f9:c6:40:48:11:62:f7:60:
                    1a:c2:40:0a:ba:93:b5:ab:74:01:4f:20:eb:2b:a8:
                    e2:eb:a7:88:57:48:9e:38:c5
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            Netscape Comment: 
                Puppet Ruby/OpenSSL Generated Certificate
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier: 
                01:2F:91:13:B7:07:94:7E:65:82:0C:EC:72:E2:BE:F2:51:8E:95:AF
            X509v3 Key Usage: 
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication, TLS Web Client Authentication, E-mail Protection
            X509v3 Subject Alternative Name: 
                DNS:fqdn.of.ldap.host, DNS:fqdn.of.vip, DNS:another.alias
    Signature Algorithm: sha1WithRSAEncryption

Thanks

Dennis

Comment 10 Dennis Krul 2011-05-11 08:51:32 UTC
Created attachment 498225 [details]
slapd stackstrace

Full stacktrace for another slapd crash

Comment 11 Ruben Kerkhof 2011-05-12 08:18:04 UTC
The same crash happens with openldap-servers-2.4.24-2.fc15.x86_64:

Thread 1 (Thread 0x7f52ca7fd700 (LWP 11854)):
#0  0x00007f52f943b0c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f52f943ca76 in abort () at abort.c:92
#2  0x00007f52f9477cfb in __libc_message (do_abort=2, fmt=0x7f52f9566a98 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x00007f52f947ed63 in malloc_printerr (av=<value optimized out>, p=0x7f52d42dab50, have_lock=0) at malloc.c:6283
#4  _int_free (av=<value optimized out>, p=0x7f52d42dab50, have_lock=0) at malloc.c:4795
#5  0x00007f52e01fa4a2 in nss_ZFreeIf (pointer=0x7f52d42dab70) at arena.c:975
#6  0x00007f52e01eb574 in pem_getPrivateKey (arena=0x7f52bc11f750, rawkey=<value optimized out>, 
    pError=0x7f52ca7fc0f8, modulus=0x7f52d417f868) at prsa.c:202
#7  0x00007f52e01eb6cd in pem_mdCryptoOperationRSAPriv_Create (proto=0x7f52e0413140, mdMechanism=0x7f52e0413200, 
    pError=0x7f52ca7fc0f8, mdKey=<value optimized out>) at prsa.c:361
#8  0x00007f52e01f872f in nssCKFWMechanism_DecryptInit (fwMechanism=0x7f52bc147cc0, pMechanism=0x7f52ca7fc410, 
    fwSession=0x7f52c412a0e0, fwObject=0x7f52d4156358) at mechanism.c:538
#9  0x00007f52e01f55d1 in NSSCKFWC_DecryptInit (fwInstance=<value optimized out>, hSession=<value optimized out>, 
    pMechanism=0x7f52ca7fc410, hKey=<value optimized out>) at wrap.c:3105
#10 0x00007f52fa272091 in pk11_HandUnwrap (slot=0x7f52d4154d80, wrappingKey=12, mech=0x7f52ca7fc410, target=881, 
    keyTemplate=0x7f52ca7fc270, templateCount=3, key_size=0, wincx=0x0, crvp=0x7f52ca7fc428, isPerm=0, 
    inKey=<value optimized out>) at pk11skey.c:1960
#11 0x00007f52fa272687 in pk11_AnyUnwrapKey (slot=0x7f52d4154d80, wrappingKey=12, wrapType=1, 
    param=<value optimized out>, wrappedKey=0x7f52ca7fc600, target=881, operation=268, keySize=0, wincx=0x0, 
    userAttr=<value optimized out>, numAttrs=0, isPerm=0) at pk11skey.c:2104
#12 0x00007f52fa2756ec in PK11_PubUnwrapSymKey (wrappingKey=0x7f52d41c3020, wrappedKey=0x7f52ca7fc600, target=881, 
    operation=268, keySize=0) at pk11skey.c:2232
#13 0x00007f52fa79af54 in ssl3_HandleRSAClientKeyExchange (ss=0x7f52b410aa30, b=<value optimized out>, 
    length=<value optimized out>) at ssl3con.c:7360
#14 ssl3_HandleClientKeyExchange (ss=0x7f52b410aa30, b=<value optimized out>, length=<value optimized out>)
    at ssl3con.c:7476
#15 ssl3_HandleHandshakeMessage (ss=0x7f52b410aa30, b=<value optimized out>, length=<value optimized out>)
    at ssl3con.c:8648
#16 0x00007f52fa79d94c in ssl3_HandleHandshake (ss=0x7f52b410aa30, cText=<value optimized out>, databuf=0x7f52b410ad98)
    at ssl3con.c:8727
#17 ssl3_HandleRecord (ss=0x7f52b410aa30, cText=<value optimized out>, databuf=0x7f52b410ad98) at ssl3con.c:9066
#18 0x00007f52fa79ea06 in ssl3_GatherCompleteHandshake (ss=0x7f52b410aa30, flags=0) at ssl3gthr.c:209
#19 0x00007f52fa7a724f in SSL_ForceHandshake (fd=<value optimized out>) at sslsecur.c:397
#20 0x00007f52fb943f8a in tlsm_session_accept_or_connect (session=0x7f52b41246f0, is_accept=1) at tls_m.c:2338
#21 0x00007f52fb9420d3 in ldap_pvt_tls_accept (sb=0x7f52b011d700, ctx_arg=0x7f52fc5c50b0) at tls2.c:421
#22 0x00007f52fb81e143 in connection_read (ctx=0x7f52ca7fcb80, argv=0x2b) at ../../../servers/slapd/connection.c:1330
#23 connection_read_thread (ctx=0x7f52ca7fcb80, argv=0x2b) at ../../../servers/slapd/connection.c:1242
#24 0x00007f52fb91eeea in ldap_int_thread_pool_wrapper (xpool=0x7f52fc55b810)
    at ../../../libraries/libldap_r/tpool.c:685
#25 0x00007f52f97aaccb in start_thread (arg=0x7f52ca7fd700) at pthread_create.c:301
#26 0x00007f52f94e8c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 12 Rich Megginson 2011-06-03 00:56:43 UTC
I tried to reproduce on the latest RHEL6.1 but no luck
I used /etc/pki/tls/misc/CA to create a CA cert and an SSL server cert which matches the one from Comment 9 as closely as possible.  One hitch was that it creates a private key like this:
-----BEGIN PRIVATE KEY-----
I used openssl rsa to convert it to a key like this:
-----BEGIN RSA PRIVATE KEY-----

I'm not sure if that's valid.  How was the key in comment 9 created?
are you able to reproduce the problem with ldapsearch?  on f14 or on another platform?

Comment 13 Jan Vcelak 2011-06-03 09:51:16 UTC
(In reply to comment #12)
> I'm not sure if that's valid.  How was the key in comment 9 created?
> are you able to reproduce the problem with ldapsearch?  on f14 or on another
> platform?

It seems that it was created by Puppet.

# yum install puppet-server
# puppetca --generate ldaphostname

CA certificate: /var/lib/puppet/ssl/ca/ca_crt.pem
Certificate: /var/lib/puppet/ssl/ca/signed/ldaphostname.pem
Key: /var/lib/puppet/ssl/private_keys/ldaphostname.pem

But I'm still not experiencing the crash...

Comment 14 Jan Vcelak 2011-06-03 11:03:31 UTC
Tried to reproduce on F14, F15 and RHEL6... unsuccessfully.

Comment 15 Dennis Krul 2011-06-04 18:28:22 UTC
Yes, the CA and the client certs have been generated by puppetca.

In the meantime we've downgraded one server to openldap-servers-2.4.21-11.fc13.x86_64 and haven't seen any more issues there. The rest of the stack on this server is still F14 and is still using the Mozilla nss package (same version).

So the problem might in fact not be caused by Mozilla nss, but be a regression in openldap itself.

It's unfortunate that you can't reproduce this issue.. Have you put the ldap server under load? Are you using the same security settings we have?

TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!MD5
security ssf=128

Comment 16 Jan Vcelak 2011-06-05 09:32:29 UTC
(In reply to comment #15)
> In the meantime we've downgraded one server to
> openldap-servers-2.4.21-11.fc13.x86_64 and haven't seen any more issues there.
> The rest of the stack on this server is still F14 and is still using the
> Mozilla nss package (same version).
> 
> So the problem might in fact not be caused by Mozilla nss, but be a regression
> in openldap itself.
> 

openldap-servers-2.4.21-11.fc13.x86_64 is linked against OpenSSL. And according to backtrace you posted, this is a crash "somewhere inside" SSL_ForceHandshake.

> It's unfortunate that you can't reproduce this issue.. Have you put the ldap
> server under load? Are you using the same security settings we have?
> 
> TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!MD5
> security ssf=128

I have the same settings. Can you be more specific about the load? I tried to create some fake load by simple multi-thread script querying the server. But maybe it was not enough.

How long does it take, before the server crashes?

Please, can you provide your configuration file with obfuscated sensitive data? Do you have some ulimit settings applied?

Comment 17 Dennis Krul 2011-06-05 10:05:54 UTC
Oh I assumed slapd was dynamically linked to libnss, but apparently it's been hard linked.

The server doesn't crash very often anymore and we don't see the general protection messages in the kernel log anymore either. The server now crashes perhaps one or two times a week.

So perhaps if you keep the load script running for a week or so you might be able to reproduce it.

Our obfuscated slapd conf. Nothing special in there:

loglevel sync stats

include /etc/openldap/schema/core.schema
include /etc/openldap/schema/cosine.schema
include /etc/openldap/schema/inetorgperson.schema
include /etc/openldap/schema/misc.schema
include /etc/openldap/schema/openldap.schema
include /etc/openldap/schema/rfc2307bis.schema

pidfile         /var/run/openldap/slapd.pid
argsfile        /var/run/openldap/slapd.args

TLSCACertificateFile /etc/openldap/cacerts/ca.pem
TLSCertificateFile /etc/pki/tls/certs/slapd.pem
TLSCertificateKeyFile /etc/pki/tls/private/slapd.key
TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!MD5

security ssf=128

#######################################################################
# ldbm and/or bdb database definitions
#######################################################################

database    hdb
suffix      "dc=foo,dc=bar"
checkpoint  1024 15
rootdn      "cn=admin,dc=foo,dc=bar"
rootpw      sekrit:)
directory   /var/lib/ldap

# Ldap slave
syncrepl        rid=001
                provider=ldap://ldap1a.foo.bar
                binddn="uid=replicator,dc=foo,dc=bar"
                starttls=critical
                bindmethod=simple
                credentials=sekrit:)
                searchbase="dc=foo,dc=bar"
                type=refreshAndPersist
                retry="5 5 300 5"
                syncdata=accesslog
                logbase="cn=log"
                logfilter="(&(objectclass=auditWriteObject)(reqResult=0))"

# refer writes to master
updateref ldap://ldap1a.foo.bar

index cn,uid pres,eq
index objectClass eq
index uidNumber eq
index gidNumber eq
index memberUid eq
index entryCSN eq
index entryUUID eq
index ou eq
index uniqueMember eq
index member eq
index default eq,sub

# self can change password
access to attrs=userPassword
        by anonymous auth
        by self write
        by dn="uid=replicator,dc=foo,dc=bar" read
        by * none

# anyone can read
access to dn.subtree="dc=foo,dc=bar"
        by self write
        by * read

# enable monitoring
database monitor

access to dn.subtree="cn=monitor"
        by * read

Comment 18 Ruben Kerkhof 2011-06-05 12:13:16 UTC
Hi Jan,

One thing that may differ from our setup to yours is that we let puppet create the certificate with an extra X509v3 Subject Alternative Name (via the puppet certdnsname setting). This is needed so TLS works behind a VIP. Maybe nss has some difficulties with that.

Just a though...

Kind regards,

Ruben

Comment 19 Rich Megginson 2011-06-06 15:55:20 UTC
(In reply to comment #15)
> Yes, the CA and the client certs have been generated by puppetca.
> 
> In the meantime we've downgraded one server to
> openldap-servers-2.4.21-11.fc13.x86_64 and haven't seen any more issues there.
> The rest of the stack on this server is still F14 and is still using the
> Mozilla nss package (same version).
> 
> So the problem might in fact not be caused by Mozilla nss, but be a regression
> in openldap itself.
> 
> It's unfortunate that you can't reproduce this issue.. Have you put the ldap
> server under load? Are you using the same security settings we have?
> 
> TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!MD5
> security ssf=128

Yes.  I'm using cn=config - here are my settings:
olcTLSCACertificateFile: /path/to/CA/cacert.pem
olcTLSCertificateFile: /path/to/CA/certs/localhost.localdomain.1024.cert.pem
olcTLSCertificateKeyFile: /path/to/CA/certs/localhost.localdomain.1024.rsa.key.pem
olcTLSCipherSuite: TLSv1+RSA:!EXPORT:!NULL:!MD5
olcSecurity: ssf=128

cert.pem and key.pem look almost identical to https://bugzilla.redhat.com/show_bug.cgi?id=701678#c9 including SubjectAltName (hostname, issuer, and subject values are different).

Is it possible to reproduce the problem with ldapsearch or some other simple load client?  It is impractical to set up an environment with many client machines using nss_ldap/pam_ldap.

Comment 20 Rich Megginson 2011-06-06 15:59:54 UTC
(In reply to comment #15)

> In the meantime we've downgraded one server to
> openldap-servers-2.4.21-11.fc13.x86_64 and haven't seen any more issues there.

(In reply to comment #17)
> The server doesn't crash very often anymore and we don't see the general
> protection messages in the kernel log anymore either. The server now crashes
> perhaps one or two times a week.

Can you explain what you mean here?  The server doesn't crash very often anymore?  So at some point it was crashing more and now it doesn't crash as often?  What changed?  Are you saying that it crashes with openldap-servers-2.4.21, but less often than it crashed with openldap-servers-2.4.24-2.fc15.x86_64?

Does f15 crash more/less often than f14?

I notice in your syncrepl configuration you have starttls=critical but do not provide a ca cert - does it fallback to 
TLSCACertificateFile /etc/openldap/cacerts/ca.pem?

Comment 21 Dennis Krul 2011-06-10 15:38:55 UTC
Hello Rick,

Sorry for the late response. I've been busy :)

With openldap-servers-2.4.23-4.fc14.x86_64 we've seen crashes up to 4 times a day.
With openldap-servers-2.4.23-10.fc14.x86_64 we see maybe 1 or 2 crashes a week.
With openldap-servers-2.4.21-11.fc13.x86_64 we don't see any crashes at all.

Why would you want to specify the same CA twice? :) Replication works, so I guess it does use the global CA cert setting.

I'll see if we can set up a lab environment and try to reproduce the crashes as well as test out F15.

Comment 22 Ruben Kerkhof 2011-08-01 06:58:48 UTC
Hey Rich,

Just upgraded last friday to openldap-servers-2.6.24 from Rawhide, which I've rebuild for F-14, and we're still seeing the same crashes unfortunately.

Comment 23 Rich Megginson 2011-08-01 14:35:24 UTC
(In reply to comment #22)
> Hey Rich,
> 
> Just upgraded last friday to openldap-servers-2.6.24 from Rawhide, which I've
> rebuild for F-14, and we're still seeing the same crashes unfortunately.

We cannot reproduce this issue.  We are not going to be able to fix this issue without a reproducer or a full stack trace from core file.

Comment 24 Ruben Kerkhof 2011-08-10 14:07:14 UTC
Hi Rich,

I still haven't been able to reliably trigger this.
In the meantime we switched to sssd on all our clients, except for a few which use nss_ldapd. So there are no clients using nss_ldap anymore.

Here's a backtrace of openldap-2.4.26-1.fc14.x86_64:

(gdb) thread all apply bt
No symbol "all" in current context.
(gdb) thread apply all bt

Thread 6 (Thread 0x7fc90d05e700 (LWP 25667)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc94632a167 in ldap_int_thread_pool_wrapper (xpool=0x7fc9481cb830) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc9441abccb in start_thread (arg=0x7fc90d05e700) at pthread_create.c:301
#3  0x00007fc943ee9c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7fc90fffd700 (LWP 22431)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc94632a167 in ldap_int_thread_pool_wrapper (xpool=0x7fc9481cb830) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc9441abccb in start_thread (arg=0x7fc90fffd700) at pthread_create.c:301
#3  0x00007fc943ee9c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7fc9461c5720 (LWP 22245)):
#0  0x00007fc9441acde5 in pthread_join (threadid=140502006011648, thread_return=0x0) at pthread_join.c:89
#1  0x00007fc9462212e2 in slapd_daemon () at ../../../servers/slapd/daemon.c:2922
#2  0x00007fc946208b01 in main (argc=6, argv=<value optimized out>) at ../../../servers/slapd/main.c:983

Thread 3 (Thread 0x7fc92c289700 (LWP 22428)):
#0  0x00007fc943eea193 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fc94621f535 in slapd_daemon_task (ptr=<value optimized out>) at ../../../servers/slapd/daemon.c:2528
#2  0x00007fc9441abccb in start_thread (arg=0x7fc92c289700) at pthread_create.c:301
#3  0x00007fc943ee9c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7fc928ea8700 (LWP 22430)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc94632a167 in ldap_int_thread_pool_wrapper (xpool=0x7fc9481cb830) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc9441abccb in start_thread (arg=0x7fc928ea8700) at pthread_create.c:301
#3  0x00007fc943ee9c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7fc9296a9700 (LWP 22429)):
#0  0x00007fc943e3c0c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fc943e3da76 in abort () at abort.c:92
#2  0x00007fc943e78cfb in __libc_message (do_abort=2, fmt=0x7fc943f67a98 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x00007fc943e7fd63 in malloc_printerr (av=<value optimized out>, p=0x7fc900392b20, have_lock=0) at malloc.c:6283
#4  _int_free (av=<value optimized out>, p=0x7fc900392b20, have_lock=0) at malloc.c:4795
#5  0x00007fc90f5de4a2 in nss_ZFreeIf (pointer=0x7fc900392b40) at arena.c:975
#6  0x00007fc90f5cf574 in pem_getPrivateKey (arena=0x7fc9081f09d0, rawkey=<value optimized out>, pError=0x7fc9296a80d8, modulus=0x7fc900179938) at prsa.c:202
#7  0x00007fc90f5cf6cd in pem_mdCryptoOperationRSAPriv_Create (proto=0x7fc90f7f7140, mdMechanism=0x7fc90f7f7200, pError=0x7fc9296a80d8, mdKey=<value optimized out>) at prsa.c:361
#8  0x00007fc90f5dc72f in nssCKFWMechanism_DecryptInit (fwMechanism=0x7fc9081f3730, pMechanism=0x7fc9296a83f0, fwSession=0x7fc900381b40, fwObject=0x7fc900150408) at mechanism.c:538
#9  0x00007fc90f5d95d1 in NSSCKFWC_DecryptInit (fwInstance=<value optimized out>, hSession=<value optimized out>, pMechanism=0x7fc9296a83f0, hKey=<value optimized out>) at wrap.c:3105
#10 0x00007fc944c733a1 in pk11_HandUnwrap (slot=0x7fc90014ee30, wrappingKey=9, mech=0x7fc9296a83f0, target=881, keyTemplate=0x7fc9296a8250, templateCount=3, key_size=0, wincx=0x0, crvp=0x7fc9296a8408, isPerm=0, inKey=<value optimized out>) at pk11skey.c:1960
#11 0x00007fc944c73997 in pk11_AnyUnwrapKey (slot=0x7fc90014ee30, wrappingKey=9, wrapType=1, param=<value optimized out>, wrappedKey=0x7fc9296a85e0, target=881, operation=268, keySize=0, wincx=0x0, userAttr=<value optimized out>, numAttrs=0, isPerm=0) at pk11skey.c:2104
#12 0x00007fc944c769fc in PK11_PubUnwrapSymKey (wrappingKey=0x7fc900181ad0, wrappedKey=0x7fc9296a85e0, target=881, operation=268, keySize=0) at pk11skey.c:2232
#13 0x00007fc94519d05c in ssl3_HandleRSAClientKeyExchange (ss=0x7fc90038ab50, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7358
#14 ssl3_HandleClientKeyExchange (ss=0x7fc90038ab50, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7474
#15 ssl3_HandleHandshakeMessage (ss=0x7fc90038ab50, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:8646
#16 0x00007fc94519fa2c in ssl3_HandleHandshake (ss=0x7fc90038ab50, cText=<value optimized out>, databuf=0x7fc90038aeb8) at ssl3con.c:8725
#17 ssl3_HandleRecord (ss=0x7fc90038ab50, cText=<value optimized out>, databuf=0x7fc90038aeb8) at ssl3con.c:9064
#18 0x00007fc9451a0ae6 in ssl3_GatherCompleteHandshake (ss=0x7fc90038ab50, flags=0) at ssl3gthr.c:209
#19 0x00007fc9451a93af in SSL_ForceHandshake (fd=<value optimized out>) at sslsecur.c:397
#20 0x00007fc94634fa15 in tlsm_session_accept_or_connect (session=<value optimized out>, is_accept=1) at tls_m.c:2341
#21 0x00007fc94634d877 in ldap_pvt_tls_accept (sb=0x7fc8f4196490, ctx_arg=0x7fc948242e90) at tls2.c:421
#22 0x00007fc9462247b3 in connection_read (ctx=0x7fc9296a8b80, argv=0x38) at ../../../servers/slapd/connection.c:1355
#23 connection_read_thread (ctx=0x7fc9296a8b80, argv=0x38) at ../../../servers/slapd/connection.c:1267
#24 0x00007fc94632a108 in ldap_int_thread_pool_wrapper (xpool=0x7fc9481cb830) at ../../../libraries/libldap_r/tpool.c:685
#25 0x00007fc9441abccb in start_thread (arg=0x7fc9296a9700) at pthread_create.c:301
#26 0x00007fc943ee9c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 25 Rich Megginson 2011-08-16 20:18:28 UTC
I have still not been able to reproduce this crash.  But testing with syncrepl over TLS/SSL has revealed that NSS_Init* functions are not thread-safe, and I was able to crash the server quite easily.

see https://bugzilla.redhat.com/show_bug.cgi?id=731112

I don't know if you are able to build from source and test, but if so, would it be possible for you to attempt to reproduce your crash with the patch posted in the ITS linked to the bug?

Comment 26 Jan Vcelak 2011-08-17 10:37:28 UTC
I will prepare scratch build with mentioned patch applied.

Comment 27 Jan Vcelak 2011-08-17 11:13:26 UTC
Scratch build is done:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3279353

In case the build got deleted:
http://jvcelak.fedorapeople.org/openldap-701678/

If you need a build for later Fedora, let me know.

Comment 28 Ruben Kerkhof 2011-08-25 18:48:15 UTC
Hi Jan, thanks for the build.

We've downgraded our ldap servers. Hopefully they keep running, but in any case we'll report in a few days.

Comment 29 Mark Nipper 2011-08-26 15:37:14 UTC
I can reliably reproduce this every single time under RHEL 6.1.  Do you want me to open a separate ticket under that product?

And separately, do you want me to provide any additional information or does everyone feel the non-thread safe calls are the problem?

For what it's worth, I'm using a wild card certificate from GoDaddy.  The problem happens immediately every single time I start a consumer using syncrepl.  The provider dies immediately as soon as the consumer tries to initiate a connection.

I'm still using slapd.conf and my TLS options look like:
---
TLSCACertificateFile /etc/pki/tls/certs/ca-bundle.crt
TLSCertificateFile /etc/pki/tls/certs/wild_intermediate.crt
TLSCertificateKeyFile /etc/pki/tls/private/wild_intermediate.key
TLSVerifyClient never

The main certificate file is the wild card certificate and the one intermediate certificate needed to verify trust back up to the GoDaddy CA certificate.

Comment 30 Jan Vcelak 2011-08-26 21:40:15 UTC
(In reply to comment #29)
> I can reliably reproduce this every single time under RHEL 6.1.  Do you want me
> to open a separate ticket under that product?

Yes, please. We need a reliable reproducer (or steps to reproduce).
 
> And separately, do you want me to provide any additional information or does
> everyone feel the non-thread safe calls are the problem?

I don't know if I understand correctly. You were able to reproduce it on F14 as well? And did the crashes disappear after installing the scratch build from comment #27? In that case, it should be fixed in F15 (openldap-2.4.24-4.fc15) and F16 (openldap-2.4.26-2.fc16). And I will update F14 as well because this seems quite critical.

Comment 31 Mark Nipper 2011-08-26 22:05:45 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > I can reliably reproduce this every single time under RHEL 6.1.  Do you want me
> > to open a separate ticket under that product?
> 
> Yes, please. We need a reliable reproducer (or steps to reproduce).
> 
> > And separately, do you want me to provide any additional information or does
> > everyone feel the non-thread safe calls are the problem?
> 
> I don't know if I understand correctly. You were able to reproduce it on F14 as
> well? And did the crashes disappear after installing the scratch build from
> comment #27? In that case, it should be fixed in F15 (openldap-2.4.24-4.fc15)
> and F16 (openldap-2.4.26-2.fc16). And I will update F14 as well because this
> seems quite critical.

Your comment is equally confusing to me.  Let's take it one at a time.

Did you want me to open a separate ticket on this bug for the version of OpenLDAP in Red Hat Enterprise Linux 6.x (2.4.23-15.el6_1.1)?

I have not tried any of this in Fedora.  But I found this bug report after having almost the exact same issue in RHEL, and I assume they're related or the exact same problem.  I have not tried any other version of OpenLDAP.  I simply turned off TLS for the time being between my provider and consumers to avoid the problem.

I will attach my provider and consumer slapd.conf files.  If it's the same problem, you shouldn't even need any data in your databases.  Just starting the consumers is enough to crash the provider, even when all of them are empty.  Hopefully it won't be specific to my certificates either since that is harder to reproduce on your end.  Give me a second to sanitize and attach my configuration files.

Comment 32 Mark Nipper 2011-08-26 22:09:52 UTC
Created attachment 520145 [details]
slapd.conf for provider

Comment 33 Mark Nipper 2011-08-26 22:10:20 UTC
Created attachment 520146 [details]
slapd.conf for consumer

Comment 34 Mark Nipper 2011-08-26 22:15:15 UTC
Derp.  You need to uncomment the two starttls lines in the consumer slapd.conf obviously to test this (and provide your own SSL certificates obviously).

Comment 35 Jan Vcelak 2011-08-27 09:11:09 UTC
(In reply to comment #31)
> Did you want me to open a separate ticket on this bug for the version of
> OpenLDAP in Red Hat Enterprise Linux 6.x (2.4.23-15.el6_1.1)?
> 
> I have not tried any of this in Fedora.  But I found this bug report after
> having almost the exact same issue in RHEL, and I assume they're related or the
> exact same problem.  I have not tried any other version of OpenLDAP.  I simply
> turned off TLS for the time being between my provider and consumers to avoid
> the problem.
> 
> I will attach my provider and consumer slapd.conf files.  If it's the same
> problem, you shouldn't even need any data in your databases.  Just starting the
> consumers is enough to crash the provider, even when all of them are empty. 
> Hopefully it won't be specific to my certificates either since that is harder
> to reproduce on your end.  Give me a second to sanitize and attach my
> configuration files.

The steps to reproduce are almost the same as for bug #701678, where the cause is clear. And we have already fixed that in RHEL (bug #731168). So I don't need a separate ticket.  

This bug is about random crashes. And we would like to know, if the cause is the same. Mark, thank you for your patience. :-)

Let's give Ruben some time to test.

Comment 36 Ruben Kerkhof 2011-08-29 11:16:46 UTC
It took a while for this to reoccur, but attached is a backtrace. This time on a slave.

Loaded symbols for /lib64/libgcc_s-4.5.1-20100924.so.1
Core was generated by `/usr/sbin/slapd -h  ldap:/// ldaps:/// -u ldap -4'.
Program terminated with signal 6, Aborted.
#0  0x00007f1a23a720c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) thread apply all bt

Thread 5 (Thread 0x7f1a0b720700 (LWP 14467)):
#0  0xffffffffff600046 in ?? ()
#1  0x00007f1a002292b0 in ?? ()
#2  0x00007f1a23ab3a03 in _IO_str_init_static_internal (sf=0x0, ptr=0x7f1a0b71f7c0 "\360\367q\v\032\177", size=<value optimized out>, pstart=0x15220490 <Address 0x15220490 out of bounds>) at strops.c:50
#3  0x0000000000034ecc in ?? ()
#4  0x00007f1a25d4cad0 in ?? ()
#5  0x00007f1a001d23a0 in ?? ()
#6  0x00007f1a0b71f8c8 in ?? ()
#7  0x00007f1a23ada12d in time () at ../sysdeps/unix/sysv/linux/x86_64/time.S:36
#8  0x00007f1a23b1b020 in __vsyslog_chk (pri=<value optimized out>, flag=-10484695, fmt=0x7f1a25f8c4f8 "conn=%lu fd=%d TLS established tls_ssf=%u ssf=%u\n", ap=0x7f1a0b71f910) at ../misc/syslog.c:196
#9  0x00007f1a23b1b5a3 in __syslog_chk (pri=<value optimized out>, flag=<value optimized out>, fmt=<value optimized out>) at ../misc/syslog.c:131
#10 0x00007f1a25e565ba in syslog (ctx=0x7f1a0b71fb80, argv=<value optimized out>) at /usr/include/bits/syslog.h:32
#11 connection_read (ctx=0x7f1a0b71fb80, argv=<value optimized out>) at ../../../servers/slapd/connection.c:1360
#12 connection_read_thread (ctx=0x7f1a0b71fb80, argv=<value optimized out>) at ../../../servers/slapd/connection.c:1238
#13 0x00007f1a25f55218 in ldap_int_thread_pool_wrapper (xpool=0x7f1a26aa1760) at ../../../libraries/libldap_r/tpool.c:685
#14 0x00007f1a23de1ccb in start_thread (arg=0x7f1a0b720700) at pthread_create.c:301
#15 0x00007f1a23b1fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7f1a0bf21700 (LWP 14466)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f1a25f55277 in ldap_int_thread_pool_wrapper (xpool=0x7f1a26aa1760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007f1a23de1ccb in start_thread (arg=0x7f1a0bf21700) at pthread_create.c:301
#3  0x00007f1a23b1fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7f1a0c722700 (LWP 14465)):
#0  0x00007f1a23b20193 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f1a25e512b3 in slapd_daemon_task (ptr=<value optimized out>) at ../../../servers/slapd/daemon.c:2467
#2  0x00007f1a23de1ccb in start_thread (arg=0x7f1a0c722700) at pthread_create.c:301
#3  0x00007f1a23b1fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7f1a25df8720 (LWP 14462)):
#0  0x00007f1a23de2de5 in pthread_join (threadid=139749854684928, thread_return=0x0) at pthread_join.c:89
#1  0x00007f1a25e529fc in slapd_daemon () at ../../../servers/slapd/daemon.c:2842
#2  0x00007f1a25e3b251 in main (argc=6, argv=<value optimized out>) at ../../../servers/slapd/main.c:961

Thread 1 (Thread 0x7f19fbfff700 (LWP 16053)):
#0  0x00007f1a23a720c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f1a23a73a76 in abort () at abort.c:92
#2  0x00007f1a23aaecfb in __libc_message (do_abort=2, fmt=0x7f1a23b9da78 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x00007f1a23ab5d63 in malloc_printerr (av=<value optimized out>, p=0x7f1a001fdb40, have_lock=0) at malloc.c:6283
#4  _int_free (av=<value optimized out>, p=0x7f1a001fdb40, have_lock=0) at malloc.c:4795
#5  0x00007f1a0a8314a2 in nss_ZFreeIf (pointer=0x7f1a001fdb60) at arena.c:975
#6  0x00007f1a0a822574 in pem_getPrivateKey (arena=0x7f19f4119e00, rawkey=<value optimized out>, pError=0x7f19fbffe0d8, modulus=0x7f19fc17f838) at prsa.c:202
#7  0x00007f1a0a8226cd in pem_mdCryptoOperationRSAPriv_Create (proto=0x7f1a0aa4a140, mdMechanism=0x7f1a0aa4a200, pError=0x7f19fbffe0d8, mdKey=<value optimized out>) at prsa.c:361
#8  0x00007f1a0a82f72f in nssCKFWMechanism_DecryptInit (fwMechanism=0x7f19f4119dd0, pMechanism=0x7f19fbffe3f0, fwSession=0x7f19f4119420, fwObject=0x7f19fc156308) at mechanism.c:538
#9  0x00007f1a0a82c5d1 in NSSCKFWC_DecryptInit (fwInstance=<value optimized out>, hSession=<value optimized out>, pMechanism=0x7f19fbffe3f0, hKey=<value optimized out>) at wrap.c:3105
---Type <return> to continue, or q <return> to quit---
#10 0x00007f1a248a93a1 in pk11_HandUnwrap (slot=0x7f19fc154d30, wrappingKey=12, mech=0x7f19fbffe3f0, target=881, keyTemplate=0x7f19fbffe250, templateCount=3, key_size=0, wincx=0x0, crvp=0x7f19fbffe408, 
    isPerm=0, inKey=<value optimized out>) at pk11skey.c:1960
#11 0x00007f1a248a9997 in pk11_AnyUnwrapKey (slot=0x7f19fc154d30, wrappingKey=12, wrapType=1, param=<value optimized out>, wrappedKey=0x7f19fbffe5e0, target=881, operation=268, keySize=0, wincx=0x0, 
    userAttr=<value optimized out>, numAttrs=0, isPerm=0) at pk11skey.c:2104
#12 0x00007f1a248ac9fc in PK11_PubUnwrapSymKey (wrappingKey=0x7f19fc1c3090, wrappedKey=0x7f19fbffe5e0, target=881, operation=268, keySize=0) at pk11skey.c:2232
#13 0x00007f1a24dd305c in ssl3_HandleRSAClientKeyExchange (ss=0x7f19f4115cc0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7358
#14 ssl3_HandleClientKeyExchange (ss=0x7f19f4115cc0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7474
#15 ssl3_HandleHandshakeMessage (ss=0x7f19f4115cc0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:8646
#16 0x00007f1a24dd5a2c in ssl3_HandleHandshake (ss=0x7f19f4115cc0, cText=<value optimized out>, databuf=0x7f19f4116028) at ssl3con.c:8725
#17 ssl3_HandleRecord (ss=0x7f19f4115cc0, cText=<value optimized out>, databuf=0x7f19f4116028) at ssl3con.c:9064
#18 0x00007f1a24dd6ae6 in ssl3_GatherCompleteHandshake (ss=0x7f19f4115cc0, flags=0) at ssl3gthr.c:209
#19 0x00007f1a24ddf3af in SSL_ForceHandshake (fd=<value optimized out>) at sslsecur.c:397
#20 0x00007f1a25f7a025 in tlsm_session_accept_or_connect (session=<value optimized out>, is_accept=1) at tls_m.c:2434
#21 0x00007f1a25f78027 in ldap_pvt_tls_accept (sb=0x7f19fc2d4200, ctx_arg=0x7f1a26b06e70) at tls2.c:429
#22 0x00007f1a25e55d63 in connection_read (ctx=0x7f19fbffeb80, argv=0x22) at ../../../servers/slapd/connection.c:1326
#23 connection_read_thread (ctx=0x7f19fbffeb80, argv=0x22) at ../../../servers/slapd/connection.c:1238
#24 0x00007f1a25f55218 in ldap_int_thread_pool_wrapper (xpool=0x7f1a26aa1760) at ../../../libraries/libldap_r/tpool.c:685
#25 0x00007f1a23de1ccb in start_thread (arg=0x7f19fbfff700) at pthread_create.c:301
#26 0x00007f1a23b1fc2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 37 Rich Megginson 2011-08-29 15:48:29 UTC
Almost always in the same place -

#5  0x00007f1a0a8314a2 in nss_ZFreeIf (pointer=0x7f1a001fdb60) at arena.c:975
#6  0x00007f1a0a822574 in pem_getPrivateKey (arena=0x7f19f4119e00,
rawkey=<value optimized out>, pError=0x7f19fbffe0d8, modulus=0x7f19fc17f838) at
prsa.c:202
#7  0x00007f1a0a8226cd in pem_mdCryptoOperationRSAPriv_Create
(proto=0x7f1a0aa4a140, mdMechanism=0x7f1a0aa4a200, pError=0x7f19fbffe0d8,
mdKey=<value optimized out>) at prsa.c:361
#8  0x00007f1a0a82f72f in nssCKFWMechanism_DecryptInit
(fwMechanism=0x7f19f4119dd0, pMechanism=0x7f19fbffe3f0,
fwSession=0x7f19f4119420, fwObject=0x7f19fc156308) at mechanism.c:538

I'm inclined to believe this is a problem with moznss pem - looks like a double free.  The section of code around 202 is quite strange.

Ruben, best case - how long does it take to reproduce this problem? 1 day?  several days?

I'm pretty sure the problem that Mark ran into is https://bugzilla.redhat.com/show_bug.cgi?id=731168

Comment 38 Mark Nipper 2011-08-29 15:57:40 UTC
And after further investigation, I don't even thing my configuration was actually kosher.  pbind (ldap backend) and syncprov don't play very well together.  I was trying to create a configuration where a primary LDAP server was feeding two slaves, and all three would be using pbind to proxy bind requests to an external server.  That's the configuration I have attached here in this report.  Unfortunately, this meant that syncprov wasn't propagating to the slaves from the DIT on the master which was using pbind as the ldap backend which pbind uses also turns off lastmod which creates the information used by syncprov.

I'm not sure whether this is anything that can be worked around.  But basically, I think it's a "bad" configuration in OpenLDAP.  Now, having said that, it still seems like a TLS connection initiated from an, as yet, unauthenticated consumer shouldn't be able to crash the provider.  But once I stopped using pbind on the provider entirely and switched to using only one instance of syncprov/syncrepl on each end, TLS does not crash the consumer any longer.

Blah.  Ignore my contributions here.  :)

Comment 39 Ruben Kerkhof 2011-08-29 17:57:11 UTC
(In reply to comment #37)
> Almost always in the same place -
> 
> #5  0x00007f1a0a8314a2 in nss_ZFreeIf (pointer=0x7f1a001fdb60) at arena.c:975
> #6  0x00007f1a0a822574 in pem_getPrivateKey (arena=0x7f19f4119e00,
> rawkey=<value optimized out>, pError=0x7f19fbffe0d8, modulus=0x7f19fc17f838) at
> prsa.c:202
> #7  0x00007f1a0a8226cd in pem_mdCryptoOperationRSAPriv_Create
> (proto=0x7f1a0aa4a140, mdMechanism=0x7f1a0aa4a200, pError=0x7f19fbffe0d8,
> mdKey=<value optimized out>) at prsa.c:361
> #8  0x00007f1a0a82f72f in nssCKFWMechanism_DecryptInit
> (fwMechanism=0x7f19f4119dd0, pMechanism=0x7f19fbffe3f0,
> fwSession=0x7f19f4119420, fwObject=0x7f19fc156308) at mechanism.c:538
> 
> I'm inclined to believe this is a problem with moznss pem - looks like a double
> free.  The section of code around 202 is quite strange.
> 
> Ruben, best case - how long does it take to reproduce this problem? 1 day? 
> several days?

It depends, my master has been up for several days now. This particular slave crashed twice today, but due to some missing sysctl settings I've only been able catch a core of the last crash.

Our ldap tree is fairly static and only has about 10 objects in it, so there's almost no load and no replication traffic.

Anyway, I'll keep supplying you with more stackstraces...

Comment 40 Rich Megginson 2011-08-29 18:51:19 UTC
Another clue here
Thread 5 has just accepted an incoming TLS connection
Thread 1 which has the segv is in the process of accepting an incoming TLS connection

Might be some sort of thread safety issue here - perhaps thread 5 has just freed the private key modulus out from under thread 1.

Do you get the crashes on both the syncrepl provider and consumer?

Comment 41 Ruben Kerkhof 2011-08-29 20:03:28 UTC
Just on the consumer at the time of that particular stacktrace.
There's nothing strange in the logs, I've increased the loglevel on all servers to capture more information.

Comment 42 Ruben Kerkhof 2011-09-02 23:13:52 UTC
Here's another one, from the same consumer:

Core was generated by `/usr/sbin/slapd -h  ldap:/// ldaps:/// -u ldap -4'.
Program terminated with signal 6, Aborted.
#0  0x00007ff5303cc0c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) thread apply all bt

Thread 7 (Thread 0x7ff502ffe700 (LWP 9589)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007ff5328af277 in ldap_int_thread_pool_wrapper (xpool=0x7ff5342fe740) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007ff53073bccb in start_thread (arg=0x7ff502ffe700) at pthread_create.c:301
#3  0x00007ff530479c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 6 (Thread 0x7ff51907c700 (LWP 24955)):
#0  0x00007ff53047a193 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007ff5327ab2b3 in slapd_daemon_task (ptr=<value optimized out>) at ../../../servers/slapd/daemon.c:2467
#2  0x00007ff53073bccb in start_thread (arg=0x7ff51907c700) at pthread_create.c:301
#3  0x00007ff530479c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7ff513fff700 (LWP 24957)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007ff5328af277 in ldap_int_thread_pool_wrapper (xpool=0x7ff5342fe740) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007ff53073bccb in start_thread (arg=0x7ff513fff700) at pthread_create.c:301
#3  0x00007ff530479c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7ff510b90700 (LWP 27857)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007ff5328af277 in ldap_int_thread_pool_wrapper (xpool=0x7ff5342fe740) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007ff53073bccb in start_thread (arg=0x7ff510b90700) at pthread_create.c:301
#3  0x00007ff530479c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7ff5017fc700 (LWP 24325)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007ff5328af277 in ldap_int_thread_pool_wrapper (xpool=0x7ff5342fe740) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007ff53073bccb in start_thread (arg=0x7ff5017fc700) at pthread_create.c:301
#3  0x00007ff530479c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7ff532752720 (LWP 24951)):
#0  0x00007ff53073cde5 in pthread_join (threadid=140690663655168, thread_return=0x0) at pthread_join.c:89
#1  0x00007ff5327ac9fc in slapd_daemon () at ../../../servers/slapd/daemon.c:2842
#2  0x00007ff532795251 in main (argc=6, argv=<value optimized out>) at ../../../servers/slapd/main.c:961

Thread 1 (Thread 0x7ff51887b700 (LWP 24956)):
#0  0x00007ff5303cc0c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ff5303cda76 in abort () at abort.c:92
#2  0x00007ff530408cfb in __libc_message (do_abort=2, fmt=0x7ff5304f7a78 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x00007ff53040fd63 in malloc_printerr (av=<value optimized out>, p=0x7ff504236300, have_lock=0) at malloc.c:6283
#4  _int_free (av=<value optimized out>, p=0x7ff504236300, have_lock=0) at malloc.c:4795
#5  0x00007ff5131104a2 in nss_ZFreeIf (pointer=0x7ff504236320) at arena.c:975
#6  0x00007ff513101574 in pem_getPrivateKey (arena=0x7ff50c386e20, rawkey=<value optimized out>, pError=0x7ff51887a0d8, modulus=0x7ff50c17f838) at prsa.c:202
#7  0x00007ff5131016cd in pem_mdCryptoOperationRSAPriv_Create (proto=0x7ff513329140, mdMechanism=0x7ff513329200, pError=0x7ff51887a0d8, mdKey=<value optimized out>) at prsa.c:361
#8  0x00007ff51310e72f in nssCKFWMechanism_DecryptInit (fwMechanism=0x7ff50c386df0, pMechanism=0x7ff51887a3f0, fwSession=0x7ff50c3865a0, fwObject=0x7ff50c156308) at mechanism.c:538
#9  0x00007ff51310b5d1 in NSSCKFWC_DecryptInit (fwInstance=<value optimized out>, hSession=<value optimized out>, pMechanism=0x7ff51887a3f0, hKey=<value optimized out>) at wrap.c:3105
#10 0x00007ff5312033a1 in pk11_HandUnwrap (slot=0x7ff50c154d30, wrappingKey=12, mech=0x7ff51887a3f0, target=881, keyTemplate=0x7ff51887a250, templateCount=3, key_size=0, wincx=0x0, 
---Type <return> to continue, or q <return> to quit---
    crvp=0x7ff51887a408, isPerm=0, inKey=<value optimized out>) at pk11skey.c:1960
#11 0x00007ff531203997 in pk11_AnyUnwrapKey (slot=0x7ff50c154d30, wrappingKey=12, wrapType=1, param=<value optimized out>, wrappedKey=0x7ff51887a5e0, target=881, operation=268, keySize=0, wincx=0x0, 
    userAttr=<value optimized out>, numAttrs=0, isPerm=0) at pk11skey.c:2104
#12 0x00007ff5312069fc in PK11_PubUnwrapSymKey (wrappingKey=0x7ff50c1c30a0, wrappedKey=0x7ff51887a5e0, target=881, operation=268, keySize=0) at pk11skey.c:2232
#13 0x00007ff53172d05c in ssl3_HandleRSAClientKeyExchange (ss=0x7ff5042d10b0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7358
#14 ssl3_HandleClientKeyExchange (ss=0x7ff5042d10b0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7474
#15 ssl3_HandleHandshakeMessage (ss=0x7ff5042d10b0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:8646
#16 0x00007ff53172fa2c in ssl3_HandleHandshake (ss=0x7ff5042d10b0, cText=<value optimized out>, databuf=0x7ff5042d1418) at ssl3con.c:8725
#17 ssl3_HandleRecord (ss=0x7ff5042d10b0, cText=<value optimized out>, databuf=0x7ff5042d1418) at ssl3con.c:9064
#18 0x00007ff531730ae6 in ssl3_GatherCompleteHandshake (ss=0x7ff5042d10b0, flags=0) at ssl3gthr.c:209
#19 0x00007ff5317393af in SSL_ForceHandshake (fd=<value optimized out>) at sslsecur.c:397
#20 0x00007ff5328d4025 in tlsm_session_accept_or_connect (session=<value optimized out>, is_accept=1) at tls_m.c:2434
#21 0x00007ff5328d2027 in ldap_pvt_tls_accept (sb=0x7ff5080017e0, ctx_arg=0x7ff534363cd0) at tls2.c:429
#22 0x00007ff5327afd63 in connection_read (ctx=0x7ff51887ab80, argv=0x21) at ../../../servers/slapd/connection.c:1326
#23 connection_read_thread (ctx=0x7ff51887ab80, argv=0x21) at ../../../servers/slapd/connection.c:1238
#24 0x00007ff5328af218 in ldap_int_thread_pool_wrapper (xpool=0x7ff5342fe740) at ../../../libraries/libldap_r/tpool.c:685
#25 0x00007ff53073bccb in start_thread (arg=0x7ff51887b700) at pthread_create.c:301
#26 0x00007ff530479c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 43 Rich Megginson 2011-09-02 23:51:21 UTC
(In reply to comment #42)
> Here's another one, from the same consumer:
> snip

This one doesn't have another thread which has just accepted a TLS connection.

I suppose you could try increasing or decreasing the number of threads that openldap uses.  If it is a race condition, decreasing the number of threads may make the problem harder to reproduce, and conversely for increasing the number of threads.

Comment 44 Ruben Kerkhof 2011-09-04 11:00:21 UTC
Ok, I'll try to do that.

Meanwhile, here's another trace, from the master:

Core was generated by `/usr/sbin/slapd -h  ldap:/// ldaps:/// -u ldap -4'.
Program terminated with signal 6, Aborted.
#0  0x00007f19dc7210c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) thread all apply bt
No symbol "all" in current context.
(gdb) thread apply all bt

Thread 8 (Thread 0x7f19ae305700 (LWP 11681)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f19dec04277 in ldap_int_thread_pool_wrapper (xpool=0x7f19df43d760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007f19dca90ccb in start_thread (arg=0x7f19ae305700) at pthread_create.c:301
#3  0x00007f19dc7cec2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 7 (Thread 0x7f19aeb06700 (LWP 11680)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f19dec04277 in ldap_int_thread_pool_wrapper (xpool=0x7f19df43d760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007f19dca90ccb in start_thread (arg=0x7f19aeb06700) at pthread_create.c:301
#3  0x00007f19dc7cec2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 6 (Thread 0x7f19a4ffc700 (LWP 16613)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f19dec04277 in ldap_int_thread_pool_wrapper (xpool=0x7f19df43d760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007f19dca90ccb in start_thread (arg=0x7f19a4ffc700) at pthread_create.c:301
#3  0x00007f19dc7cec2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7f19af307700 (LWP 11679)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f19dec04277 in ldap_int_thread_pool_wrapper (xpool=0x7f19df43d760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007f19dca90ccb in start_thread (arg=0x7f19af307700) at pthread_create.c:301
#3  0x00007f19dc7cec2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7f19afb08700 (LWP 11678)):
#0  0x00007f19dc7cf193 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f19deb002b3 in slapd_daemon_task (ptr=<value optimized out>) at ../../../servers/slapd/daemon.c:2467
#2  0x00007f19dca90ccb in start_thread (arg=0x7f19afb08700) at pthread_create.c:301
#3  0x00007f19dc7cec2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7f19deaa6720 (LWP 11674)):
#0  0x00007f19dca91de5 in pthread_join (threadid=139748298491648, thread_return=0x0) at pthread_join.c:89
#1  0x00007f19deb019fc in slapd_daemon () at ../../../servers/slapd/daemon.c:2842
#2  0x00007f19deaea251 in main (argc=6, argv=<value optimized out>) at ../../../servers/slapd/main.c:961

Thread 2 (Thread 0x7f1991ffd700 (LWP 26835)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f19dec04277 in ldap_int_thread_pool_wrapper (xpool=0x7f19df43d760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007f19dca90ccb in start_thread (arg=0x7f1991ffd700) at pthread_create.c:301
#3  0x00007f19dc7cec2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7f19ace98700 (LWP 3007)):
#0  0x00007f19dc7210c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f19dc722a76 in abort () at abort.c:92
#2  0x00007f19dc75dcfb in __libc_message (do_abort=2, fmt=0x7f19dc84ca78 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x00007f19dc764d63 in malloc_printerr (av=<value optimized out>, p=0x7f1994379a80, have_lock=0) at malloc.c:6283
#4  _int_free (av=<value optimized out>, p=0x7f1994379a80, have_lock=0) at malloc.c:4795
#5  0x00007f19ad4164a2 in nss_ZFreeIf (pointer=0x7f1994379aa0) at arena.c:975
#6  0x00007f19ad407574 in pem_getPrivateKey (arena=0x7f198c201170, rawkey=<value optimized out>, pError=0x7f19ace970d8, modulus=0x7f19940770b8) at prsa.c:202
#7  0x00007f19ad4076cd in pem_mdCryptoOperationRSAPriv_Create (proto=0x7f19ad62f140, mdMechanism=0x7f19ad62f200, pError=0x7f19ace970d8, mdKey=<value optimized out>) at prsa.c:361
#8  0x00007f19ad41472f in nssCKFWMechanism_DecryptInit (fwMechanism=0x7f198c201140, pMechanism=0x7f19ace973f0, fwSession=0x7f1994402350, fwObject=0x7f199404db88) at mechanism.c:538
#9  0x00007f19ad4115d1 in NSSCKFWC_DecryptInit (fwInstance=<value optimized out>, hSession=<value optimized out>, pMechanism=0x7f19ace973f0, hKey=<value optimized out>) at wrap.c:3105
#10 0x00007f19dd5583a1 in pk11_HandUnwrap (slot=0x7f199404c5b0, wrappingKey=9, mech=0x7f19ace973f0, target=881, keyTemplate=0x7f19ace97250, templateCount=3, key_size=0, wincx=0x0, crvp=0x7f19ace97408, isPerm=0, inKey=<value optimized out>) at pk11skey.c:1960
#11 0x00007f19dd558997 in pk11_AnyUnwrapKey (slot=0x7f199404c5b0, wrappingKey=9, wrapType=1, param=<value optimized out>, wrappedKey=0x7f19ace975e0, target=881, operation=268, keySize=0, wincx=0x0, userAttr=<value optimized out>, numAttrs=0, isPerm=0)
    at pk11skey.c:2104
#12 0x00007f19dd55b9fc in PK11_PubUnwrapSymKey (wrappingKey=0x7f199407f250, wrappedKey=0x7f19ace975e0, target=881, operation=268, keySize=0) at pk11skey.c:2232
#13 0x00007f19dda8205c in ssl3_HandleRSAClientKeyExchange (ss=0x7f19a03d9470, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7358
#14 ssl3_HandleClientKeyExchange (ss=0x7f19a03d9470, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7474
#15 ssl3_HandleHandshakeMessage (ss=0x7f19a03d9470, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:8646
#16 0x00007f19dda84a2c in ssl3_HandleHandshake (ss=0x7f19a03d9470, cText=<value optimized out>, databuf=0x7f19a03d97d8) at ssl3con.c:8725
#17 ssl3_HandleRecord (ss=0x7f19a03d9470, cText=<value optimized out>, databuf=0x7f19a03d97d8) at ssl3con.c:9064
#18 0x00007f19dda85ae6 in ssl3_GatherCompleteHandshake (ss=0x7f19a03d9470, flags=0) at ssl3gthr.c:209
#19 0x00007f19dda8e3af in SSL_ForceHandshake (fd=<value optimized out>) at sslsecur.c:397
#20 0x00007f19dec29025 in tlsm_session_accept_or_connect (session=<value optimized out>, is_accept=1) at tls_m.c:2434
#21 0x00007f19dec27027 in ldap_pvt_tls_accept (sb=0x7f198c1c7de0, ctx_arg=0x7f19df4afa10) at tls2.c:429
---Type <return> to continue, or q <return> to quit---
#22 0x00007f19deb04d63 in connection_read (ctx=0x7f19ace97b80, argv=0x3f) at ../../../servers/slapd/connection.c:1326
#23 connection_read_thread (ctx=0x7f19ace97b80, argv=0x3f) at ../../../servers/slapd/connection.c:1238
#24 0x00007f19dec04218 in ldap_int_thread_pool_wrapper (xpool=0x7f19df43d760) at ../../../libraries/libldap_r/tpool.c:685
#25 0x00007f19dca90ccb in start_thread (arg=0x7f19ace98700) at pthread_create.c:301
#26 0x00007f19dc7cec2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Comment 45 Ruben Kerkhof 2011-09-04 11:36:45 UTC
Wow, that was easy. I got the consumer to crash in a few minutes by running:
while true; do ldapsearch -xZZ -H ldap://localhost uid=ruben >/dev/null || exit 1; done &
about ten times in parallel in the background.

Here's the trace:



Thread 20 (Thread 0x7fc886ffe700 (LWP 25932)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc886ffe700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 19 (Thread 0x7fc871ffc700 (LWP 2377)):
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007fc8aefc07b4 in _L_lock_505 () from /lib64/libc-2.13.so
#2  0x00007fc8aefc01aa in __vsyslog_chk (pri=<value optimized out>, flag=1, fmt=0x7fc8b14315a8 "ber_get_next on fd %d failed errno=%d (%s)\n", ap=0x7fc871ffb910) at ../misc/syslog.c:260
#3  0x00007fc8aefc05a3 in __syslog_chk (pri=<value optimized out>, flag=<value optimized out>, fmt=<value optimized out>) at ../misc/syslog.c:131
#4  0x00007fc8b12fa9e9 in syslog (ctx=0x7fc871ffbb80, argv=<value optimized out>) at /usr/include/bits/syslog.h:32
#5  connection_input (ctx=0x7fc871ffbb80, argv=<value optimized out>) at ../../../servers/slapd/connection.c:1493
#6  connection_read (ctx=0x7fc871ffbb80, argv=<value optimized out>) at ../../../servers/slapd/connection.c:1414
#7  connection_read_thread (ctx=0x7fc871ffbb80, argv=<value optimized out>) at ../../../servers/slapd/connection.c:1238
#8  0x00007fc8b13fa218 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:685
#9  0x00007fc8af286ccb in start_thread (arg=0x7fc871ffc700) at pthread_create.c:301
#10 0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 18 (Thread 0x7fc8517fc700 (LWP 2531)):
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007fc8aefc07b4 in _L_lock_505 () from /lib64/libc-2.13.so
#2  0x00007fc8aefc01aa in __vsyslog_chk (pri=<value optimized out>, flag=1, fmt=0x7fc8b1430d60 "connection_get(%d): got connid=%lu\n", ap=0x7fc8517fb8e0) at ../misc/syslog.c:260
#3  0x00007fc8aefc05a3 in __syslog_chk (pri=<value optimized out>, flag=<value optimized out>, fmt=<value optimized out>) at ../misc/syslog.c:131
#4  0x00007fc8b12f7e05 in syslog (s=37) at /usr/include/bits/syslog.h:32
#5  connection_get (s=37) at ../../../servers/slapd/connection.c:286
#6  0x00007fc8b12fa685 in connection_read (ctx=0x7fc8517fbb80, argv=0x25) at ../../../servers/slapd/connection.c:1292
#7  connection_read_thread (ctx=0x7fc8517fbb80, argv=0x25) at ../../../servers/slapd/connection.c:1238
#8  0x00007fc8b13fa218 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:685
#9  0x00007fc8af286ccb in start_thread (arg=0x7fc8517fc700) at pthread_create.c:301
#10 0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 17 (Thread 0x7fc8427fd700 (LWP 2551)):
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007fc8aefc07b4 in _L_lock_505 () from /lib64/libc-2.13.so
#2  0x00007fc8aefc01aa in __vsyslog_chk (pri=<value optimized out>, flag=1, fmt=0x7fc8b142fe07 ">>> slap_listener(%s)\n", ap=0x7fc8427fb4e0) at ../misc/syslog.c:260
#3  0x00007fc8aefc05a3 in __syslog_chk (pri=<value optimized out>, flag=<value optimized out>, fmt=<value optimized out>) at ../misc/syslog.c:131
#4  0x00007fc8b12f3cae in syslog (sl=0x7fc8b305b240) at /usr/include/bits/syslog.h:32
#5  slap_listener (sl=0x7fc8b305b240) at ../../../servers/slapd/daemon.c:1781
#6  0x00007fc8b12f44e9 in slap_listener_thread (ctx=<value optimized out>, ptr=0x7fc8b305b240) at ../../../servers/slapd/daemon.c:2040
#7  0x00007fc8b13fa218 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:685
#8  0x00007fc8af286ccb in start_thread (arg=0x7fc8427fd700) at pthread_create.c:301
#9  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 16 (Thread 0x7fc897bc7700 (LWP 24122)):
#0  0x00007fc8aefc5193 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fc8b12f62b3 in slapd_daemon_task (ptr=<value optimized out>) at ../../../servers/slapd/daemon.c:2467
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc897bc7700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 15 (Thread 0x7fc87dffd700 (LWP 30558)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc87dffd700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 14 (Thread 0x7fc896bc5700 (LWP 24124)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc896bc5700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 13 (Thread 0x7fc8717fb700 (LWP 2378)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc8717fb700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 12 (Thread 0x7fc870ffa700 (LWP 2384)):
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007fc8aefc07b4 in _L_lock_505 () from /lib64/libc-2.13.so
#2  0x00007fc8aefc01aa in __vsyslog_chk (pri=<value optimized out>, flag=1, fmt=0x7fc8b14339a0 "send_ldap_extended: err=%d oid=%s len=%ld\n", ap=0x7fc870ff9770) at ../misc/syslog.c:260
#3  0x00007fc8aefc05a3 in __syslog_chk (pri=<value optimized out>, flag=<value optimized out>, fmt=<value optimized out>) at ../misc/syslog.c:131
#4  0x00007fc8b130b9ba in syslog (op=0x7fc8580181d0, rs=0x7fc870ff9930) at /usr/include/bits/syslog.h:32
#5  slap_send_ldap_extended (op=0x7fc8580181d0, rs=0x7fc870ff9930) at ../../../servers/slapd/result.c:793
#6  0x00007fc8b132c809 in fe_extended (op=0x7fc8580181d0, rs=0x7fc870ff9930) at ../../../servers/slapd/extended.c:240
#7  0x00007fc8b132c572 in do_extended (op=0x7fc8580181d0, rs=0x7fc870ff9930) at ../../../servers/slapd/extended.c:180
#8  0x00007fc8b12fa48f in connection_operation (ctx=0x7fc870ff9b80, arg_v=0x7fc8580181d0) at ../../../servers/slapd/connection.c:1109
#9  0x00007fc8b12fadff in connection_read_thread (ctx=0x7fc870ff9b80, argv=<value optimized out>) at ../../../servers/slapd/connection.c:1245
#10 0x00007fc8b13fa218 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:685
#11 0x00007fc8af286ccb in start_thread (arg=0x7fc870ffa700) at pthread_create.c:301
#12 0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 11 (Thread 0x7fc842ffe700 (LWP 2550)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc842ffe700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 10 (Thread 0x7fc8b129d720 (LWP 24118)):
#0  0x00007fc8af287de5 in pthread_join (threadid=140499515897600, thread_return=0x0) at pthread_join.c:89
#1  0x00007fc8b12f79fc in slapd_daemon () at ../../../servers/slapd/daemon.c:2842
#2  0x00007fc8b12e0251 in main (argc=6, argv=<value optimized out>) at ../../../servers/slapd/main.c:961

Thread 9 (Thread 0x7fc850ffb700 (LWP 2549)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc850ffb700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 8 (Thread 0x7fc851ffd700 (LWP 2530)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc851ffd700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 7 (Thread 0x7fc83dff8700 (LWP 3777)):
#0  0x00007fc8af28df1d in write () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fc8b142892c in sb_debug_write (sbiod=0x7fc854012f40, buf=0x7fc84c255fe0, len=59) at ../../../libraries/liblber/sockbuf.c:852
#2  0x00007fc8b141f1ed in tlsm_PR_Send (fd=<value optimized out>, buf=<value optimized out>, len=59, flags=<value optimized out>, timeout=<value optimized out>) at tls_m.c:2819
#3  0x00007fc8b02807fb in ssl_DefSend (ss=0x7fc8540008c0, buf=0x7fc84c255fe0 "\024\003\001", len=59, flags=0) at ssldef.c:127
#4  0x00007fc8b02845fe in ssl_SendSavedWriteData (ss=0x7fc8540008c0) at sslsecur.c:508
#5  0x00007fc8b027130a in ssl3_SendRecord (ss=0x7fc8540008c0, type=content_handshake, pIn=<value optimized out>, nIn=<value optimized out>, flags=0) at ssl3con.c:2268
#6  0x00007fc8b027141e in ssl3_FlushHandshake (ss=0x7fc8540008c0, flags=<value optimized out>) at ssl3con.c:2418
#7  0x00007fc8b02748e2 in ssl3_SendFinished (ss=0x7fc8540008c0, flags=0) at ssl3con.c:8226
#8  0x00007fc8b0276ef2 in ssl3_HandleFinished (ss=0x7fc8540008c0, b=<value optimized out>, length=<value optimized out>, hashes=<value optimized out>) at ssl3con.c:8422
#9  0x00007fc8b02772f1 in ssl3_HandleHandshakeMessage (ss=0x7fc8540008c0, 
    b=0x7fc854131504 "\304d\316\177.'\362\240\066\275\330Y\327\b\217\252w\022\261闅\022\321\064Y\360\237\326 \344\002", '\v' <repeats 12 times>"\355, xI\357\071\237\377\303\322\324N\240\035\t\217#\225c\314\301\260\357\362>쳓\264\243\021da\250\324Bg\335\311\375\364c\036\306?K\305\064\246o\335zi\326\330\335;\027\247Q\255\336\270\002\360\230i<ug9\302M\257\204\030\325\377\257\033\371R¨ɓ", <incomplete sequence \346>, length=12) at ssl3con.c:8657
#10 0x00007fc8b027aa2c in ssl3_HandleHandshake (ss=0x7fc8540008c0, cText=<value optimized out>, databuf=0x7fc854000c28) at ssl3con.c:8725
#11 ssl3_HandleRecord (ss=0x7fc8540008c0, cText=<value optimized out>, databuf=0x7fc854000c28) at ssl3con.c:9064
#12 0x00007fc8b027bae6 in ssl3_GatherCompleteHandshake (ss=0x7fc8540008c0, flags=0) at ssl3gthr.c:209
#13 0x00007fc8b02843af in SSL_ForceHandshake (fd=<value optimized out>) at sslsecur.c:397
#14 0x00007fc8b141f025 in tlsm_session_accept_or_connect (session=<value optimized out>, is_accept=1) at tls_m.c:2434
#15 0x00007fc8b141d027 in ldap_pvt_tls_accept (sb=0x7fc88819b570, ctx_arg=0x7fc8b30f0d00) at tls2.c:429
#16 0x00007fc8b12fad63 in connection_read (ctx=0x7fc83dff7b80, argv=0x24) at ../../../servers/slapd/connection.c:1326
#17 connection_read_thread (ctx=0x7fc83dff7b80, argv=0x24) at ../../../servers/slapd/connection.c:1238
#18 0x00007fc8b13fa218 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:685
#19 0x00007fc8af286ccb in start_thread (arg=0x7fc83dff8700) at pthread_create.c:301
#20 0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 6 (Thread 0x7fc8973c6700 (LWP 24123)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc8973c6700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7fc85cffc700 (LWP 2491)):
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007fc8aefc07b4 in _L_lock_505 () from /lib64/libc-2.13.so
#2  0x00007fc8aefc01aa in __vsyslog_chk (pri=<value optimized out>, flag=1, fmt=0x7fc8b1430d60 "connection_get(%d): got connid=%lu\n", ap=0x7fc85cffb8e0) at ../misc/syslog.c:260
#3  0x00007fc8aefc05a3 in __syslog_chk (pri=<value optimized out>, flag=<value optimized out>, fmt=<value optimized out>) at ../misc/syslog.c:131
#4  0x00007fc8b12f7e05 in syslog (s=30) at /usr/include/bits/syslog.h:32
#5  connection_get (s=30) at ../../../servers/slapd/connection.c:286
#6  0x00007fc8b12fa685 in connection_read (ctx=0x7fc85cffbb80, argv=0x1e) at ../../../servers/slapd/connection.c:1292
#7  connection_read_thread (ctx=0x7fc85cffbb80, argv=0x1e) at ../../../servers/slapd/connection.c:1238
#8  0x00007fc8b13fa218 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:685
#9  0x00007fc8af286ccb in start_thread (arg=0x7fc85cffc700) at pthread_create.c:301
#10 0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7fc8737fe700 (LWP 2369)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc8737fe700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7fc873fff700 (LWP 2368)):
#0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007fc8aefc07b4 in _L_lock_505 () from /lib64/libc-2.13.so
#2  0x00007fc8aefc01aa in __vsyslog_chk (pri=<value optimized out>, flag=1, fmt=0x7fc8b1430d60 "connection_get(%d): got connid=%lu\n", ap=0x7fc873ffe8e0) at ../misc/syslog.c:260
#3  0x00007fc8aefc05a3 in __syslog_chk (pri=<value optimized out>, flag=<value optimized out>, fmt=<value optimized out>) at ../misc/syslog.c:131
#4  0x00007fc8b12f7e05 in syslog (s=35) at /usr/include/bits/syslog.h:32
#5  connection_get (s=35) at ../../../servers/slapd/connection.c:286
#6  0x00007fc8b12fa685 in connection_read (ctx=0x7fc873ffeb80, argv=0x23) at ../../../servers/slapd/connection.c:1292
#7  connection_read_thread (ctx=0x7fc873ffeb80, argv=0x23) at ../../../servers/slapd/connection.c:1238
#8  0x00007fc8b13fa218 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:685
#9  0x00007fc8af286ccb in start_thread (arg=0x7fc873fff700) at pthread_create.c:301
#10 0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7fc884ffb700 (LWP 28529)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fc8b13fa277 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:672
#2  0x00007fc8af286ccb in start_thread (arg=0x7fc884ffb700) at pthread_create.c:301
#3  0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7fc8857fc700 (LWP 27301)):
#0  0x00007fc8aef170c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fc8aef18a76 in abort () at abort.c:92
#2  0x00007fc8aef53cfb in __libc_message (do_abort=2, fmt=0x7fc8af042a78 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x00007fc8aef5ad63 in malloc_printerr (av=<value optimized out>, p=0x7fc868012030, have_lock=0) at malloc.c:6283
#4  _int_free (av=<value optimized out>, p=0x7fc868012030, have_lock=0) at malloc.c:4795
#5  0x00007fc895cd64a2 in nss_ZFreeIf (pointer=0x7fc868012050) at arena.c:975
#6  0x00007fc895cc7574 in pem_getPrivateKey (arena=0x7fc87814c0d0, rawkey=<value optimized out>, pError=0x7fc8857fb0d8, modulus=0x7fc88817f838) at prsa.c:202
#7  0x00007fc895cc76cd in pem_mdCryptoOperationRSAPriv_Create (proto=0x7fc895eef140, mdMechanism=0x7fc895eef200, pError=0x7fc8857fb0d8, mdKey=<value optimized out>) at prsa.c:361
#8  0x00007fc895cd472f in nssCKFWMechanism_DecryptInit (fwMechanism=0x7fc87814c0a0, pMechanism=0x7fc8857fb3f0, fwSession=0x7fc878134e70, fwObject=0x7fc888156308) at mechanism.c:538
#9  0x00007fc895cd15d1 in NSSCKFWC_DecryptInit (fwInstance=<value optimized out>, hSession=<value optimized out>, pMechanism=0x7fc8857fb3f0, hKey=<value optimized out>) at wrap.c:3105
#10 0x00007fc8afd4e3a1 in pk11_HandUnwrap (slot=0x7fc888154d30, wrappingKey=12, mech=0x7fc8857fb3f0, target=881, keyTemplate=0x7fc8857fb250, templateCount=3, key_size=0, wincx=0x0, crvp=0x7fc8857fb408, isPerm=0, inKey=<value optimized out>) at pk11skey.c:1960
#11 0x00007fc8afd4e997 in pk11_AnyUnwrapKey (slot=0x7fc888154d30, wrappingKey=12, wrapType=1, param=<value optimized out>, wrappedKey=0x7fc8857fb5e0, target=881, operation=268, keySize=0, wincx=0x0, userAttr=<value optimized out>, numAttrs=0, isPerm=0)
    at pk11skey.c:2104
#12 0x00007fc8afd519fc in PK11_PubUnwrapSymKey (wrappingKey=0x7fc8881c30a0, wrappedKey=0x7fc8857fb5e0, target=881, operation=268, keySize=0) at pk11skey.c:2232
#13 0x00007fc8b027805c in ssl3_HandleRSAClientKeyExchange (ss=0x7fc88c258b90, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7358
#14 ssl3_HandleClientKeyExchange (ss=0x7fc88c258b90, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:7474
#15 ssl3_HandleHandshakeMessage (ss=0x7fc88c258b90, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:8646
#16 0x00007fc8b027aa2c in ssl3_HandleHandshake (ss=0x7fc88c258b90, cText=<value optimized out>, databuf=0x7fc88c258ef8) at ssl3con.c:8725
#17 ssl3_HandleRecord (ss=0x7fc88c258b90, cText=<value optimized out>, databuf=0x7fc88c258ef8) at ssl3con.c:9064
#18 0x00007fc8b027bae6 in ssl3_GatherCompleteHandshake (ss=0x7fc88c258b90, flags=0) at ssl3gthr.c:209
#19 0x00007fc8b02843af in SSL_ForceHandshake (fd=<value optimized out>) at sslsecur.c:397
#20 0x00007fc8b141f025 in tlsm_session_accept_or_connect (session=<value optimized out>, is_accept=1) at tls_m.c:2434
#21 0x00007fc8b141d027 in ldap_pvt_tls_accept (sb=0x7fc874125640, ctx_arg=0x7fc8b30f0d00) at tls2.c:429
#22 0x00007fc8b12fad63 in connection_read (ctx=0x7fc8857fbb80, argv=0x21) at ../../../servers/slapd/connection.c:1326
#23 connection_read_thread (ctx=0x7fc8857fbb80, argv=0x21) at ../../../servers/slapd/connection.c:1238
#24 0x00007fc8b13fa218 in ldap_int_thread_pool_wrapper (xpool=0x7fc8b308b760) at ../../../libraries/libldap_r/tpool.c:685
#25 0x00007fc8af286ccb in start_thread (arg=0x7fc8857fc700) at pthread_create.c:301
#26 0x00007fc8aefc4c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 46 Rich Megginson 2011-09-06 17:31:43 UTC
Yes!  Was able to reproduce the exact same crash
1) using openldap master branch (openldap requires patches to be against master branch)
2) using syncrepl with TLS/SSL
3) while syncrepl is occurring, do
ii=0 ; while [ $ii -lt 100 ] ; do LDAPTLS_CACERT=~/save/CA/cacert.pem ldapsearch -xLLL -ZZ -H "ldap://localhost.localdomain:2389" -D "cn=manager,dc=example,dc=com" -w secret -b dc=example,dc=com uid=scarter > /dev/null 2>&1 & ii=`expr $ii + 1` ; done

against the syncrepl consumer

Thanks for the stack traces, Ruben - I don't think we will need any more - they have been most helpful

Stay tuned

Comment 47 Rich Megginson 2011-09-06 17:41:15 UTC
Created attachment 521722 [details]
stack trace from crash

Comment 48 Rich Megginson 2011-09-06 17:43:25 UTC
Created attachment 521723 [details]
stack trace of just the two problematic threads

Comment 49 Rich Megginson 2011-09-06 19:06:05 UTC
Created attachment 521742 [details]
0001-use-mutex-for-connection-handshake-when-using-PEM-ns.patch

This patch is against the ol24 branch rebased on top of the earlier NSS_Init thread safety patch.

Is anyone in a position to test this with a build from openldap 2.4.x from source?

Comment 50 Rich Megginson 2011-09-06 19:12:25 UTC
Upstream ITS and patch submission
http://www.openldap.org/its/index.cgi?findid=7034

Comment 51 Ruben Kerkhof 2011-09-06 20:28:54 UTC
That's great to hear Rich!

If I can get a Koji build I can test your patch. Jan, would you be willing to do a scratch build for F-14?

Comment 52 Rich Megginson 2011-09-06 20:52:28 UTC
Created attachment 521759 [details]
0001-use-mutex-for-connection-handshake-when-using-PEM-ns.patch

new patch (also updated in openldap ITS)

Comment 53 Jan Vcelak 2011-09-12 12:07:23 UTC
(In reply to comment #50)
> Upstream ITS and patch submission
> http://www.openldap.org/its/index.cgi?findid=7034

Great work, Rich!

(In reply to comment #51)
> Jan, would you be willing to do a scratch build for F-14?

Definitely. I will post the links after submitting the build.

Comment 54 Jan Vcelak 2011-09-12 15:34:43 UTC
I am able to verify the fix only partially.

I still do not manage to reproduce the crash caused by SSL_ForceHandshake(). However I'm able to reproduce the crash caused by SSL_ConfigServerSessionIDCache().

Updated builds will follow soon.

Comment 55 Jan Vcelak 2011-09-12 15:38:29 UTC
*** Bug 720896 has been marked as a duplicate of this bug. ***

Comment 56 Rich Megginson 2011-09-12 15:47:02 UTC
(In reply to comment #54)
> I am able to verify the fix only partially.
> 
> I still do not manage to reproduce the crash caused by SSL_ForceHandshake().

It is difficult to do.  I've used something like this:
1) set up two openldap servers as SSL servers and set up syncrepl between them
also using SSL
2) have a client add a lot of entries to the syncrepl provider and confirm that
replication is working to the consumer
3) in another window, do something like this:

ii=0 ; while [ $ii -lt 1000 ] ; do LDAPTLS_CACERT=/path/to/cacert.pem
ldapsearch -xLLL -ZZ -h fqdn.of.consumer -D "cn=manager,cn=example,dc=com" -w
secret -b dc=example,dc=com > /dev/null 2>&1 & ii=`expr $ii + 1` ; done

> However I'm able to reproduce the crash caused by
> SSL_ConfigServerSessionIDCache().
> 
> Updated builds will follow soon.

Comment 57 Jan Vcelak 2011-09-12 18:51:09 UTC
Resolved in:

openldap-2.4.24-5.fc15
openldap-2.4.26-3.fc16
openldap-2.4.26-3.fc17

Scratch build with all NSS patches from F17 addressing memory leaks or thread safety is here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3345870

Please verify, that the issue is gone. I can create an update for F14 as well.

Comment 58 Jan Vcelak 2011-09-12 18:51:45 UTC
(In reply to comment #57)
> Scratch build ...
... for F14

Comment 59 Fedora Update System 2011-09-12 18:54:46 UTC
openldap-2.4.26-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/openldap-2.4.26-3.fc16

Comment 60 Fedora Update System 2011-09-12 19:01:26 UTC
openldap-2.4.24-5.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/openldap-2.4.24-5.fc15

Comment 61 Fedora Update System 2011-09-13 05:43:11 UTC
Package openldap-2.4.24-5.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing openldap-2.4.24-5.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/openldap-2.4.24-5.fc15
then log in and leave karma (feedback).

Comment 62 Ruben Kerkhof 2011-09-13 15:53:57 UTC
I've updated my servers to openldap-2.4.26-3.fc17, rebuild for F-14.

I'll report back in a few days.

Comment 63 Ruben Kerkhof 2011-09-22 11:31:45 UTC
No crashes for the last 8 days, so it looks likes this is fixed.
Thank you all for your help!

Comment 64 jonnya99 2011-10-13 10:40:26 UTC
(In reply to comment #57)
> Resolved in:
> openldap-2.4.24-5.fc15
> openldap-2.4.26-3.fc16
> openldap-2.4.26-3.fc17
> Scratch build with all NSS patches from F17 addressing memory leaks or thread
> safety is here:
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3345870
> Please verify, that the issue is gone. I can create an update for F14 as well.

Hi Jan

Would it be possible for you to create an update for F14 please?
If not, could you point me to the location of the source RPM on koji please?

Comment 65 Jan Vcelak 2011-10-13 11:15:38 UTC
(In reply to comment #64)
> Would it be possible for you to create an update for F14 please?
> If not, could you point me to the location of the source RPM on koji please?

I do not plan to update the package in F14 now.

You can rebuild F15 package for F14, it should work:
http://kojipkgs.fedoraproject.org/packages/openldap/2.4.24/5.fc15/src/openldap-2.4.24-5.fc15.src.rpm

Comment 66 Fedora Update System 2011-11-05 01:24:38 UTC
openldap-2.4.26-5.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 67 Fedora Update System 2011-11-17 23:41:23 UTC
openldap-2.4.24-5.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.