Bug 783431 - replication (syncrepl) with TLS causes segfault
replication (syncrepl) with TLS causes segfault
Product: Fedora
Classification: Fedora
Component: openldap (Show other bugs)
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Jan Vcelak
Fedora Extras Quality Assurance
Depends On:
Blocks: 783445
  Show dependency treegraph
Reported: 2012-01-20 06:51 EST by Jan Vcelak
Modified: 2013-03-03 20:29 EST (History)
4 users (show)

See Also:
Fixed In Version: openldap-2.4.26-6.fc16
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 783445 (view as bug list)
Last Closed: 2012-02-16 19:58:01 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
configs and scripts to reproduce (7.34 KB, application/x-compressed-tar)
2012-01-20 07:05 EST, Jan Vcelak
no flags Details
proposed patch (3.29 KB, patch)
2012-01-25 04:40 EST, Jan Vcelak
rmeggins: review+
Details | Diff

  None (edit)
Description Jan Vcelak 2012-01-20 06:51:48 EST
Description of problem:

In my test configuration, master-master replication with TLS enabled (ldaps:// or ldap:// with starttls) causes the segfault of one of the servers.

#0  __strrchr_sse2 () at ../sysdeps/x86_64/strrchr.S:33
#1  0x000055555574874f in tlsm_get_certdb_prefix (certdir=0x34 <Address 0x34 out of bounds>, realcertdir=0x7fffeedd0c28, 
    prefix=0x7fffeedd0c20) at tls_m.c:1521
#2  0x0000555555748960 in tlsm_deferred_init (arg=0x7fffe8108eb0) at tls_m.c:1608
#3  0x00005555557497e8 in tlsm_deferred_ctx_init (arg=0x7fffe8108eb0) at tls_m.c:2068
#4  0x00007ffff63f2ed5 in PR_CallOnceWithArg (once=0x7fffe8108ee8, func=<optimized out>, arg=<optimized out>)
    at ../../../mozilla/nsprpub/pr/src/misc/prinit.c:836
#5  0x000055555574a4b1 in tlsm_session_new (ctx=0x7fffe8108eb0, is_server=0) at tls_m.c:2432
#6  0x0000555555743842 in alloc_handle (ctx_arg=0x7fffe8108eb0, is_server=0) at tls2.c:288
#7  0x0000555555743972 in ldap_int_tls_connect (ld=0x7fffe0100910, conn=0x7fffe0108ec0) at tls2.c:333
#8  0x0000555555744bd5 in ldap_int_tls_start (ld=0x7fffe0100910, conn=0x7fffe0108ec0, srv=0x7fffe0108de0) at tls2.c:834
#9  0x0000555555717920 in ldap_int_open_connection (ld=0x7fffe0100910, conn=0x7fffe0108ec0, srv=0x7fffe0108de0, async=0) at open.c:437
#10 0x000055555572ecb6 in ldap_new_connection (ld=0x7fffe0100910, srvlist=0x7fffe0100a30, use_ldsb=1, connect=1, bind=0x0, m_req=0, 
    m_res=0) at request.c:480
#11 0x0000555555716b43 in ldap_open_defconn (ld=0x7fffe0100910) at open.c:41
#12 0x000055555571ec83 in ldap_int_sasl_bind (ld=0x7fffe0100910, dn=0x555555b406a0 "cn=manager,dc=redhat,dc=bug", 
    mechs=0x555555b40080 "EXTERNAL", sctrls=0x0, cctrls=0x0, flags=2, interact=0x555555709a58 <lutil_sasl_interact>, 
    defaults=0x7fffe0108e80, result=0x0, rmech=0x7fffeedd13a0, msgid=0x7fffeedd1394) at cyrus.c:425
#13 0x0000555555721db7 in ldap_sasl_interactive_bind (ld=0x7fffe0100910, dn=0x555555b406a0 "cn=manager,dc=redhat,dc=bug", 
    mechs=0x555555b40080 "EXTERNAL", serverControls=0x0, clientControls=0x0, flags=2, interact=0x555555709a58 <lutil_sasl_interact>, 
    defaults=0x7fffe0108e80, result=0x0, rmech=0x7fffeedd13a0, msgid=0x7fffeedd1394) at sasl.c:474
#14 0x0000555555721e6a in ldap_sasl_interactive_bind_s (ld=0x7fffe0100910, dn=0x555555b406a0 "cn=manager,dc=redhat,dc=bug", 
    mechs=0x555555b40080 "EXTERNAL", serverControls=0x0, clientControls=0x0, flags=2, interact=0x555555709a58 <lutil_sasl_interact>, 
    defaults=0x7fffe0108e80) at sasl.c:511
#15 0x000055555559c5d6 in slap_client_connect (ldp=0x555555b40570, sb=0x555555b40350) at ../../../servers/slapd/config.c:2041
#16 0x0000555555628d19 in do_syncrep1 (op=0x7fffeedd14c0, si=0x555555b40320) at ../../../servers/slapd/syncrepl.c:611
#17 0x000055555562c5ab in do_syncrepl (ctx=0x7fffeedd1ba0, arg=0x555555b408a0) at ../../../servers/slapd/syncrepl.c:1510
#18 0x000055555571546d in ldap_int_thread_pool_wrapper (xpool=0x555555ae6b70) at ../../../libraries/libldap_r/tpool.c:685
#19 0x00007ffff73e7bd0 in start_thread (arg=0x7fffeedd2700) at pthread_create.c:309
#20 0x00007ffff5cfea0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Version-Release number of selected component (if applicable):
possibly all openldap in Fedora and RHEL since switch to NSS


Steps to Reproduce:

Reproducer with instructions will follow
Comment 1 Jan Vcelak 2012-01-20 07:05:50 EST
Created attachment 556509 [details]
configs and scripts to reproduce

Steps to reproduce:

(work as a root)

1. decompress in /root (you should get /root/bz783431 dir)
2. append content of /root/bz783431/hosts.add to your /etc/hosts
3. run first server by running "make run" in /root/bz783431/server1
4. run second server by running "make run" in /root/bz783431/server2

the first server will crash in a few moments
Comment 2 Jan Vcelak 2012-01-20 07:39:52 EST
When the first server is started, it wants to create a new connection to the second server for replication. This is done in "do_syncrep1" by calling "slapd_client_connect". sb_tls_do_init in slapd_bindconf sb* is unset and new TLS context is created. This context is also stored in sb* and used for the next connections.

Deferred initialization is used with MozNSS backend. When the second server comes up, the real initialization takes place. At this point, the TLS parameters are taken from the TLS context structure (*tc_config member of struct tlsm_ctx). It seems, that these information are no longer valid and an uninitialized memory is touched, which causes segfault.
Comment 3 Jan Vcelak 2012-01-20 07:59:27 EST
The TLS parameters are really not available. During tlsm_ctx_new, the pointer to lo->ldo-tls_info is taken:

(gdb) p lo->ldo_tls_info 
$7 = {lt_certfile = 0x7fffe4100d00 "replicator", lt_keyfile = 0x0, lt_dhfile = 0x0, lt_cacertfile = 0x0, 
  lt_cacertdir = 0x7fffe4100ce0 "/root/bz783431/certdb", lt_ciphersuite = 0x0, lt_crlfile = 0x0, lt_randfile = 0x0, 
  lt_protocol_min = 0}
(gdb) p &lo->ldo_tls_info 
$8 = (struct ldaptls *) 0x7fffe41009d8

And when the first connection fails, the structure is freed in ldap_int_tls_destroy.

This can be solved by copying the TLS initialization data into the TLS context structure temporarily. And the data can be freed when the deferred initialization is finished.

I will write a patch.
Comment 4 Jan Vcelak 2012-01-25 04:40:09 EST
Created attachment 557415 [details]
proposed patch

Fix: Make a copy of TLS configuration in tlsm_ctx_init and free it later when tlsm_deferred_ctx_init finishes. When the initialization never takes place, the copy is freed in tlsm_ctx_free.

(Patch applies cleanly on 2.4.28 release tarball -- Fedora Rawhide.)
Comment 5 Jan Vcelak 2012-01-25 10:58:23 EST
Thank you for the review, Rich.

Patch submitted upstream:
Comment 6 Jan Vcelak 2012-01-31 12:48:36 EST
Fixed in:
Comment 7 Fedora Update System 2012-01-31 12:51:14 EST
openldap-2.4.26-6.fc16 has been submitted as an update for Fedora 16.
Comment 8 Fedora Update System 2012-02-01 14:26:57 EST
Package openldap-2.4.26-6.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing openldap-2.4.26-6.fc16'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
Comment 9 Fedora Update System 2012-02-16 19:58:01 EST
openldap-2.4.26-6.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.