Bug 783431

Summary: replication (syncrepl) with TLS causes segfault
Product: [Fedora] Fedora Reporter: Jan Vcelak <jvcelak>
Component: openldapAssignee: Jan Vcelak <jvcelak>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: jsynacek, jvcelak, rmeggins, tsmetana
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openldap-2.4.26-6.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 783445 (view as bug list) Environment:
Last Closed: 2012-02-17 00:58:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 783445    
Attachments:
Description Flags
configs and scripts to reproduce
none
proposed patch rmeggins: review+

Description Jan Vcelak 2012-01-20 11:51:48 UTC
Description of problem:

In my test configuration, master-master replication with TLS enabled (ldaps:// or ldap:// with starttls) causes the segfault of one of the servers.

#0  __strrchr_sse2 () at ../sysdeps/x86_64/strrchr.S:33
#1  0x000055555574874f in tlsm_get_certdb_prefix (certdir=0x34 <Address 0x34 out of bounds>, realcertdir=0x7fffeedd0c28, 
    prefix=0x7fffeedd0c20) at tls_m.c:1521
#2  0x0000555555748960 in tlsm_deferred_init (arg=0x7fffe8108eb0) at tls_m.c:1608
#3  0x00005555557497e8 in tlsm_deferred_ctx_init (arg=0x7fffe8108eb0) at tls_m.c:2068
#4  0x00007ffff63f2ed5 in PR_CallOnceWithArg (once=0x7fffe8108ee8, func=<optimized out>, arg=<optimized out>)
    at ../../../mozilla/nsprpub/pr/src/misc/prinit.c:836
#5  0x000055555574a4b1 in tlsm_session_new (ctx=0x7fffe8108eb0, is_server=0) at tls_m.c:2432
#6  0x0000555555743842 in alloc_handle (ctx_arg=0x7fffe8108eb0, is_server=0) at tls2.c:288
#7  0x0000555555743972 in ldap_int_tls_connect (ld=0x7fffe0100910, conn=0x7fffe0108ec0) at tls2.c:333
#8  0x0000555555744bd5 in ldap_int_tls_start (ld=0x7fffe0100910, conn=0x7fffe0108ec0, srv=0x7fffe0108de0) at tls2.c:834
#9  0x0000555555717920 in ldap_int_open_connection (ld=0x7fffe0100910, conn=0x7fffe0108ec0, srv=0x7fffe0108de0, async=0) at open.c:437
#10 0x000055555572ecb6 in ldap_new_connection (ld=0x7fffe0100910, srvlist=0x7fffe0100a30, use_ldsb=1, connect=1, bind=0x0, m_req=0, 
    m_res=0) at request.c:480
#11 0x0000555555716b43 in ldap_open_defconn (ld=0x7fffe0100910) at open.c:41
#12 0x000055555571ec83 in ldap_int_sasl_bind (ld=0x7fffe0100910, dn=0x555555b406a0 "cn=manager,dc=redhat,dc=bug", 
    mechs=0x555555b40080 "EXTERNAL", sctrls=0x0, cctrls=0x0, flags=2, interact=0x555555709a58 <lutil_sasl_interact>, 
    defaults=0x7fffe0108e80, result=0x0, rmech=0x7fffeedd13a0, msgid=0x7fffeedd1394) at cyrus.c:425
#13 0x0000555555721db7 in ldap_sasl_interactive_bind (ld=0x7fffe0100910, dn=0x555555b406a0 "cn=manager,dc=redhat,dc=bug", 
    mechs=0x555555b40080 "EXTERNAL", serverControls=0x0, clientControls=0x0, flags=2, interact=0x555555709a58 <lutil_sasl_interact>, 
    defaults=0x7fffe0108e80, result=0x0, rmech=0x7fffeedd13a0, msgid=0x7fffeedd1394) at sasl.c:474
#14 0x0000555555721e6a in ldap_sasl_interactive_bind_s (ld=0x7fffe0100910, dn=0x555555b406a0 "cn=manager,dc=redhat,dc=bug", 
    mechs=0x555555b40080 "EXTERNAL", serverControls=0x0, clientControls=0x0, flags=2, interact=0x555555709a58 <lutil_sasl_interact>, 
    defaults=0x7fffe0108e80) at sasl.c:511
#15 0x000055555559c5d6 in slap_client_connect (ldp=0x555555b40570, sb=0x555555b40350) at ../../../servers/slapd/config.c:2041
#16 0x0000555555628d19 in do_syncrep1 (op=0x7fffeedd14c0, si=0x555555b40320) at ../../../servers/slapd/syncrepl.c:611
#17 0x000055555562c5ab in do_syncrepl (ctx=0x7fffeedd1ba0, arg=0x555555b408a0) at ../../../servers/slapd/syncrepl.c:1510
#18 0x000055555571546d in ldap_int_thread_pool_wrapper (xpool=0x555555ae6b70) at ../../../libraries/libldap_r/tpool.c:685
#19 0x00007ffff73e7bd0 in start_thread (arg=0x7fffeedd2700) at pthread_create.c:309
#20 0x00007ffff5cfea0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Version-Release number of selected component (if applicable):
possibly all openldap in Fedora and RHEL since switch to NSS

openldap-2.4.28-1.fc17.x86_64
openldap-2.4.23-20.el6.x86_64

Steps to Reproduce:

Reproducer with instructions will follow

Comment 1 Jan Vcelak 2012-01-20 12:05:50 UTC
Created attachment 556509 [details]
configs and scripts to reproduce

Steps to reproduce:

(work as a root)

1. decompress in /root (you should get /root/bz783431 dir)
2. append content of /root/bz783431/hosts.add to your /etc/hosts
3. run first server by running "make run" in /root/bz783431/server1
4. run second server by running "make run" in /root/bz783431/server2

the first server will crash in a few moments

Comment 2 Jan Vcelak 2012-01-20 12:39:52 UTC
When the first server is started, it wants to create a new connection to the second server for replication. This is done in "do_syncrep1" by calling "slapd_client_connect". sb_tls_do_init in slapd_bindconf sb* is unset and new TLS context is created. This context is also stored in sb* and used for the next connections.

Deferred initialization is used with MozNSS backend. When the second server comes up, the real initialization takes place. At this point, the TLS parameters are taken from the TLS context structure (*tc_config member of struct tlsm_ctx). It seems, that these information are no longer valid and an uninitialized memory is touched, which causes segfault.

Comment 3 Jan Vcelak 2012-01-20 12:59:27 UTC
The TLS parameters are really not available. During tlsm_ctx_new, the pointer to lo->ldo-tls_info is taken:

(gdb) p lo->ldo_tls_info 
$7 = {lt_certfile = 0x7fffe4100d00 "replicator", lt_keyfile = 0x0, lt_dhfile = 0x0, lt_cacertfile = 0x0, 
  lt_cacertdir = 0x7fffe4100ce0 "/root/bz783431/certdb", lt_ciphersuite = 0x0, lt_crlfile = 0x0, lt_randfile = 0x0, 
  lt_protocol_min = 0}
(gdb) p &lo->ldo_tls_info 
$8 = (struct ldaptls *) 0x7fffe41009d8

And when the first connection fails, the structure is freed in ldap_int_tls_destroy.


This can be solved by copying the TLS initialization data into the TLS context structure temporarily. And the data can be freed when the deferred initialization is finished.

I will write a patch.

Comment 4 Jan Vcelak 2012-01-25 09:40:09 UTC
Created attachment 557415 [details]
proposed patch

Fix: Make a copy of TLS configuration in tlsm_ctx_init and free it later when tlsm_deferred_ctx_init finishes. When the initialization never takes place, the copy is freed in tlsm_ctx_free.

(Patch applies cleanly on 2.4.28 release tarball -- Fedora Rawhide.)

Comment 5 Jan Vcelak 2012-01-25 15:58:23 UTC
Thank you for the review, Rich.

Patch submitted upstream:
http://www.openldap.org/its/index.cgi?findid=7136

Comment 6 Jan Vcelak 2012-01-31 17:48:36 UTC
Fixed in:
openldap-2.4.26-6.fc16
openldap-2.4.28-3.fc17

Comment 7 Fedora Update System 2012-01-31 17:51:14 UTC
openldap-2.4.26-6.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/openldap-2.4.26-6.fc16

Comment 8 Fedora Update System 2012-02-01 19:26:57 UTC
Package openldap-2.4.26-6.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing openldap-2.4.26-6.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-1135/openldap-2.4.26-6.fc16
then log in and leave karma (feedback).

Comment 9 Fedora Update System 2012-02-17 00:58:01 UTC
openldap-2.4.26-6.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.