Bug 1519167 - Third ldap_start_tls_s() call with OPT_X_TLS_CACERTFILE fails in NSS layer
Summary: Third ldap_start_tls_s() call with OPT_X_TLS_CACERTFILE fails in NSS layer
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: openldap
Version: 27
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Matus Honek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1520990
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-30 10:35 UTC by Christian Heimes
Modified: 2017-12-13 13:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 13:42:52 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Christian Heimes 2017-11-30 10:35:13 UTC
Description of problem:
The OpenLDAP client library fails with CONNECT_ERROR. The error always occurs with the third LDAP connection within the ldap_start_tls_s(). It's only reproducible when I set LDAP_OPT_X_TLS_CACERTFILE to a valid CA file and apply the settings with ldap_set_option(conn, LDAP_OPT_X_TLS_NEWCTX, 0). It does not occur on Travis CI, where libldap_r is compiled against OpenSSL.

I was able to track down the issue to a failing call to SECMOD_RestartModules() in tls_m.c around line 1836. It looks like PR_CallOnceWithArg( &c->tc_callonce, tlsm_deferred_ctx_init, c ); is called more than once. https://github.com/python-ldap/python-ldap/issues/60#issuecomment-348134403 contains additional information.


Version-Release number of selected component (if applicable):
* nss-3.34.0-1.0.fc27.x86_64
* openldap-2.4.45-3.fc27.x86_64

How reproducible:
Always


Steps to Reproduce:
I have created a reproducer script:

1. sudo dnf builddep python-ldap
2. git clone --branch tls_nss https://github.com/tiran/python-ldap.git
3. cd python-ldap
4. contrib/reproducer_issue60.py


Actual results:
Iteration 1
dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
Iteration 2
dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
Iteration 3
Traceback (most recent call last):
  File "contrib/reproducer_issue60.py", line 55, in <module>
    main()
  File "contrib/reproducer_issue60.py", line 49, in main
    reproducer(server)
  File "contrib/reproducer_issue60.py", line 40, in reproducer
    conn = connect(server)
  File "contrib/reproducer_issue60.py", line 32, in connect
    conn.start_tls_s()
  File "Lib/ldap/ldapobject.py", line 806, in start_tls_s
    return self._ldap_call(self._l.start_tls_s)
  File "Lib/ldap/ldapobject.py", line 298, in _ldap_call
    reraise(exc_type, exc_value, exc_traceback)
  File "Lib/ldap/ldapobject.py", line 282, in _ldap_call
    result = func(*args,**kwargs)
ldap.CONNECT_ERROR: {u'info': 'Is a directory', 'errno': 21, 'desc': u'Connect error'}

NOTE: "u'info': 'Is a directory'" is a red herring. python-ldap wrongly maps errno 21 to EISDIR

Expected results:
No exception


Additional info:
Upstream issue: https://github.com/python-ldap/python-ldap/issues/60

I don't know if this code ever worked. I just started to add tests for start_tls_s() to python-ldap.

Comment 1 Kai Engert (:kaie) (inactive account) 2017-11-30 10:55:36 UTC
IIUC we don't know yet if there is a bug in NSS, or if openldap is using NSS incorrectly, right?

If SECMOD_RestartModules() fails, have you checked that openldap uses this call appropriately, in other words, it only calls it once in response to a fork?

Since you were already in the debugger, I guess it would be easy for you to find out at which point inside of SECMOD_RestartModules we run into a failure? (If you do, please paste the failing line, as line numbers can vary between snapshots.)



> I was able to track down the issue to a failing call to
> SECMOD_RestartModules() in tls_m.c around line 1836. It looks like
> PR_CallOnceWithArg( &c->tc_callonce, tlsm_deferred_ctx_init, c ); is called
> more than once.

Did you draw that conclusion, because of the following debugger output, which you pasted into the github issue?

(gdb) n
2777            LDAP_MUTEX_LOCK( &tlsm_init_mutex );
(gdb) n
2778            status = PR_CallOnceWithArg( &c->tc_callonce, tlsm_deferred_ctx_init, c );
(gdb) n
2779            LDAP_MUTEX_UNLOCK( &tlsm_init_mutex );
(gdb) n
2778            status = PR_CallOnceWithArg( &c->tc_callonce, tlsm_deferred_ctx_init, c );
(gdb) n


This doesn't look like a duplicate execution. There is no loop here. I'm guessing what you saw is a behavior of gdb, which I've frequently seen with optimized builds, which might seem confusing.

gdb might go back multiple times to a single line, after it has executed only portions of the underlying assembler code.

To verify your theory that tlsm_deferred_ctx_init gets called multiple times, you'd have to set a breakpoint on tlsm_deferred_ctx_init and check how often it's being reached.

Also, you might prefer to use binaries compiled with debug information, to make it easier for gdb. You'll get less confusing experiences when using gdb.

Also, at one point your debugger session printed
  warning: Source file is more recent than executable.
so you should make sure there's nothing out of date in your debugging session.

Comment 2 Christian Heimes 2017-11-30 10:57:36 UTC
I added a --no-cacertfile option to the script. With --no-cacertfile, it does not set OPT_X_TLS_CACERTFILE and disables cert validation with OPT_X_TLS_REQUIRE_CERT ldap.OPT_X_TLS_NEVER.

$ contrib/reproducer_issue60.py --no-cacertfile
running build_ext
copying build/lib.linux-x86_64-2.7/_ldap.so -> Lib
Testing with cacertfile=False...
  Iteration 1, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 2, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 3, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 4, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 5, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 6, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 7, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 8, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 9, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 10, cacacertfile=False
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org


$ contrib/reproducer_issue60.py
running build_ext
copying build/lib.linux-x86_64-2.7/_ldap.so -> Lib
Testing with cacertfile=True...
  Iteration 1, cacacertfile=True
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 2, cacacertfile=True
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 3, cacacertfile=True
Traceback (most recent call last):
  File "contrib/reproducer_issue60.py", line 76, in <module>
    main()
  File "contrib/reproducer_issue60.py", line 70, in main
    reproducer(server, cacertfile=args.cacertfile)
  File "contrib/reproducer_issue60.py", line 52, in reproducer
    conn = connect(server, cacertfile=cacertfile)
  File "contrib/reproducer_issue60.py", line 43, in connect
    conn.start_tls_s()
  File "Lib/ldap/ldapobject.py", line 806, in start_tls_s
    return self._ldap_call(self._l.start_tls_s)
  File "Lib/ldap/ldapobject.py", line 298, in _ldap_call
    reraise(exc_type, exc_value, exc_traceback)
  File "Lib/ldap/ldapobject.py", line 282, in _ldap_call
    result = func(*args,**kwargs)
ldap.CONNECT_ERROR: {u'info': 'Is a directory', 'errno': 21, 'desc': u'Connect error'}

Comment 3 Kai Engert (:kaie) (inactive account) 2017-11-30 10:58:48 UTC
Please explain the meaning of comment 2 for this bug.

Comment 4 Christian Heimes 2017-11-30 11:08:12 UTC
Please ignore "warning: Source file is more recent than executable.". openldap-debuginfo does not provide debug sources. I had to extract files from the SRPM and add the source directory to gdb

I can confirm that tlsm_deferred_ctx_init() is called for every ldap_start_tls_s() call because the old context is destroyed by LDAP_OPT_X_TLS_NEWCTX. Please note that LDAP_OPT_X_TLS_NEWCTX is strictly required by libldap to apply all outstanding LDAP_OPT_X_TLS options to the underlying SSL implementation. Without LDAP_OPT_X_TLS_NEWCTX, new settings are ignored.

$ gdb python
(gdb) dir openldap-2.4.45-3.fc27.x86_64/libraries/libldap
(gdb) b tlsm_deferred_ctx_ini
(gdb) run contrib/reproducer_issue60.py
...
Testing with cacertfile=True...
  Iteration 1, cacacertfile=True

Breakpoint 1, tlsm_deferred_ctx_init (arg=0x555555973450) at tls_m.c:2329
warning: Source file is more recent than executable.
2329    {
(gdb) c
Continuing.
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 2, cacacertfile=True

Breakpoint 1, tlsm_deferred_ctx_init (arg=0x555555ae5f40) at tls_m.c:2329
2329    {
Missing separate debuginfos, use: dnf debuginfo-install sqlite-libs-3.20.1-1.fc27.x86_64
(gdb) c
Continuing.
     dn:cn=Manager,dc=slapd-test,dc=python-ldap,dc=org
  Iteration 3, cacacertfile=True

Breakpoint 1, tlsm_deferred_ctx_init (arg=0x555555b14060) at tls_m.c:2329
2329    {
(gdb) c
Continuing.
Traceback (most recent call last):
  File "contrib/reproducer_issue60.py", line 76, in <module>
    main()
  File "contrib/reproducer_issue60.py", line 70, in main
    reproducer(server, cacertfile=args.cacertfile)
  File "contrib/reproducer_issue60.py", line 52, in reproducer
    conn = connect(server, cacertfile=cacertfile)
  File "contrib/reproducer_issue60.py", line 43, in connect
    conn.start_tls_s()
  File "Lib/ldap/ldapobject.py", line 806, in start_tls_s
    return self._ldap_call(self._l.start_tls_s)
  File "Lib/ldap/ldapobject.py", line 298, in _ldap_call
    reraise(exc_type, exc_value, exc_traceback)
  File "Lib/ldap/ldapobject.py", line 282, in _ldap_call
    result = func(*args,**kwargs)
ldap.CONNECT_ERROR: {u'info': 'Is a directory', 'errno': 21, 'desc': u'Connect error'}

Comment 5 Christian Heimes 2017-11-30 11:10:32 UTC
(In reply to Kai Engert (:kaie) from comment #3)
> Please explain the meaning of comment 2 for this bug.

The error does not occur unless I set a CA cert file with LDAP_OPT_X_TLS_CACERTFILE. Without a custom CA cert, start_tls_s() just works fine.


Note You need to log in before you can comment on or make changes to this bug.