Cause: All the connections over cyrus-sasl using GSSAPI plugin were using one shared lock for both server and client implementations.
Consequence: Deadlock in cyrus-sasl GSSAPI plugin caused whole directory server stop responding when outgoing replication uses SASL GSSAPI and incoming client connection uses SASL GSSAPI.
Fix: Introduction of per-thread locks minimizes the required synchronization and prevents deadlock if some threads are blocked.
Result: The directory server is able to handle reliably concurrent connections using GSSAPI plugin in cyrus-sasl.
Description of problem:
deadlock in ipa context with 389-ds-base-1.3.3.1-20.el7_1.x86_64 in gssapi
need pstack review
there are several threads with a trace similar to this:
Thread 33 (Thread 0x7f0fecd58700 (LWP 2420)):
#0 0x00007f1018682f7d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f101867ed68 in _L_lock_975 () from /lib64/libpthread.so.0
#2 0x00007f101867ed11 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1018cd5cb9 in PR_Lock () from /lib64/libnspr4.so
#4 0x00007f101adc7a99 in nssasl_mutex_lock ()
#5 0x00007f101301430c in gssapi_decode_packet () from /usr/lib64/sasl2/libgssapiv2.so
#6 0x00007f1018aabb25 in _plug_decode () from /lib64/libsasl2.so.3
#7 0x00007f1013014826 in gssapi_decode () from /usr/lib64/sasl2/libgssapiv2.so
#8 0x00007f1018aa15bd in sasl_decode () from /lib64/libsasl2.so.3
#9 0x00007f101adc773a in sasl_io_recv ()
#10 0x00007f101adb78f1 in connection_read_operation ()
#11 0x00007f101adb88df in connection_threadmain ()
#12 0x00007f1018cdb7bb in _pt_root () from /lib64/libnspr4.so
#13 0x00007f101867cdf5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f10183aa1ad in clone () from /lib64/libc.so.6
Version-Release number of selected component (if applicable):
389-ds-base-1.3.3.1-20.el7_1.x86_64
redhat-release-server-7.1-1.el7.x86_64
How reproducible:
N/A
Steps to Reproduce:
1. N/A
2.
3.
Actual results:
ns-slapd hang / deadlock as per pstack
Expected results:
Additional info:
The fix for this bug has been delivered in RHEL 7.2.z and this component has not been updated in RHEL 7.3. RHEL 7.3 contains the fix from RHEL 7.2.z. Therefore, this bug has been closed as CURRENTRELEASE.