Bug 1466724 - Suffix deletion hangs with namedPipeLog tests
Suffix deletion hangs with namedPipeLog tests
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base (Show other bugs)
x86_64 Linux
medium Severity high
: rc
: ---
Assigned To: mreynolds
Viktor Ashirov
Depends On:
  Show dependency treegraph
Reported: 2017-06-30 06:31 EDT by Sankar Ramalingam
Modified: 2017-11-16 11:30 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Comment 4 thierry bordaz 2017-06-30 09:13:52 EDT
The DS server hangs while processing
[30/Jun/2017:06:43:48.711609117 +051800] conn=8 op=8 DEL dn="cn=aci,cn=index,cn=NamedPipeDB,cn=ldbm database,cn=plugins,cn=config"

The DS thread is hanging on a DB mutex

Thread 10 (Thread 0x7f810e3d1700 (LWP 24803)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f8130d07403 in __db_pthread_mutex_condwait (env=0x5583130d0a00, mutex=140192266383520, timespec=0x0, mutexp=<optimized out>) at ../../src/mutex/mut_pthread.c:321
#2  __db_hybrid_mutex_suspend (env=env@entry=0x5583130d0a00, mutex=mutex@entry=19492, timespec=timespec@entry=0x0, exclusive=exclusive@entry=1) at ../../src/mutex/mut_pthread.c:577
#3  0x00007f8130d06750 in __db_tas_mutex_lock_int (nowait=0, timeout=0, mutex=<optimized out>, env=0x5583130d0a00) at ../../src/mutex/mut_tas.c:271
#4  __db_tas_mutex_lock (env=env@entry=0x5583130d0a00, mutex=19492, timeout=timeout@entry=0) at ../../src/mutex/mut_tas.c:302
#5  0x00007f8130db0e4a in __lock_get_internal (lt=lt@entry=0x5583132f1a20, sh_locker=sh_locker@entry=0x7f812a63ccc8, flags=flags@entry=0, obj=<optimized out>, lock_mode=<optimized out>, timeout=0, lock=lock@entry=0x7f810e3cee38) at ../../src/lock/lock.c:983
#6  0x00007f8130db21f8 in __lock_vec (env=0x5583130d0a00, sh_locker=sh_locker@entry=0x7f812a63ccc8, flags=<optimized out>, list=list@entry=0x7f810e3cedf0, nlist=nlist@entry=2, elistp=elistp@entry=0x7f810e3ced98) at ../../src/lock/lock.c:140
#7  0x00007f8130e0b0ab in __fop_lock_handle (env=<optimized out>, dbp=0x558313316600, locker=0x7f812a63ccc8, mode=<optimized out>, elockp=0x7f810e3ceec0, flags=<optimized out>) at ../../src/fileops/fop_util.c:144
#8  0x00007f8130e0d078 in __fop_remove_setup (dbp=dbp@entry=0x558313316600, txn=txn@entry=0x0, name=0x558313e2adc0 "/var/lib/dirsrv/slapd-deftestinst/db/NamedPipeDB/aci.db", flags=flags@entry=0) at ../../src/fileops/fop_util.c:1084
#9  0x00007f8130df05da in __db_remove_int (dbp=dbp@entry=0x558313316600, ip=0x0, txn=txn@entry=0x0, name=name@entry=0x7f810e3cf2f0 "/var/lib/dirsrv/slapd-deftestinst/db/NamedPipeDB/aci.db", subdb=subdb@entry=0x0, flags=flags@entry=0) at ../../src/db/db_remove.c:289
#10 0x00007f8130df107f in __db_remove (dbp=dbp@entry=0x558313316600, ip=<optimized out>, txn=txn@entry=0x0, name=name@entry=0x7f810e3cf2f0 "/var/lib/dirsrv/slapd-deftestinst/db/NamedPipeDB/aci.db", subdb=subdb@entry=0x0, flags=flags@entry=0) at ../../src/db/db_remove.c:220
#11 0x00007f8130df11e4 in __db_remove_pp (dbp=0x558313316600, name=0x7f810e3cf2f0 "/var/lib/dirsrv/slapd-deftestinst/db/NamedPipeDB/aci.db", subdb=0x0, flags=0) at ../../src/db/db_remove.c:194
#12 0x00007f812de9ea6d in dblayer_db_remove_ex (env=env@entry=0x5583134194c0, path=0x7f810e3cf2f0 "/var/lib/dirsrv/slapd-deftestinst/db/NamedPipeDB/aci.db", dbName=dbName@entry=0x0, use_lock=use_lock@entry=0) at ldap/servers/slapd/back-ldbm/dblayer.c:3155
#13 0x00007f812de9ecf6 in dblayer_erase_index_file_ex (be=<optimized out>, a=0x5583132bac40, use_lock=use_lock@entry=1, no_force_checkpoint=no_force_checkpoint@entry=0) at ldap/servers/slapd/back-ldbm/dblayer.c:3273
#14 0x00007f812de9ee1c in dblayer_erase_index_file (be=<optimized out>, a=<optimized out>, no_force_chkpt=no_force_chkpt@entry=0) at ldap/servers/slapd/back-ldbm/dblayer.c:3299
#15 0x00007f812ded5c0a in ldbm_instance_index_config_delete_callback (pb=0x7f810e3d0a50, e=0x558313402180, entryAfter=<optimized out>, returncode=0x7f810e3d0428, returntext=<optimized out>, arg=0x558313482820) at ldap/servers/slapd/back-ldbm/ldbm_index_config.c:173
#16 0x00007f813995e3e9 in dse_call_callback (pb=pb@entry=0x7f810e3d0a50, operation=operation@entry=32, flags=flags@entry=1, entryBefore=entryBefore@entry=0x558313402180, entryAfter=entryAfter@entry=0x0, returncode=returncode@entry=0x7f810e3d0428, returntext=returntext@entry=0x7f810e3d0470 "", pdse=<optimized out>) at ldap/servers/slapd/dse.c:2637
#17 0x00007f8139961828 in dse_delete (pb=0x7f810e3d0a50) at ldap/servers/slapd/dse.c:2499
#18 0x00007f8139957fd0 in op_shared_delete (pb=pb@entry=0x7f810e3d0a50) at ldap/servers/slapd/delete.c:331
#19 0x00007f8139958352 in do_delete (pb=pb@entry=0x7f810e3d0a50) at ldap/servers/slapd/delete.c:97
#20 0x000055831199d642 in connection_dispatch_operation (pb=0x7f810e3d0a50, op=0x558313483ee0, conn=0x558313f19cd0) at ldap/servers/slapd/connection.c:623
#21 connection_threadmain () at ldap/servers/slapd/connection.c:1772
#22 0x00007f8137d0a9bb in _pt_root (arg=0x5583130c7cc0) at ../../../nspr/pr/src/pthreads/ptthread.c:216
#23 0x00007f81376aae25 in start_thread (arg=0x7f810e3d1700) at pthread_create.c:308
#24 0x00007f8136f8c34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

The DS process (24756) is hanging (thread 10) because an other process (16411) acquired that mutex and did not release it.

    21d8 dd= 0 locks held 0    write locks 0    pid/thread 24756/140192266393344 flags 0    priority 100
    21d8 WRITE         1 WAIT    NamedPipeDB/aci.db        handle        0
    2147 dd=47 locks held 1    write locks 0    pid/thread 16411/140050708789440 flags 10   priority 100
    2147 READ          1 HELD    NamedPipeDB/aci.db        handle        0

The process 16411 is no longer running. Nothing in DS error logs or audit logs mentioned that process.

No core was dumped and audit log does not record crashes at that time.

The BZ https://bugzilla.redhat.com/show_bug.cgi?id=1277887, is fixed in the running version libdb-5.3.21-20.el7.x86_64.

Next steps

  - It would be great to find which process was 16411
  - .... :(
Comment 9 mreynolds 2017-10-19 16:47:21 EDT
This test passes for me on 1.3.7 for RHEL 7.5.  I ran the namedPipe tests 10 times without a hang.

Can someone please test the latest version of 1.3.7 from brew and verify the results?  Thanks!
Comment 10 Viktor Ashirov 2017-10-20 04:14:11 EDT
When these tests are executed standalone, they pass. But if they're executed in a full acceptance run, with namedPipeLog tests being last, hang occurs. I have a hunch that this bug is a side effect of bz1463204, but I need to confirm it.

I'll rerun the tests. Since it's a full run, it'll take some time, results should be available by Monday.
Comment 11 Viktor Ashirov 2017-10-26 10:42:00 EDT
Tests still hang, when namedPipeLog is the last test. I'm working on a standalone reproducer.

Note You need to log in before you can comment on or make changes to this bug.