Hide Forgot
Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Description of problem: Running system tests against IdM RHEL 7.5. Authentication and admin tests were scheduled to run for 3 days. After 1 day running, all authentication and admin tests failed. Tried to "knit admin", failed with "kinit: Generic error (see e-text) while getting initial credentials" Version-Release number of selected component (if applicable): RHEL 7.5 How reproducible: Install 1 IPA server and 1 replica, Config: 1000 users 100 groups 200 totp tokens 200 hotp tokens 10 admin users Run reliability system test load and monitor test result; Steps to Reproduce: 1. builds a test environment defined below 2. configure user/group/tokens env defined below 3. apply STR periodic concurrent authn/admin system pos/neg load tests such as; -authn kerberos test load -authn totp / hotp 2factor test load -sudo runtime test -admin user test scenarios -admin group test scenarios -admin enable / disable users -admin active stage user test scenario Actual results: Test failure rate should be consistent as tests continue running Expected results: Authentication test and admin tests are all failed after 1 day load. Additional info: RPM info: rpm -qi ipa-server Name : ipa-server Version : 4.5.4 Release : 9.el7 Architecture: x86_64 Install Date: Tue 30 Jan 2018 08:19:29 PM EST Group : System Environment/Base Size : 1052958 License : GPLv3+ Signature : RSA/SHA256, Wed 24 Jan 2018 02:23:32 PM EST, Key ID 199e2f91fd431d51 Source RPM : ipa-4.5.4-9.el7.src.rpm Build Date : Wed 24 Jan 2018 12:11:56 PM EST Build Host : x86-019.build.eng.bos.redhat.com Relocations : (not relocatable) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. URL : http://www.freeipa.org/ Summary : The IPA authentication server rpm -qi 389-ds-base Name : 389-ds-base Version : 1.3.7.5 Release : 14.el7 Architecture: x86_64 Install Date: Tue 30 Jan 2018 08:19:28 PM EST Group : System Environment/Daemons Size : 5509074 License : GPLv3+ Signature : RSA/SHA256, Fri 26 Jan 2018 12:08:52 PM EST, Key ID 199e2f91fd431d51 Source RPM : 389-ds-base-1.3.7.5-14.el7.src.rpm Build Date : Fri 26 Jan 2018 10:17:57 AM EST Build Host : x86-041.build.eng.bos.redhat.com Relocations : (not relocatable) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. URL : https://www.port389.org/ Summary : 389 Directory Server (base) **Test Environment: IdM RHEL 7.5 1. IPA server - metal RedHat-Linux-7.5-Maipo master.qe.sti.com 3.10.0-838.el7.x86_64 ram: 65GB cpu 10 core / 40 proc 2. replica 1 -metal RedHat-Linux-7.5-Maipo replica1.qe.sti.com 3.10.0-843.el7.x86_64 ram: 16GM cpu 10 core / 16 proc **users: str_user1k/ 10 groups/ 400 tokens
**********************dirsrv error log***************** 10/Feb/2018:11:45:50.847697465 -0500] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/master95.qe.sti.com.COM] in keytab [FILE:/etc/dirsrv/ds.keytab ]: -1765328324 (Generic error (see e-text)) [10/Feb/2018:11:45:54.689541815 -0500] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/master95.qe.sti.com.COM] in keytab [FILE:/etc/dirsrv/ds.keytab ]: -1765328324 (Generic error (see e-text)) ******************************************************************************** *********************pstack <ns-slapd pid>************* pstack 71735 Thread 1 (process 71735): #0 0x00007f525ea68bb3 in select () at /lib64/libc.so.6 #1 0x00007f5261c970f9 in DS_Sleep () at /usr/lib64/dirsrv/libslapd.so.0 #2 0x00007f525388efc7 in trickle_threadmain () at /usr/lib64/dirsrv/plugins/libback-ldbm.so #3 0x00007f525fa23c8b in _pt_root () at /lib64/libnspr4.so #4 0x00007f525f3c3dd5 in start_thread () at /lib64/libpthread.so.0 #5 0x00007f525ea71aed in clone () at /lib64/libc.so.6 ******************************************************************************** [root@master95 ~]# pstack 71648 Thread 1 (process 71648): #0 0x00007f525f3c7945 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007f5261efee63 in [IDLE THREAD] work_q_wait (tp=0x5644661cb740) at src/nunc-stans/ns/ns_thrpool.c:344 #2 0x00007f5261efee63 in worker_thread_func (arg=<optimized out>) at src/nunc-stans/ns/ns_thrpool.c:396 #3 0x00007f525f3c3dd5 in start_thread () at /lib64/libpthread.so.0 #4 0x00007f525ea71aed in clone () at /lib64/libc.so.6 ******************************************************************************** [root@master95 ~]# pstack 71649 Thread 1 (process 71649): #0 0x00007f525f3c7945 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007f5261efee63 in [IDLE THREAD] work_q_wait (tp=0x5644661cb740) at src/nunc-stans/ns/ns_thrpool.c:344 #2 0x00007f5261efee63 in worker_thread_func (arg=<optimized out>) at src/nunc-stans/ns/ns_thrpool.c:396 #3 0x00007f525f3c3dd5 in start_thread () at /lib64/libpthread.so.0 #4 0x00007f525ea71aed in clone () at /lib64/libc.so.6 [root@master95 ~]# ********************************************************************************
Created attachment 1395010 [details] This is addition stack trace information.
More is needed. You are pinning this on 389-ds but it is unclear what led you to that conclusion. We need full logs from 389-ds and krb5kdc log at a minimum. If 389-ds is hung then a gdb traceback is more useful: gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` > stacktrace.`date +%s`.txt 2>&1
Created attachment 1395042 [details] Gdb stacktrace is attached.
*********************krb5kdc stack trace********************************** [root@master95 ~]# pstack 71832 #0 0x00007fa3fc5b3032 in wait () from /lib64/libc.so.6 #1 0x000055aa39d68cbf in main () root@master95 ~]# pstack 71833 #0 0x00007fa3fc5ed0a3 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x00007fa3ee61b803 in epoll_dispatch () from /lib64/libevent-2.0.so.5 #2 0x00007fa3ee6073ea in event_base_loop () from /lib64/libevent-2.0.so.5 #3 0x000055aa39d68a76 in main () [root@master95 ~]# pstack 71834 #0 0x00007fa3fc5ed0a3 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x00007fa3ee61b803 in epoll_dispatch () from /lib64/libevent-2.0.so.5 #2 0x00007fa3ee6073ea in event_base_loop () from /lib64/libevent-2.0.so.5 #3 0x000055aa39d68a76 in main () [root@master95 ~]# pstack 71844 #0 0x00007fa3fc5ed0a3 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x00007fa3ee61b803 in epoll_dispatch () from /lib64/libevent-2.0.so.5 #2 0x00007fa3ee6073ea in event_base_loop () from /lib64/libevent-2.0.so.5 #3 0x000055aa39d68a76 in main ()
Created attachment 1395072 [details] Dirsrv error log
Created attachment 1395073 [details] Dirsrv access log 1
Created attachment 1395146 [details] Dirsrv access lofg 2
Created attachment 1395148 [details] Krb5kdc log is attached
Created attachment 1395149 [details] dse.ldif is attached
Created attachment 1395207 [details] pstack log attached
Created attachment 1395208 [details] top stack trace output file is attached
First error in DS error log happened at 2/09/2018 11:35:55. Test failure rate is increasing, but not 100% failure. on 2/10/2018 0:41:00, all authentication test failed. ***********************from DS error log********************************** [09/Feb/2018:11:35:55.587426531 -0500] - ERR - schema-compat-plugin - scheduled schema-compat-plugin tree scan in about 5 seconds after the server startup
- The two threads are in deadlock because acquiring locks in the opposite order - monitor lock (conn->c_job->monitor) - connection lock (conn->c_mutex) On the same connection/job (gdb) thread 46 (gdb) frame 7 (gdb) print opconnid $4 = 31572 (gdb) thread 100 (gdb) frame 2 (gdb) print c->c_connid $3 = 31572 - The two threads are (gdb) thread 46 [Switching to thread 46 (Thread 0x7f51a408d700 (LWP 71786))] #0 0x00007f525f3ca4cd in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) where #0 0x00007f525f3ca4cd in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007f525f3c5de6 in _L_lock_870 () at /lib64/libpthread.so.0 #2 0x00007f525f3c5cdf in pthread_mutex_lock () at /lib64/libpthread.so.0 #3 0x00007f5261efebc6 in ns_job_done (job=0x5644764a3770) at src/nunc-stans/ns/ns_thrpool.c:760 #4 0x0000564465373cd7 in ns_connection_post_io_or_closing (conn=conn@entry=0x56447030bd40) at ldap/servers/slapd/daemon.c:1693 #5 0x000056446536fb04 in disconnect_server_nomutex_ext (conn=conn@entry=0x56447030bd40, opconnid=opconnid@entry=31572, opid=opid@entry=276, reason=reason@entry=-4996, error=error@entry=0, schedule_closure_job=schedule_closure_job@entry=1) at ldap/servers/slapd/connection.c:2301 #6 0x00005644653703a3 in disconnect_server_nomutex (error=0, reason=-4996, opid=276, opconnid=31572, conn=0x56447030bd40) at ldap/servers/slapd/connection.c:2314 #7 0x00005644653703a3 in disconnect_server (conn=0x56447030bd40, opconnid=31572, opid=276, reason=-4996, error=0) at ldap/servers/slapd/connection.c:2209 #8 0x00007f5261c73c13 in flush_ber (pb=pb@entry=0x5644702952c0, conn=0x56447030bd40, op=0x564478a12000, ber=ber@entry=0x56447899bef0, type=type@entry=0) at ldap/servers/slapd/result.c:1780 #9 0x00007f5261c75db0 in send_ldap_result_ext (pb=0x5644702952c0, err=0, matched=<optimized out>, text=<optimized out>, nentries=0, urls=<optimized out>, ber=0x56447899bef0, ber@entry=0x0) at ldap/servers/slapd/result.c:600 #10 0x00007f5261c75f71 in send_ldap_result (pb=<optimized out>, err=<optimized out>, matched=<optimized out>, text=<optimized out>, nentries=<optimized out>, urls=<optimized out>) at ldap/servers/slapd/result.c:199 #11 0x00007f5261c5f673 in slapi_send_ldap_result (pb=pb@entry=0x5644702952c0, err=0, matched=matched@entry=0x0, text=0x0, nentries=nentries@entry=0, urls=urls@entry=0x0) at ldap/servers/slapd/plugin.c:781 #12 0x00007f52538cce49 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:1000 #13 0x00007f5261c4d656 in op_shared_modify (pb=pb@entry=0x5644702952c0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1021 #14 0x00007f5261c4eb7b in do_modify (pb=pb@entry=0x5644702952c0) at ldap/servers/slapd/modify.c:391 #15 0x00005644653711e2 in connection_dispatch_operation (pb=0x5644702952c0, op=0x564478a12000, conn=0x56447030bd40) at ldap/servers/slapd/connection.c:624 #16 0x00005644653711e2 in connection_threadmain () at ldap/servers/slapd/connection.c:1769 #17 0x00007f525fa23c8b in _pt_root () at /lib64/libnspr4.so #18 0x00007f525f3c3dd5 in start_thread () at /lib64/libpthread.so.0 #19 0x00007f525ea71aed in clone () at /lib64/libc.so.6 Thread 100 (Thread 0x7f52284e6700 (LWP 71728)): #0 0x00007f525f3c7945 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007f525fa1e503 in PR_EnterMonitor () at /lib64/libnspr4.so #2 0x0000564465373ffa in ns_handle_pr_read_ready () #3 0x00007f5261efec89 in work_job_execute (job=0x5644764a3770) at src/nunc-stans/ns/ns_thrpool.c:291 #4 0x00007f5261effbe3 in event_cb (fd=<optimized out>, event=<optimized out>, arg=<optimized out>) at src/nunc-stans/ns/ns_event_fw_event.c:118 #5 0x00007f525ef54a14 in event_base_loop () at /lib64/libevent-2.0.so.5 #6 0x00007f5261effeae in ns_event_fw_loop (ns_event_fw_ctx=<optimized out>) at src/nunc-stans/ns/ns_event_fw_event.c:308 #7 0x00007f5261efeac9 in event_loop_thread_func (arg=0x5644661cb740) at src/nunc-stans/ns/ns_thrpool.c:581 #8 0x00007f525f3c3dd5 in start_thread () at /lib64/libpthread.so.0 #9 0x00007f525ea71aed in clone () at /lib64/libc.so.6 - Details of the debug Thread 100 acquired the 'monitor' lock (in work_job_execute) before trying to acquire the 'c_mutex' (ns_handle_pr_read_ready) Thread 100 waits on conn->c_mutex (gdb) thread 100 (gdb) frame 2 (gdb) print c->c_mutex $12 = (PRMonitor *) 0x564472171130 (gdb) print (void *)((struct MY_PRMonitor *) 0x564472171130)->owner $37 = (void *) 0x7f51a408d700 --> That is hold by Thread 46 Thread 46 acquired the 'c_mutex' lock (disconnect_server) before trying to acquire the 'monitor' (ns_job_done) Thread 46 waits on conn->c_job->monitor (gdb) print job $5 = (ns_job_t *) 0x5644764a3770 print job->monitor.__data.__owner $3 = 71728 --> That is hold by Thread 100 - The scenario is rare because it happens when at the same time the server - adds the connection to the event queue to "poll" it - It closes the connection
[root@master95 ~]# rpm -qi libdb Name : libdb Version : 5.3.21 Release : 24.el7 Architecture: x86_64 Install Date: Tue 30 Jan 2018 03:42:19 PM EST Group : System Environment/Libraries Size : 1853360 License : BSD and LGPLv2 and Sleepycat Signature : RSA/SHA256, Thu 11 Jan 2018 10:27:08 AM EST, Key ID 199e2f91fd431d51 Source RPM : libdb-5.3.21-24.el7.src.rpm Build Date : Thu 11 Jan 2018 07:54:21 AM EST Build Host : x86-019.build.eng.bos.redhat.com Relocations : (not relocatable) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. URL : http://www.oracle.com/database/berkeley-db/ Summary : The Berkeley DB database library for C
So under this condition when the threads are in deadlock and the system is not operational anymore, could you please indicate which service / process needs to be restarted for the IdM environment to become operational again. thanks
Hi, This has been found and reported upstream. For more see, https://pagure.io/389-ds-base/issue/49569 I'll be working on it soon.
We did run same test on RHEL 7.4 for 12 days during last holiday shut down. We did not see this issue during 12 day test period.
Thank you sooo much for having already tested it!!!. We can try to retrieve what differs 7.4 from 7.5 in the connection handling. By any chance do you know which 7.4 version you tested 1.3.6.xx ?
RPM info when running IPA system testing on RHEL 7.4: rpm -qi ipa-server Name : ipa-server Version : 4.5.0 Release : 21.el7_4.1.2 Architecture: x86_64 Install Date: Mon 16 Oct 2017 10:58:54 AM EDT Group : System Environment/Base Size : 1050087 License : GPLv3+ Signature : RSA/SHA256, Thu 17 Aug 2017 03:51:44 AM EDT, Key ID 199e2f91fd431d51 Source RPM : ipa-4.5.0-21.el7_4.1.2.src.rpm Build Date : Wed 16 Aug 2017 02:31:08 PM EDT Build Host : x86-041.build.eng.bos.redhat.com rpm -qi 389-ds-base Name : 389-ds-base Version : 1.3.6.1 Release : 19.el7_4 Architecture: x86_64 Install Date: Mon 16 Oct 2017 10:58:39 AM EDT Group : System Environment/Daemons Size : 5480103 License : GPLv3+ Signature : RSA/SHA256, Fri 25 Aug 2017 03:59:01 AM EDT, Key ID 199e2f91fd431d51 Source RPM : 389-ds-base-1.3.6.1-19.el7_4.src.rpm Build Date : Tue 22 Aug 2017 09:33:10 AM EDT Build Host : x86-019.build.eng.bos.redhat.com
Hi Min, The investigations to fix the current deadlock can be long/risky. We know deadlock condition was introduced in some builds in 7.5. We identified possible guilty fixes that came after 1.3.7.5-9. Would it be possible to run the STR test with 1.3.7.5-9 (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=618728) ? If it does not hang, it would greatly help to reduce the scope of investigations.
STR tests complete successfully (*without* hang) on 1.3.7.5-9. So the problem was introduced ]1.3.7.5-9, 1.3.7.5-14] Suspecting a bunch of coverity fixes done in 1.3.7.5-14, the next tests will be done on 1.3.7.5-13.
Run system load test with DS version 1.3.7.5-9 and DS version 1.3.7.5-13 did not observe IPA server not responding issue during 2 and half day test period. The server not responding issue is reproduced with DS version 1.3.7.5-14 after 27 hours' system load test. Load type and thread numbers are same as when we reported bug on 2018-02-12 Server not responding and test all failed start around 14:29 EDT on March 21th, 2018.
Correct failing time should be 21/Mar/2018:14:36. DIRSRV log error: [21/Mar/2018:14:36:09.887351063 -0400] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/master95.qe.sti.com.COM] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm) [21/Mar/2018:14:36:13.161005418 -0400] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/master95.qe.sti.com.COM] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm)
Thanks Min for this great piece of information. Your tests on -14 hit the same deadlock as https://bugzilla.redhat.com/show_bug.cgi?id=1544477#c20 Thread 60 (Thread 0x7f67513d0700 (LWP 2414)): #0 0x00007f68048be4cd in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007f68048b9de6 in _L_lock_870 () at /lib64/libpthread.so.0 #2 0x00007f68048b9cdf in pthread_mutex_lock () at /lib64/libpthread.so.0 #3 0x00007f68073f2bc6 in ns_job_done (job=0x5636f47962d0) at src/nunc-stans/ns/ns_thrpool.c:760 #4 0x00005636e583dcd7 in ns_connection_post_io_or_closing (conn=conn@entry=0x5636f09ecb18) at ldap/servers/slapd/daemon.c:1693 #5 0x00005636e5839b04 in disconnect_server_nomutex_ext (conn=conn@entry=0x5636f09ecb18, opconnid=opconnid@entry=23341, opid=opid@entry=608, reason=reason@entry=-4996, error=error@entry=0, schedule_closure_job=schedule_closure_job@entry=1) at ldap/servers/slapd/connection.c:2301 #6 0x00005636e583a3a3 in disconnect_server_nomutex (error=0, reason=-4996, opid=608, opconnid=23341, conn=0x5636f09ecb18) at ldap/servers/slapd/connection.c:2314 #7 0x00005636e583a3a3 in disconnect_server (conn=0x5636f09ecb18, opconnid=23341, opid=608, reason=-4996, error=0) at ldap/servers/slapd/connection.c:2209 #8 0x00007f6807167c13 in flush_ber (pb=pb@entry=0x5636f0d46240, conn=0x5636f09ecb18, op=0x5636fbfae000, ber=ber@entry=0x5636f57e1450, type=type@entry=0) at ldap/servers/slapd/result.c:1780 #9 0x00007f6807169db0 in send_ldap_result_ext (pb=0x5636f0d46240, err=0, matched=<optimized out>, text=<optimized out>, nentries=0, urls=<optimized out>, ber=0x5636f57e1450, ber@entry=0x0) at ldap/servers/slapd/result.c:600 #10 0x00007f6807169f71 in send_ldap_result (pb=<optimized out>, err=<optimized out>, matched=<optimized out>, text=<optimized out>, nentries=<optimized out>, urls=<optimized out>) at ldap/servers/slapd/result.c:199 #11 0x00007f6807153673 in slapi_send_ldap_result (pb=pb@entry=0x5636f0d46240, err=0, matched=matched@entry=0x0, text=0x0, nentries=nentries@entry=0, urls=urls@entry=0x0) at ldap/servers/slapd/plugin.c:781 #12 0x00007f67f8dbfe49 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:1000 #13 0x00007f6807141656 in op_shared_modify (pb=pb@entry=0x5636f0d46240, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1021 #14 0x00007f6807142b7b in do_modify (pb=pb@entry=0x5636f0d46240) at ldap/servers/slapd/modify.c:391 #15 0x00005636e583b1e2 in connection_dispatch_operation (pb=0x5636f0d46240, op=0x5636fbfae000, conn=0x5636f09ecb18) at ldap/servers/slapd/connection.c:624 #16 0x00005636e583b1e2 in connection_threadmain () at ldap/servers/slapd/connection.c:1769 #17 0x00007f6804f17c8b in _pt_root () at /lib64/libnspr4.so #18 0x00007f68048b7dd5 in start_thread () at /lib64/libpthread.so.0 #19 0x00007f6803f64b3d in clone () at /lib64/libc.so.6 Thread 99 (Thread 0x7f67cd9d9700 (LWP 2372)): #0 0x00007f68048bb945 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007f6804f12503 in PR_EnterMonitor () at /lib64/libnspr4.so #2 0x00005636e583dffa in ns_handle_pr_read_ready (job=0x5636f47962d0) at ldap/servers/slapd/daemon.c:1782 #3 0x00007f68073f2c89 in work_job_execute (job=0x5636f47962d0) at src/nunc-stans/ns/ns_thrpool.c:291 #4 0x00007f68073f3be3 in event_cb (fd=<optimized out>, event=<optimized out>, arg=<optimized out>) at src/nunc-stans/ns/ns_event_fw_event.c:118 #5 0x00007f6804447a14 in event_base_loop () at /lib64/libevent-2.0.so.5 #6 0x00007f68073f3eae in ns_event_fw_loop (ns_event_fw_ctx=<optimized out>) at src/nunc-stans/ns/ns_event_fw_event.c:308 #7 0x00007f68073f2ac9 in event_loop_thread_func (arg=0x5636e68df740) at src/nunc-stans/ns/ns_thrpool.c:581 #8 0x00007f68048b7dd5 in start_thread () at /lib64/libpthread.so.0 #9 0x00007f6803f64b3d in clone () at /lib64/libc.so.6 I will investigate what could trigger this issue in -14 that was not in -13. Regarding your tests on -14, have you noticed some failures around OTP authentication like you reported during tests of -13 ?
OTP failure rate is high on build -13 and -14. OTP Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1508498 The only difference when we run test against build 13 and build 14 is : When running build -13, we did not run HOTP test, only run TOTP. When running build -14, We did run both HOTP and TOTP. We could rerun build -13 with hotp.
If the load differs -13 and -14 it can contribute to the connection hang (that is dynamic). I agree it would be good to test -13 with HOTP and TOTP because we expect it to fail like -14.
Hi Viktor, The STR team successfully ran Idm Test with DirServer ver/build 1.3.7.5-11 with no deadlock issue. System test duration 4+ days. Run ID R-3909 Profile Name idm-75-sys-load-test Run Name idm system test_11_lce-2 Tester milei State Complete Group idm Start Date 2018-03-28 16:04:24 Stop Date 2018-04-02 08:04:24 Test Duration Days:4, Hrs:16, Mins:0, Secs:0 Lab Controller Engine LCE-2-10.8.48.121:8085
(In reply to baiesi from comment #53) > Hi Viktor, The STR team successfully ran Idm Test with DirServer ver/build > 1.3.7.5-11 with no deadlock issue. System test duration 4+ days. > > Run ID R-3909 > Profile Name idm-75-sys-load-test > Run Name idm system test_11_lce-2 > Tester milei > State Complete > Group idm > Start Date 2018-03-28 16:04:24 > Stop Date 2018-04-02 08:04:24 > Test Duration Days:4, Hrs:16, Mins:0, Secs:0 > Lab Controller Engine LCE-2-10.8.48.121:8085 Hi Bruce, Thanks for this great news. It confirms the current plan is the good one. A new build (0day) that reverts a patch (DS ticket 48184) introduced in 1.3.7.5-12 will be available soon (https://bugzilla.redhat.com/show_bug.cgi?id=1563107) Will you have bandwidth to test this 0day build (when available) ?
Hi Thierry: STR team will install the oday build with fix when it is available.
Min, Bruce, thank you soooo much. The build is now available https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=668919
Verified. Version :: ipa-server-4.6.4-6.el7.x86_64 389-ds-base-1.3.8.4-10.el7.x86_64 Results :: Run ID R-4740 Profile Name idm-76-sys-load-test-bz1544477-qe.sti.com-2-7day Run Name idm system test_3_lce-1 Tester spoore State Complete Complete Group idm Start Date 2018-08-21 18:30:39 Stop Date 2018-08-25 18:30:39 Test Duration Days:4, Hrs:0, Mins:0, Secs:0 Lab Controller Engine LCE-1-10.8.48.121:8084 Regardless of name, this was a 4 day system load test run. The profile was created with original intention of 7 day runs but, due to timing, that was shortened for this particular run. Excluding OTP tests which were affected by another bug, the restuls for this 4 day run were: Passed: 26,446 Failed: 8 Failures above were not related to the authentication and admin test failures seen initially. No drastic numbers or rate of failures seen. The same so far appears to be true for another run going on now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3127