Bug 1243950

Summary: When starting a replica agreement a deadlock can occur with an op updating nsuniqueid index
Product: Red Hat Enterprise Linux 7 Reporter: Noriko Hosoi <nhosoi>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: mreynolds, nhosoi, nkinder, rmeggins, sramling, tbordaz
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.4.0-8.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 11:43:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Noriko Hosoi 2015-07-16 16:29:35 UTC
The version was 389-ds-base-1.3.3.9-1 (F21).

A write operation (like a DEL) can update nsuniqueid index. in betxn_postop when it tries to update the changelog/ruv, it tries to update the replica agreements and so acquire the RA locks.

If at the same time, the replica agreement is started, it triggers an internal search to retrieve the current ruv. It does internal search using nsuniqueid and so while it is holding the RA lock it accesses the nsuniqueid index.

Could be related to fix:
Ticket 47368 - IPA server dirsrv RUV entry data excluded from replication

How to reproduce:
I reproduced it several times on VM F21 with ticket47787_test.py

Comment 1 mreynolds 2015-07-17 20:31:01 UTC
Fixed upstream

Comment 3 Sankar Ramalingam 2015-07-30 09:37:09 UTC
To verify this bug, I think its enough to run the upstream test ticket47787_test.py on RHEL7.2 with latest 389-ds-base.

Comment 4 Noriko Hosoi 2015-08-25 17:03:25 UTC
(In reply to Sankar Ramalingam from comment #3)
> To verify this bug, I think its enough to run the upstream test
> ticket47787_test.py on RHEL7.2 with latest 389-ds-base.

Yes, Thierry noted it in the ticket https://fedorahosted.org/389/ticket/48179.  Please repeat the testcase several times.  Just once may not be good enough...
> How to reproduce:
> I reproduced it several times on VM F21 with ticket47787_test.py

Thanks!

Comment 5 Sankar Ramalingam 2015-09-14 08:53:16 UTC
I repeated the tests 4 times on a latest RHEL7.2 VM and I didn't see any deadlock/failure in the tests. Hence, marking the bug as Verified.

Build tested:
389-ds-base-1.3.4.0-15.el7.x86_64
389-ds-base-libs-1.3.4.0-15.el7.x86_64

DEBUG:lib389:running: /usr/sbin/remove-ds.pl -i slapd-master_1 
Instance slapd-master_1 removed.
INFO:lib389:dir (sys) : //etc/sysconfig
INFO:lib389:dir (priv): /home/sramling/.dirsrv
INFO:lib389:List from /home/sramling/.dirsrv
INFO:lib389:list instance {'RUN_DIR': '/var/run/dirsrv', 'DS_ROOT': '', 'SERVER_DIR': '/usr/lib64/dirsrv', 'INST_DIR': '/usr/lib64/dirsrv/slapd-master_2', 'SERVERBIN_DIR': '/usr/sbin', 'CONFIG_DIR': '/etc/dirsrv/slapd-master_2', 'PRODUCT_NAME': 'slapd'}

DEBUG:lib389:running: /usr/sbin/remove-ds.pl -i slapd-master_2 
Instance slapd-master_2 removed.
INFO:ticket47787_test:Testcase PASSED
PASSED

================================================================= 3 passed in 108.10 seconds ==================================================================
[root@dhcp35-196 sramling]# ./run_dirsrv.sh    ticket47787_test.py

Comment 6 errata-xmlrpc 2015-11-19 11:43:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2351.html