Bug 677483

Summary: export task followed by import task causes cache assertion
Product: Red Hat Enterprise Linux 6 Reporter: Rich Megginson <rmeggins>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: amsharma, benl, jgalipea, nhosoi, shaines
Target Milestone: rcKeywords: screened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.8-0.3.a3.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 676053 Environment:
Last Closed: 2011-05-19 08:41:51 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On: 676053, 678369    
Bug Blocks: 639035, 656390, 676871    

Comment 1 Scott Haines 2011-02-21 16:35:54 EST
*** Bug 678369 has been marked as a duplicate of this bug. ***
Comment 6 Amita Sharma 2011-03-31 02:32:44 EDT
Hi Noriko,

I am testing Bug 677483 -  export task followed by import task causes cache assertion.
This is very interesting and big one, like an interesting novel :)
  
I concluded there are in total 3 bugs under this one -

1. Description: Task version of export had a bug in  handling the busy instance error case.  When returning due to the busy  error, the function  ldbm_back_ldbm2ldif reset the busy bit set by other  threads. This patch checks the special return value set in the busy  error case and resets the busy bit only when it is set by the function.

Verify steps :
Set up 2-way mmr
1. On one window, be a root:
# cd /usr/lib64/dirsrv/slapd-amsharma
# while true; do
./db2ldif.pl -D "cn=directory Manager" -w Secret123 -r -n userRootNT -a /tmp/export.ldif; ./ldif2db.pl -D "cn=Directory Manager" -w Secret123 -n userRootNT " -i /tmp/export.ldif; done

**********It gives me Operations error (1) **********************************


2. On another window:
$ while true; do ldapsearch -x -h localhost -p 1389 -D "cn=Directory Manager" -w Secret123 -b "dc=example,dc=com" "(cn=*)"; done

**********It gives me Operations error (1)**************************************

Run the commands for an hour or so.
If you don't see the following error message in the error log
and the server keeps running, the fix is verified.

************I am getting below ***********************************************
[amsharma@amsharma /]$ tail -f /var/log/dirsrv/slapd-amsharma/errors
[30/Mar/2011:19:08:00 +051800] - ldbm: 'example' is already in the middle of another task and cannot be disturbed.
[30/Mar/2011:19:08:00 +051800] - import example: Processing file "/tmp/export.ldif"
[30/Mar/2011:19:08:00 +051800] - import example: Finished scanning file "/tmp/export.ldif" (9 entries)
[30/Mar/2011:19:08:01 +051800] - import example: Workers finished; cleaning up...
[30/Mar/2011:19:08:01 +051800] - import example: Workers cleaned up.
[30/Mar/2011:19:08:01 +051800] - import example: Cleaning up producer thread...
[30/Mar/2011:19:08:01 +051800] - import example: Indexing complete.  Post-processing...
[30/Mar/2011:19:08:01 +051800] - import example: Flushing caches...
[30/Mar/2011:19:08:01 +051800] - import example: Closing files...
[30/Mar/2011:19:08:02 +051800] - import example: Import complete.  Processed 9 entries in 1 seconds. (9.00 entries/sec)

********************Noriko, is it as expected, I am not getting this error "entrycache_clear_int"*********************************************************************************

2. Description: 
When Simple Paged Results is requested and a page is
returned, one entry is read ahead to check whether more entries
exist or not.  The read-ahead retrieves an entry (if any) and adds
it into the entry cache.  Simple Paged Results code puts the read-
ahead entry back, but there was missing to call cache_return for
the entry (that decrementing refcnt).  If ldif2db.pl is called with
the cache state, it finds out the entry which is still referred.
This patch calls cache_return when the Simple Paged Results puts
the read-ahead entry back.  Plus, adding a debug function dump_hash.

Verify steps :
Prepare a server with some entries (> 10 entries).
Run Simple Paged Result search and stops before getting all entries.
  $ ldapsearch -x -h localhost -p 1389 -b "dc=example,dc=com" -E pr=2 "(cn=*)"
  ...
  # search result
  search: 8
  result: 0 Success
  control: 1.2.840.113556.1.4.319 false MAcCAgPbBAEy
  pagedresults: estimate=987 cookie=Mg==
  Press [size] Enter for the next {2|size} entries.

Run ./db2ldif.pl
  # ./db2ldif.pl -D 'cn=directory manager' -w Secret123 -n userRootNT -a /tmp/export.ldif

Run ./ldif2db.pl
  # ./ldif2db.pl -n userRootNT -D 'cn=directory manager' -w Secret123 -i /tmp/export.ldif
  
Check error log:
  # grep entrycache_clear_int /var/log/dirsrv/slapd-ID/errors
  # echo $?
  1
If the keyword is not found in the error log, the bug was verified.

*************This one is all set, no errors are there**********************************************

3. Description: 
When a search request with VLV and/or SORT control
fails, it did not returning an entry to the entry cache.  The
entry has positive refcnt and won't be cleared even by cache_clear.
This patch adds CACHE_RETURN call for the error cases.

********I think it is covered in the 4th one? or do I need to verify this with it some other steps? ********

4. Description:
There were 3 places where an entry was not released
by CACHE_RETURN (not decrimenting refcnt).  If an entry has positive
refcnt in the entry cache, it won't be released even if the entry
never be accessed.
1. When a search request with VLV and/or SORT control fails.
2. When comparing entries in compare_entries_sv, and the second
entry is not found, the first entry is not released.
3. vlv_trim_candidates_byvalue retrieves entries for performing
binary search over the candidate list and put them into the cache.
They are not released.

Verify steps.
1. setup a server with suffix "o=umc".
./AddSuffix "o=umc" userRootNT 1389 localhost

shutdown the server

put 99umcschema.ldif in /etc/dirsrv/slapd-ID/schema

open dse.ldif and append the contents of index.ldif at the end of the file

import db_Febr15_noRep.ldif.

ldif2db -n userRootNT -i /tmp/db_Febr15_noRep.ldif

start the server

2. run ldapsearch:
$ /usr/lib[64]/mozldap/ldapsearch -p <port> -D
'cn=cli,ou=components,o=operators,o=UMC' -w cli -b "o=umc" -s sub -S
'-createTimestamp' -x -G 0:2:3 "(objectClass=*)" "createTimestamp
modifyTimestamp"

cd /usr/lib64/mozldap
/usr/lib64/mozldap/ldapsearch -p 1389 -D 'cn=cli,ou=components,o=operators,o=UMC' -w cli -b "o=umc" -s sub -S '-createTimestamp' -x -G 0:2:3 "(objectClass=*)" "createTimestamp modifyTimestamp"

******** Above ldapsearch is not giving me any output?? Is it as expected? ****************

3. run task import
cd /usr/lib64/dirsrv/slapd-amsharma
# ./ldif2db.pl -D 'cn=directory manager' -w Secret123 -n userRootNT -i /tmp/db_Febr15_noRep.ldif

4. check error log
# grep entrycache_clear_int /var/log/dirsrv/slapd-amsharma/errors
If the keyword is not found, the bug is verified.

****Note : I did not face any slapd crash and error "entrycache_clear_int" while executing above scenarios************************
Comment 8 errata-xmlrpc 2011-05-19 08:41:51 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0533.html