Bug 676053 - export task followed by import task causes cache assertion
Summary: export task followed by import task causes cache assertion
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: 389
Classification: Retired
Component: Database - Import/Export
Version: 1.2.7
Hardware: x86_64
OS: Linux
high
unspecified
Target Milestone: ---
Assignee: Noriko Hosoi
QA Contact: Viktor Ashirov
URL:
Whiteboard:
Depends On:
Blocks: 639035 389_1.2.8 677483 678369
TreeView+ depends on / blocked
 
Reported: 2011-02-08 17:58 UTC by reinhard nappert
Modified: 2015-12-07 17:09 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 677483 678369 (view as bug list)
Environment:
Last Closed: 2015-12-07 17:09:38 UTC
Embargoed:


Attachments (Terms of Use)
git patch file (master) (4.72 KB, patch)
2011-02-11 00:46 UTC, Noriko Hosoi
nkinder: review+
Details | Diff
git patch file (master) (4.18 KB, patch)
2011-02-14 22:31 UTC, Noriko Hosoi
no flags Details | Diff
revised git patch file (master) (4.25 KB, patch)
2011-02-14 23:10 UTC, Noriko Hosoi
nkinder: review+
rmeggins: review+
Details | Diff
The errors and the exported ldif file (68.44 KB, application/x-gzip)
2011-02-15 16:15 UTC, reinhard nappert
no flags Details
core file, when export fails (73.30 KB, text/plain)
2011-02-15 19:36 UTC, reinhard nappert
no flags Details
access file in response of Comment #29 (7.02 MB, application/octet-stream)
2011-02-15 21:26 UTC, reinhard nappert
no flags Details
git patch file (master) (6.64 KB, patch)
2011-02-16 02:49 UTC, Noriko Hosoi
nkinder: review+
Details | Diff
access, errors and core info file (18.61 KB, application/x-gzip)
2011-02-16 15:16 UTC, reinhard nappert
no flags Details
git patch file (master) (15.24 KB, patch)
2011-02-17 00:54 UTC, Noriko Hosoi
nhosoi: review?
rmeggins: review+
Details | Diff

Description reinhard nappert 2011-02-08 17:58:34 UTC
I have a working MM setup and I exported my db with db2ldif.pl with the -r option:
 
db2ldif.pl -D 'cn=Directory Manager' -w password -n userRoot -r -a /tmp/db_replica.ldif
 
The errors file do not indicate an issue:
[01/Feb/2011:09:23:59 -0500] - Beginning export of 'userRoot'
[01/Feb/2011:09:23:59 -0500] - export userRoot: Processed 1000 entries (10%).
[01/Feb/2011:09:23:59 -0500] - export userRoot: Processed 2000 entries (21%).
[01/Feb/2011:09:23:59 -0500] - export userRoot: Processed 3000 entries (32%).
[01/Feb/2011:09:24:00 -0500] - export userRoot: Processed 4000 entries (43%).
[01/Feb/2011:09:24:00 -0500] - export userRoot: Processed 5000 entries (54%).
[01/Feb/2011:09:24:00 -0500] - export userRoot: Processed 6000 entries (65%).
[01/Feb/2011:09:24:00 -0500] - export userRoot: Processed 7000 entries (76%).
[01/Feb/2011:09:24:00 -0500] - export userRoot: Processed 8000 entries (87%).
[01/Feb/2011:09:24:00 -0500] - export userRoot: Processed 9000 entries (98%).
[01/Feb/2011:09:24:00 -0500] - export userRoot: Processed 9160 entries (100%).
[01/Feb/2011:09:24:00 -0500] - Export finished.

and the ldif file itself looks fine to me as well.
 
Then, I tried to import the ldif file with
 
ldif2db.pl -D 'cn=Directory Manager' -w password -n userRoot -i /tmp/db_replica.ldif
 
This fails with the following errors log:
[01/Feb/2011:09:29:45 -0500] - Bringing userRoot offline...
[01/Feb/2011:09:29:45 -0500] NSMMReplicationPlugin - multimaster_be_state_change: replica o=umc is going offline; disabling replication
[01/Feb/2011:09:29:46 -0500] - entrycache_clear_int: there are still 1 entries in the entry cache. :/
[01/Feb/2011:09:29:49 -0500] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[01/Feb/2011:09:29:49 -0500] - import userRoot: Beginning import job...
[01/Feb/2011:09:29:49 -0500] - import userRoot: Index buffering is disabled.
[01/Feb/2011:09:29:49 -0500] - import userRoot: Processing file "/tmp/db_replica.ldif"
[01/Feb/2011:09:29:49 -0500] - BAD CACHE ASSERTION at ../ldap/servers/slapd/back-ldbm/cache.c/883: e->ep_refcnt > 0

Comment 1 Noriko Hosoi 2011-02-08 18:35:04 UTC
Could it be possible that you are running (or have been running) any permanent search?
[...] - entrycache_clear_int: there are still 1 entries in the entry cache. :/

I ran psearch, then even if I kill it, my import still issues this error message.

Comment 2 reinhard nappert 2011-02-08 18:54:35 UTC
No, I don't.
We do poll it every 30 seconds or so for changes (VLV searches). Is it possible that the permanent search comes from replication?

Comment 3 reinhard nappert 2011-02-08 19:00:29 UTC
No, I don't.
We do poll it every 30 seconds or so for changes (VLV searches). Is it possible that the permanent search comes from replication?

Comment 4 Noriko Hosoi 2011-02-11 00:46:35 UTC
Created attachment 478160 [details]
git patch file (master)

Description: Task version of export had a bug in handling the busy
instance error case.  When returning due to the busy error, the
function ldbm_back_ldbm2ldif reset the busy bit set by other threads.
This patch checks the special return value set in the busy error
case and resets the busy bit only when it is set by the function.

Also, this patch fixes a bug in dse_add which replaces an adding
entry with its copy in pblock and the original entry is consumed.
But the caller frees the original entry.  Applying this patch,
instead of consuming the original entry, its copy is consumed and
the original adding entry stays in the pblock.

Comment 5 Noriko Hosoi 2011-02-11 00:50:26 UTC
Hi reinhard,

Can I ask you a favor to apply the attached patch, rebuild the server and check if the problem still occurs?  I don't see the entrycache_clear_int error any more in my testing, but not 100% sure if this patch solves the problem you are facing...

Thanks a lot in advance,
--noriko

Comment 7 reinhard nappert 2011-02-11 15:45:50 UTC
I will and let you know.
Thanks for the quick fix
-Reinhard

Comment 8 reinhard nappert 2011-02-11 19:06:55 UTC
I still need to look into this, because I got some permission denied message, when I do an export, which I can not explain. I see the following:
[11/Feb/2011:13:46:19 -0500] - 389-Directory/1.2.7.5 B2011.042.1610 starting up
[11/Feb/2011:13:46:19 -0500] - slapd started.  Listening on All Interfaces port
389 for LDAP requests
[11/Feb/2011:13:49:33 -0500] - Beginning export of 'userRoot'
[11/Feb/2011:13:49:33 -0500] - db2ldif: can't open /tmp/db_Febr11.ldif: 13 (Perm
ission denied)
[11/Feb/2011:13:49:33 -0500] - ldbm2ldif: backend 'userRoot' export failed (-1)
[11/Feb/2011:13:49:33 -0500] - Export failed.

Comment 9 reinhard nappert 2011-02-11 20:57:48 UTC
I am not sure what is going on there. I did "back-port" the changes to 1.2.7.5
I noticed that the code has changed. Is it possible that the back-port caused the permission denied failure?

Comment 10 Noriko Hosoi 2011-02-11 21:04:00 UTC
(In reply to comment #9)
> I am not sure what is going on there. I did "back-port" the changes to 1.2.7.5
> I noticed that the code has changed. Is it possible that the back-port caused
> the permission denied failure?

It's hard to imagine...  
Who owns the ldif file /tmp/db_Febr11.ldif and what is the mode?  
$ ls -l /tmp/db_Febr11.ldif

And who owns ns-slapd?
$ps -ef | grep ns-slapd

Comment 11 Noriko Hosoi 2011-02-11 21:10:44 UTC
My server is running as myself (nhosoi):
nhosoi   15464     1  0 11:55 ?        00:00:05 ./ns-slapd -D /etc/dirsrv/slapd-ID -i /var/run/dirsrv/slapd-ID.pid -w /var/run/dirsrv/slapd-ID.startpid

I exported the db as follows:
# ./db2ldif.pl -D 'cn=directory manager' -w <pw> -r -n userRoot -a /tmp/export.ldif

The mode is 0600 and the owner is myself:
# ls -l /tmp/export.ldif 
-rw-------. 1 nhosoi nhosoi 1726411 Feb 11 13:06 /tmp/export.ldif

Comment 12 reinhard nappert 2011-02-11 21:29:29 UTC
nobody   13264     1  0 15:53 ?        00:00:01 ./ns-slapd -D /opt/UMC/jdb/etc/dirsrv/slapd-sdx -i /opt/UMC/jdb/var/run/dirsrv/slapd-sdx.pid -w /opt/UMC/jdb/var/run/dirsrv/slapd-sdx.startpid

I run the command as root:
 ./db2ldif.pl -D 'cn=directory manager' -w <pw> -r -n userRoot -a  /tmp/db_Febr11.ldif

[root@louise ~]# ls -ltr /tmp/db_Febr11.ldif
-rw-r--r--  1 root root 21 Feb 11 11:36 /tmp/db_Febr11.ldif

This explains why it does not work. With my previous installation, the access rights would look differently (like yours):
-rw-------  1 nobody nobody 18369062 Feb 11 13:28 db_Febr11.ldif


I did change (manually) the mode to 666 and export works. However, I still run into the issue, when I import this ldif file:

[11/Feb/2011:16:27:23 -0500] - Bringing userRoot offline...
[11/Feb/2011:16:27:23 -0500] NSMMReplicationPlugin - multimaster_be_state_change
: replica o=umc is going offline; disabling replication
[11/Feb/2011:16:27:24 -0500] - entrycache_clear_int: there are still 1 entries i
n the entry cache. :/
[11/Feb/2011:16:27:24 -0500] - WARNING: Import is running with nsslapd-db-privat
e-import-mem on; No other process is allowed to access the database
[11/Feb/2011:16:27:24 -0500] - import userRoot: Beginning import job...
[11/Feb/2011:16:27:24 -0500] - import userRoot: Index buffering is disabled.
[11/Feb/2011:16:27:25 -0500] - import userRoot: Processing file "/tmp/db_Febr11.
ldif"
[11/Feb/2011:16:27:25 -0500] - BAD CACHE ASSERTION at ../ldap/servers/slapd/back
-ldbm/cache.c/883: e->ep_refcnt > 0

Comment 13 Noriko Hosoi 2011-02-11 21:38:28 UTC
Thank you for running the test, reinhard.

All right.  The problem is still there.  I'm going to run my test with 1.2.7-5 and see how it goes/fails...

Comment 14 Noriko Hosoi 2011-02-14 22:31:50 UTC
Created attachment 478729 [details]
git patch file (master)

Description: When Simple Paged Results is requested and a page is
returned, one entry is read ahead to check whether more entries
exist or not.  The read-ahead retrieves an entry (if any) and adds
it into the entry cache.  Simple Paged Results code puts the read-
ahead entry back, but there was missing to call cache_return for
the entry (that decrementing refcnt).  If ldif2db.pl is called with
the cache state, it finds out the entry which is still referred.
This patch calls cache_return when the Simple Paged Results puts
the read-ahead entry back.  Plus, adding a debug function dump_hash.

Comment 15 Noriko Hosoi 2011-02-14 22:44:59 UTC
Hi reinhard,

I've found another bug that causes an abnormality in the entry cache.  Do you happen to use Simple Paged Results on your server?  It could leave a read-ahead entry in the entry cache that makes clearing the cache failed in the import.

If this is not your case, could it be possible to apply the patch "attachment 478729 [details]", build the server with LDAP_CACHE_DEBUG defined (I guess you've already done so) and once "entrycache_clear_int: there are still 1 entries
in the entry cache." is observed, could you find out which entry/entries are in the entry cache?  The entry IDs are supposed to be printed out in the error log:
  entrycache_clear_int: there are still 1 entries in the entry cache.
  entry cache:
  <id> ...
Then, you should be able to find out the corresponding entry in your exported ldif file.  If you could tell us which entry/entries are in the cache (e.g., recently added, modified, modrdn'ed, searched, etc.), it'd be a big help for us.

Thanks!
--noriko

Comment 16 Noriko Hosoi 2011-02-14 23:10:00 UTC
Created attachment 478734 [details]
revised git patch file (master)

Thanks to Nathan for his comments.  Following his suggestion, this revised patch is adding a check for sr->sr_entry.

Description: When Simple Paged Results is requested and a page is
returned, one entry is read ahead to check whether more entries
exist or not.  The read-ahead retrieves an entry (if any) and adds
it into the entry cache.  Simple Paged Results code puts the read-
ahead entry back, but there was missing to call cache_return for
the entry (that decrementing refcnt).  If ldif2db.pl is called with
the cache state, it finds out the entry which is still referred.
This patch calls cache_return when the Simple Paged Results puts
the read-ahead entry back.  Plus, adding a debug function dump_hash.

Comment 18 reinhard nappert 2011-02-15 16:15:21 UTC
Created attachment 478908 [details]
The errors and the exported ldif file

errors and ldif file

Comment 19 reinhard nappert 2011-02-15 16:20:08 UTC
I did build the server with the patch and I installed it from scratch. During the setup and directory instance creation, I load a sample database with 451 entries and configure the indexes and the vlv indexes. Now, it got worse. The slapd process just terminated after while, before I did the export. See errors (which I attached):
[15/Feb/2011:10:50:14 -0500] - WARNING: Import is running with nsslapd-db-privat
e-import-mem on; No other process is allowed to access the database
[15/Feb/2011:10:50:14 -0500] - check_and_set_import_cache: pagesize: 4096, pages
: 4110397, procpages: 24904
[15/Feb/2011:10:50:14 -0500] - Import allocates 6576632KB import cache.
[15/Feb/2011:10:50:14 -0500] - Setting ncache to: 2 to keep each chunk below 4Gb
ytes
[15/Feb/2011:10:50:18 -0500] - userRoot: Indexing VLV: CRT
[15/Feb/2011:10:50:18 -0500] - userRoot: Finished indexing.
[15/Feb/2011:10:50:18 -0500] - All database threads now stopped
[15/Feb/2011:10:50:19 -0500] - WARNING: Import is running with nsslapd-db-privat
e-import-mem on; No other process is allowed to access the database
[15/Feb/2011:10:50:19 -0500] - check_and_set_import_cache: pagesize: 4096, pages
: 4110397, procpages: 24904
[15/Feb/2011:10:50:19 -0500] - Import allocates 6576632KB import cache.
[15/Feb/2011:10:50:19 -0500] - Setting ncache to: 2 to keep each chunk below 4Gb
ytes
[15/Feb/2011:10:50:22 -0500] - userRoot: Indexing VLV: MT
[15/Feb/2011:10:50:22 -0500] - userRoot: Finished indexing.
[15/Feb/2011:10:50:23 -0500] - All database threads now stopped

At that time, I the vlv indexes were already done. Then, I exported the data:
[root@louise slapd-sdx]# date; ./db2ldif.pl -D cn=umcadmin,o=umc -w <password> -s o=UMC  -a /tmp/db_Febr15_noRep.ldif
Tue Feb 15 10:53:19 EST 2011
Exporting to ldif file: /tmp/db_Febr15_noRep.ldif
adding new entry "cn=export_2011_2_15_10_53_19, cn=export, cn=tasks, cn=config"

When this was done, the slapd process died again (same vlv messages) and I started the server:
[root@louise slapd-sdx]# date; service UMCjdb start
Tue Feb 15 10:54:54 EST 2011
Starting SRC Database Service

Then, I imported it, which seems to work always for the first time:
[root@louise slapd-sdx]# date; ./ldif2db.pl -D cn=umcadmin,o=umc -w <password> -s o=UMC -i /tmp/db_Febr15_noRep.ldif
Tue Feb 15 10:55:28 EST 2011
adding new entry "cn=import_2011_2_15_10_55_28, cn=import, cn=tasks, cn=config"

But, it fails with the second time:
[root@louise slapd-sdx]# date; ./ldif2db.pl -D cn=umcadmin,o=umc -w <password> -s o=UMC -i /tmp/db_Febr15_noRep.ldif
Tue Feb 15 10:55:48 EST 2011
adding new entry "cn=import_2011_2_15_10_55_48, cn=import, cn=tasks, cn=config"

errors:
[15/Feb/2011:10:55:47 -0500] - Bringing userRoot offline...
[15/Feb/2011:10:55:47 -0500] - entrycache_clear_int: there are still 16 entries
in the entry cache. :/
[15/Feb/2011:10:55:47 -0500] - entry cache:
[15/Feb/2011:10:55:47 -0500] - 1198334336392393408420424426434438445448449450
[15/Feb/2011:10:55:47 -0500] - WARNING: Import is running with nsslapd-db-privat
e-import-mem on; No other process is allowed to access the database
[15/Feb/2011:10:55:47 -0500] - import userRoot: Beginning import job...
[15/Feb/2011:10:55:47 -0500] - import userRoot: Index buffering is disabled.
[15/Feb/2011:10:55:47 -0500] - import userRoot: Processing file "/tmp/db_Febr15_
noRep.ldif"
[15/Feb/2011:10:55:48 -0500] - BAD CACHE ASSERTION at ../ldap/servers/slapd/back
-ldbm/cache.c/922: e->ep_refcnt > 0

Again, I attached the ldif and the errors file.

Comment 20 Noriko Hosoi 2011-02-15 17:54:24 UTC
Thank you, reinhard.  

I tried to find out the pattern why these 16 entries are in the entry cache, but it was too tough for me.  I'd like to reproduce your problem in house.  It looks your system has some custom schema.  Could it be possible to share the schema with us?  And the configuration file dse.ldif?  If you are not confortable to attach them to this bug report, you could just send them to my address nhosoi.  

Also, do you have any custom plugin?  Or are you using any specific plugins other than the replication?  Such as memberOf, linkedattrs, dna, USN, etc.?  

What is your platform?  OS?  CPU?

Any information would be greatly appreciated!

Comment 21 Noriko Hosoi 2011-02-15 18:26:58 UTC
>  The slapd process just terminated after while, before I did the export.

Is it possible to attach gdb to ns-slapd and get the stacktrace when the process gets terminated?  Thanks!

Comment 22 reinhard nappert 2011-02-15 19:36:38 UTC
Created attachment 478958 [details]
core file, when export fails

I was not able to reproduce the crash of slapd. However, I produced a core for the export issue. Attached is the gdb file. Here is the output to stdout of the gdb as well:
[root@louise log]# gdb /opt/UMC/jdb/sbin/ns-slapd /var/UMC/jdb/log/core.22423
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Failed to read a valid object file image from memory.
Core was generated by `./ns-slapd -D /opt/UMC/jdb/etc/dirsrv/slapd-sdx -i /opt/UMC/jdb/var/run/dirsrv/'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libslapd.so.0...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libslapd.so.0
Reading symbols from /usr/lib64/libkrb5.so.3...done.
Loaded symbols for /usr/lib64/libkrb5.so.3
Reading symbols from /usr/lib64/libk5crypto.so.3...done.
Loaded symbols for /usr/lib64/libk5crypto.so.3
Reading symbols from /lib64/libcom_err.so.2...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libpcre.so.0...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libpcre.so.0
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libssldap60.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libssldap60.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libprldap60.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libprldap60.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libldap60.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libldap60.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libldif60.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libldif60.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libsoftokn3.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libsoftokn3.so
Reading symbols from /usr/lib64/libsasl2.so.2...done.
Loaded symbols for /usr/lib64/libsasl2.so.2
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libsvrcore.so.0...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libsvrcore.so.0
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libssl3.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libssl3.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libnss3.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libnss3.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libplds4.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libplds4.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libplc4.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libplc4.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libnspr4.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libnspr4.so
Reading symbols from /lib64/tls/libpthread.so.0...done.
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libsyntax-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libsyntax-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libbitwise-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libbitwise-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libcollation-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libcollation-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libicui18n.so.36...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libicui18n.so.36
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libicuuc.so.36...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libicuuc.so.36
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libicudata.so.36...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libicudata.so.36
Reading symbols from /usr/lib64/libstdc++.so.6...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/tls/libm.so.6...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libpwdstorage-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libpwdstorage-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libdes-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libdes-plugin.so
Reading symbols from /usr/lib64/sasl2/libplain.so.2...done.
Loaded symbols for /usr/lib64/sasl2/libplain.so.2
Reading symbols from /usr/lib64/sasl2/libcrammd5.so.2...done.
Loaded symbols for /usr/lib64/sasl2/libcrammd5.so.2
Reading symbols from /usr/lib64/sasl2/libsasldb.so.2...done.
Loaded symbols for /usr/lib64/sasl2/libsasldb.so.2
Reading symbols from /usr/lib64/sasl2/liblogin.so.2...done.
Loaded symbols for /usr/lib64/sasl2/liblogin.so.2
Reading symbols from /usr/lib64/sasl2/libdigestmd5.so.2...done.
Loaded symbols for /usr/lib64/sasl2/libdigestmd5.so.2
Reading symbols from /lib64/libcrypto.so.4...done.
Loaded symbols for /lib64/libcrypto.so.4
Reading symbols from /usr/lib64/libgssapi_krb5.so.2...done.
Loaded symbols for /usr/lib64/libgssapi_krb5.so.2
Reading symbols from /usr/lib64/libz.so.1...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /usr/lib64/sasl2/libanonymous.so.2...done.
Loaded symbols for /usr/lib64/sasl2/libanonymous.so.2
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libattr-unique-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libattr-unique-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libacl-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libacl-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libns-dshttpd.so.0...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libns-dshttpd.so.0
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libchainingdb-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libchainingdb-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libcos-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libcos-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libderef-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libderef-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libdna-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libdna-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libhttp-client-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libhttp-client-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libback-ldbm.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libback-ldbm.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libdb-4.2.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libdb-4.2.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libreplication-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libreplication-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/liblinkedattrs-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/liblinkedattrs-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libmanagedentries-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libmanagedentries-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libmemberof-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libmemberof-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libpam-passthru-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libpam-passthru-plugin.so
Reading symbols from /lib64/libpam.so.0...done.
Loaded symbols for /lib64/libpam.so.0
Reading symbols from /lib64/libaudit.so.0...done.
Loaded symbols for /lib64/libaudit.so.0
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libpassthru-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libpassthru-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libreferint-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libreferint-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libretrocl-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libretrocl-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libroles-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libroles-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libschemareload-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libschemareload-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libstatechange-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libstatechange-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libusn-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libusn-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/plugins/libviews-plugin.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/plugins/libviews-plugin.so
Reading symbols from /opt/UMC/jdb/lib/dirsrv/libfreebl3.so...done.
Loaded symbols for /opt/UMC/jdb/lib/dirsrv/libfreebl3.so
#0  0x00007f4b5d5ebd3a in cache_remove (cache=0x7254e8, ptr=0x7f4b5ae2d740)
    at ../ldap/servers/slapd/back-ldbm/cache.c:922
922     ../ldap/servers/slapd/back-ldbm/cache.c: No such file or directory.
        in ../ldap/servers/slapd/back-ldbm/cache.c

Comment 23 Noriko Hosoi 2011-02-15 19:43:10 UTC
Could you do 'bt' (backtrace) there?
gdb> bt

Comment 24 reinhard nappert 2011-02-15 19:48:14 UTC
(gdb) bt
#0  0x00007f4b5d5ebd3a in cache_remove (cache=0x7254e8, ptr=0x7f4b5ae2d740)
    at ../ldap/servers/slapd/back-ldbm/cache.c:922
#1  0x00007f4b5d605467 in import_foreman (param=Variable "param" is not available.
)
    at ../ldap/servers/slapd/back-ldbm/import-threads.c:2334
#2  0x00007f4b6078edfe in _pt_root () from /opt/UMC/jdb/lib/dirsrv/libnspr4.so
#3  0x00007f4b6065910a in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00007f4b604e58b3 in clone () from /lib64/tls/libc.so.6
#5  0x0000000000000000 in ?? ()


Did you go through slapd.core.txt file, I attached. I did 
thread apply all bt

-Reinhard

Comment 25 Noriko Hosoi 2011-02-15 19:49:10 UTC
Thanks, Reinhard!

My side does not have much luck.

I installed the server (both the latest and 1.2.7-5) with your schema, index configuration, and data.  I could successfully run ./db2index.pl, ./db2ldif.pl, and ./ldif2db.pl repeatedly.  I don't see any entry cache failure... :(

Now, I'm comparing your dse.ldif with mine and looking for any differences.

Comment 26 Noriko Hosoi 2011-02-15 19:50:31 UTC
BTW, I had to add attributeTypes for 'keywords' to the custom schema file to start the server...

Comment 27 Noriko Hosoi 2011-02-15 19:58:56 UTC
Could some other threads be running at the same time?  What this command tells us?
(gdb) thread apply all bt

Comment 28 reinhard nappert 2011-02-15 20:10:51 UTC
(In reply to comment #26)
> BTW, I had to add attributeTypes for 'keywords' to the custom schema file to
> start the server...

Yes, this is correct. 389 DS used to have it in 28pilot.ldif. You are right, I altered 28pilot.ldif and added keywords in there.

(In reply to comment #27)
> Could some other threads be running at the same time?  What this command tells
> us?
> (gdb) thread apply all bt

I guess you get a backtrace for all threads. 

(In reply to comment #25)
> Thanks, Reinhard!
> 
> My side does not have much luck.
> 
> I installed the server (both the latest and 1.2.7-5) with your schema, index
> configuration, and data.  I could successfully run ./db2index.pl, ./db2ldif.pl,
> and ./ldif2db.pl repeatedly.  I don't see any entry cache failure... :(
> 
> Now, I'm comparing your dse.ldif with mine and looking for any differences.

This is interesting! How can this be?

Comment 29 Noriko Hosoi 2011-02-15 21:11:03 UTC
(In reply to comment #28)
> (In reply to comment #26)
> > BTW, I had to add attributeTypes for 'keywords' to the custom schema file to
> > start the server...
> 
> Yes, this is correct. 389 DS used to have it in 28pilot.ldif. You are right, I
> altered 28pilot.ldif and added keywords in there.

Thanks for confirming it.

> (In reply to comment #27)
> > Could some other threads be running at the same time?  What this command tells
> > us?
> > (gdb) thread apply all bt
> 
> I guess you get a backtrace for all threads. 

Right.  Do you see any active threads there other than import?

> (In reply to comment #25)
> > Thanks, Reinhard!
> > 
> > My side does not have much luck.
> > 
> > I installed the server (both the latest and 1.2.7-5) with your schema, index
> > configuration, and data.  I could successfully run ./db2index.pl, ./db2ldif.pl,
> > and ./ldif2db.pl repeatedly.  I don't see any entry cache failure... :(
> > 
> > Now, I'm comparing your dse.ldif with mine and looking for any differences.
> 
> This is interesting! How can this be?

Yeah, I'm completely puzzled...  Everything is going well on my test system 64-bit F-13 (yes, it's not centos...).  I also rebuilt the server with LDAP_CACHE_DEBUG enabled, but I don't see any sign of cache corruption.

I guess some activities should be around on your system...  Could you give us the access log, too?

Comment 30 reinhard nappert 2011-02-15 21:26:05 UTC
Created attachment 478974 [details]
access file in response of Comment #29

I have also a 64 bit environment. I can not see that F-13 makes a difference.
You see a ton of searches in access. Those a performed by our multi-component application.

Hope, this helps

Comment 31 reinhard nappert 2011-02-15 21:46:09 UTC
Your last post gave me the idea to import the ldif file, while all of our components were shutdown. Guess what? I imported the ldif file for at least 5/6 times. Then, I started one of our components. I was able to import it for more than 2 times, but eventually it failed, however I do not see anything in access from our component.

[root@louise slapd-sdx]# date; ./ldif2db.pl -D cn=umcadmin,o=umc -w <password> -s o=UMC -i /tmp/db_Febr15_noRep.ldif
Tue Feb 15 16:35:20 EST 2011
adding new entry "cn=import_2011_2_15_16_35_20, cn=import, cn=tasks, cn=config"

errors:
[15/Feb/2011:16:35:20 -0500] - Bringing userRoot offline...
[15/Feb/2011:16:35:20 -0500] - entrycache_clear_int: there are still 16 entries
in the entry cache. :/
[15/Feb/2011:16:35:20 -0500] - entry cache:
[15/Feb/2011:16:35:20 -0500] - 1198334336392393408420424426434438445448449450
[15/Feb/2011:16:35:20 -0500] - WARNING: Import is running with nsslapd-db-privat
e-import-mem on; No other process is allowed to access the database
[15/Feb/2011:16:35:20 -0500] - import userRoot: Beginning import job...
[15/Feb/2011:16:35:20 -0500] - import userRoot: Index buffering is disabled.
[15/Feb/2011:16:35:20 -0500] - import userRoot: Processing file "/tmp/db_Febr15_
noRep.ldif"
[15/Feb/2011:16:35:20 -0500] - BAD CACHE ASSERTION at ../ldap/servers/slapd/back
-ldbm/cache.c/922: e->ep_refcnt > 0

access:
[15/Feb/2011:16:33:02 -0500] conn=1 fd=64 slot=64 connection from 10.227.6.53 to
 10.227.6.53
[15/Feb/2011:16:33:02 -0500] conn=1 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:33:02 -0500] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:33:02 -0500] conn=1 op=1 ADD dn="cn=import_2011_2_15_16_33_2,cn=
import,cn=tasks,cn=config"
[15/Feb/2011:16:33:02 -0500] conn=1 op=1 RESULT err=0 tag=105 nentries=0 etime=0
[15/Feb/2011:16:33:02 -0500] conn=1 op=2 UNBIND
[15/Feb/2011:16:33:02 -0500] conn=1 op=2 fd=64 closed - U1
[15/Feb/2011:16:33:21 -0500] conn=2 fd=64 slot=64 connection from 10.227.6.53 to
 10.227.6.53
[15/Feb/2011:16:33:21 -0500] conn=2 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:33:21 -0500] conn=2 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:33:21 -0500] conn=2 op=1 ADD dn="cn=import_2011_2_15_16_33_21,cn
=import,cn=tasks,cn=config"
[15/Feb/2011:16:33:21 -0500] conn=2 op=1 RESULT err=0 tag=105 nentries=0 etime=0
[15/Feb/2011:16:33:21 -0500] conn=2 op=2 UNBIND
[15/Feb/2011:16:33:21 -0500] conn=2 op=2 fd=64 closed - U1
[15/Feb/2011:16:33:33 -0500] conn=3 fd=64 slot=64 connection from 10.227.6.53 to
 10.227.6.53
[15/Feb/2011:16:33:33 -0500] conn=3 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:33:33 -0500] conn=3 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:33:33 -0500] conn=3 op=1 ADD dn="cn=import_2011_2_15_16_33_34,cn
=import,cn=tasks,cn=config"
[15/Feb/2011:16:33:33 -0500] conn=3 op=1 RESULT err=0 tag=105 nentries=0 etime=0
[15/Feb/2011:16:33:33 -0500] conn=3 op=2 UNBIND
[15/Feb/2011:16:33:33 -0500] conn=3 op=2 fd=64 closed - U1
[15/Feb/2011:16:33:45 -0500] conn=4 fd=64 slot=64 connection from 10.227.6.53 to  10.227.6.53
[15/Feb/2011:16:33:45 -0500] conn=4 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:33:45 -0500] conn=4 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:33:45 -0500] conn=4 op=1 ADD dn="cn=import_2011_2_15_16_33_45,cn
=import,cn=tasks,cn=config"
[15/Feb/2011:16:33:45 -0500] conn=4 op=1 RESULT err=0 tag=105 nentries=0 etime=0
[15/Feb/2011:16:33:45 -0500] conn=4 op=2 UNBIND
[15/Feb/2011:16:33:45 -0500] conn=4 op=2 fd=64 closed - U1
[15/Feb/2011:16:33:51 -0500] conn=5 fd=64 slot=64 connection from 10.227.6.53 to
 10.227.6.53
[15/Feb/2011:16:33:51 -0500] conn=5 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:33:51 -0500] conn=5 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:33:51 -0500] conn=5 op=1 ADD dn="cn=import_2011_2_15_16_33_51,cn
=import,cn=tasks,cn=config"
[15/Feb/2011:16:33:51 -0500] conn=5 op=1 RESULT err=0 tag=105 nentries=0 etime=0
[15/Feb/2011:16:33:51 -0500] conn=5 op=2 UNBIND
[15/Feb/2011:16:33:51 -0500] conn=5 op=2 fd=64 closed - U1
[15/Feb/2011:16:34:02 -0500] conn=6 fd=64 slot=64 connection from 10.227.6.53 to
 10.227.6.53
[15/Feb/2011:16:34:02 -0500] conn=6 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:34:02 -0500] conn=6 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:34:02 -0500] conn=6 op=1 ADD dn="cn=import_2011_2_15_16_34_3,cn=
import,cn=tasks,cn=config"
[15/Feb/2011:16:34:03 -0500] conn=6 op=2 UNBIND
[15/Feb/2011:16:34:03 -0500] conn=6 op=2 fd=64 closed - U1
[15/Feb/2011:16:34:03 -0500] conn=6 op=1 RESULT err=0 tag=105 nentries=0 etime=1
[15/Feb/2011:16:34:08 -0500] conn=7 fd=64 slot=64 connection from 10.227.6.53 to
 10.227.6.53
[15/Feb/2011:16:34:08 -0500] conn=7 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:34:08 -0500] conn=7 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:34:08 -0500] conn=7 op=1 ADD dn="cn=import_2011_2_15_16_34_8,cn=
import,cn=tasks,cn=config"
[15/Feb/2011:16:34:08 -0500] conn=7 op=1 RESULT err=0 tag=105 nentries=0 etime=0
[15/Feb/2011:16:34:08 -0500] conn=7 op=2 UNBIND
[15/Feb/2011:16:34:08 -0500] conn=7 op=2 fd=64 closed - U1
[15/Feb/2011:16:34:14 -0500] conn=8 fd=64 slot=64 connection from 10.227.6.53 to
 10.227.6.53
[15/Feb/2011:16:34:14 -0500] conn=8 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:34:14 -0500] conn=8 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:34:14 -0500] conn=8 op=1 ADD dn="cn=import_2011_2_15_16_34_14,cn
=import,cn=tasks,cn=config"
[15/Feb/2011:16:34:14 -0500] conn=8 op=1 RESULT err=0 tag=105 nentries=0 etime=0
[15/Feb/2011:16:34:14 -0500] conn=8 op=2 UNBIND
[15/Feb/2011:16:34:14 -0500] conn=8 op=2 fd=64 closed - U1
[15/Feb/2011:16:34:49 -0500] conn=9 fd=64 slot=64 connection from 10.227.6.53 to
 10.227.6.53
[15/Feb/2011:16:34:49 -0500] conn=9 op=0 BIND dn="cn=umcadmin,o=umc" method=128
version=3
[15/Feb/2011:16:34:49 -0500] conn=9 op=0 RESULT err=0 tag=97 nentries=0 etime=0
dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:34:49 -0500] conn=9 op=1 ADD dn="cn=import_2011_2_15_16_34_49,cn
=import,cn=tasks,cn=config"
[15/Feb/2011:16:34:49 -0500] conn=9 op=1 RESULT err=0 tag=105 nentries=0 etime=0
[15/Feb/2011:16:34:49 -0500] conn=9 op=2 UNBIND
[15/Feb/2011:16:34:49 -0500] conn=9 op=2 fd=64 closed - U1
[15/Feb/2011:16:34:57 -0500] conn=10 fd=64 slot=64 connection from 10.227.6.53 t
o 10.227.6.53
[15/Feb/2011:16:34:57 -0500] conn=10 op=0 BIND dn="cn=umcadmin,o=umc" method=128
 version=3
[15/Feb/2011:16:34:57 -0500] conn=10 op=0 RESULT err=0 tag=97 nentries=0 etime=0
 dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:34:57 -0500] conn=10 op=1 ADD dn="cn=import_2011_2_15_16_34_58,c
n=import,cn=tasks,cn=config"
[15/Feb/2011:16:34:57 -0500] conn=10 op=1 RESULT err=0 tag=105 nentries=0 etime=
0
[15/Feb/2011:16:34:57 -0500] conn=10 op=2 UNBIND
[15/Feb/2011:16:34:57 -0500] conn=10 op=2 fd=64 closed - U1
[15/Feb/2011:16:35:07 -0500] conn=11 fd=64 slot=64 connection from 10.227.6.53 t
o 10.227.6.53
[15/Feb/2011:16:35:07 -0500] conn=11 op=0 BIND dn="cn=umcadmin,o=umc" method=128
 version=3
[15/Feb/2011:16:35:07 -0500] conn=11 op=0 RESULT err=0 tag=97 nentries=0 etime=0
 dn="cn=umcadmin,o=umc"
[15/Feb/2011:16:35:07 -0500] conn=11 op=1 ADD dn="cn=import_2011_2_15_16_35_7,cn
=import,cn=tasks,cn=config"
[15/Feb/2011:16:35:07 -0500] conn=11 op=2 UNBIND
[15/Feb/2011:16:35:07 -0500] conn=11 op=2 fd=64 closed - U1
[15/Feb/2011:16:35:07 -0500] conn=11 op=1 RESULT err=0 tag=105 nentries=0 etime=
0

Comment 32 Noriko Hosoi 2011-02-15 22:59:49 UTC
> Hope, this helps
Thanks A LOT, Reinhard.  It did!

I could reproduce the assertion failure after running some VLV searches.  This is not confirmed yet, but most likely VLV is has some culprit...
Thread 1 (Thread 18773):
#0  0x00007f29a382f69b in cache_remove (cache=0x2081728, ptr=0x7f2934007e60)
    at ldap/servers/slapd/back-ldbm/cache.c:922
#1  0x00007f29a385599e in import_foreman (param=0x7f291c1cc900)
    at ldap/servers/slapd/back-ldbm/import-threads.c:2334
#2  0x00000034f6229593 in _pt_root (arg=0x7f291c001490)
    at ../../../mozilla/nsprpub/pr/src/pthreads/ptthread.c:228
#3  0x00000034e7207761 in start_thread (arg=0x7f29853f4710)
    at pthread_create.c:301

Comment 33 Noriko Hosoi 2011-02-16 02:49:20 UTC
Created attachment 479007 [details]
git patch file (master)

Description: When a search request with VLV and/or SORT control
fails, it did not returning an entry to the entry cache.  The
entry has positive refcnt and won't be cleared even by cache_clear.
This patch adds CACHE_RETURN call for the error cases.

Comment 34 Noriko Hosoi 2011-02-16 02:52:10 UTC
Hi Reinhard,

I've attached the third patch around this bug...  I hope this solves the problem you are facing...  Thank you sooooo much for your help and patience.
--noriko

Comment 35 reinhard nappert 2011-02-16 14:59:29 UTC
(In reply to comment #34)
> Hi Reinhard,
> 
> I've attached the third patch around this bug...  I hope this solves the
> problem you are facing...  Thank you sooooo much for your help and patience.
> --noriko

Hi Noriko,
I am really sorry to disappoint you. I still see the issue :(
[16/Feb/2011:09:54:15 -0500] - Bringing userRoot offline...
[16/Feb/2011:09:54:15 -0500] - entrycache_clear_int: there are still 15 entries
in the entry cache. :/
[16/Feb/2011:09:54:15 -0500] - ID(s) in entry cache:
[16/Feb/2011:09:54:15 -0500] - 1
[16/Feb/2011:09:54:15 -0500] - WARNING: Import is running with nsslapd-db-privat
e-import-mem on; No other process is allowed to access the database
[16/Feb/2011:09:54:15 -0500] - import userRoot: Beginning import job...
[16/Feb/2011:09:54:15 -0500] - import userRoot: Index buffering is disabled.
[16/Feb/2011:09:54:15 -0500] - import userRoot: Processing file "/tmp/db_Febr16.
ldif"
[16/Feb/2011:09:54:15 -0500] - BAD CACHE ASSERTION at ../ldap/servers/slapd/back
-ldbm/cache.c/922: e->ep_refcnt > 0

I am pretty sure that we are on the right track. I gather all the info from my last attempt, zip them up and add it as an attachment.

Comment 36 reinhard nappert 2011-02-16 15:16:06 UTC
Created attachment 479128 [details]
access, errors and core info file

as promised, attached is the info
-Reinhard

Comment 37 Noriko Hosoi 2011-02-16 17:21:51 UTC
(In reply to comment #35)
> (In reply to comment #34)
> Hi Noriko,
> I am really sorry to disappoint you. I still see the issue :(

Oh, rather, I'd say big "Thank you!" to you.  This is an excellent report.  There were lots of bugs hidden around this point.  I agree with you that we are on the right track.  Once again, thank you so much for your help!
--noriko

Comment 38 reinhard nappert 2011-02-16 18:29:17 UTC
(In reply to comment #37)
> (In reply to comment #35)
> > (In reply to comment #34)
> > Hi Noriko,
> > I am really sorry to disappoint you. I still see the issue :(
> 
> Oh, rather, I'd say big "Thank you!" to you.  This is an excellent report. 
> There were lots of bugs hidden around this point.  I agree with you that we are
> on the right track.  Once again, thank you so much for your help!
> --noriko

Let me know if you need more data....

Comment 39 Noriko Hosoi 2011-02-17 00:54:06 UTC
Created attachment 479245 [details]
git patch file (master)

Thanks to Rich for his comments.  I've updated ldbm_back_search_cleanup following his suggestions.

And thanks to Reinhard for his testing and reports.  I've found 2 more places entries are not released.

Description: There were 3 places where an entry was not released
by CACHE_RETURN (not decrimenting refcnt).  If an entry has positive
refcnt in the entry cache, it won't be released even if the entry
never be accessed.
1. When a search request with VLV and/or SORT control fails.
2. When comparing entries in compare_entries_sv, and the second
entry is not found, the first entry is not released.
3. vlv_trim_candidates_byvalue retrieves entries for performing
binary search over the candidate list and put them into the cache.
They are not released.

This patch adds CACHE_RETURN call for the above cases.

Comment 40 Noriko Hosoi 2011-02-17 01:47:08 UTC
Reviewed by Rich (Thank you!!!)

Pushed to master.

$ git merge 676053
Updating 79a3fae..da14d3c
Fast-forward
 ldap/servers/slapd/back-ldbm/cache.c       |   43 ++++++++++++-
 ldap/servers/slapd/back-ldbm/id2entry.c    |    3 +-
 ldap/servers/slapd/back-ldbm/ldbm_search.c |   95 ++++++++++++++++++++++------
 ldap/servers/slapd/back-ldbm/ldif2ldbm.c   |   10 +--
 ldap/servers/slapd/back-ldbm/sort.c        |    1 +
 ldap/servers/slapd/back-ldbm/vlv.c         |   10 ++-
 ldap/servers/slapd/dse.c                   |   14 ++---
 7 files changed, 133 insertions(+), 43 deletions(-)

$ git push
Counting objects: 39, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (27/27), done.
Writing objects: 100% (27/27), 5.18 KiB, done.
Total 27 (delta 21), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   79a3fae..da14d3c  master -> master

Cherry picking 3 patches on 389-ds-base-1.2.8
commit f68c9e5ad8f8bff536a30e4fcdf089fd9d3c412e
commit 00a13794271527c7c474cd4d61f8ace5e7e7da32
commit da14d3c8a14b015d17f8b8bbcb16f9aa7d6ccd86

Pushed to 389-ds-base-1.2.8, as well.

$ git push origin ds128-local:389-ds-base-1.2.8
Counting objects: 39, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (27/27), done.
Writing objects: 100% (27/27), 5.18 KiB, done.
Total 27 (delta 21), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   b339ee0..7205ca0  ds128-local -> 389-ds-base-1.2.8

I leave the status of this bug NEW until it is verified that the original bug is fixed by Reinhard on his system.  Thanks in advance, Reinhard.

Comment 42 reinhard nappert 2011-02-17 15:52:47 UTC
Noriko,

thanks for fixing the bug. I did apply the patch and re-compiled 1.2.7.5. I was able to import our sample database and a larger database, while our components were running. I performed those imports at least 20 times and all of them worked. Of course, I could write a test client, performing vlv searches (I still may do this at a later time), but it looks to me that you nailed it, since I was not able to do it that often before. I only tested it in a stand-alone mode. I will run some tests in  MMR environment as well.

Anyway, for now I am happy :)

Thanks, again
-Reinhard

Comment 43 Noriko Hosoi 2011-02-17 17:30:12 UTC
Thank YOU so much, Reinhard.  Without your detailed bug reports/analysis and quick patch testings, I believe we never have been able to figure out what was the real cause.  Besides, your bug was very prolific.  We found multiple culprits, in addition!

Please keep us informed how you are doing and what kind of difficulties you are facing.  That'll be a big help to improve the 389 product!

Warmest regards,
--noriko

Comment 44 Amita Sharma 2011-06-08 12:53:18 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=677483 - Duplicate.


Note You need to log in before you can comment on or make changes to this bug.