Bug 627738

Summary: The cn=monitor statistics entries for the dnentry cache do not change or change very rarely
Product: [Retired] 389 Reporter: Andrey Ivanov <andrey.ivanov>
Component: Database - Indexes/SearchesAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED CURRENTRELEASE QA Contact: Viktor Ashirov <vashirov>
Severity: low Docs Contact:
Priority: medium    
Version: 1.2.6CC: amsharma, jgalipea
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-07 17:13:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 639035    
Attachments:
Description Flags
git patch file (master)
nhosoi: review?, nhosoi: review?, rmeggins: review+
Fix for the previous patch: id=441634 none

Description Andrey Ivanov 2010-08-26 19:29:30 UTC
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)

The statistics of the new dn cache monitor attributes does not seem to be up to date and consistent

Reproducible: Always

Steps to Reproduce:
1. Start the server in entryrdn mode (default in 1.2.6rc7).
2. Make some load and noise :) (in our case it's production)
3. Watch the  cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config entry

Actual Results:  
dnentrycachehits: 9531
dnentrycachetries: 19054
dnentrycachehitratio: 50
dncurrententrycachesize: 1776227
dnmaxentrycachesize: 10485760
dncurrententrycachecount: 9531
dnmaxentrycachecount: -1

dnentrycachehits and dnentrycachetries do not change even under a significant load. The dncurrententrycachecount is correct - it's the number of entries in 389.

Expected Results:  
dnentrycachehits and dnentrycachetries should be up-to-date and follow the hits and misses.

These counters change sometimes but extremely rarely and they don't seem to be consistent, maybe once every 12 hours...

Comment 1 Noriko Hosoi 2010-08-27 23:59:38 UTC
Dn cache works like a backup of entry cache.  For instance, when you run a search, if the search target is found in the entry cache, the entry is returned without searching the dn cache.  So, if you have a large entry cache and the entry cache is filled with the entries, there could be few chances the dn cache is consulted.  Dn cache is implemented to go to the entryrdn index as little as possible.

I ran a test adding 10K entries with a very small entry cache:
maxentrycachesize: 1048576
maxdncachesize: 10485760 <== not allowed to change the size
Started with this set of values:

    entrycachehits: 6
    entrycachetries: 8
    entrycachehitratio: 75
    currententrycachesize: 2418
    maxentrycachesize: 1048576
    currententrycachecount: 1
    maxentrycachecount: -1
    dncachehits: 0
    dncachetries: 1
    dncachehitratio: 0
    currentdncachesize: 67
    maxdncachesize: 10485760
    currentdncachecount: 1
    maxdncachecount: -1

and ended with this set:

    entrycachehits: 30001
    entrycachetries: 40013
    entrycachehitratio: 74
    currententrycachesize: 1048129
    maxentrycachesize: 1048576
    currententrycachecount: 170
    maxentrycachecount: -1
    dncachehits: 10000
    dncachetries: 10006
    dncachehitratio: 99
    currentdncachesize: 1292746
    maxdncachesize: 10485760
    currentdncachecount: 10006
    maxdncachecount: -1

Since this is repeated add, entrycache was not very useful and dncache's hit ratio was pretty good (most likely it's used for getting the parents of the adding entries).  Then, I ran repeated "(cn=*)" search, which overflows small entry cache.  Again, the entry cache hit ratio was pretty poor, while the dn cache hit ratio was good.  This is because, the dn cache size is usually large enough for the most case.

    entrycachehits: 30060
    entrycachetries: 565454
    entrycachehitratio: 5
    currententrycachesize: 1044494
    maxentrycachesize: 1048576
    currententrycachecount: 166
    maxentrycachecount: -1
    dncachehits: 535267
    dncachetries: 535283
    dncachehitratio: 99
    currentdncachesize: 1293986
    maxdncachesize: 10485760
    currentdncachecount: 10016
    maxdncachecount: -1

Please evaluate your entry cache size.  I imagine you allocated enough size for your server (which is ideal.  Please bear in mind the dn cache is secondary cache to assist the primary entry cache.

Comment 2 Noriko Hosoi 2010-08-28 00:11:44 UTC
Created attachment 441634 [details]
git patch file (master)

These are not the fixes for the subject of this bug, but found while looking into the dn cache.

Description:
1) id2entry_add_ext replaces an dn instance in the dn cache even if
   the DN value is identical.  Replace it only when they don't match.
   File:  ldap/servers/slapd/back-ldbm/id2entry.c

2) Modifying the dn cache related labels:
    dnEntrycachehits
    dnEntrycachetries
    dnEntrycachehitratio
    dnCurrententrycachesize
    dnMaxentrycachesize
    dnCurrententrycachecount
    dnMaxentrycachecount
   ===>
    dncachehits
    dncachetries
    dncachehitratio
    currentdncachesize
    maxdncachesize
    currentdncachecount
    maxdncachecount
   File:  ldap/servers/slapd/back-ldbm/monitor.c

Comment 3 Andrey Ivanov 2010-08-28 11:50:47 UTC
Thank you, Noriko, for your explanations. Indeed, i have allocated enough space in memory to keep all the entries in the entry cache. And the fact that the dn cache is used only if the entry cache does not help can be deduced from the description of dn cache in the design document (http://directory.fedoraproject.org/wiki/Subtree_Rename#dn_cache).


The strange thing for me was that these values were still changing sometimes, very rarely. After some research i have found the reason for this behaviour. The hits and tries of dn cache change every time the server does the backup export (with db2ldif.pl). The delta of the change corresponds to the number of exported entries. And we export every 12 hours, so everything is clear now :)

Thanks again!

Comment 4 Noriko Hosoi 2010-08-31 17:39:50 UTC
Comment on attachment 441634 [details]
git patch file (master)

Reviewed by Rich (Thanks!!!)

Pushed to master.
$ git push
Counting objects: 67, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (41/41), done.
Writing objects: 100% (41/41), 14.98 KiB, done.
Total 41 (delta 33), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   a2bcd81..cc36301  master -> master

Comment 5 Andrey Ivanov 2010-09-02 11:17:01 UTC
When i apply the id2entry.c part of this patch, namely this code part :

             struct backdn *bdn = backdn_init(sdn, e->ep_id, 0);
             options |= SLAPI_DUMP_RDN_ENTRY;
 
-            /* If the ID already exists in the DN cache, replace it. */
-            if (CACHE_ADD( &inst->inst_dncache, bdn, &oldbdn ) == 1) {
+            /* If the ID already exists in the DN cache && the DNs do not match,
+             * replace it. */
+            if ((CACHE_ADD( &inst->inst_dncache, bdn, &oldbdn ) == 1) &&
+                (slapi_sdn_compare(sdn, oldbdn->dn_sdn))) {
                 cache_replace( &inst->inst_dncache, oldbdn, bdn );
                 CACHE_RETURN(&inst->inst_dncache, &oldbdn); /* to free oldbdn */
             }



, there is an error during import with the dn cache :

[02/Sep/2010:13:10:34 +0200] - import userRoot: Workers finished; cleaning up...
[02/Sep/2010:13:10:34 +0200] - import userRoot: Workers cleaned up.
[02/Sep/2010:13:10:34 +0200] - import userRoot: Cleaning up producer thread...
[02/Sep/2010:13:10:34 +0200] - import userRoot: Indexing complete.  Post-processing...
[02/Sep/2010:13:10:34 +0200] - import userRoot: Flushing caches...
[02/Sep/2010:13:10:34 +0200] - import userRoot: Closing files...
---->
[02/Sep/2010:13:10:34 +0200] - dncache_clear_int: there are still 53 dn's in the dn cache. :/
<----
[02/Sep/2010:13:10:35 +0200] - All database threads now stopped
[02/Sep/2010:13:10:35 +0200] - import userRoot: Import complete.  Processed 9527 entries in 13 seconds. (732.85 entries/sec)

Comment 6 Noriko Hosoi 2010-09-02 17:56:22 UTC
Created attachment 442689 [details]
Fix for the previous patch: id=441634

Description:
Change made in this commit: cc36301a7cae6737d9f8a0e53bed653a52130a1d
for the following bug introduced a missing-cache-return bug:
  id2entry_add_ext replaces an dn instance in the dn cache even if
  the DN value is identical.  Replace it only when they don't match.
Thanks to andrey.ivanov for finding it out:
https://bugzilla.redhat.com/show_bug.cgi?id=627738#c5

Comment 8 Jenny Severance 2011-06-06 19:14:39 UTC
Can you please add steps to reproduce and verify, what was actually fixed Thanks!

Comment 9 Noriko Hosoi 2011-06-06 21:48:05 UTC
Run some repeated operations on a server.

While the operations are running, run the search on the monitor and grep dncache related attributes:
# ldapsearch -x -h localhost -p <port> -D 'cn=directory manager' -w <pw> -b "cn=monitor,cn=<backend>,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 5122
dncachetries: 5124
dncachehitratio: 99
currentdncachesize: 548818
maxdncachesize: 10485760
currentdncachecount: 5124
maxdncachecount: -1

Wait for a while and run it again.
# ldapsearch -x -h localhost -p <port> -D 'cn=directory manager' -w <pw> -b "cn=monitor,cn=<backend>,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrepdncache
dncachehits: 5143
dncachetries: 5145
dncachehitratio: 99
currentdncachesize: 550971
maxdncachesize: 10485760
currentdncachecount: 5145
maxdncachecount: -1

If the counters are updated, this bug could be marked verified.

Comment 10 Amita Sharma 2011-06-16 09:53:00 UTC
Hi Noriko,

In my case,these entries did not get changed:
[root@rhel61-ds90-amita pwpolicy]# ldapsearch -x -h localhost -p 20100 -D 'cn=directory manager' -w Secret123 -b "cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 0
dncachetries: 6
dncachehitratio: 0
currentdncachesize: 696
maxdncachesize: 10485760
currentdncachecount: 6
maxdncachecount: -1
[root@rhel61-ds90-amita pwpolicy]# sleep 15
[root@rhel61-ds90-amita pwpolicy]# ldapsearch -x -h localhost -p 20100 -D 'cn=directory manager' -w Secret123 -b "cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 0
dncachetries: 6
dncachehitratio: 0
currentdncachesize: 696
maxdncachesize: 10485760
currentdncachecount: 6
maxdncachecount: -1
[root@rhel61-ds90-amita pwpolicy]# ldapsearch -x -h localhost -p 20100 -D 'cn=directory manager' -w Secret123 -b "cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 0
dncachetries: 6
dncachehitratio: 0
currentdncachesize: 696
maxdncachesize: 10485760
currentdncachecount: 6
maxdncachecount: -1

I am running this operation in parallel :
while [ 0=0 ] ; do ldapsearch -x -p 20100 -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=replsuffix,dc=com"; done

Q: do I need to execute some specific operation in parallel?

Comment 11 Noriko Hosoi 2011-06-16 16:19:27 UTC
Is the suffix "dc=replsuffix,dc=com" belongs to the backend userRoot?
I guess when you set NEEDINFO, you'd better give me the IP address of your test system.

Comment 12 Noriko Hosoi 2011-06-16 16:40:04 UTC
Is the suffix "dc=replsuffix,dc=com" belongs to the backend userRoot?
==>
Does the suffix "dc=replsuffix,dc=com" belong to the backend userRoot?

:p

Comment 13 Amita Sharma 2011-06-17 08:54:48 UTC
no, but this time I am using the same suffix to execute the operation which is having the backend userRoot :

1. Executing in parallel window:
while [ 0=0 ] ; do ldapsearch -x -p 1389 -h localhost -D "cn=Directory Manager" -w Secret123 -b "dc=pnq,dc=redhat,dc=com"; done

2. On other tty:
[root@rhel61 dirsrv]# ldapsearch -x -h localhost -p 1389 -D 'cn=directory manager' -w Secret123 -b "cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 0
dncachetries: 9
dncachehitratio: 0
currentdncachesize: 1057
maxdncachesize: 10485760
currentdncachecount: 9
maxdncachecount: -1
[1]+  Done                    389-console
[root@rhel61 dirsrv]# sleep 15
[root@rhel61 dirsrv]# ldapsearch -x -h localhost -p 1389 -D 'cn=directory manager' -w Secret123 -b "cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 0
dncachetries: 9
dncachehitratio: 0
currentdncachesize: 1057
maxdncachesize: 10485760
currentdncachecount: 9
maxdncachecount: -1
[root@rhel61 dirsrv]# 389-console &
[1] 10436
[root@rhel61 dirsrv]# ldapsearch -x -h localhost -p 1389 -D 'cn=directory manager' -w Secret123 -b "cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 0
dncachetries: 9
dncachehitratio: 0
currentdncachesize: 1057
maxdncachesize: 10485760
currentdncachecount: 9
maxdncachecount: -1
[root@rhel61 dirsrv]# 

sending info of test machine in mail.

Comment 14 Noriko Hosoi 2011-06-17 16:23:33 UTC
This is because you had just 9 entries and all of them were in the entry cache.  Entry cache is more powerful and if entries are already in the entry cache, dn cache won't be accessed.

I added one entry uid=tuser0, then it changed the dncache status:
Before>
# ldapsearch -x -h localhost -p 1389 -D 'cn=directory manager' -w Secret123 -b "cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 0
dncachetries: 9
dncachehitratio: 0
currentdncachesize: 1057
maxdncachesize: 10485760
currentdncachecount: 9
maxdncachecount: -1

After>
# ldapsearch -x -h localhost -p 1389 -D 'cn=directory manager' -w Secret123 -b "cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" "(cn=*)" | egrep dncache
dncachehits: 1
dncachetries: 10
dncachehitratio: 10
currentdncachesize: 1158
maxdncachesize: 10485760
currentdncachecount: 10
maxdncachecount: -1

Comment 15 Amita Sharma 2011-06-20 07:14:27 UTC
That makes sense, hence marking verified