737811 – tombstone entries are not deleted completely

Bug 737811 - tombstone entries are not deleted completely

Summary: tombstone entries are not deleted completely

Keywords:
Status:	CLOSED DUPLICATE of bug 736431
Alias:	None
Product:	389
Classification:	Retired
Component:	Replication - General
Sub Component:
Version:	1.2.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Noriko Hosoi
QA Contact:	Chandrasekar Kannan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	690319
TreeView+	depends on / blocked

Reported:	2011-09-13 07:29 UTC by Toshimichi Aoki
Modified:	2015-01-04 23:51 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-12-21 00:11:29 UTC
Embargoed:

Attachments	(Terms of Use)
The file is LDIF which construct nest structure entries. (879 bytes, application/octet-stream) 2012-03-07 05:24 UTC, Mitsuru Tabata	no flags	Details
View All

Description Toshimichi Aoki 2011-09-13 07:29:50 UTC

Environment is below:
OS: RHEL 5.x
389DS: 389-ds-base-1.2.8.3-1.e15
Replication: Supplier*3, Consumer*3

We can't remove detritus of tombstones under the system environment above.

I first thought this phenomenon was related to Bug 696407 or Bug 684996,
but it doesn't seem to be right.

After I erased the entry like below, error messages are logged during
purge process.

cn=111111100,ou=group,o=example,c=jp
cn=222222200,cn=111111100,ou=group,o=example,c=jp



The error messages are:

[errors]
[13/Sep/2011:10:34:22 +0900] str2entry - Failed to convert DN cn=222222200 to RDN
[13/Sep/2011:10:34:22 +0900] id2entry - str2entry returned NULL for id 99152, string="rdn"



Here is a dbscan:

[dbscan]
# dbscan -f /var/lib/dirsrv/slapd-wam-ldap01/db/example/id2entry.db4 -K
99152
id 99152
        rdn: nsuniqueid=4284f902-dd3911e0-883998b7-9c671d85,cn=222222200
        objectClass;vucsn-4e6df8ab0001044c0000: top
        objectClass;vucsn-4e6df8ab0001044c0000: groupOfUniqueNames
        objectClass;vucsn-4e6df8d10000044c0000: nsTombstone
        ou;vucsn-4e6df8ab0001044c0000: groups
        cn;vucsn-4e6df8ab0001044c0000;mdcsn-4e6df8ab0001044c0000: 222222200
        creatorsName;vucsn-4e6df8ab0001044c0000: cn=admin
        modifiersName;vucsn-4e6df8ab0001044c0000: cn=admin
        createTimestamp;vucsn-4e6df8ab0001044c0000: 20110912121849Z
        modifyTimestamp;vucsn-4e6df8ab0001044c0000: 20110912121849Z
        nsUniqueId: 4284f902-dd3911e0-883998b7-9c671d85
        parentid: 99151
        entryid: 99152
        nsParentUniqueId: 4284f901-dd3911e0-883998b7-9c671d85
        nscpEntryDN: cn=222222200,cn=111111100,ou=group,o=example,c=jp

# dbscan -f /var/lib/dirsrv/slapd-wam-ldap01/db/example/id2entry.db4 -K 99151
Can't set cursor to returned item: DB_NOTFOUND: No matching key/data pair found

Thanks for any help in advance.

Comment 3 Noriko Hosoi 2011-12-13 00:07:52 UTC

Aoki-san,

I'm trying to reproduce the problem.  So for, no luck.  Could you share the steps to duplicate the bug?

What's the values of nsds5ReplicaPurgeDelay and nsds5ReplicaTombstonePurgeInterval in your replica entry?

You had a subtree like this:
  ou=group,o=example,c=jp
  cn=111111100,ou=group,o=example,c=jp
  cn=222222200,cn=111111100,ou=group,o=example,c=jp
How did you delete the subtree?

Does the problem occur every time you repeat the test?  Or does it depend upon the timing you delete the subtree?

Comment 4 Toshimichi Aoki 2011-12-14 11:12:29 UTC

Thank you for the response.

> I'm trying to reproduce the problem.  So for, no luck.  Could you share the
> steps to duplicate the bug?
> 
> What's the values of nsds5ReplicaPurgeDelay and
> nsds5ReplicaTombstonePurgeInterval in your replica entry?

The default value is used.
nsds5ReplicaPurgeDelay            : 604800
nsDS5ReplicaTombstonePurgeInterval:  86400

> You had a subtree like this:
>   ou=group,o=example,c=jp
>   cn=111111100,ou=group,o=example,c=jp
>   cn=222222200,cn=111111100,ou=group,o=example,c=jp
> How did you delete the subtree?

The problem occured when low-order entry was deleted sequentially(1->2).
1. ldapdelete -x -D cn=root -W cn=222222200,cn=111111100,ou=group,o=osakagas,c=jp
2. ldapdelete -x -D cn=root -W cn=111111100,ou=group,o=osakagas,c=jp

> Does the problem occur every time you repeat the test?  Or does it depend upon
> the timing you delete the subtree?

I retried this case.
After restarting dirsrv or executing db2ldif, this problem always occurred. 
(This problem may not occur in the environment that after starting dirsrv 
 and above deleting operation(1,2), dirsrv never restart???)
Please try restart dirsrv.

Thanks for any help in advance.

Comment 5 Noriko Hosoi 2011-12-21 00:11:29 UTC

Thank you for the steps to reproduce the problem.  I could make it happen and it looks it's a duplicate of this bug.

Bug 736431 - parent tombstone entry could be reaped even if its child
             tombstone entries still exist

Please take a look at the bug as well as this trac ticket.

https://fedorahosted.org/389/ticket/2

*** This bug has been marked as a duplicate of bug 736431 ***

Comment 6 Noriko Hosoi 2012-01-03 22:28:23 UTC

[trac] https://fedorahosted.org/389/ticket/2

Comment 7 Mitsuru Tabata 2012-03-07 05:24:46 UTC

Created attachment 568133 [details]
The file is LDIF which construct nest structure entries.

Comment 8 Mitsuru Tabata 2012-03-07 05:28:41 UTC

I tested based on the "Trac-Ticket2-test-scenario.txt" using the source code downloaded to 2012/2/17.

* test scenario
https://fedorahosted.org/389/attachment/ticket/2/Trac-Ticket2-test-scenario.txt

* download command
git clone http://git.fedorahosted.org/git/389/ds.git
 -> 389ds version is "1.2.11.a1.gitf7b882a"

* Deleted entry which constructs nest structure in this test
(Please refer to an attached file #568133 'import.ldif')
# ldapdelete -x -D xxxxxxxx -W "uid=000000001,ou=people,o=example,c=jp" -r

As a result of the test, following errors were not recorded on a log file.
  "_entry_set_tombstone_rdn - Failed to convert DN uid=... to RDN"
  "id2entry - str2entry returned NULL for id 30, string="rdn""

But deleted entries seem to be remaining in database.
# dbscan -f /var/lib/dirsrv/slapd-xxxxxxxx/db/example/id2entry.db4 -K XX
  id XX
    rdn: nsuniqueid=a7c47f04-610c11e1-839ef58b-164ce082,uid=000000001

The setup of my environment is as follows. 
  "nsds5ReplicaPurgeDelay:60"
  "nsds5ReplicaTombstonePurgeInterval: 60"

When is this entry deleted from database?


And, I found a new problem which is different from the above.

If an entry is deleted in the case where "referential integrity postoperation" plug-in is set, process terminated unexpectedly. (In case where this plug-in is not set, process is not terminated.)
Test operation is as follows.

1. Set "referential integrity postoperation" on
  cn: referential integrity postoperation
  nsslapd-pluginEnabled: on

2. Delete entry referred to by other entries.
(Please refer to an attached file #568133 'import.ldif') 
# ldapdelete -x -D xxxxxxxx -W "uid=000000001,ou=people,o=example,c=jp" -r

3. Check "dirsrv" service
# service dirsrv status
dirsrv "instance" dead but pid file exists


Thanks for any help in advance.

Comment 9 Noriko Hosoi 2012-03-07 17:39:45 UTC

Tabata-san,

> When is this entry deleted from database?

At the next time any update operation is made.  So, you may need to do some modification operation such as add, modify, delete.

> 3. Check "dirsrv" service
> # service dirsrv status
> dirsrv "instance" dead but pid file exists

You mean your server crashed?  Do you have any core?  Or what happens if you attach gdb to ns-slapd and delete the entries?

Thanks!

Comment 10 Noriko Hosoi 2012-03-07 19:33:40 UTC

> 3. Check "dirsrv" service

I could not reproduce the problem.
$ service dirsrv status
dirsrv s1 (pid 3450) is running...
dirsrv s2 (pid 4013) is running...

$ ldapsearch -h localhost -p <port1> -LLLx -b "o=my.com" -D 'cn=directory manager' -w <pw> dn
dn: o=example,c=jp
dn: ou=people,o=example,c=jp
dn: ou=group,o=example,c=jp
dn: cn=000000003,ou=group,o=example,c=jp

$ ldapsearch -h localhost -p <port2> -LLLx -b "o=my.com" -D 'cn=directory manager' -w Secret123 dn
dn: o=example,c=jp
dn: ou=people,o=example,c=jp
dn: ou=group,o=example,c=jp
dn: cn=000000003,ou=group,o=example,c=jp

Also, could you try the same steps against 389-ds-base 1.2.10-3?  The version should be more stable.  The build from the git master could contain the unexpected bugs.

Comment 11 Mitsuru Tabata 2012-03-14 06:54:19 UTC

Thank you for the response.

> At the next time any update operation is made.  So, you may need to do some
> modification operation such as add, modify, delete.

By deleting other entries, I checked that the following entries were deleted.

> Also, could you try the same steps against 389-ds-base 1.2.10-3?  The version
> should be more stable.  The build from the git master could contain the
> unexpected bugs.

I tried the same steps against 389-ds-base 1.2.10-3. As a result, process is not terminated in the case where "referential integrity postoperation" plug-in is set.

Thanks for any help in advance.

Comment 12 Noriko Hosoi 2012-03-14 17:20:35 UTC

(In reply to comment #11)
> I tried the same steps against 389-ds-base 1.2.10-3. As a result, process is
> not terminated in the case where "referential integrity postoperation" plug-in
> is set.

When you write "process is not terminated", does that mean "if you tried to shutdown the server, the server hung"?  Or some other symptom?

If your server is hanging, could you attach your server's stacktrace to this bug?

This is the howto for the crash case.  If hanging, you need to attach gdb to the server directly to get the stacktraces.
http://port389.org/wiki/FAQ#Debugging_Crashes

Comment 13 Mitsuru Tabata 2012-04-02 06:47:53 UTC

Thank you for the response.

> When you write "process is not terminated", does that mean "if you tried to
> shutdown the server, the server hung"?  Or some other symptom?

It doesn't mean "the server hung", but the result of test against 389-ds-base 1.2.10-3.

Thanks for any help in advance.

Note You need to log in before you can comment on or make changes to this bug.