Bug 572018

Summary:

Upgrading from 1.2.5 to 1.2.6.a2 deletes userRoot

Product:

[Retired] 389

Reporter:

Anthony Messina <amessina>

Component:

Install/Uninstall

Assignee:

Nathan Kinder <nkinder>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Viktor Ashirov <vashirov>

Severity:

medium

Docs Contact:

Priority:

high

Version:

1.2.6

CC:

amsharma, d.bz-redhat, jgalipea, lsc55578, msauton, nhosoi, rmeggins

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-12-07 16:40:44 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

576869

Attachments:

Description	Flags
error log from upgrade	none
error log from successful upgrade	none
Output from the dbscan commands (cfr. comments #15-#16)	none
diff output (comment 28)	none
dn2rdn output (comment 26)	none
1.2.7a3 DB upgrade log	none
Output from dbscan (cfr. comment #43)	none
Output from dbscan (cfr. comment #45)	none
script to replace attr value with "X..."	none
git patch file (master)	nkinder: review+

Description Anthony Messina 2010-03-10 01:32:37 UTC

Created attachment 398981 [details]
error log from upgrade

During the 'yum upgrade' the userRoot is deleted.  See attachment containing error logs from one of my servers.  This happened on and x86_64 master and a i686 slave.

Comment 1 Rich Megginson 2010-03-11 19:09:26 UTC

I'm using Fedora 12 x86_64.  So far I've tried upgrading from a clean 1.2.2 install and a clean 1.2.5 install.

install 389-ds-base package - downloaded the rpm from koji
setup-ds.pl - select defaults, except use dc=example,dc=com for suffix
run ldif2db to load the database
stop the server
set nsslapd-errorlog-level 16384 to enable any LDAPDebug(LDAP_DEBUG_ANY) messages
start the server

yum --enablerepo=updates-testing update 389-ds-base

Everything works fine

So I'm currently at a loss about how to reproduce this problem.

There are a few other error messages that seem odd:

[06/Mar/2010:21:50:15 -0600] - nsslapd-subtree-rename-switch is on, while the instance userRoot is in the DN format. Please run dn2rdn to convert the database format.
[06/Mar/2010:21:50:15 -0600] - nsslapd-subtree-rename-switch is on, while the instance NetscapeRoot is in the DN format. Please run dn2rdn to convert the database format.
[06/Mar/2010:21:50:15 -0600] - start: Failed to start databases, err=-1 Unknown error: -1
[06/Mar/2010:21:50:15 -0600] - Failed to start database plugin ldbm database
[06/Mar/2010:21:50:15 -0600] - WARNING: ldbm instance userRoot already exists
[06/Mar/2010:21:50:15 -0600] - WARNING: ldbm instance NetscapeRoot already exists
[06/Mar/2010:21:50:15 -0600] binder-based resource limits - nsLookThroughLimit: parameter error (slapi_reslimit_register() already registered)
[06/Mar/2010:21:50:15 -0600] - start: Resource limit registration failed
[06/Mar/2010:21:50:15 -0600] - Failed to start database plugin ldbm database

I don't know what would have caused these messages.

This is the main problem with the upgrade:
[06/Mar/2010:21:55:40 -0600] - ancestorid BAD 13110, err=-30988 DB_NOTFOUND: No matching key/data pair found
[06/Mar/2010:21:55:40 -0600] - Failed to create ancestorid index
[06/Mar/2010:21:55:40 -0600] - import userRoot: Failed to create ancestorid index

It could not find the entry by id in id2entry.  I don't know how this could have happened.

Note that the upgrade did backup your database to /var/lib/dirsrv/slapd-elburn/bak/reindex_2010_03_06_21_55_35, so if all else fails, you should be able to restore from that.  You may have to turn off the nsslapd-subtree-rename-switch first.

Is there anything else that might help me reproduce this problem?

Comment 2 Anthony Messina 2010-03-12 04:03:55 UTC

I guess I'm not sure how to answer that.  I was using the stock 1.2.5 from the Fedora repos, then used the Fedora updates-testing repo to upgrade to 1.2.6.a2.

I did not use Koji, not that it should matter.  I did see the problem on both i686 and x86_64 machines.  Perhaps this was caused by some of the data in the server that wouldn't let it go through the db upgrade process.

Either way, it happened on both.

I have one other server (i686) that I *might* be able to try this weekend.  I'll report back on that.

Comment 3 Rich Megginson 2010-03-12 04:11:34 UTC

Could have something to do with replication.  I'll try a test of replicated databases - it could have something to do with the way we handle tombstones and the interaction with ancestorid.

Comment 4 Rich Megginson 2010-03-12 17:10:20 UTC

Tried replication -
setup master to master replication on F-12 with 389-ds-base 1.2.5
deleted several leaf (user) entries on one master - verified the entries were deleted on the other master
yum --enablerepo=updates-testing update 389-ds-base
verified that no errors occurred

So I'm still at a loss as to how I can reproduce this problem

Comment 5 Anthony Messina 2010-03-13 02:41:23 UTC

Well, Rich, I'll be darned, I can't replicate it with my other (slave) server now.  However, I will tell you that the first two servers were using data that contained the "$$" in the postalAddress fields (https://bugzilla.redhat.com/show_bug.cgi?id=570905).

This last server of mine had be completely reinitialized with clean data after my first upgrade disaster.  It seems possible (from my end user perspective) that the syntax errors might have prevented the data in userRoot from being upgraded properly since the syntax rules changed.

I will attach the error log from this successful upgrade (i386) in case you can spot anything differently.

Perhaps you might try using a 1.2.5 setup with some of the "$$" stuff in several postalAddress values, then try the upgrade to 1.2.6.a2

Thanks for your attentiveness.

Comment 6 Anthony Messina 2010-03-13 02:41:56 UTC

Created attachment 399801 [details]
error log from successful upgrade

Comment 7 Rich Megginson 2010-03-15 15:28:02 UTC

Does not appear to have anything to do with syntax problems.

Installed clean 389-ds-base 1.2.5 on F-12 x86_64
Imported Example.ldif
Added several homePostalAddress values with $$ in them (NOTE: I had to turn off syntax checking in order to do this - then I turned syntax checking back on)
yum --enablerepo=updates-testing upgrade 389-ds-base

Everything worked fine - no errors.

Comment 8 Anthony Messina 2010-03-18 18:12:22 UTC

Ok, I'm not sure where to go with this either.  I'll wait until the next updates-testing release and try again.

Comment 9 Rich Megginson 2010-04-23 15:08:57 UTC

*** Bug 585067 has been marked as a duplicate of this bug. ***

Comment 10 Rich Megginson 2010-04-23 22:49:09 UTC

I've tried setting up mmr 2 master - add/delete/reap tombstones - still cannot reproduce.  One thing that is weird about both of the problem cases - the banner is printed in the error log:

[23/Apr/2010:23:05:42 -0400] - slapd stopped.
	389-Directory/1.2.6.a3 B2010.105.1818
	agcdvldbr01:3389 (/etc/dirsrv/slapd-agcdvldbr01)

[23/Apr/2010:23:05:49 -0400] - autosize_import_cache: pagesize: 4096, pages: 193761, procpages: 7690

The banner is the 389-Directory/1.2.6.a3 etc.  In my test cases, I do not see this, I just see something like the following:

[23/Apr/2010:23:05:42 -0400] - slapd stopped.
[23/Apr/2010:23:05:49 -0400] - autosize_import_cache: pagesize: 4096, pages: 193761, procpages: 7690

Are both of you using chkconfig to make 389/dirsrv started at boot time?  Are you using init or daemontools or something like that to make sure 389/dirsrv is automatically started if it ever goes down?

Another weird thing is that even though userRoot fails, NetscapeRoot succeeds.

Finally, one of the cases is from an upgrade of 1.2.2, and another from 1.2.5.  Do you know when you originally started using Fedora DS/389?  That is, did you upgrade to 1.2.2 or 1.2.5 from a version 1.2.0 or older of Fedora DS/389?

Comment 11 Rich Megginson 2010-04-27 01:48:24 UTC

To ssh://git.fedorahosted.org/git/389/ds.git
   1d7f7f5..4fa2ee8  master -> master
commit 4fa2ee84eb3dfdfd202585a59403195b408bbb8f
Author: Rich Megginson <rmeggins>
Date:   Mon Apr 26 17:26:00 2010 -0600
    Fix Description: According to the error message, the entry id cannot be
    found in the id2entry file.  The entry id comes from the parentid index,
    which has just been created by the dn2rdn upgradedb process.  The entryid
    is the key in the parentid index.  I'm not sure how this can happen -
    either the parentid contains the id of an entry that does not exist, or
    the entryid was somehow corrupted.  I've added some additional debugging
    statements to try to narrow this down.
    Platforms tested: RHEL5 x86_64
    Flag Day: no
    Doc impact: no

Comment 12 Anthony Messina 2010-06-23 20:42:58 UTC

I don't see these errors with:

389-ds-base-1.2.6-0.7.rc2.fc13.x86_64
389-admin-1.1.11-0.5.rc1.fc13.x86_64

Comment 13 Didier 2010-10-12 10:17:42 UTC

Experiencing the same issue with 1.2.6-1 :

# grep 389-ds /var/log/yum.log
Jan 28 16:42:52 Installed: 389-ds-base-1.2.4-1.el5.x86_64
Jan 28 16:42:52 Installed: 389-ds-console-1.2.0-5.el5.noarch
Jan 28 16:42:53 Installed: 389-dsgw-1.1.4-1.el5.x86_64
Jan 28 16:42:53 Installed: 389-ds-console-doc-1.2.0-5.el5.noarch
Jan 28 16:42:59 Installed: 389-ds-1.1.3-6.el5.noarch
Jan 28 16:42:59 Installed: 389-ds-base-1.2.4-1.el5.x86_64
Feb 10 04:51:47 Updated: 389-ds-base-1.2.5-1.el5.x86_64
Feb 10 04:51:54 Updated: 389-ds-base-1.2.5-1.el5.x86_64
Oct 08 12:39:41 Updated: 389-ds-base-1.2.6-1.el5.x86_64
Oct 08 12:40:00 Updated: 389-dsgw-1.1.5-1.el5.x86_64
Oct 08 12:40:02 Updated: 389-ds-console-1.2.3-1.el5.noarch
Oct 08 12:40:02 Updated: 389-ds-console-doc-1.2.3-1.el5.noarch
Oct 08 12:40:03 Updated: 389-ds-1.2.1-1.el5.noarch
Oct 08 12:40:34 Updated: 389-ds-base-1.2.6-1.el5.x86_64


* Banner is printed in the error log too :
        389-Directory/1.2.6 B2010.238.2134


* userRoot and NetscapeRoot are processed OK :

[12/Oct/2010:09:28:11 +0200] upgrade DB - userRoot: Start upgradedb.
...
[12/Oct/2010:09:28:12 +0200] - reindex userRoot: Reindexing complete.  Processed 497 entries in 1 seconds. (497.00 entries/sec)
...
[12/Oct/2010:09:28:12 +0200] upgrade DB - NetscapeRoot: Start upgradedb.
...
[12/Oct/2010:09:28:13 +0200] - reindex NetscapeRoot: Reindexing complete.  Processed 247 entries in 1 seconds. (247.00 entries/sec)


* other databases yield errors :

[12/Oct/2010:09:28:14 +0200] upgrade DB - certprx.dmbr.ugent.be-pki-ocsp-dmbr: Start upgradedb.
[12/Oct/2010:09:28:14 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[12/Oct/2010:09:28:14 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Index buffering enabled with bucket size 100
[12/Oct/2010:09:28:14 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Workers finished; cleaning up...
[12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Workers cleaned up.
[12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Cleaning up producer thread...
[12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Indexing complete.  Post-processing...
[12/Oct/2010:09:28:15 +0200] - ancestorid BAD 13110, err=-30989 DB_NOTFOUND: No matching key/data pair found
[12/Oct/2010:09:28:15 +0200] ancestorid - Error: unable to find entry id [234881024] (original [14]) in id2entry
[12/Oct/2010:09:28:15 +0200] ancestorid - Error: ldbm_parentid on node index [3] of [4]
[12/Oct/2010:09:28:15 +0200] - Failed to create ancestorid index
[12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Failed to create ancestorid index
[12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Closing files...
...
[12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Reindexing failed.
[12/Oct/2010:09:28:15 +0200] upgrade DB - upgradedb: Failed to upgrade database certprx.dmbr.ugent.be-pki-ocsp-dmbr
...
...
[12/Oct/2010:09:28:17 +0200] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Reindexing failed.
[12/Oct/2010:09:28:17 +0200] upgrade DB - upgradedb: Failed to upgrade database cert.dmbr.UGent.be-pki-ca-dmbr

Comment 14 Rich Megginson 2010-10-12 14:29:54 UTC

Can you try 389-ds-base-1.2.6.1-2 which is in epel-testing?

Comment 15 Noriko Hosoi 2010-10-12 16:38:04 UTC

Could it be possible to run these command lines against the pre-upgrade DB on the 389 v1.2.5 and share the result with us?  You could use the backed up DB, as well.

# dbscan -f /var/lib/dirsrv/slapd-YOURID/db/certprx.dmbr.ugent.be-pki-ocsp-dmbr/ancestorid.db4 -r
=1                                      
	# # # ... 
=2                                      
	... 

# dbscan -f /var/lib/dirsrv/slapd-YOURID/db/certprx.dmbr.ugent.be-pki-ocsp-dmbr/id2entry.db4 -K 4

Comment 16 Noriko Hosoi 2010-10-12 16:39:08 UTC

Oops, I meant the entry id 14 not 4.  Sorry...

# dbscan -f
/var/lib/dirsrv/slapd-YOURID/db/certprx.dmbr.ugent.be-pki-ocsp-dmbr/id2entry.db4
-K 14

Comment 17 Didier 2010-10-12 18:00:57 UTC

Created attachment 453004 [details]
Output from the dbscan commands (cfr. comments #15-#16)

Re: comment #16 :

Please find attached the requested dbscan outputs :

- these are from the most recent pre-1.2.6 backup ("2010_05_25_13_17_51") ;
- certificate bodies have been snipped for brevity.

Comment 18 Didier 2010-10-12 18:18:10 UTC

Re: comment #14 :

Upgraded to 389-ds-base-1.2.6.1-2.el5 ; identical error messages :


[12/Oct/2010:20:14:12 +0200] upgrade DB - certprx.dmbr.ugent.be-pki-ocsp-dmbr: Start upgradedb.
[12/Oct/2010:20:14:12 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[12/Oct/2010:20:14:12 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Index buffering enabled with bucket size 100
[12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Workers finished; cleaning up...
[12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Workers cleaned up.
[12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Cleaning up producer thread...
[12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Indexing complete.  Post-processing...
[12/Oct/2010:20:14:13 +0200] - ancestorid BAD 13110, err=-30989 DB_NOTFOUND: No matching key/data pair found
[12/Oct/2010:20:14:13 +0200] ancestorid - Error: unable to find entry id [234881024] (original [14]) in id2entry
[12/Oct/2010:20:14:13 +0200] ancestorid - Error: ldbm_parentid on node index [3] of [4]
[12/Oct/2010:20:14:13 +0200] - Failed to create ancestorid index
[12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Failed to create ancestorid index
[12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Closing files...

Comment 19 Didier 2010-10-18 06:42:45 UTC

Would you like me to provide more logs and/or debug outputs ?

Comment 20 Noriko Hosoi 2010-10-18 16:32:02 UTC

(In reply to comment #19)
> Would you like me to provide more logs and/or debug outputs ?

Thank you.  We are trying to reproduce the problem in house.  So far no luck...
This error means the ancestorid indexing code failed to find out the entry ID 14 in the primary db (id2entry.db4).  But as you shared the dbscan results with us, the entry does exist in it.
  ancestorid BAD 13110, err=-30989 DB_NOTFOUND: No matching key/data pair found

We are trying to figure out how the DB_NOTFOUND error could have happened.  Do you think you have any special server setups we'd better be aware of?

Comment 21 Didier 2010-10-19 07:33:54 UTC

(In reply to comment #20)

Dear Noriko,
Additional data points : 

1. The LDAP setup consists of a master and a (read-only) consumer.

2. both the LDAP and the internal (Dogtag) PKI server ran out of root filesystem space somewhere in August (yeah, I know) ; this was corrected at a later date.
If I recall correctly, 389-DS complained about an inconsistent state due to unsollicited shutdown.

However, when restoring to a known good .ldif backup (from May), I got the same behaviour as described in this BR.

3. Since the upgrade of 389-DS and Dogtag, I am experiencing a problem with Dogtag too, with its logs referring to an LDAP issue ("unknown LDAP object classes") : bugzilla #642398 .


To stress again : I am experiencing these issues after a restoration of a known good .ldif backup.

Comment 22 Noriko Hosoi 2010-10-19 17:55:12 UTC

That's an interesting input.  Could you run this command lines?  If you don't see any output (or echo $? reports 0), your DB files are healthy.
db_verify /path/to/your/backup/id2entry.db4
db_verify /path/to/your/backup/ancestorid.db4

If you get these errors from ancestorid.db4, please ignore them.
    db_verify: Page 7: out-of-order key at entry <num>

It's a known bug in the older Berkeleyb DB.
https://bugzilla.redhat.com/show_bug.cgi?id=472131

Comment 23 Didier 2010-10-19 18:33:41 UTC

db_verify (from db4 & db4-utils 4.3.29-10.el5_5.2) did not yield any output, hence the DB files should be healthy ...

Comment 24 Noriko Hosoi 2010-10-19 18:57:22 UTC

(In reply to comment #23)
> db_verify (from db4 & db4-utils 4.3.29-10.el5_5.2) did not yield any output,
> hence the DB files should be healthy ...

Thank you, Didier!  That's good to know.

BTW, I've noticed you keep quite a lot of state information -- I counted 25 of them.  They are not very useful when you have just one master.  Thus, you may want to decrease the value of nsDS5ReplicaPurgeDelay, which makes the entry size much smaller.  (I don't think that's related to this upgrade problem, though...)

http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html/Configuration_and_Command-Line_Tool_Reference/Core_Server_Configuration_Reference.html#Replication_Attributes_under_cnreplica_cnsuffixName_cnmapping_tree_cnconfig-nsDS5ReplicaPurgeDelay

Comment 25 Didier 2010-10-19 21:12:41 UTC

Thank you for the information.
Anything else I can help you with, Noriko ? I'd hate it if this turned out to be a user (= me) invoked error and I'd be just waisting precious developers' time.

If needed, I'd be glad to provide you with the raw DB data ; however, this being certificates data, of course I would need to be informed about all possible ramifications if/when making this available. :)

Comment 26 Noriko Hosoi 2010-10-19 22:32:24 UTC

Thank YOU for providing us the good information.  This problem is quite serious, so any input would be very appreciated.

If you could provide us the data which we could duplicate the problem with, that'd be fabulous.

Compared to it, this should be much easier :).  Could you share the entire output from the upgrade process?  (the errors log should store all the output.)  You gave us the snippet in the comment 13 and 18.  I'd like to examine if there could be anything suspicious logged before...

Thanks!

Comment 27 Marc Sauton 2010-10-20 00:41:15 UTC

Question for comment 25:
May be the data is not relevant to the real issue reported here, but is the suffix replicated from a clone of a Dogtag OCSP responder? (CRL is small?)

Comment 28 Noriko Hosoi 2010-10-20 18:36:17 UTC

Didier, Could you try one more thing?  Could you run this command line?
# diff -r -twU4 /etc/dirsrv/schema /etc/dirsrv/slapd-YOURID/schema

Do you see any difference in other than 99user.ldif?

Also, we are looking forward to your error log and an answer to Marc.

Comment 29 Didier 2010-10-21 13:05:10 UTC

Created attachment 454818 [details]
diff output (comment 28)

Output from diff, as requested in comment 28.

Comment 30 Didier 2010-10-21 13:15:04 UTC

Created attachment 454822 [details]
dn2rdn output (comment 26)

Output from dn2rdn, as requested in comment 26.

Comment 31 Didier 2010-10-21 13:26:08 UTC

(In reply to comment #27)

The setup is as follows :
- Master ldap server (ldap1) in intranet, with read-only replica (ldap2) in DMZ ;
- CA Dogtag server (cert) in intranet, with auxiliary OCSP/RA (certprx) in DMZ ;
- 'cert' stores its certificates in 'ldap1', which is also set up to authenticate our network ssh/cifs users (not in production yet) ;
- 'ldap1' and 'ldap2' communicate over LDAPS, using certificates signed by the 'cert' CA.

I do not know whether this answers your question, Marc ?


(as previously noted in comment #21, since the upgrade I am experiencing CSR issues too, as described in bug #642398).

Comment 32 Marc Sauton 2010-10-21 16:58:57 UTC

When I looked at the attachment from comment 17 at
https://bugzilla.redhat.com/attachment.cgi?id=453004
I was wondering why the suffix used by the Dogtag OCSP instance is replicated, with the idea to create a test case to help reproduce the issue to work with Noriko.

Comment 33 Didier 2010-10-21 17:57:33 UTC

If this can help : I have no objection in providing you with a dump of the database, if a script could be provided which obfuscates LDAP auth data, user shadow & samba passwords, and certificate private key data.

(please do not consider this a sign of mistrust ; rather, it is merely an IMHO best practices precaution).


Alternatively : would supplying a login credential with applicable permissions on the ldap1 server be of any use ?

Comment 34 Noriko Hosoi 2010-10-21 20:47:48 UTC

(In reply to comment #33)
> If this can help : I have no objection in providing you with a dump of the
> database, if a script could be provided which obfuscates LDAP auth data, user
> shadow & samba passwords, and certificate private key data.

Thanks for the offer, Didier.  I'll ask around to look for such a tool.

> Alternatively : would supplying a login credential with applicable permissions
> on the ldap1 server be of any use ?

Comment 36 Noriko Hosoi 2010-10-26 18:49:34 UTC

Didier,

Can we ask you a favour? We fixed one severe bug in the upgrade. We are about to release 389 v1.2.7 alpha3. Is it possible to run the upgrade test using the new bit in your environment?

If your system is RHEL5 equivalent, you can download the package from here.
http://koji.fedoraproject.org/koji/buildinfo?buildID=201596

If you need some other platform version, you can choose it from here.
http://koji.fedoraproject.org/koji/packageinfo?packageID=8423

Please be careful, the page contains other versions/revisions, as well. The latest alpha 3 should be located at the top of the page starting with this string: 389-ds-base-1.2.7-0.3.a3

These are the steps you could take...
1. restore the 389 v1.2.5 server and database.
(please make sure /var/lib/dirsrv/slapd-ID/db/DBVERSION and /var/lib/dirsrv/slapd-ID/*/DBVERSION do not contain "dn-4514" keyword. If it does, the database files won't be upgraded.)
2. remove 389-ds-base-1.2.5 packages
rpm -e --nodeps 389-ds-base
rpm -e --nodeps 389-ds-base-selinux-devel # if you have installed
rpm -e --nodeps 389-ds-base-debuginfo # if you have installed
rpm -e --nodeps 389-ds-base-devel # if you have installed
3. install 389-ds-base-1.2.7-0.3.a3 packages.
rpm -ivh 389-ds-base-1.2.7-0.3.a3.....
rpm -ivh 389-ds-base-selinux-devel-1.2.7-0.3.a3.....
4. make sure the directory server is not running
service dirsrv stop
5. run "setup-ds.pl -u"
...
Would you like to continue with update? [yes]: yes
...
Which update mode do you want to use? [quit]: Offline

We'd be greatly appreciate your help.

Comment 37 Didier 2010-11-02 09:56:03 UTC

(In reply to comment #36)
Apologies for missing this comment from a few days ago.

I just upgraded from 389-ds-base-1.2.6.1-2.el5 to 389-ds-base-1.2.7-0.3.a3.el5 ; please check attachment "errors.Nov2.out" for the full log.


1. I first ran dn2rdn , resulting in the same errors ("ancestorid - Error: unable to find entry id [738197504] (original [44]) in id2entry") ;

2. I then restored from a verified good backup ('bkp20100520a'), and

3a. "# grep -i rename /etc/dirsrv/slapd-dmbrldap1/dse.ldif"
     nsslapd-subtree-rename-switch: on

3b. ran "# setup-ds.pl -u".

Appearantly, this yielded a successful upgrade, although :
- there are lots of "WARNING: skipping an entry with no RDN (id {XXX})" warnings;
- "Dryrun" and "Processed 0 entries" ?

As I am not at the office today, I cannot verify the functionality of the databases until tomorrow.

Comment 38 Didier 2010-11-02 09:58:23 UTC

Created attachment 457139 [details]
1.2.7a3 DB upgrade log

Logfile, showing results of dn2rdn / DB restore / setup-ds.pl -u

Comment 39 Noriko Hosoi 2010-11-02 16:25:37 UTC

Thank you for testing v1.2.7.a3, Didier.

Unfortunately, the bug is still there.  The new version does restore better when the upgradednformat/reindex went wrong.  That part was "fixed", but not the real cause of the failure.

[02/Nov/2010:10:21:34 +0100] upgrade DB - cert.dmbr.UGent.be-pki-ca-dmbr: Start upgradedb.
[02/Nov/2010:10:21:34 +0100] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[02/Nov/2010:10:21:34 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Index buffering enabled with bucket size 100
[02/Nov/2010:10:21:35 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Workers finished; cleaning up...
[02/Nov/2010:10:21:35 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Workers cleaned up.
[02/Nov/2010:10:21:35 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Cleaning up producer thread...
[02/Nov/2010:10:21:35 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Indexing complete.  Post-processing...
[02/Nov/2010:10:21:35 +0100] - ancestorid BAD 13110, err=-30989 DB_NOTFOUND: No matching key/data pair found
[02/Nov/2010:10:21:36 +0100] ancestorid - Error: unable to find entry id [738197504] (original [44]) in id2entry
[02/Nov/2010:10:21:36 +0100] ancestorid - Error: ldbm_parentid on node index [11] of [12]
[02/Nov/2010:10:21:36 +0100] - Failed to create ancestorid index
[02/Nov/2010:10:21:36 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Failed to create ancestorid index
[02/Nov/2010:10:21:36 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Closing files...

I searched a tool to obfuscate data in the team for no luck...  For us, the contents of the db should not matter.  So, I think I'm going to come up with some simple tool to do it.  Thank you!!

Comment 40 Didier 2010-11-02 17:10:50 UTC

Noriko,

IMO, the error you quoted in comment #39, is part of the "dn2rdn" attempt.

To reiterate :

10:21:29 : "dn2rdn" (fail)
10:32:23 : "bak2db" (restore from backup, succeeded)
10:35:29 : "setup-ds.pl -u" (succeeded)

Apologies if this is off-topic.

Comment 41 Noriko Hosoi 2010-11-02 17:45:42 UTC

(In reply to comment #40)
> Noriko,
> 
> IMO, the error you quoted in comment #39, is part of the "dn2rdn" attempt.
> 
> To reiterate :
> 
> 10:21:29 : "dn2rdn" (fail)
> 10:32:23 : "bak2db" (restore from backup, succeeded)
> 10:35:29 : "setup-ds.pl -u" (succeeded)

Thank you for pointing it out.  Actually, dn2rdn is called in "setup-ds.pl -u" via 90subtreerename.pl and it's supposed to call after upgradednformat (80upgradednformat.pl).
  /usr/share/dirsrv/updates
  80upgradednformat.pl
  90subtreerename.pl

Could it be possible to rerun the upgrade test just with "setup-ds.pl -u" without running "dn2rdn" manually?

Comment 42 Didier 2010-11-02 18:38:12 UTC

(In reply to comment #41)
 
> Could it be possible to rerun the upgrade test just with "setup-ds.pl -u"
> without running "dn2rdn" manually?

Maybe I'm just being a bit thick here, but isn't this what I did, by restoring from backup (10:32:23) before upgrading (10:35:29) ?

Comment 43 Noriko Hosoi 2010-11-02 20:33:53 UTC

I guess I misunderstood the steps.  I thought restore is from the ancestorid error...

May I ask you another favour?  What does this command line return?
dbscan -f /path/to/cert.dmbr.UGent.be-pki-ca-dmbr/id2entry.db4 | egrep dn:

I'm curious about the deleted entries.

Comment 44 Didier 2010-11-02 21:07:53 UTC

Created attachment 457289 [details]
Output from dbscan (cfr. comment #43)

(In reply to comment #43)

Requested dbscan output (from *after* the 'setup-ds.pl -u' command).

Comment 45 Noriko Hosoi 2010-11-02 21:27:55 UTC

Indeed this entry (entryid 44) does not exist.
cn=CN-DMBR Certificate Authority_O-DMBR Security Domain,dc=certprx.dmbr.ug...

To compare, please run the same command line to the original back up.
Thank you!

Comment 46 Didier 2010-11-02 21:54:19 UTC

Created attachment 457295 [details]
Output from dbscan (cfr. comment #45)

(In reply to comment #45)

Requested dbscan output (from *after* restoration of original backup, *before* the 1.2.7a3 'setup-ds.pl -u' command).

Comment 47 Noriko Hosoi 2010-11-02 22:41:58 UTC

Oops. My fault.  The backend in Comment 17 is certprx.dmbr.ugent.be-pki-ocsp-dmbr, and the current one is cert.dmbr.UGent.be-pki-ca-dmbr.  No wonder "cn=CN-DMBR Certificate Authority_O-DMBR Security Domain" does not exist. :p

Interestingly, this is only a difference between dbscan_dn.out and dbscan_dn-orig.out.
$ diff -twU2 dbscan_dn.out dbscan_dn-orig.out 
--- dbscan_dn.out	2010-11-02 14:18:32.000000000 -0700
+++ dbscan_dn-orig.out	2010-11-02 15:21:36.000000000 -0700
@@ -177,4 +177,3 @@
         entrydn: cn=21,ou=certificaterepository,ou=ca,dc=cert.dmbr.ugent.be-pki-ca-dmb
         dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff, dc=cert.dmbr.ugent.be-pki-
-        rdn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff

Comment 48 Noriko Hosoi 2010-11-02 23:26:30 UTC

Created attachment 457305 [details]
script to replace attr value with "X..."

Hi Didier,

Attached is a very simple perl script which replaces the specified attributes' value (ascii chars) with the X's.  It's a quick and dirty script.  Please try it and examine the result if it's okay to send it to us or not.  Please feel free to modify the script, as well.

How to run the script:
1) export the backend db.
/usr/lib[64]/dirsrv/slapd-ID/db2ldif -n certprx.dmbr.ugent.be-pki-ocsp-dmbr -r -U -a /tmp/ocsp.ldif
/usr/lib[64]/dirsrv/slapd-ID/db2ldif -n cert.dmbr.UGent.be-pki-ca-dmbr -r -U -a /tmp/ca.ldif

2) replace the sensitive attribute values.  Note: please add attributes you want to replace the value.
perl /path/to/replace.pl /tmp/ocsp.ldif userPassword userCertificate > /tmp/ocsp-replaced.ldif
perl /path/to/replace.pl /tmp/ca.ldif userPassword userCertificate > /tmp/ca-replaced.ldif

If you prefer, you could directly send the ldif files to me.  My email address is "nhosoi".

Comment 49 Didier 2010-11-04 11:35:34 UTC

(In reply to comment #48)

> If you prefer, you could directly send the ldif files to me.  My email address
> is "nhosoi".

Mail sent, subject "Bugzilla #572018 ldif data".

Comment 50 Noriko Hosoi 2010-11-08 19:35:58 UTC

Created attachment 458863 [details]
git patch file (master)

Description: The upgrade tool dn2rdn shares the reindex code, in
which a cursor is set to each entry in id2entry.db# converting
dn to rdn and put it into the btree with db->put one by one.
It turned out the strategy does not work for the case the btree
structure is modified.  In that case, the btree could get corrupted.
This patch creates a new id2entry_tmp.db# from the scratch.  Once
all the entries are successfully converted, the old db is removed
and id2entry_tmp.db# is renamed to id2entry.db#.

Files:
 ldap/servers/slapd/back-ldbm/back-ldbm.h
 ldap/servers/slapd/back-ldbm/dblayer.c
 ldap/servers/slapd/back-ldbm/import-threads.c
 ldap/servers/slapd/back-ldbm/import.c
 ldap/servers/slapd/back-ldbm/proto-back-ldbm.h

Comment 52 Noriko Hosoi 2010-11-08 20:47:28 UTC

Reviewed by Nathan (Thank you!!!)

Pushed to master.

$ git merge 572018
Updating edeb742..becb87f
Fast-forward
 ldap/servers/slapd/back-ldbm/back-ldbm.h       |    3 +
 ldap/servers/slapd/back-ldbm/dblayer.c         |  163 ++++++++++++++++--------
 ldap/servers/slapd/back-ldbm/import-threads.c  |   76 ++++++++++-
 ldap/servers/slapd/back-ldbm/import.c          |    2 +-
 ldap/servers/slapd/back-ldbm/proto-back-ldbm.h |    3 +-
 5 files changed, 185 insertions(+), 62 deletions(-)

$ git push
Counting objects: 20, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (11/11), 10.40 KiB, done.
Total 11 (delta 8), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   edeb742..becb87f  master -> master

Comment 53 Didier 2010-11-08 22:49:23 UTC

Any chance of testing the patch through a rawhide RPM, or is this still premature ?

Comment 54 Rich Megginson 2010-11-08 22:54:32 UTC

(In reply to comment #53)
> Any chance of testing the patch through a rawhide RPM, or is this still
> premature ?

Yes.  We're working on a 1.2.7.a5 release.

Comment 55 Rich Megginson 2010-11-09 18:15:21 UTC

(In reply to comment #53)
> Any chance of testing the patch through a rawhide RPM, or is this still
> premature ?

389-ds-base-1.2.7.a5 is now built in koji.  It should be available in rawhide very soon (not sure when the push happens).  I've released this as an update to Testing on all Fedora platforms.  If you want the rpms directly, go to koji:
http://koji.fedoraproject.org/koji/packageinfo?packageID=8423

Comment 56 Didier 2010-11-10 10:51:56 UTC

Tested with 389-ds-base-1.2.7-0.6.a5.el5.x86_64.

The issue (started in comment #13) appears to be fixed : dn2rdn works, and dirsrv starts with 'nsslapd-subtree-rename-switch: on' in dse.ldif.

Thanks to all !

Comment 57 Rich Megginson 2010-11-10 14:20:35 UTC

(In reply to comment #56)
> Tested with 389-ds-base-1.2.7-0.6.a5.el5.x86_64.
> 
> The issue (started in comment #13) appears to be fixed : dn2rdn works, and
> dirsrv starts with 'nsslapd-subtree-rename-switch: on' in dse.ldif.
> 
> Thanks to all !

Thank you for patience and assistance!

Comment 58 Noriko Hosoi 2010-11-10 16:41:01 UTC

(In reply to comment #57)
> (In reply to comment #56)
> > Tested with 389-ds-base-1.2.7-0.6.a5.el5.x86_64.
> > 
> > The issue (started in comment #13) appears to be fixed : dn2rdn works, and
> > dirsrv starts with 'nsslapd-subtree-rename-switch: on' in dse.ldif.
> > 
> > Thanks to all !
> 
> Thank you for patience and assistance!

Didier, without your help, we could not find out the cause of this bug.  Thank you very much for your support!

Comment 59 Didier 2010-11-10 19:17:38 UTC

My pleasure.
I really enjoy that warm, cosy feeling of being a (very small) part of the OSS community, of which Red Hat truly is an exemplary citizen.

Comment 60 Amita Sharma 2011-05-25 11:02:33 UTC

Hi Noriko,

I have seen the steps at comment#51, my question is "is there a way I can test it with latest version of 389-ds?"

Thanks,
Amita

Comment 63 Rich Megginson 2011-08-31 17:45:06 UTC

Since upgrade works from 8.2 to 9.0, can we mark this bug as VERIFIED?

Comment 64 Amita Sharma 2011-08-31 17:49:56 UTC

yes.