Bug 572018
Summary: | Upgrading from 1.2.5 to 1.2.6.a2 deletes userRoot | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] 389 | Reporter: | Anthony Messina <amessina> | ||||||||||||||||||||||
Component: | Install/Uninstall | Assignee: | Nathan Kinder <nkinder> | ||||||||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Viktor Ashirov <vashirov> | ||||||||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||||||||
Priority: | high | ||||||||||||||||||||||||
Version: | 1.2.6 | CC: | amsharma, d.bz-redhat, jgalipea, lsc55578, msauton, nhosoi, rmeggins | ||||||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||
Last Closed: | 2015-12-07 16:40:44 UTC | Type: | --- | ||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||
Bug Blocks: | 576869 | ||||||||||||||||||||||||
Attachments: |
|
I'm using Fedora 12 x86_64. So far I've tried upgrading from a clean 1.2.2 install and a clean 1.2.5 install. install 389-ds-base package - downloaded the rpm from koji setup-ds.pl - select defaults, except use dc=example,dc=com for suffix run ldif2db to load the database stop the server set nsslapd-errorlog-level 16384 to enable any LDAPDebug(LDAP_DEBUG_ANY) messages start the server yum --enablerepo=updates-testing update 389-ds-base Everything works fine So I'm currently at a loss about how to reproduce this problem. There are a few other error messages that seem odd: [06/Mar/2010:21:50:15 -0600] - nsslapd-subtree-rename-switch is on, while the instance userRoot is in the DN format. Please run dn2rdn to convert the database format. [06/Mar/2010:21:50:15 -0600] - nsslapd-subtree-rename-switch is on, while the instance NetscapeRoot is in the DN format. Please run dn2rdn to convert the database format. [06/Mar/2010:21:50:15 -0600] - start: Failed to start databases, err=-1 Unknown error: -1 [06/Mar/2010:21:50:15 -0600] - Failed to start database plugin ldbm database [06/Mar/2010:21:50:15 -0600] - WARNING: ldbm instance userRoot already exists [06/Mar/2010:21:50:15 -0600] - WARNING: ldbm instance NetscapeRoot already exists [06/Mar/2010:21:50:15 -0600] binder-based resource limits - nsLookThroughLimit: parameter error (slapi_reslimit_register() already registered) [06/Mar/2010:21:50:15 -0600] - start: Resource limit registration failed [06/Mar/2010:21:50:15 -0600] - Failed to start database plugin ldbm database I don't know what would have caused these messages. This is the main problem with the upgrade: [06/Mar/2010:21:55:40 -0600] - ancestorid BAD 13110, err=-30988 DB_NOTFOUND: No matching key/data pair found [06/Mar/2010:21:55:40 -0600] - Failed to create ancestorid index [06/Mar/2010:21:55:40 -0600] - import userRoot: Failed to create ancestorid index It could not find the entry by id in id2entry. I don't know how this could have happened. Note that the upgrade did backup your database to /var/lib/dirsrv/slapd-elburn/bak/reindex_2010_03_06_21_55_35, so if all else fails, you should be able to restore from that. You may have to turn off the nsslapd-subtree-rename-switch first. Is there anything else that might help me reproduce this problem? I guess I'm not sure how to answer that. I was using the stock 1.2.5 from the Fedora repos, then used the Fedora updates-testing repo to upgrade to 1.2.6.a2. I did not use Koji, not that it should matter. I did see the problem on both i686 and x86_64 machines. Perhaps this was caused by some of the data in the server that wouldn't let it go through the db upgrade process. Either way, it happened on both. I have one other server (i686) that I *might* be able to try this weekend. I'll report back on that. Could have something to do with replication. I'll try a test of replicated databases - it could have something to do with the way we handle tombstones and the interaction with ancestorid. Tried replication - setup master to master replication on F-12 with 389-ds-base 1.2.5 deleted several leaf (user) entries on one master - verified the entries were deleted on the other master yum --enablerepo=updates-testing update 389-ds-base verified that no errors occurred So I'm still at a loss as to how I can reproduce this problem Well, Rich, I'll be darned, I can't replicate it with my other (slave) server now. However, I will tell you that the first two servers were using data that contained the "$$" in the postalAddress fields (https://bugzilla.redhat.com/show_bug.cgi?id=570905). This last server of mine had be completely reinitialized with clean data after my first upgrade disaster. It seems possible (from my end user perspective) that the syntax errors might have prevented the data in userRoot from being upgraded properly since the syntax rules changed. I will attach the error log from this successful upgrade (i386) in case you can spot anything differently. Perhaps you might try using a 1.2.5 setup with some of the "$$" stuff in several postalAddress values, then try the upgrade to 1.2.6.a2 Thanks for your attentiveness. Created attachment 399801 [details]
error log from successful upgrade
Does not appear to have anything to do with syntax problems. Installed clean 389-ds-base 1.2.5 on F-12 x86_64 Imported Example.ldif Added several homePostalAddress values with $$ in them (NOTE: I had to turn off syntax checking in order to do this - then I turned syntax checking back on) yum --enablerepo=updates-testing upgrade 389-ds-base Everything worked fine - no errors. Ok, I'm not sure where to go with this either. I'll wait until the next updates-testing release and try again. *** Bug 585067 has been marked as a duplicate of this bug. *** I've tried setting up mmr 2 master - add/delete/reap tombstones - still cannot reproduce. One thing that is weird about both of the problem cases - the banner is printed in the error log: [23/Apr/2010:23:05:42 -0400] - slapd stopped. 389-Directory/1.2.6.a3 B2010.105.1818 agcdvldbr01:3389 (/etc/dirsrv/slapd-agcdvldbr01) [23/Apr/2010:23:05:49 -0400] - autosize_import_cache: pagesize: 4096, pages: 193761, procpages: 7690 The banner is the 389-Directory/1.2.6.a3 etc. In my test cases, I do not see this, I just see something like the following: [23/Apr/2010:23:05:42 -0400] - slapd stopped. [23/Apr/2010:23:05:49 -0400] - autosize_import_cache: pagesize: 4096, pages: 193761, procpages: 7690 Are both of you using chkconfig to make 389/dirsrv started at boot time? Are you using init or daemontools or something like that to make sure 389/dirsrv is automatically started if it ever goes down? Another weird thing is that even though userRoot fails, NetscapeRoot succeeds. Finally, one of the cases is from an upgrade of 1.2.2, and another from 1.2.5. Do you know when you originally started using Fedora DS/389? That is, did you upgrade to 1.2.2 or 1.2.5 from a version 1.2.0 or older of Fedora DS/389? To ssh://git.fedorahosted.org/git/389/ds.git 1d7f7f5..4fa2ee8 master -> master commit 4fa2ee84eb3dfdfd202585a59403195b408bbb8f Author: Rich Megginson <rmeggins> Date: Mon Apr 26 17:26:00 2010 -0600 Fix Description: According to the error message, the entry id cannot be found in the id2entry file. The entry id comes from the parentid index, which has just been created by the dn2rdn upgradedb process. The entryid is the key in the parentid index. I'm not sure how this can happen - either the parentid contains the id of an entry that does not exist, or the entryid was somehow corrupted. I've added some additional debugging statements to try to narrow this down. Platforms tested: RHEL5 x86_64 Flag Day: no Doc impact: no I don't see these errors with: 389-ds-base-1.2.6-0.7.rc2.fc13.x86_64 389-admin-1.1.11-0.5.rc1.fc13.x86_64 Experiencing the same issue with 1.2.6-1 : # grep 389-ds /var/log/yum.log Jan 28 16:42:52 Installed: 389-ds-base-1.2.4-1.el5.x86_64 Jan 28 16:42:52 Installed: 389-ds-console-1.2.0-5.el5.noarch Jan 28 16:42:53 Installed: 389-dsgw-1.1.4-1.el5.x86_64 Jan 28 16:42:53 Installed: 389-ds-console-doc-1.2.0-5.el5.noarch Jan 28 16:42:59 Installed: 389-ds-1.1.3-6.el5.noarch Jan 28 16:42:59 Installed: 389-ds-base-1.2.4-1.el5.x86_64 Feb 10 04:51:47 Updated: 389-ds-base-1.2.5-1.el5.x86_64 Feb 10 04:51:54 Updated: 389-ds-base-1.2.5-1.el5.x86_64 Oct 08 12:39:41 Updated: 389-ds-base-1.2.6-1.el5.x86_64 Oct 08 12:40:00 Updated: 389-dsgw-1.1.5-1.el5.x86_64 Oct 08 12:40:02 Updated: 389-ds-console-1.2.3-1.el5.noarch Oct 08 12:40:02 Updated: 389-ds-console-doc-1.2.3-1.el5.noarch Oct 08 12:40:03 Updated: 389-ds-1.2.1-1.el5.noarch Oct 08 12:40:34 Updated: 389-ds-base-1.2.6-1.el5.x86_64 * Banner is printed in the error log too : 389-Directory/1.2.6 B2010.238.2134 * userRoot and NetscapeRoot are processed OK : [12/Oct/2010:09:28:11 +0200] upgrade DB - userRoot: Start upgradedb. ... [12/Oct/2010:09:28:12 +0200] - reindex userRoot: Reindexing complete. Processed 497 entries in 1 seconds. (497.00 entries/sec) ... [12/Oct/2010:09:28:12 +0200] upgrade DB - NetscapeRoot: Start upgradedb. ... [12/Oct/2010:09:28:13 +0200] - reindex NetscapeRoot: Reindexing complete. Processed 247 entries in 1 seconds. (247.00 entries/sec) * other databases yield errors : [12/Oct/2010:09:28:14 +0200] upgrade DB - certprx.dmbr.ugent.be-pki-ocsp-dmbr: Start upgradedb. [12/Oct/2010:09:28:14 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [12/Oct/2010:09:28:14 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Index buffering enabled with bucket size 100 [12/Oct/2010:09:28:14 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Workers finished; cleaning up... [12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Workers cleaned up. [12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Cleaning up producer thread... [12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Indexing complete. Post-processing... [12/Oct/2010:09:28:15 +0200] - ancestorid BAD 13110, err=-30989 DB_NOTFOUND: No matching key/data pair found [12/Oct/2010:09:28:15 +0200] ancestorid - Error: unable to find entry id [234881024] (original [14]) in id2entry [12/Oct/2010:09:28:15 +0200] ancestorid - Error: ldbm_parentid on node index [3] of [4] [12/Oct/2010:09:28:15 +0200] - Failed to create ancestorid index [12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Failed to create ancestorid index [12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Closing files... ... [12/Oct/2010:09:28:15 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Reindexing failed. [12/Oct/2010:09:28:15 +0200] upgrade DB - upgradedb: Failed to upgrade database certprx.dmbr.ugent.be-pki-ocsp-dmbr ... ... [12/Oct/2010:09:28:17 +0200] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Reindexing failed. [12/Oct/2010:09:28:17 +0200] upgrade DB - upgradedb: Failed to upgrade database cert.dmbr.UGent.be-pki-ca-dmbr Can you try 389-ds-base-1.2.6.1-2 which is in epel-testing? Could it be possible to run these command lines against the pre-upgrade DB on the 389 v1.2.5 and share the result with us? You could use the backed up DB, as well. # dbscan -f /var/lib/dirsrv/slapd-YOURID/db/certprx.dmbr.ugent.be-pki-ocsp-dmbr/ancestorid.db4 -r =1 # # # ... =2 ... # dbscan -f /var/lib/dirsrv/slapd-YOURID/db/certprx.dmbr.ugent.be-pki-ocsp-dmbr/id2entry.db4 -K 4 Oops, I meant the entry id 14 not 4. Sorry... # dbscan -f /var/lib/dirsrv/slapd-YOURID/db/certprx.dmbr.ugent.be-pki-ocsp-dmbr/id2entry.db4 -K 14 Created attachment 453004 [details] Output from the dbscan commands (cfr. comments #15-#16) Re: comment #16 : Please find attached the requested dbscan outputs : - these are from the most recent pre-1.2.6 backup ("2010_05_25_13_17_51") ; - certificate bodies have been snipped for brevity. Re: comment #14 : Upgraded to 389-ds-base-1.2.6.1-2.el5 ; identical error messages : [12/Oct/2010:20:14:12 +0200] upgrade DB - certprx.dmbr.ugent.be-pki-ocsp-dmbr: Start upgradedb. [12/Oct/2010:20:14:12 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [12/Oct/2010:20:14:12 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Index buffering enabled with bucket size 100 [12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Workers finished; cleaning up... [12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Workers cleaned up. [12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Cleaning up producer thread... [12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Indexing complete. Post-processing... [12/Oct/2010:20:14:13 +0200] - ancestorid BAD 13110, err=-30989 DB_NOTFOUND: No matching key/data pair found [12/Oct/2010:20:14:13 +0200] ancestorid - Error: unable to find entry id [234881024] (original [14]) in id2entry [12/Oct/2010:20:14:13 +0200] ancestorid - Error: ldbm_parentid on node index [3] of [4] [12/Oct/2010:20:14:13 +0200] - Failed to create ancestorid index [12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Failed to create ancestorid index [12/Oct/2010:20:14:13 +0200] - reindex certprx.dmbr.ugent.be-pki-ocsp-dmbr: Closing files... Would you like me to provide more logs and/or debug outputs ? (In reply to comment #19) > Would you like me to provide more logs and/or debug outputs ? Thank you. We are trying to reproduce the problem in house. So far no luck... This error means the ancestorid indexing code failed to find out the entry ID 14 in the primary db (id2entry.db4). But as you shared the dbscan results with us, the entry does exist in it. ancestorid BAD 13110, err=-30989 DB_NOTFOUND: No matching key/data pair found We are trying to figure out how the DB_NOTFOUND error could have happened. Do you think you have any special server setups we'd better be aware of? (In reply to comment #20) Dear Noriko, Additional data points : 1. The LDAP setup consists of a master and a (read-only) consumer. 2. both the LDAP and the internal (Dogtag) PKI server ran out of root filesystem space somewhere in August (yeah, I know) ; this was corrected at a later date. If I recall correctly, 389-DS complained about an inconsistent state due to unsollicited shutdown. However, when restoring to a known good .ldif backup (from May), I got the same behaviour as described in this BR. 3. Since the upgrade of 389-DS and Dogtag, I am experiencing a problem with Dogtag too, with its logs referring to an LDAP issue ("unknown LDAP object classes") : bugzilla #642398 . To stress again : I am experiencing these issues after a restoration of a known good .ldif backup. That's an interesting input. Could you run this command lines? If you don't see any output (or echo $? reports 0), your DB files are healthy. db_verify /path/to/your/backup/id2entry.db4 db_verify /path/to/your/backup/ancestorid.db4 If you get these errors from ancestorid.db4, please ignore them. db_verify: Page 7: out-of-order key at entry <num> It's a known bug in the older Berkeleyb DB. https://bugzilla.redhat.com/show_bug.cgi?id=472131 db_verify (from db4 & db4-utils 4.3.29-10.el5_5.2) did not yield any output, hence the DB files should be healthy ... (In reply to comment #23) > db_verify (from db4 & db4-utils 4.3.29-10.el5_5.2) did not yield any output, > hence the DB files should be healthy ... Thank you, Didier! That's good to know. BTW, I've noticed you keep quite a lot of state information -- I counted 25 of them. They are not very useful when you have just one master. Thus, you may want to decrease the value of nsDS5ReplicaPurgeDelay, which makes the entry size much smaller. (I don't think that's related to this upgrade problem, though...) http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html/Configuration_and_Command-Line_Tool_Reference/Core_Server_Configuration_Reference.html#Replication_Attributes_under_cnreplica_cnsuffixName_cnmapping_tree_cnconfig-nsDS5ReplicaPurgeDelay Thank you for the information. Anything else I can help you with, Noriko ? I'd hate it if this turned out to be a user (= me) invoked error and I'd be just waisting precious developers' time. If needed, I'd be glad to provide you with the raw DB data ; however, this being certificates data, of course I would need to be informed about all possible ramifications if/when making this available. :) Thank YOU for providing us the good information. This problem is quite serious, so any input would be very appreciated. If you could provide us the data which we could duplicate the problem with, that'd be fabulous. Compared to it, this should be much easier :). Could you share the entire output from the upgrade process? (the errors log should store all the output.) You gave us the snippet in the comment 13 and 18. I'd like to examine if there could be anything suspicious logged before... Thanks! Question for comment 25: May be the data is not relevant to the real issue reported here, but is the suffix replicated from a clone of a Dogtag OCSP responder? (CRL is small?) Didier, Could you try one more thing? Could you run this command line? # diff -r -twU4 /etc/dirsrv/schema /etc/dirsrv/slapd-YOURID/schema Do you see any difference in other than 99user.ldif? Also, we are looking forward to your error log and an answer to Marc. Created attachment 454818 [details] diff output (comment 28) Output from diff, as requested in comment 28. Created attachment 454822 [details] dn2rdn output (comment 26) Output from dn2rdn, as requested in comment 26. (In reply to comment #27) The setup is as follows : - Master ldap server (ldap1) in intranet, with read-only replica (ldap2) in DMZ ; - CA Dogtag server (cert) in intranet, with auxiliary OCSP/RA (certprx) in DMZ ; - 'cert' stores its certificates in 'ldap1', which is also set up to authenticate our network ssh/cifs users (not in production yet) ; - 'ldap1' and 'ldap2' communicate over LDAPS, using certificates signed by the 'cert' CA. I do not know whether this answers your question, Marc ? (as previously noted in comment #21, since the upgrade I am experiencing CSR issues too, as described in bug #642398). When I looked at the attachment from comment 17 at https://bugzilla.redhat.com/attachment.cgi?id=453004 I was wondering why the suffix used by the Dogtag OCSP instance is replicated, with the idea to create a test case to help reproduce the issue to work with Noriko. If this can help : I have no objection in providing you with a dump of the database, if a script could be provided which obfuscates LDAP auth data, user shadow & samba passwords, and certificate private key data. (please do not consider this a sign of mistrust ; rather, it is merely an IMHO best practices precaution). Alternatively : would supplying a login credential with applicable permissions on the ldap1 server be of any use ? (In reply to comment #33) > If this can help : I have no objection in providing you with a dump of the > database, if a script could be provided which obfuscates LDAP auth data, user > shadow & samba passwords, and certificate private key data. Thanks for the offer, Didier. I'll ask around to look for such a tool. > Alternatively : would supplying a login credential with applicable permissions > on the ldap1 server be of any use ? Didier, Can we ask you a favour? We fixed one severe bug in the upgrade. We are about to release 389 v1.2.7 alpha3. Is it possible to run the upgrade test using the new bit in your environment? If your system is RHEL5 equivalent, you can download the package from here. http://koji.fedoraproject.org/koji/buildinfo?buildID=201596 If you need some other platform version, you can choose it from here. http://koji.fedoraproject.org/koji/packageinfo?packageID=8423 Please be careful, the page contains other versions/revisions, as well. The latest alpha 3 should be located at the top of the page starting with this string: 389-ds-base-1.2.7-0.3.a3 These are the steps you could take... 1. restore the 389 v1.2.5 server and database. (please make sure /var/lib/dirsrv/slapd-ID/db/DBVERSION and /var/lib/dirsrv/slapd-ID/*/DBVERSION do not contain "dn-4514" keyword. If it does, the database files won't be upgraded.) 2. remove 389-ds-base-1.2.5 packages rpm -e --nodeps 389-ds-base rpm -e --nodeps 389-ds-base-selinux-devel # if you have installed rpm -e --nodeps 389-ds-base-debuginfo # if you have installed rpm -e --nodeps 389-ds-base-devel # if you have installed 3. install 389-ds-base-1.2.7-0.3.a3 packages. rpm -ivh 389-ds-base-1.2.7-0.3.a3..... rpm -ivh 389-ds-base-selinux-devel-1.2.7-0.3.a3..... 4. make sure the directory server is not running service dirsrv stop 5. run "setup-ds.pl -u" ... Would you like to continue with update? [yes]: yes ... Which update mode do you want to use? [quit]: Offline We'd be greatly appreciate your help. (In reply to comment #36) Apologies for missing this comment from a few days ago. I just upgraded from 389-ds-base-1.2.6.1-2.el5 to 389-ds-base-1.2.7-0.3.a3.el5 ; please check attachment "errors.Nov2.out" for the full log. 1. I first ran dn2rdn , resulting in the same errors ("ancestorid - Error: unable to find entry id [738197504] (original [44]) in id2entry") ; 2. I then restored from a verified good backup ('bkp20100520a'), and 3a. "# grep -i rename /etc/dirsrv/slapd-dmbrldap1/dse.ldif" nsslapd-subtree-rename-switch: on 3b. ran "# setup-ds.pl -u". Appearantly, this yielded a successful upgrade, although : - there are lots of "WARNING: skipping an entry with no RDN (id {XXX})" warnings; - "Dryrun" and "Processed 0 entries" ? As I am not at the office today, I cannot verify the functionality of the databases until tomorrow. Created attachment 457139 [details]
1.2.7a3 DB upgrade log
Logfile, showing results of dn2rdn / DB restore / setup-ds.pl -u
Thank you for testing v1.2.7.a3, Didier. Unfortunately, the bug is still there. The new version does restore better when the upgradednformat/reindex went wrong. That part was "fixed", but not the real cause of the failure. [02/Nov/2010:10:21:34 +0100] upgrade DB - cert.dmbr.UGent.be-pki-ca-dmbr: Start upgradedb. [02/Nov/2010:10:21:34 +0100] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [02/Nov/2010:10:21:34 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Index buffering enabled with bucket size 100 [02/Nov/2010:10:21:35 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Workers finished; cleaning up... [02/Nov/2010:10:21:35 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Workers cleaned up. [02/Nov/2010:10:21:35 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Cleaning up producer thread... [02/Nov/2010:10:21:35 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Indexing complete. Post-processing... [02/Nov/2010:10:21:35 +0100] - ancestorid BAD 13110, err=-30989 DB_NOTFOUND: No matching key/data pair found [02/Nov/2010:10:21:36 +0100] ancestorid - Error: unable to find entry id [738197504] (original [44]) in id2entry [02/Nov/2010:10:21:36 +0100] ancestorid - Error: ldbm_parentid on node index [11] of [12] [02/Nov/2010:10:21:36 +0100] - Failed to create ancestorid index [02/Nov/2010:10:21:36 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Failed to create ancestorid index [02/Nov/2010:10:21:36 +0100] - reindex cert.dmbr.UGent.be-pki-ca-dmbr: Closing files... I searched a tool to obfuscate data in the team for no luck... For us, the contents of the db should not matter. So, I think I'm going to come up with some simple tool to do it. Thank you!! Noriko, IMO, the error you quoted in comment #39, is part of the "dn2rdn" attempt. To reiterate : 10:21:29 : "dn2rdn" (fail) 10:32:23 : "bak2db" (restore from backup, succeeded) 10:35:29 : "setup-ds.pl -u" (succeeded) Apologies if this is off-topic. (In reply to comment #40) > Noriko, > > IMO, the error you quoted in comment #39, is part of the "dn2rdn" attempt. > > To reiterate : > > 10:21:29 : "dn2rdn" (fail) > 10:32:23 : "bak2db" (restore from backup, succeeded) > 10:35:29 : "setup-ds.pl -u" (succeeded) Thank you for pointing it out. Actually, dn2rdn is called in "setup-ds.pl -u" via 90subtreerename.pl and it's supposed to call after upgradednformat (80upgradednformat.pl). /usr/share/dirsrv/updates 80upgradednformat.pl 90subtreerename.pl Could it be possible to rerun the upgrade test just with "setup-ds.pl -u" without running "dn2rdn" manually? (In reply to comment #41) > Could it be possible to rerun the upgrade test just with "setup-ds.pl -u" > without running "dn2rdn" manually? Maybe I'm just being a bit thick here, but isn't this what I did, by restoring from backup (10:32:23) before upgrading (10:35:29) ? I guess I misunderstood the steps. I thought restore is from the ancestorid error... May I ask you another favour? What does this command line return? dbscan -f /path/to/cert.dmbr.UGent.be-pki-ca-dmbr/id2entry.db4 | egrep dn: I'm curious about the deleted entries. Created attachment 457289 [details] Output from dbscan (cfr. comment #43) (In reply to comment #43) Requested dbscan output (from *after* the 'setup-ds.pl -u' command). Indeed this entry (entryid 44) does not exist. cn=CN-DMBR Certificate Authority_O-DMBR Security Domain,dc=certprx.dmbr.ug... To compare, please run the same command line to the original back up. Thank you! Created attachment 457295 [details] Output from dbscan (cfr. comment #45) (In reply to comment #45) Requested dbscan output (from *after* restoration of original backup, *before* the 1.2.7a3 'setup-ds.pl -u' command). Oops. My fault. The backend in Comment 17 is certprx.dmbr.ugent.be-pki-ocsp-dmbr, and the current one is cert.dmbr.UGent.be-pki-ca-dmbr. No wonder "cn=CN-DMBR Certificate Authority_O-DMBR Security Domain" does not exist. :p Interestingly, this is only a difference between dbscan_dn.out and dbscan_dn-orig.out. $ diff -twU2 dbscan_dn.out dbscan_dn-orig.out --- dbscan_dn.out 2010-11-02 14:18:32.000000000 -0700 +++ dbscan_dn-orig.out 2010-11-02 15:21:36.000000000 -0700 @@ -177,4 +177,3 @@ entrydn: cn=21,ou=certificaterepository,ou=ca,dc=cert.dmbr.ugent.be-pki-ca-dmb dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff, dc=cert.dmbr.ugent.be-pki- - rdn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff Created attachment 457305 [details]
script to replace attr value with "X..."
Hi Didier,
Attached is a very simple perl script which replaces the specified attributes' value (ascii chars) with the X's. It's a quick and dirty script. Please try it and examine the result if it's okay to send it to us or not. Please feel free to modify the script, as well.
How to run the script:
1) export the backend db.
/usr/lib[64]/dirsrv/slapd-ID/db2ldif -n certprx.dmbr.ugent.be-pki-ocsp-dmbr -r -U -a /tmp/ocsp.ldif
/usr/lib[64]/dirsrv/slapd-ID/db2ldif -n cert.dmbr.UGent.be-pki-ca-dmbr -r -U -a /tmp/ca.ldif
2) replace the sensitive attribute values. Note: please add attributes you want to replace the value.
perl /path/to/replace.pl /tmp/ocsp.ldif userPassword userCertificate > /tmp/ocsp-replaced.ldif
perl /path/to/replace.pl /tmp/ca.ldif userPassword userCertificate > /tmp/ca-replaced.ldif
If you prefer, you could directly send the ldif files to me. My email address is "nhosoi".
(In reply to comment #48) > If you prefer, you could directly send the ldif files to me. My email address > is "nhosoi". Mail sent, subject "Bugzilla #572018 ldif data". Created attachment 458863 [details]
git patch file (master)
Description: The upgrade tool dn2rdn shares the reindex code, in
which a cursor is set to each entry in id2entry.db# converting
dn to rdn and put it into the btree with db->put one by one.
It turned out the strategy does not work for the case the btree
structure is modified. In that case, the btree could get corrupted.
This patch creates a new id2entry_tmp.db# from the scratch. Once
all the entries are successfully converted, the old db is removed
and id2entry_tmp.db# is renamed to id2entry.db#.
Files:
ldap/servers/slapd/back-ldbm/back-ldbm.h
ldap/servers/slapd/back-ldbm/dblayer.c
ldap/servers/slapd/back-ldbm/import-threads.c
ldap/servers/slapd/back-ldbm/import.c
ldap/servers/slapd/back-ldbm/proto-back-ldbm.h
Reviewed by Nathan (Thank you!!!) Pushed to master. $ git merge 572018 Updating edeb742..becb87f Fast-forward ldap/servers/slapd/back-ldbm/back-ldbm.h | 3 + ldap/servers/slapd/back-ldbm/dblayer.c | 163 ++++++++++++++++-------- ldap/servers/slapd/back-ldbm/import-threads.c | 76 ++++++++++- ldap/servers/slapd/back-ldbm/import.c | 2 +- ldap/servers/slapd/back-ldbm/proto-back-ldbm.h | 3 +- 5 files changed, 185 insertions(+), 62 deletions(-) $ git push Counting objects: 20, done. Delta compression using up to 4 threads. Compressing objects: 100% (11/11), done. Writing objects: 100% (11/11), 10.40 KiB, done. Total 11 (delta 8), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git edeb742..becb87f master -> master Any chance of testing the patch through a rawhide RPM, or is this still premature ? (In reply to comment #53) > Any chance of testing the patch through a rawhide RPM, or is this still > premature ? Yes. We're working on a 1.2.7.a5 release. (In reply to comment #53) > Any chance of testing the patch through a rawhide RPM, or is this still > premature ? 389-ds-base-1.2.7.a5 is now built in koji. It should be available in rawhide very soon (not sure when the push happens). I've released this as an update to Testing on all Fedora platforms. If you want the rpms directly, go to koji: http://koji.fedoraproject.org/koji/packageinfo?packageID=8423 Tested with 389-ds-base-1.2.7-0.6.a5.el5.x86_64. The issue (started in comment #13) appears to be fixed : dn2rdn works, and dirsrv starts with 'nsslapd-subtree-rename-switch: on' in dse.ldif. Thanks to all ! (In reply to comment #56) > Tested with 389-ds-base-1.2.7-0.6.a5.el5.x86_64. > > The issue (started in comment #13) appears to be fixed : dn2rdn works, and > dirsrv starts with 'nsslapd-subtree-rename-switch: on' in dse.ldif. > > Thanks to all ! Thank you for patience and assistance! (In reply to comment #57) > (In reply to comment #56) > > Tested with 389-ds-base-1.2.7-0.6.a5.el5.x86_64. > > > > The issue (started in comment #13) appears to be fixed : dn2rdn works, and > > dirsrv starts with 'nsslapd-subtree-rename-switch: on' in dse.ldif. > > > > Thanks to all ! > > Thank you for patience and assistance! Didier, without your help, we could not find out the cause of this bug. Thank you very much for your support! My pleasure. I really enjoy that warm, cosy feeling of being a (very small) part of the OSS community, of which Red Hat truly is an exemplary citizen. Hi Noriko, I have seen the steps at comment#51, my question is "is there a way I can test it with latest version of 389-ds?" Thanks, Amita Since upgrade works from 8.2 to 9.0, can we mark this bug as VERIFIED? yes. |
Created attachment 398981 [details] error log from upgrade During the 'yum upgrade' the userRoot is deleted. See attachment containing error logs from one of my servers. This happened on and x86_64 master and a i686 slave.